From patchwork Tue May 26 09:38:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tuong Lien X-Patchwork-Id: 218554 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18E26C433E0 for ; Tue, 26 May 2020 09:38:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CAF2F205CB for ; Tue, 26 May 2020 09:38:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=dektech.com.au header.i=@dektech.com.au header.b="CdqREd5G" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388679AbgEZJi6 (ORCPT ); Tue, 26 May 2020 05:38:58 -0400 Received: from mail-db8eur05on2125.outbound.protection.outlook.com ([40.107.20.125]:25953 "EHLO EUR05-DB8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388468AbgEZJi4 (ORCPT ); Tue, 26 May 2020 05:38:56 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LqIE6oxHCxijSsFN+ig7ovkj2AA9t0cQRjCtEzVX2TSqkzzlMUdQ3c5dIKiQDTMGBOPTkjUUkiSHf8dDCK4x4KRBl7kf3e/ggb5NsLbnAOxDSLjK4rzs59sIeH0ElrG74O+SscDs3MzYN5+V0+dY3tBqRqsBSvmxF9MejTChZ7/JKevb2/6qPNfSyKy08mHA9mUyHRlbJkvT8eMtE5/NKfrEd89mvp2J8coAvllIDCUOs8cdMIIjJzNMjUGJjNqu4kxgBrNCv1fwRQ2auoh2dxGKKvN7eVXjGQnuknDghvx5IqrQ+DK/M9MpSDeSHRxY2rZf9wLXLlT22WxLhr2EFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=d8Y0bBmzHO/pVSBKiH9t/IHfjsvl4NfF0mJM0rg6tFQ=; b=avwulwnahdNK44R+DuS7nxbUDvwFShQ7tZxQAPFsbW+rFFWfZsB96Yjln+VaJIyXiMrg0Y/bPgvJrN/o751f/75D/DyA9a1KOSJGPUwdCYvnj2ph2NrlXrYE3dm/cBOPmyx1mYWjm30QpTu0XWd5vs7M51vuYoSqT1rGOHsE20/Ii63YwHlM5g71NT9NQCIym0Hfc5r8HAzZU+TBbQougU5xuX9UA37NrtJC2u0OTURivEJLIKYt10b+xSJJNwJ9eNPTcjDqkmDY9pfIYqMtslBOdKrfxclat5aCw6Kfgu7v6uO6MYKVTP7IVJE2ZGpA0ykg0moD1HGn+r59672/1Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=dektech.com.au; dmarc=pass action=none header.from=dektech.com.au; dkim=pass header.d=dektech.com.au; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dektech.com.au; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=d8Y0bBmzHO/pVSBKiH9t/IHfjsvl4NfF0mJM0rg6tFQ=; b=CdqREd5Gv2oMjiKx/t0r1CrGEw7mbIiZbnDZ9wPdBoT5pI1PAqRYWTWgPRi26gt7ZtknMcYu2nijpTWTyqRchG/5zjtW6CTAmg9mWIFOYw8axYWf3D5Fx0QkUxzqHUBuymlu5FPh6d4ZrRQWDdcUs+2ErUvNUSiaL1PWBlYOHdk= Authentication-Results: davemloft.net; dkim=none (message not signed) header.d=none;davemloft.net; dmarc=none action=none header.from=dektech.com.au; Received: from AM6PR0502MB3925.eurprd05.prod.outlook.com (2603:10a6:209:5::28) by AM6PR0502MB3718.eurprd05.prod.outlook.com (2603:10a6:209:11::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3021.27; Tue, 26 May 2020 09:38:49 +0000 Received: from AM6PR0502MB3925.eurprd05.prod.outlook.com ([fe80::4d5f:2ab:5a66:deaf]) by AM6PR0502MB3925.eurprd05.prod.outlook.com ([fe80::4d5f:2ab:5a66:deaf%7]) with mapi id 15.20.3021.029; Tue, 26 May 2020 09:38:49 +0000 From: Tuong Lien To: davem@davemloft.net, jmaloy@redhat.com, maloy@donjonn.com, ying.xue@windriver.com, netdev@vger.kernel.org Cc: tipc-discussion@lists.sourceforge.net Subject: [net-next 1/5] tipc: introduce Gap ACK blocks for broadcast link Date: Tue, 26 May 2020 16:38:34 +0700 Message-Id: <20200526093838.17421-2-tuong.t.lien@dektech.com.au> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20200526093838.17421-1-tuong.t.lien@dektech.com.au> References: <20200526093838.17421-1-tuong.t.lien@dektech.com.au> X-ClientProxiedBy: HK2PR03CA0060.apcprd03.prod.outlook.com (2603:1096:202:17::30) To AM6PR0502MB3925.eurprd05.prod.outlook.com (2603:10a6:209:5::28) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from dektech.com.au (14.161.14.188) by HK2PR03CA0060.apcprd03.prod.outlook.com (2603:1096:202:17::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.20.3045.8 via Frontend Transport; Tue, 26 May 2020 09:38:47 +0000 X-Mailer: git-send-email 2.13.7 X-Originating-IP: [14.161.14.188] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 946a0c74-d1e6-4c22-86ff-08d801589802 X-MS-TrafficTypeDiagnostic: AM6PR0502MB3718: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:356; X-Forefront-PRVS: 041517DFAB X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Sfj+AI4+EE0zpEjgF10R2tBXGjDy9Im5VRyCfjcnrOrWI8aFZ/Xg+mEb1FJDKeVjvuqPwPi0o58u8l+5RqQczKD9/8EsCHgd4mc558UMYx6xJTdezgI1lOe9L2C+4/7ZB8p4XzNb9jiXT1uWQrQ0mO5Jp81cEKVDf40Yd5CcEc0CgVIKehmKnip4kh88LuM9Fi2gnbMYfrtCNXUp1rLhXhnXI4eJeTKR0Q8Gl/cITNl/+MgvPBTJo5v3ui/7tRDr/dCV19vcbLpoq4WDUPMwK+CzhjwPSsh42IdGNJvn4ijTrqW0KzJpWFgII90fmxh0M+HNArU+SAKsnOjOwHq2jg== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM6PR0502MB3925.eurprd05.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(396003)(346002)(366004)(376002)(136003)(39850400004)(8676002)(16526019)(956004)(2616005)(103116003)(55016002)(186003)(6666004)(4326008)(66476007)(66556008)(2906002)(86362001)(36756003)(316002)(5660300002)(30864003)(26005)(66946007)(8936002)(52116002)(1076003)(478600001)(7696005); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData: IwdE6nVrYy3/1/XA9kP7PD9OSQR9fSsgQYah6yqIghPNGTTDGgeQHluU6sjcZfZWoYe0j837ieC54cXVClmLup+g/QiwJGsnXU5kH7YNca6Me9+3Of5yqYA6nD+xiYLm5L4aZeBl6XBy2BA0RfqzZUZJwcH4gBd31oxPYFM2QB7BK+w2Qri8WIaQKGkZatUnqzDxbzUgDMgk6RZIFpRChIf6ccxl/4dOGJq/+sjT0cJjP5ML/8YL6KzKHwyPM3avj5a0TeMG4MgNR38wMpeZhxF7tmB8Ro90sJHnKSAgeyetJERbnLMRLFjf7YLn1vXyALIZye3YdohkuWmGTNc4yUiK+GeM+HzDfM5jFlDV+PurkwHu01JxBIuEb2Anh3Mp+JkbF6grJ+ZVLuoSM1Iwt8ADRiAHuAEoKFa+OrLYNHAW8YMJ0WiiRASgkQ4+AgCiltKVex/o4ZiiJzaWx0TJJ35KbIxMDkzb2x9VDc7HZi8= X-OriginatorOrg: dektech.com.au X-MS-Exchange-CrossTenant-Network-Message-Id: 946a0c74-d1e6-4c22-86ff-08d801589802 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 May 2020 09:38:49.5685 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 1957ea50-0dd8-4360-8db0-c9530df996b2 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 0LT1xZTjnkzeSbG7ZUQRgDL4RK5vJyIJda+kDT6c/UM7D2Iw9k1bHmljiYzzyLKxvvAy85/CajhSQ7yWgs7HldpJ2fLWPFkqeLCMztXrLvs= X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR0502MB3718 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput by Gap ACK blocks"), we apply the same mechanism for the broadcast link as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will consist of two parts built for both the broadcast and unicast types: 31 16 15 0 +-------------+-------------+-------------+-------------+ | bgack_cnt | ugack_cnt | len | +-------------+-------------+-------------+-------------+ - | gap | ack | | +-------------+-------------+-------------+-------------+ > bc gacks : : : | +-------------+-------------+-------------+-------------+ - | gap | ack | | +-------------+-------------+-------------+-------------+ > uc gacks : : : | +-------------+-------------+-------------+-------------+ - which is "automatically" backward-compatible. We also increase the max number of Gap ACK blocks to 128, allowing upto 64 blocks per type (total buffer size = 516 bytes). Besides, the 'tipc_link_advance_transmq()' function is refactored which is applicable for both the unicast and broadcast cases now, so some old functions can be removed and the code is optimized. With the patch, TIPC broadcast is more robust regardless of packet loss or disorder, latency, ... in the underlying network. Its performance is boost up significantly. For example, experiment with a 5% packet loss rate results: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 0m 42.46s user 0m 1.16s sys 0m 17.67s Without the patch: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 8m 27.94s user 0m 0.55s sys 0m 2.38s Acked-by: Ying Xue Acked-by: Jon Maloy Signed-off-by: Tuong Lien --- net/tipc/bcast.c | 9 +- net/tipc/link.c | 425 ++++++++++++++++++++++++++++++++----------------------- net/tipc/link.h | 7 +- net/tipc/msg.h | 27 +++- net/tipc/node.c | 10 +- 5 files changed, 293 insertions(+), 185 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 4c20be08b9c4..3ce690a96ee9 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -474,7 +474,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, __skb_queue_head_init(&xmitq); tipc_bcast_lock(net); - tipc_link_bc_ack_rcv(l, acked, &xmitq); + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); tipc_bcast_unlock(net); tipc_bcbase_xmit(net, &xmitq); @@ -492,6 +492,7 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, struct tipc_msg *hdr) { struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; + struct tipc_gap_ack_blks *ga; struct sk_buff_head xmitq; int rc = 0; @@ -501,8 +502,10 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, if (msg_type(hdr) != STATE_MSG) { tipc_link_bc_init_rcv(l, hdr); } else if (!msg_bc_ack_invalid(hdr)) { - tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), &xmitq); - rc = tipc_link_bc_sync_rcv(l, hdr, &xmitq); + tipc_get_gap_ack_blks(&ga, l, hdr, false); + rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), + msg_bc_gap(hdr), ga, &xmitq); + rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); } tipc_bcast_unlock(net); diff --git a/net/tipc/link.c b/net/tipc/link.c index d4675e922a8f..d29b9c531171 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -188,6 +188,8 @@ struct tipc_link { /* Broadcast */ u16 ackers; u16 acked; + u16 last_gap; + struct tipc_gap_ack_blks *last_ga; struct tipc_link *bc_rcvlink; struct tipc_link *bc_sndlink; u8 nack_state; @@ -249,11 +251,14 @@ static int tipc_link_build_nack_msg(struct tipc_link *l, struct sk_buff_head *xmitq); static void tipc_link_build_bc_init_msg(struct tipc_link *l, struct sk_buff_head *xmitq); -static int tipc_link_release_pkts(struct tipc_link *l, u16 to); -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap); -static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, + struct tipc_link *l, u8 start_index); +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct tipc_msg *hdr); +static int tipc_link_advance_transmq(struct tipc_link *l, struct tipc_link *r, + u16 acked, u16 gap, struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq); + struct sk_buff_head *xmitq, + bool *retransmitted, int *rc); static void tipc_link_update_cwin(struct tipc_link *l, int released, bool retransmitted); /* @@ -370,7 +375,7 @@ void tipc_link_remove_bc_peer(struct tipc_link *snd_l, snd_l->ackers--; rcv_l->bc_peer_is_up = true; rcv_l->state = LINK_ESTABLISHED; - tipc_link_bc_ack_rcv(rcv_l, ack, xmitq); + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); tipc_link_reset(rcv_l); rcv_l->state = LINK_RESET; @@ -784,8 +789,6 @@ bool tipc_link_too_silent(struct tipc_link *l) return (l->silent_intv_cnt + 2 > l->abort_limit); } -static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, - u16 from, u16 to, struct sk_buff_head *xmitq); /* tipc_link_timeout - perform periodic task as instructed from node timeout */ int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head *xmitq) @@ -948,6 +951,9 @@ void tipc_link_reset(struct tipc_link *l) l->snd_nxt_state = 1; l->rcv_nxt_state = 1; l->acked = 0; + l->last_gap = 0; + kfree(l->last_ga); + l->last_ga = NULL; l->silent_intv_cnt = 0; l->rst_cnt = 0; l->bc_peer_is_up = false; @@ -1183,68 +1189,14 @@ static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, if (link_is_bc_sndlink(l)) { r->state = LINK_RESET; - *rc = TIPC_LINK_DOWN_EVT; + *rc |= TIPC_LINK_DOWN_EVT; } else { - *rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT); + *rc |= tipc_link_fsm_evt(l, LINK_FAILURE_EVT); } return true; } -/* tipc_link_bc_retrans() - retransmit zero or more packets - * @l: the link to transmit on - * @r: the receiving link ordering the retransmit. Same as l if unicast - * @from: retransmit from (inclusive) this sequence number - * @to: retransmit to (inclusive) this sequence number - * xmitq: queue for accumulating the retransmitted packets - */ -static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, - u16 from, u16 to, struct sk_buff_head *xmitq) -{ - struct sk_buff *_skb, *skb = skb_peek(&l->transmq); - u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; - u16 ack = l->rcv_nxt - 1; - int retransmitted = 0; - struct tipc_msg *hdr; - int rc = 0; - - if (!skb) - return 0; - if (less(to, from)) - return 0; - - trace_tipc_link_retrans(r, from, to, &l->transmq); - - if (link_retransmit_failure(l, r, &rc)) - return rc; - - skb_queue_walk(&l->transmq, skb) { - hdr = buf_msg(skb); - if (less(msg_seqno(hdr), from)) - continue; - if (more(msg_seqno(hdr), to)) - break; - if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) - continue; - TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; - _skb = pskb_copy(skb, GFP_ATOMIC); - if (!_skb) - return 0; - hdr = buf_msg(_skb); - msg_set_ack(hdr, ack); - msg_set_bcast_ack(hdr, bc_ack); - _skb->priority = TC_PRIO_CONTROL; - __skb_queue_tail(xmitq, _skb); - l->stats.retransmitted++; - retransmitted++; - /* Increase actual retrans counter & mark first time */ - if (!TIPC_SKB_CB(skb)->retr_cnt++) - TIPC_SKB_CB(skb)->retr_stamp = jiffies; - } - tipc_link_update_cwin(l, 0, retransmitted); - return 0; -} - /* tipc_data_input - deliver data and name distr msgs to upper layer * * Consumes buffer if message is of right type @@ -1402,46 +1354,71 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, struct sk_buff *skb, return rc; } -static int tipc_link_release_pkts(struct tipc_link *l, u16 acked) -{ - int released = 0; - struct sk_buff *skb, *tmp; - - skb_queue_walk_safe(&l->transmq, skb, tmp) { - if (more(buf_seqno(skb), acked)) - break; - __skb_unlink(skb, &l->transmq); - kfree_skb(skb); - released++; +/** + * tipc_get_gap_ack_blks - get Gap ACK blocks from PROTOCOL/STATE_MSG + * @ga: returned pointer to the Gap ACK blocks if any + * @l: the tipc link + * @hdr: the PROTOCOL/STATE_MSG header + * @uc: desired Gap ACK blocks type, i.e. unicast (= 1) or broadcast (= 0) + * + * Return: the total Gap ACK blocks size + */ +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, + struct tipc_msg *hdr, bool uc) +{ + struct tipc_gap_ack_blks *p; + u16 sz = 0; + + /* Does peer support the Gap ACK blocks feature? */ + if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { + p = (struct tipc_gap_ack_blks *)msg_data(hdr); + sz = ntohs(p->len); + /* Sanity check */ + if (sz == tipc_gap_ack_blks_sz(p->ugack_cnt + p->bgack_cnt)) { + /* Good, check if the desired type exists */ + if ((uc && p->ugack_cnt) || (!uc && p->bgack_cnt)) + goto ok; + /* Backward compatible: peer might not support bc, but uc? */ + } else if (uc && sz == tipc_gap_ack_blks_sz(p->ugack_cnt)) { + if (p->ugack_cnt) { + p->bgack_cnt = 0; + goto ok; + } + } } - return released; + /* Other cases: ignore! */ + p = NULL; + +ok: + *ga = p; + return sz; } -/* tipc_build_gap_ack_blks - build Gap ACK blocks - * @l: tipc link that data have come with gaps in sequence if any - * @data: data buffer to store the Gap ACK blocks after built - * - * returns the actual allocated memory size - */ -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, + struct tipc_link *l, u8 start_index) { + struct tipc_gap_ack *gacks = &ga->gacks[start_index]; struct sk_buff *skb = skb_peek(&l->deferdq); - struct tipc_gap_ack_blks *ga = data; - u16 len, expect, seqno = 0; + u16 expect, seqno = 0; u8 n = 0; - if (!skb || !gap) - goto exit; + if (!skb) + return 0; expect = buf_seqno(skb); skb_queue_walk(&l->deferdq, skb) { seqno = buf_seqno(skb); if (unlikely(more(seqno, expect))) { - ga->gacks[n].ack = htons(expect - 1); - ga->gacks[n].gap = htons(seqno - expect); - if (++n >= MAX_GAP_ACK_BLKS) { - pr_info_ratelimited("Too few Gap ACK blocks!\n"); - goto exit; + gacks[n].ack = htons(expect - 1); + gacks[n].gap = htons(seqno - expect); + if (++n >= MAX_GAP_ACK_BLKS / 2) { + char buf[TIPC_MAX_LINK_NAME]; + + pr_info_ratelimited("Gacks on %s: %d, ql: %d!\n", + tipc_link_name_ext(l, buf), + n, + skb_queue_len(&l->deferdq)); + return n; } } else if (unlikely(less(seqno, expect))) { pr_warn("Unexpected skb in deferdq!\n"); @@ -1451,14 +1428,44 @@ static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) } /* last block */ - ga->gacks[n].ack = htons(seqno); - ga->gacks[n].gap = 0; + gacks[n].ack = htons(seqno); + gacks[n].gap = 0; n++; + return n; +} -exit: - len = tipc_gap_ack_blks_sz(n); +/* tipc_build_gap_ack_blks - build Gap ACK blocks + * @l: tipc unicast link + * @hdr: the tipc message buffer to store the Gap ACK blocks after built + * + * The function builds Gap ACK blocks for both the unicast & broadcast receiver + * links of a certain peer, the buffer after built has the network data format + * as found at the struct tipc_gap_ack_blks definition. + * + * returns the actual allocated memory size + */ +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct tipc_msg *hdr) +{ + struct tipc_link *bcl = l->bc_rcvlink; + struct tipc_gap_ack_blks *ga; + u16 len; + + ga = (struct tipc_gap_ack_blks *)msg_data(hdr); + + /* Start with broadcast link first */ + tipc_bcast_lock(bcl->net); + msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1); + msg_set_bc_gap(hdr, link_bc_rcv_gap(bcl)); + ga->bgack_cnt = __tipc_build_gap_ack_blks(ga, bcl, 0); + tipc_bcast_unlock(bcl->net); + + /* Now for unicast link, but an explicit NACK only (???) */ + ga->ugack_cnt = (msg_seq_gap(hdr)) ? + __tipc_build_gap_ack_blks(ga, l, ga->bgack_cnt) : 0; + + /* Total len */ + len = tipc_gap_ack_blks_sz(ga->bgack_cnt + ga->ugack_cnt); ga->len = htons(len); - ga->gack_cnt = n; return len; } @@ -1466,47 +1473,109 @@ static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) * acked packets, also doing retransmissions if * gaps found * @l: tipc link with transmq queue to be advanced + * @r: tipc link "receiver" i.e. in case of broadcast (= "l" if unicast) * @acked: seqno of last packet acked by peer without any gaps before * @gap: # of gap packets * @ga: buffer pointer to Gap ACK blocks from peer * @xmitq: queue for accumulating the retransmitted packets if any + * @retransmitted: returned boolean value if a retransmission is really issued + * @rc: returned code e.g. TIPC_LINK_DOWN_EVT if a repeated retransmit failures + * happens (- unlikely case) * - * In case of a repeated retransmit failures, the call will return shortly - * with a returned code (e.g. TIPC_LINK_DOWN_EVT) + * Return: the number of packets released from the link transmq */ -static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, +static int tipc_link_advance_transmq(struct tipc_link *l, struct tipc_link *r, + u16 acked, u16 gap, struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq) + struct sk_buff_head *xmitq, + bool *retransmitted, int *rc) { + struct tipc_gap_ack_blks *last_ga = r->last_ga, *this_ga = NULL; + struct tipc_gap_ack *gacks = NULL; struct sk_buff *skb, *_skb, *tmp; struct tipc_msg *hdr; + u32 qlen = skb_queue_len(&l->transmq); + u16 nacked = acked, ngap = gap, gack_cnt = 0; u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; - bool retransmitted = false; u16 ack = l->rcv_nxt - 1; - bool passed = false; - u16 released = 0; u16 seqno, n = 0; - int rc = 0; + u16 end = r->acked, start = end, offset = r->last_gap; + u16 si = (last_ga) ? last_ga->start_index : 0; + bool is_uc = !link_is_bc_sndlink(l); + bool bc_has_acked = false; + + /* Determine Gap ACK blocks if any for the particular link */ + if (ga && is_uc) { + /* Get the Gap ACKs, uc part */ + gack_cnt = ga->ugack_cnt; + gacks = &ga->gacks[ga->bgack_cnt]; + } else if (ga) { + /* Copy the Gap ACKs, bc part, for later renewal if needed */ + this_ga = kmemdup(ga, tipc_gap_ack_blks_sz(ga->bgack_cnt), + GFP_ATOMIC); + if (likely(this_ga)) { + this_ga->start_index = 0; + /* Start with the bc Gap ACKs */ + gack_cnt = this_ga->bgack_cnt; + gacks = &this_ga->gacks[0]; + } else { + /* Hmm, we can get in trouble..., simply ignore it */ + pr_warn_ratelimited("Ignoring bc Gap ACKs, no memory\n"); + } + } + /* Advance the link transmq */ skb_queue_walk_safe(&l->transmq, skb, tmp) { seqno = buf_seqno(skb); next_gap_ack: - if (less_eq(seqno, acked)) { + if (less_eq(seqno, nacked)) { + if (is_uc) + goto release; + /* Skip packets peer has already acked */ + if (!more(seqno, r->acked)) + continue; + /* Get the next of last Gap ACK blocks */ + while (more(seqno, end)) { + if (!last_ga || si >= last_ga->bgack_cnt) + break; + start = end + offset + 1; + end = ntohs(last_ga->gacks[si].ack); + offset = ntohs(last_ga->gacks[si].gap); + si++; + WARN_ONCE(more(start, end) || + (!offset && + si < last_ga->bgack_cnt) || + si > MAX_GAP_ACK_BLKS, + "Corrupted Gap ACK: %d %d %d %d %d\n", + start, end, offset, si, + last_ga->bgack_cnt); + } + /* Check against the last Gap ACK block */ + if (in_range(seqno, start, end)) + continue; + /* Update/release the packet peer is acking */ + bc_has_acked = true; + if (--TIPC_SKB_CB(skb)->ackers) + continue; +release: /* release skb */ __skb_unlink(skb, &l->transmq); kfree_skb(skb); - released++; - } else if (less_eq(seqno, acked + gap)) { - /* First, check if repeated retrans failures occurs? */ - if (!passed && link_retransmit_failure(l, l, &rc)) - return rc; - passed = true; - + } else if (less_eq(seqno, nacked + ngap)) { + /* First gap: check if repeated retrans failures? */ + if (unlikely(seqno == acked + 1 && + link_retransmit_failure(l, r, rc))) { + /* Ignore this bc Gap ACKs if any */ + kfree(this_ga); + this_ga = NULL; + break; + } /* retransmit skb if unrestricted*/ if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) continue; - TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME; + TIPC_SKB_CB(skb)->nxt_retr = (is_uc) ? + TIPC_UC_RETR_TIME : TIPC_BC_RETR_LIM; _skb = pskb_copy(skb, GFP_ATOMIC); if (!_skb) continue; @@ -1516,25 +1585,51 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, _skb->priority = TC_PRIO_CONTROL; __skb_queue_tail(xmitq, _skb); l->stats.retransmitted++; - retransmitted = true; + *retransmitted = true; /* Increase actual retrans counter & mark first time */ if (!TIPC_SKB_CB(skb)->retr_cnt++) TIPC_SKB_CB(skb)->retr_stamp = jiffies; } else { /* retry with Gap ACK blocks if any */ - if (!ga || n >= ga->gack_cnt) + if (n >= gack_cnt) break; - acked = ntohs(ga->gacks[n].ack); - gap = ntohs(ga->gacks[n].gap); + nacked = ntohs(gacks[n].ack); + ngap = ntohs(gacks[n].gap); n++; goto next_gap_ack; } } - if (released || retransmitted) - tipc_link_update_cwin(l, released, retransmitted); - if (released) - tipc_link_advance_backlog(l, xmitq); - return 0; + + /* Renew last Gap ACK blocks for bc if needed */ + if (bc_has_acked) { + if (this_ga) { + kfree(last_ga); + r->last_ga = this_ga; + r->last_gap = gap; + } else if (last_ga) { + if (less(acked, start)) { + si--; + offset = start - acked - 1; + } else if (less(acked, end)) { + acked = end; + } + if (si < last_ga->bgack_cnt) { + last_ga->start_index = si; + r->last_gap = offset; + } else { + kfree(last_ga); + r->last_ga = NULL; + r->last_gap = 0; + } + } else { + r->last_gap = 0; + } + r->acked = acked; + } else { + kfree(this_ga); + } + + return qlen - skb_queue_len(&l->transmq); } /* tipc_link_build_state_msg: prepare link state message for transmission @@ -1651,7 +1746,8 @@ int tipc_link_rcv(struct tipc_link *l, struct sk_buff *skb, kfree_skb(skb); break; } - released += tipc_link_release_pkts(l, msg_ack(hdr)); + released += tipc_link_advance_transmq(l, l, msg_ack(hdr), 0, + NULL, NULL, NULL, NULL); /* Defer delivery if sequence gap */ if (unlikely(seqno != rcv_nxt)) { @@ -1739,7 +1835,7 @@ static void tipc_link_build_proto_msg(struct tipc_link *l, int mtyp, bool probe, msg_set_probe(hdr, probe); msg_set_is_keepalive(hdr, probe || probe_reply); if (l->peer_caps & TIPC_GAP_ACK_BLOCK) - glen = tipc_build_gap_ack_blks(l, data, rcvgap); + glen = tipc_build_gap_ack_blks(l, hdr); tipc_mon_prep(l->net, data + glen, &dlen, mstate, l->bearer_id); msg_set_size(hdr, INT_H_SIZE + glen + dlen); skb_trim(skb, INT_H_SIZE + glen + dlen); @@ -2027,20 +2123,19 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, { struct tipc_msg *hdr = buf_msg(skb); struct tipc_gap_ack_blks *ga = NULL; - u16 rcvgap = 0; - u16 ack = msg_ack(hdr); - u16 gap = msg_seq_gap(hdr); + bool reply = msg_probe(hdr), retransmitted = false; + u16 dlen = msg_data_sz(hdr), glen = 0; u16 peers_snd_nxt = msg_next_sent(hdr); u16 peers_tol = msg_link_tolerance(hdr); u16 peers_prio = msg_linkprio(hdr); + u16 gap = msg_seq_gap(hdr); + u16 ack = msg_ack(hdr); u16 rcv_nxt = l->rcv_nxt; - u16 dlen = msg_data_sz(hdr); + u16 rcvgap = 0; int mtyp = msg_type(hdr); - bool reply = msg_probe(hdr); - u16 glen = 0; - void *data; + int rc = 0, released; char *if_name; - int rc = 0; + void *data; trace_tipc_proto_rcv(skb, false, l->name); if (tipc_link_is_blocked(l) || !xmitq) @@ -2137,13 +2232,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, } /* Receive Gap ACK blocks from peer if any */ - if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { - ga = (struct tipc_gap_ack_blks *)data; - glen = ntohs(ga->len); - /* sanity check: if failed, ignore Gap ACK blocks */ - if (glen != tipc_gap_ack_blks_sz(ga->gack_cnt)) - ga = NULL; - } + glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr, &l->mon_state, l->bearer_id); @@ -2158,9 +2247,14 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, tipc_link_build_proto_msg(l, STATE_MSG, 0, reply, rcvgap, 0, 0, xmitq); - rc |= tipc_link_advance_transmq(l, ack, gap, ga, xmitq); + released = tipc_link_advance_transmq(l, l, ack, gap, ga, xmitq, + &retransmitted, &rc); if (gap) l->stats.recv_nacks++; + if (released || retransmitted) + tipc_link_update_cwin(l, released, retransmitted); + if (released) + tipc_link_advance_backlog(l, xmitq); if (unlikely(!skb_queue_empty(&l->wakeupq))) link_prepare_wakeup(l); } @@ -2246,10 +2340,7 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr) int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, struct sk_buff_head *xmitq) { - struct tipc_link *snd_l = l->bc_sndlink; u16 peers_snd_nxt = msg_bc_snd_nxt(hdr); - u16 from = msg_bcast_ack(hdr) + 1; - u16 to = from + msg_bc_gap(hdr) - 1; int rc = 0; if (!link_is_up(l)) @@ -2271,8 +2362,6 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, if (more(peers_snd_nxt, l->rcv_nxt + l->window)) return rc; - rc = tipc_link_bc_retrans(snd_l, l, from, to, xmitq); - l->snd_nxt = peers_snd_nxt; if (link_bc_rcv_gap(l)) rc |= TIPC_LINK_SND_STATE; @@ -2307,38 +2396,27 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, return 0; } -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, - struct sk_buff_head *xmitq) +int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, + struct tipc_gap_ack_blks *ga, + struct sk_buff_head *xmitq) { - struct sk_buff *skb, *tmp; - struct tipc_link *snd_l = l->bc_sndlink; + struct tipc_link *l = r->bc_sndlink; + bool unused = false; + int rc = 0; - if (!link_is_up(l) || !l->bc_peer_is_up) - return; + if (!link_is_up(r) || !r->bc_peer_is_up) + return 0; - if (!more(acked, l->acked)) - return; + if (less(acked, r->acked) || (acked == r->acked && !gap && !ga)) + return 0; - trace_tipc_link_bc_ack(l, l->acked, acked, &snd_l->transmq); - /* Skip over packets peer has already acked */ - skb_queue_walk(&snd_l->transmq, skb) { - if (more(buf_seqno(skb), l->acked)) - break; - } + tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, &unused, &rc); - /* Update/release the packets peer is acking now */ - skb_queue_walk_from_safe(&snd_l->transmq, skb, tmp) { - if (more(buf_seqno(skb), acked)) - break; - if (!--TIPC_SKB_CB(skb)->ackers) { - __skb_unlink(skb, &snd_l->transmq); - kfree_skb(skb); - } - } - l->acked = acked; - tipc_link_advance_backlog(snd_l, xmitq); - if (unlikely(!skb_queue_empty(&snd_l->wakeupq))) - link_prepare_wakeup(snd_l); + tipc_link_advance_backlog(l, xmitq); + if (unlikely(!skb_queue_empty(&l->wakeupq))) + link_prepare_wakeup(l); + + return rc; } /* tipc_link_bc_nack_rcv(): receive broadcast nack message @@ -2366,8 +2444,7 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, struct sk_buff *skb, return 0; if (dnode == tipc_own_addr(l->net)) { - tipc_link_bc_ack_rcv(l, acked, xmitq); - rc = tipc_link_bc_retrans(l->bc_sndlink, l, from, to, xmitq); + rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq); l->stats.recv_nacks++; return rc; } diff --git a/net/tipc/link.h b/net/tipc/link.h index d3c1c3fc1659..0a0fa7350722 100644 --- a/net/tipc/link.h +++ b/net/tipc/link.h @@ -143,8 +143,11 @@ int tipc_link_bc_peers(struct tipc_link *l); void tipc_link_set_mtu(struct tipc_link *l, int mtu); int tipc_link_mtu(struct tipc_link *l); int tipc_link_mss(struct tipc_link *l); -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, - struct sk_buff_head *xmitq); +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, + struct tipc_msg *hdr, bool uc); +int tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, u16 gap, + struct tipc_gap_ack_blks *ga, + struct sk_buff_head *xmitq); void tipc_link_build_bc_sync_msg(struct tipc_link *l, struct sk_buff_head *xmitq); void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr); diff --git a/net/tipc/msg.h b/net/tipc/msg.h index 871feadbbc19..ca5f8689a33b 100644 --- a/net/tipc/msg.h +++ b/net/tipc/msg.h @@ -160,20 +160,39 @@ struct tipc_gap_ack { /* struct tipc_gap_ack_blks * @len: actual length of the record - * @gack_cnt: number of Gap ACK blocks in the record + * @ugack_cnt: number of Gap ACK blocks for unicast (following the broadcast + * ones) + * @start_index: starting index for "valid" broadcast Gap ACK blocks + * @bgack_cnt: number of Gap ACK blocks for broadcast in the record * @gacks: array of Gap ACK blocks + * + * 31 16 15 0 + * +-------------+-------------+-------------+-------------+ + * | bgack_cnt | ugack_cnt | len | + * +-------------+-------------+-------------+-------------+ - + * | gap | ack | | + * +-------------+-------------+-------------+-------------+ > bc gacks + * : : : | + * +-------------+-------------+-------------+-------------+ - + * | gap | ack | | + * +-------------+-------------+-------------+-------------+ > uc gacks + * : : : | + * +-------------+-------------+-------------+-------------+ - */ struct tipc_gap_ack_blks { __be16 len; - u8 gack_cnt; - u8 reserved; + union { + u8 ugack_cnt; + u8 start_index; + }; + u8 bgack_cnt; struct tipc_gap_ack gacks[]; }; #define tipc_gap_ack_blks_sz(n) (sizeof(struct tipc_gap_ack_blks) + \ sizeof(struct tipc_gap_ack) * (n)) -#define MAX_GAP_ACK_BLKS 32 +#define MAX_GAP_ACK_BLKS 128 #define MAX_GAP_ACK_BLKS_SZ tipc_gap_ack_blks_sz(MAX_GAP_ACK_BLKS) static inline struct tipc_msg *buf_msg(struct sk_buff *skb) diff --git a/net/tipc/node.c b/net/tipc/node.c index 803a3a6d0f50..6a49b3eeaae9 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -2071,10 +2071,16 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b) le = &n->links[bearer_id]; /* Ensure broadcast reception is in synch with peer's send state */ - if (unlikely(usr == LINK_PROTOCOL)) + if (unlikely(usr == LINK_PROTOCOL)) { + if (unlikely(skb_linearize(skb))) { + tipc_node_put(n); + goto discard; + } + hdr = buf_msg(skb); tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq); - else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) + } else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) { tipc_bcast_ack_rcv(net, n->bc_entry.link, hdr); + } /* Receive packet directly if conditions permit */ tipc_node_read_lock(n); From patchwork Tue May 26 09:38:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tuong Lien X-Patchwork-Id: 218553 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C16DEC433E0 for ; Tue, 26 May 2020 09:39:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D680205CB for ; Tue, 26 May 2020 09:39:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=dektech.com.au header.i=@dektech.com.au header.b="wRv5PTgB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388701AbgEZJjF (ORCPT ); Tue, 26 May 2020 05:39:05 -0400 Received: from mail-db8eur05on2125.outbound.protection.outlook.com ([40.107.20.125]:25953 "EHLO EUR05-DB8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388468AbgEZJjE (ORCPT ); Tue, 26 May 2020 05:39:04 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WDNAwgAw+4F+/RSe4USxs8DVcM3wWCq+blr5CNPMvZaw4tYT4iRQmCQ75N/O2I8YfiQSuR5kF3NQeZ6jKbmQTGTQCJxTVn/5Ykha/NYbHITTkBuCEdrm2M2Dpzt+9v1C+9kAWh7WL0zmmMu23V1pK5XiPo/7Nc0gZ2/B05c3nbk3lwDvEYz/aLMYdU9Y7eGqqbNOon8unmQ+L2rqGYFnAy01b1s2j1QndaG8JPINLTlzJCVO4M84qHEQ08dQ/+6s9aL6jtM/OwWVdZUgfinE942PJPhmqfk/S/3Ef/4i0yi63eKLfxiuWs/zpe6/okY/6hznphrpp/k1XxLdiSSijQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6wSZzMZAYu9j9QwTrBCBj1bXGkzk5N12x7xEeJewurs=; b=M9g0yWi34z7UTu3zROli4D5iQO97oP2VYYPVWKJox+s4T4l7cYRh9oIwahI4MNJgjT1YB977kpPpzwVlpSYDgpbovuhDqa8w2pP8e0Ds1+R1OoQPNL7EPYxBMeKTI4RzpfMqZRr505gu9VeZoDxFNX1G/cR5Xw4sL6RnRcaw0bCwwXYm6L84j5IIWqHxuPx2uTqPZ80eCEZRFzokBdrpQb/iDStA4o8YNvXp2L3LRS94irDgIJuaxVe4D4GVxZDLZ/VGBQLUWvWs/ELUDG+zgi5lDfzoEWXcBNuv/ngfunMLS/WivPMST9P39C0w4yysL/iqcOHdmyK2FFmrFB2knQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=dektech.com.au; dmarc=pass action=none header.from=dektech.com.au; dkim=pass header.d=dektech.com.au; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dektech.com.au; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6wSZzMZAYu9j9QwTrBCBj1bXGkzk5N12x7xEeJewurs=; b=wRv5PTgBkH+KpCOyCfAf616dSnR3RNnSmZJVD3B/EpGUp1AZjlWflhwsQXlAKtDlh/cbTp1B20r1kevINxzqaazwd/7UfmZfY6O9BNCqrIatftXqlY2EOCx9Q6aB3ChhFjzrPtyVH5p3G4UbDy608xCpsi8IZYGr3RxKtlqLCVs= Authentication-Results: davemloft.net; dkim=none (message not signed) header.d=none;davemloft.net; dmarc=none action=none header.from=dektech.com.au; Received: from AM6PR0502MB3925.eurprd05.prod.outlook.com (2603:10a6:209:5::28) by AM6PR0502MB3718.eurprd05.prod.outlook.com (2603:10a6:209:11::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3021.27; Tue, 26 May 2020 09:38:54 +0000 Received: from AM6PR0502MB3925.eurprd05.prod.outlook.com ([fe80::4d5f:2ab:5a66:deaf]) by AM6PR0502MB3925.eurprd05.prod.outlook.com ([fe80::4d5f:2ab:5a66:deaf%7]) with mapi id 15.20.3021.029; Tue, 26 May 2020 09:38:54 +0000 From: Tuong Lien To: davem@davemloft.net, jmaloy@redhat.com, maloy@donjonn.com, ying.xue@windriver.com, netdev@vger.kernel.org Cc: tipc-discussion@lists.sourceforge.net Subject: [net-next 3/5] tipc: enable broadcast retrans via unicast Date: Tue, 26 May 2020 16:38:36 +0700 Message-Id: <20200526093838.17421-4-tuong.t.lien@dektech.com.au> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20200526093838.17421-1-tuong.t.lien@dektech.com.au> References: <20200526093838.17421-1-tuong.t.lien@dektech.com.au> X-ClientProxiedBy: HK2PR03CA0060.apcprd03.prod.outlook.com (2603:1096:202:17::30) To AM6PR0502MB3925.eurprd05.prod.outlook.com (2603:10a6:209:5::28) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from dektech.com.au (14.161.14.188) by HK2PR03CA0060.apcprd03.prod.outlook.com (2603:1096:202:17::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.20.3045.8 via Frontend Transport; Tue, 26 May 2020 09:38:52 +0000 X-Mailer: git-send-email 2.13.7 X-Originating-IP: [14.161.14.188] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 105ebfd9-5aeb-42d4-a5a6-08d801589b0d X-MS-TrafficTypeDiagnostic: AM6PR0502MB3718: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:632; X-Forefront-PRVS: 041517DFAB X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 1fnByXkAWdoLN1WE1PX6TBWl8HVhfOzyBu+v4yvvP6GSBNX06/QEfFgZm81FJrfMwe75aepph0RrF1vegFDZ1vQJUq1jTBIOEs/wIo+lTu3beRKRlrulMMUiSwFxJDBQvpEBFDoirwloFn9jXpR3wI8dBKHNqA7eu3v51jog6jP1gZ9RrFapFeTcVWFPilKisEJcm0yuk+iRPuYC7ZcT8+i+LKzPZzub598OJZ7Kohh3+kgtPiDH+Jggr0qePRDh0hNJamcyvOG3vnX6OeW17xmq6Fos45xCKwHnt9bMfX4aabRFTWvpUUK/YcDVX5LqK+9roPVccbst1TflOX5u7w== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM6PR0502MB3925.eurprd05.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(396003)(346002)(366004)(376002)(136003)(39850400004)(8676002)(16526019)(956004)(2616005)(103116003)(55016002)(186003)(6666004)(4326008)(66476007)(66556008)(2906002)(86362001)(36756003)(316002)(5660300002)(26005)(66946007)(8936002)(52116002)(1076003)(478600001)(7696005); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData: xAHDMeH2woY2x/jRdnwWCvXdfWrwVf1JuMsZ0YidUlurMK72Nnc0C9WYugmU8KnKq67MJMyW8ZHBkQLjaqQnfd/FVUrcEd+D3v5aFN3PiKgA6sTb2fzRZ6iLHccYzBLOxBv4JL5cSrl1fgN3tGx5TqhVPHUYpw5TJ1ZAUC7ILW5onmHj08SKDnd913P41xyMulTIv+DHHMVPfUGrW4vGgKC0rW2KfwwfnGquj2ER4ArIuYqwHUjEuz8f8/WJHb6nfLz7LXdA03twjhySjz4g1RO+NEg1+UgTxseRRHQQIXQe7LiiTwJJyQSyoqMb8UgiGG3LvUTgXet3RrYGiGjIRyk9f6zzYZHY4VwJbypOsbUsOxkZ4bCjQ+L16BYJpV2ie3+Sp0DBaCSeyyR//fritKp2sdk3ktVkzuRuTnJLxjMG4XD08MPj6x08YCBoJLrpSUCbC3mpMX7B8EGXB4SNa6M2K6WlP75X96r6aEf03HA= X-OriginatorOrg: dektech.com.au X-MS-Exchange-CrossTenant-Network-Message-Id: 105ebfd9-5aeb-42d4-a5a6-08d801589b0d X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 May 2020 09:38:54.6505 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 1957ea50-0dd8-4360-8db0-c9530df996b2 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 2AXLvE/b4RYgC0qtX5JKm548LBuYug2skZksALhFQcLVTD5wWGqnXTlU05BELFwMShpnQSgBX7S1nbNgt/D7cI7y6ff3okERB4ia3uf4uKM= X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR0502MB3718 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In some environment, broadcast traffic is suppressed at high rate (i.e. a kind of bandwidth limit setting). When it is applied, TIPC broadcast can still run successfully. However, when it comes to a high load, some packets will be dropped first and TIPC tries to retransmit them but the packet retransmission is intentionally broadcast too, so making things worse and not helpful at all. This commit enables the broadcast retransmission via unicast which only retransmits packets to the specific peer that has really reported a gap i.e. not broadcasting to all nodes in the cluster, so will prevent from being suppressed, and also reduce some overheads on the other peers due to duplicates, finally improve the overall TIPC broadcast performance. Note: the functionality can be turned on/off via the sysctl file: echo 1 > /proc/sys/net/tipc/bc_retruni echo 0 > /proc/sys/net/tipc/bc_retruni Default is '0', i.e. the broadcast retransmission still works as usual. Acked-by: Ying Xue Acked-by: Jon Maloy Signed-off-by: Tuong Lien --- net/tipc/bcast.c | 11 ++++++++--- net/tipc/bcast.h | 4 +++- net/tipc/link.c | 10 ++++++---- net/tipc/link.h | 3 ++- net/tipc/node.c | 2 +- net/tipc/sysctl.c | 9 ++++++++- 6 files changed, 28 insertions(+), 11 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 3ce690a96ee9..50a16f8bebd9 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -46,6 +46,7 @@ #define BCLINK_WIN_MIN 32 /* bcast minimum link window size */ const char tipc_bclink_name[] = "broadcast-link"; +unsigned long sysctl_tipc_bc_retruni __read_mostly; /** * struct tipc_bc_base - base structure for keeping broadcast send state @@ -474,7 +475,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, __skb_queue_head_init(&xmitq); tipc_bcast_lock(net); - tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq, NULL); tipc_bcast_unlock(net); tipc_bcbase_xmit(net, &xmitq); @@ -489,7 +490,8 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, * RCU is locked, no other locks set */ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, - struct tipc_msg *hdr) + struct tipc_msg *hdr, + struct sk_buff_head *retrq) { struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; struct tipc_gap_ack_blks *ga; @@ -503,8 +505,11 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, tipc_link_bc_init_rcv(l, hdr); } else if (!msg_bc_ack_invalid(hdr)) { tipc_get_gap_ack_blks(&ga, l, hdr, false); + if (!sysctl_tipc_bc_retruni) + retrq = &xmitq; rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), - msg_bc_gap(hdr), ga, &xmitq); + msg_bc_gap(hdr), ga, &xmitq, + retrq); rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); } tipc_bcast_unlock(net); diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h index 9e847d9617d3..97d3cf9d3e4d 100644 --- a/net/tipc/bcast.h +++ b/net/tipc/bcast.h @@ -45,6 +45,7 @@ struct tipc_nl_msg; struct tipc_nlist; struct tipc_nitem; extern const char tipc_bclink_name[]; +extern unsigned long sysctl_tipc_bc_retruni; #define TIPC_METHOD_EXPIRE msecs_to_jiffies(5000) @@ -93,7 +94,8 @@ int tipc_bcast_rcv(struct net *net, struct tipc_link *l, struct sk_buff *skb); void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, struct tipc_msg *hdr); int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, - struct tipc_msg *hdr); + struct tipc_msg *hdr, + struct sk_buff_head *retrq); int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg); int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[]); int tipc_bclink_reset_stats(struct net *net); diff --git a/net/tipc/link.c b/net/tipc/link.c index 288c5670cfa5..af352391e2ab 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -375,7 +375,7 @@ void tipc_link_remove_bc_peer(struct tipc_link *snd_l, snd_l->ackers--; rcv_l->bc_peer_is_up = true; rcv_l->state = LINK_ESTABLISHED; - tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq, NULL); trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); tipc_link_reset(rcv_l); rcv_l->state = LINK_RESET; @@ -2400,7 +2400,8 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq) + struct sk_buff_head *xmitq, + struct sk_buff_head *retrq) { struct tipc_link *l = r->bc_sndlink; bool unused = false; @@ -2413,7 +2414,7 @@ int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, return 0; trace_tipc_link_bc_ack(r, acked, gap, &l->transmq); - tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, &unused, &rc); + tipc_link_advance_transmq(l, r, acked, gap, ga, retrq, &unused, &rc); tipc_link_advance_backlog(l, xmitq); if (unlikely(!skb_queue_empty(&l->wakeupq))) @@ -2447,7 +2448,8 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, struct sk_buff *skb, return 0; if (dnode == tipc_own_addr(l->net)) { - rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq); + rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq, + xmitq); l->stats.recv_nacks++; return rc; } diff --git a/net/tipc/link.h b/net/tipc/link.h index 0a0fa7350722..4d0768cf91d5 100644 --- a/net/tipc/link.h +++ b/net/tipc/link.h @@ -147,7 +147,8 @@ u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, struct tipc_msg *hdr, bool uc); int tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, u16 gap, struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq); + struct sk_buff_head *xmitq, + struct sk_buff_head *retrq); void tipc_link_build_bc_sync_msg(struct tipc_link *l, struct sk_buff_head *xmitq); void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr); diff --git a/net/tipc/node.c b/net/tipc/node.c index 6a49b3eeaae9..548207fdec15 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -1772,7 +1772,7 @@ static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr, struct tipc_link *ucl; int rc; - rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr); + rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr, xmitq); if (rc & TIPC_LINK_DOWN_EVT) { tipc_node_reset_links(n); diff --git a/net/tipc/sysctl.c b/net/tipc/sysctl.c index 58ab3d6dcdce..97a6264a2993 100644 --- a/net/tipc/sysctl.c +++ b/net/tipc/sysctl.c @@ -36,7 +36,7 @@ #include "core.h" #include "trace.h" #include "crypto.h" - +#include "bcast.h" #include static struct ctl_table_header *tipc_ctl_hdr; @@ -75,6 +75,13 @@ static struct ctl_table tipc_table[] = { .extra1 = SYSCTL_ONE, }, #endif + { + .procname = "bc_retruni", + .data = &sysctl_tipc_bc_retruni, + .maxlen = sizeof(sysctl_tipc_bc_retruni), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + }, {} }; From patchwork Tue May 26 09:38:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tuong Lien X-Patchwork-Id: 218552 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FC4EC433E0 for ; Tue, 26 May 2020 09:39:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6DAEB205CB for ; Tue, 26 May 2020 09:39:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=dektech.com.au header.i=@dektech.com.au header.b="BRe/XIa0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388725AbgEZJjS (ORCPT ); Tue, 26 May 2020 05:39:18 -0400 Received: from mail-db8eur05on2125.outbound.protection.outlook.com ([40.107.20.125]:25953 "EHLO EUR05-DB8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388704AbgEZJjR (ORCPT ); Tue, 26 May 2020 05:39:17 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=d0WEvYaiwapNil/TI0zr7KR4tCMMVU9YYiXscqXTWPe4rcSVqabvU82/Zf0fNkxKfK1iFTAqRL1etauGe2tc1bje+cljfuw6+hnc/7fPQADG8LRCV/0Fuj0QZEouWn09d/zZ/z/cL6H3b4hEarT8Rdln8cmjAenyov/1119ENa5X9r3s5T6LieJ9OEaqHe83NXeWFqRJlekqp9CfFTJZE6QYWkL+PziaKA6pfbzxfhzHr7LJ61fMVsydAJe/1IEfaNJbq7n9dRweBbOEgMG+9cYqGMoY8r0tBEtVNB7IQ1eAdisTWm8KIfNiC3Jm3QzYMmMkC4LCwO0Q9Yhda0Y/rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CTFw79NPAHW5+YbOTDoUVkSxqhWOJ5it5fp/dkiDdWw=; b=PqaQ/G51hpH+dQC7dHxKQAsXrQ7Vzz2FDCxvHLVX+vgTAtezF98ogR8AudfKvIku3qlBkIc+vsu1o0J2FjKVFcMBDh3eiTOUVfMoqukbTeftNpKpnpJCslUzTkY8Fw4kdOlga++G3+IdhKJEwXfMOKjLFy5T8fAT55+K9U1IqQIJii/Aw4R5jdTTEYJkntXyOfk+woY9mt0dbfR/LU+izoHdxG9wuCn6WiN2ZZkoWKXXbxx+4iLYMzzYM14ZhEjc+61sKV0YKtD/W5yCn49tRS5ZEtX9Ldp9rmV7UgpKzXkq3LvsW76n69oe9xVSzdsmz/3NrCALROuyabo7L1a7nw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=dektech.com.au; dmarc=pass action=none header.from=dektech.com.au; dkim=pass header.d=dektech.com.au; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dektech.com.au; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CTFw79NPAHW5+YbOTDoUVkSxqhWOJ5it5fp/dkiDdWw=; b=BRe/XIa0VGew04cchN6Cz9d0IowEoIS1TmwGzTkl2godmsbPViTog2We4RYvUxyiz21c+b8XvkOCGFYe9oaxrLGh7Z2+IUg3pb8WWTvN4DJW0t7aSg4oTDIYdnyyEeWt9C5gn4TSnkIDODWlsyfv1HFisEGT7dgdTdnZwc0oQ/g= Authentication-Results: davemloft.net; dkim=none (message not signed) header.d=none;davemloft.net; dmarc=none action=none header.from=dektech.com.au; Received: from AM6PR0502MB3925.eurprd05.prod.outlook.com (2603:10a6:209:5::28) by AM6PR0502MB3718.eurprd05.prod.outlook.com (2603:10a6:209:11::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3021.27; Tue, 26 May 2020 09:38:59 +0000 Received: from AM6PR0502MB3925.eurprd05.prod.outlook.com ([fe80::4d5f:2ab:5a66:deaf]) by AM6PR0502MB3925.eurprd05.prod.outlook.com ([fe80::4d5f:2ab:5a66:deaf%7]) with mapi id 15.20.3021.029; Tue, 26 May 2020 09:38:59 +0000 From: Tuong Lien To: davem@davemloft.net, jmaloy@redhat.com, maloy@donjonn.com, ying.xue@windriver.com, netdev@vger.kernel.org Cc: tipc-discussion@lists.sourceforge.net Subject: [net-next 5/5] tipc: add test for Nagle algorithm effectiveness Date: Tue, 26 May 2020 16:38:38 +0700 Message-Id: <20200526093838.17421-6-tuong.t.lien@dektech.com.au> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20200526093838.17421-1-tuong.t.lien@dektech.com.au> References: <20200526093838.17421-1-tuong.t.lien@dektech.com.au> X-ClientProxiedBy: HK2PR03CA0060.apcprd03.prod.outlook.com (2603:1096:202:17::30) To AM6PR0502MB3925.eurprd05.prod.outlook.com (2603:10a6:209:5::28) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from dektech.com.au (14.161.14.188) by HK2PR03CA0060.apcprd03.prod.outlook.com (2603:1096:202:17::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.20.3045.8 via Frontend Transport; Tue, 26 May 2020 09:38:57 +0000 X-Mailer: git-send-email 2.13.7 X-Originating-IP: [14.161.14.188] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 7bf37cfc-085b-4da2-fd4d-08d801589e10 X-MS-TrafficTypeDiagnostic: AM6PR0502MB3718: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-Forefront-PRVS: 041517DFAB X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: oO8+xj/kRnZ5vtei0jkNegMm30Jh5ul/R8ChRiY4PmwZ7tP7+TqlSs3aUJwKVMpnTBqWJYMTiZsY1zt8vpaiHoaIJh6XLd8lEUkzScIL8mE+8725G/J7cZhUsU+2a/qWPHplxD1etFSzlPe3OizuH7LgdjmtV1uSGaT/PNVVsSyi/xr1nsFPI+GoPNM/4uOdCq8WaQ/lcjGSlDRWQQOVWtwvvni5tr3h3O9hIyJ3QBUR6E1n6HnEc3nrVMPITggGk4qEkR0USW4TnUj7Z+SYebcspbGplEyToRKmg6dPKZDXXbz5xvNbLgLNlnK3+SQV X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM6PR0502MB3925.eurprd05.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(396003)(346002)(366004)(376002)(136003)(39850400004)(8676002)(16526019)(956004)(2616005)(103116003)(55016002)(186003)(6666004)(4326008)(66476007)(66556008)(2906002)(86362001)(36756003)(316002)(5660300002)(26005)(66946007)(8936002)(52116002)(1076003)(478600001)(7696005); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData: 9ujbErDlYmfbC88vIUGwkQGD3hGau5HfflUjmqzqlL4s+FLkTHnExYwrYxDQXd8LFAwHwrUDK4UwKSMbCtV9cAMPDYFfW8I94rxa1/TYLnAnHG0ks1sS3U3m9ypeHZ+roWGAfLSjdJ3h+0TR47wAIPiAj9lUWvGbKDnxqXe2nWfGSy5hBL9+dxGdtB9Eqs+IxBUq2qqzP387jGOgEJ4dQygmIf9xRVK8ZB/lCSJpUoMs01DQpBEbpz8Ddx8llkSm7+kJwNtyT/DSD+QYIfyex5fqCe01+Y77bbe1nWJQPdSD8oFNvDTIHe9J5hS3LO0blAfN9BcWZvIsRV09mj6m/a1EDOaVt1BgSz4mwtso2dBj79B3rR6DXmAvtcISf+XHc0GB0IMdI2TXAsZ7aw6R4b4g3Ss0XFi0RQVu9bEYcEunzK/tPIywV6gw/mIqWXcVPhIMXyQiogc19xvIh8/Q/Wqbhw3YUjCM5tGgWI4NETs= X-OriginatorOrg: dektech.com.au X-MS-Exchange-CrossTenant-Network-Message-Id: 7bf37cfc-085b-4da2-fd4d-08d801589e10 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 May 2020 09:38:59.7076 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 1957ea50-0dd8-4360-8db0-c9530df996b2 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: kzq+FQX4bK58RoW2jGm9DDnhQHv5F+aBnINEkRYJL1nMLs+kUHJSYkhrav1HwzY0H/VXDDJ1NdvAbwUH/IaJqIzuJPd2OlmqS3WoKVhQ4/E= X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR0502MB3718 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When streaming in Nagle mode, we try to bundle small messages from user as many as possible if there is one outstanding buffer, i.e. not ACK-ed by the receiving side, which helps boost up the overall throughput. So, the algorithm's effectiveness really depends on when Nagle ACK comes or what the specific network latency (RTT) is, compared to the user's message sending rate. In a bad case, the user's sending rate is low or the network latency is small, there will not be many bundles, so making a Nagle ACK or waiting for it is not meaningful. For example: a user sends its messages every 100ms and the RTT is 50ms, then for each messages, we require one Nagle ACK but then there is only one user message sent without any bundles. In a better case, even if we have a few bundles (e.g. the RTT = 300ms), but now the user sends messages in medium size, then there will not be any difference at all, that says 3 x 1000-byte data messages if bundled will still result in 3 bundles with MTU = 1500. When Nagle is ineffective, the delay in user message sending is clearly wasted instead of sending directly. Besides, adding Nagle ACKs will consume some processor load on both the sending and receiving sides. This commit adds a test on the effectiveness of the Nagle algorithm for an individual connection in the network on which it actually runs. Particularly, upon receipt of a Nagle ACK we will compare the number of bundles in the backlog queue to the number of user messages which would be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle mode will be kept for further message sending. Otherwise, we will leave Nagle and put a 'penalty' on the connection, so it will have to spend more 'one-way' messages before being able to re-enter Nagle. In addition, the 'ack-required' bit is only set when really needed that the number of Nagle ACKs will be reduced during Nagle mode. Testing with benchmark showed that with the patch, there was not much difference in throughput for small messages since the tool continuously sends messages without a break, so Nagle would still take in effect. Acked-by: Ying Xue Acked-by: Jon Maloy Signed-off-by: Tuong Lien --- net/tipc/msg.c | 3 --- net/tipc/msg.h | 14 ++++++++++-- net/tipc/socket.c | 64 ++++++++++++++++++++++++++++++++++++++++++++----------- 3 files changed, 64 insertions(+), 17 deletions(-) diff --git a/net/tipc/msg.c b/net/tipc/msg.c index c69fb99163fc..23809039dda1 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -235,9 +235,6 @@ int tipc_msg_append(struct tipc_msg *_hdr, struct msghdr *m, int dlen, msg_set_size(hdr, MIN_H_SIZE); __skb_queue_tail(txq, skb); total += 1; - if (prev) - msg_set_ack_required(buf_msg(prev), 0); - msg_set_ack_required(hdr, 1); } hdr = buf_msg(skb); curr = msg_blocks(hdr); diff --git a/net/tipc/msg.h b/net/tipc/msg.h index cd4281779468..58660d56bc83 100644 --- a/net/tipc/msg.h +++ b/net/tipc/msg.h @@ -340,9 +340,19 @@ static inline int msg_ack_required(struct tipc_msg *m) return msg_bits(m, 0, 18, 1); } -static inline void msg_set_ack_required(struct tipc_msg *m, u32 d) +static inline void msg_set_ack_required(struct tipc_msg *m) { - msg_set_bits(m, 0, 18, 1, d); + msg_set_bits(m, 0, 18, 1, 1); +} + +static inline int msg_nagle_ack(struct tipc_msg *m) +{ + return msg_bits(m, 0, 18, 1); +} + +static inline void msg_set_nagle_ack(struct tipc_msg *m) +{ + msg_set_bits(m, 0, 18, 1, 1); } static inline bool msg_is_rcast(struct tipc_msg *m) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index e370ad0edd76..d6b67d07d22e 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -48,6 +48,8 @@ #include "group.h" #include "trace.h" +#define NAGLE_START_INIT 4 +#define NAGLE_START_MAX 1024 #define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */ #define CONN_PROBING_INTV msecs_to_jiffies(3600000) /* [ms] => 1 h */ #define TIPC_FWD_MSG 1 @@ -119,7 +121,10 @@ struct tipc_sock { struct rcu_head rcu; struct tipc_group *group; u32 oneway; + u32 nagle_start; u16 snd_backlog; + u16 msg_acc; + u16 pkt_cnt; bool expect_ack; bool nodelay; bool group_is_open; @@ -143,7 +148,7 @@ static int tipc_sk_insert(struct tipc_sock *tsk); static void tipc_sk_remove(struct tipc_sock *tsk); static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dsz); static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dsz); -static void tipc_sk_push_backlog(struct tipc_sock *tsk); +static void tipc_sk_push_backlog(struct tipc_sock *tsk, bool nagle_ack); static const struct proto_ops packet_ops; static const struct proto_ops stream_ops; @@ -474,6 +479,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock, tsk = tipc_sk(sk); tsk->max_pkt = MAX_PKT_DEFAULT; tsk->maxnagle = 0; + tsk->nagle_start = NAGLE_START_INIT; INIT_LIST_HEAD(&tsk->publications); INIT_LIST_HEAD(&tsk->cong_links); msg = &tsk->phdr; @@ -541,7 +547,7 @@ static void __tipc_shutdown(struct socket *sock, int error) !tsk_conn_cong(tsk))); /* Push out delayed messages if in Nagle mode */ - tipc_sk_push_backlog(tsk); + tipc_sk_push_backlog(tsk, false); /* Remove pending SYN */ __skb_queue_purge(&sk->sk_write_queue); @@ -1252,14 +1258,37 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, /* tipc_sk_push_backlog(): send accumulated buffers in socket write queue * when socket is in Nagle mode */ -static void tipc_sk_push_backlog(struct tipc_sock *tsk) +static void tipc_sk_push_backlog(struct tipc_sock *tsk, bool nagle_ack) { struct sk_buff_head *txq = &tsk->sk.sk_write_queue; + struct sk_buff *skb = skb_peek_tail(txq); struct net *net = sock_net(&tsk->sk); u32 dnode = tsk_peer_node(tsk); - struct sk_buff *skb = skb_peek(txq); int rc; + if (nagle_ack) { + tsk->pkt_cnt += skb_queue_len(txq); + if (!tsk->pkt_cnt || tsk->msg_acc / tsk->pkt_cnt < 2) { + tsk->oneway = 0; + if (tsk->nagle_start < NAGLE_START_MAX) + tsk->nagle_start *= 2; + tsk->expect_ack = false; + pr_debug("tsk %10u: bad nagle %u -> %u, next start %u!\n", + tsk->portid, tsk->msg_acc, tsk->pkt_cnt, + tsk->nagle_start); + } else { + tsk->nagle_start = NAGLE_START_INIT; + if (skb) { + msg_set_ack_required(buf_msg(skb)); + tsk->expect_ack = true; + } else { + tsk->expect_ack = false; + } + } + tsk->msg_acc = 0; + tsk->pkt_cnt = 0; + } + if (!skb || tsk->cong_link_cnt) return; @@ -1267,9 +1296,10 @@ static void tipc_sk_push_backlog(struct tipc_sock *tsk) if (msg_is_syn(buf_msg(skb))) return; + if (tsk->msg_acc) + tsk->pkt_cnt += skb_queue_len(txq); tsk->snt_unacked += tsk->snd_backlog; tsk->snd_backlog = 0; - tsk->expect_ack = true; rc = tipc_node_xmit(net, txq, dnode, tsk->portid); if (rc == -ELINKCONG) tsk->cong_link_cnt = 1; @@ -1322,8 +1352,7 @@ static void tipc_sk_conn_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb, return; } else if (mtyp == CONN_ACK) { was_cong = tsk_conn_cong(tsk); - tsk->expect_ack = false; - tipc_sk_push_backlog(tsk); + tipc_sk_push_backlog(tsk, msg_nagle_ack(hdr)); tsk->snt_unacked -= msg_conn_ack(hdr); if (tsk->peer_caps & TIPC_BLOCK_FLOWCTL) tsk->snd_win = msg_adv_win(hdr); @@ -1516,6 +1545,7 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen) struct tipc_sock *tsk = tipc_sk(sk); struct tipc_msg *hdr = &tsk->phdr; struct net *net = sock_net(sk); + struct sk_buff *skb; u32 dnode = tsk_peer_node(tsk); int maxnagle = tsk->maxnagle; int maxpkt = tsk->max_pkt; @@ -1544,17 +1574,25 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen) break; send = min_t(size_t, dlen - sent, TIPC_MAX_USER_MSG_SIZE); blocks = tsk->snd_backlog; - if (tsk->oneway++ >= 4 && send <= maxnagle) { + if (tsk->oneway++ >= tsk->nagle_start && send <= maxnagle) { rc = tipc_msg_append(hdr, m, send, maxnagle, txq); if (unlikely(rc < 0)) break; blocks += rc; + tsk->msg_acc++; if (blocks <= 64 && tsk->expect_ack) { tsk->snd_backlog = blocks; sent += send; break; + } else if (blocks > 64) { + tsk->pkt_cnt += skb_queue_len(txq); + } else { + skb = skb_peek_tail(txq); + msg_set_ack_required(buf_msg(skb)); + tsk->expect_ack = true; + tsk->msg_acc = 0; + tsk->pkt_cnt = 0; } - tsk->expect_ack = true; } else { rc = tipc_msg_build(hdr, m, sent, send, maxpkt, txq); if (unlikely(rc != send)) @@ -2091,7 +2129,7 @@ static void tipc_sk_proto_rcv(struct sock *sk, smp_wmb(); tsk->cong_link_cnt--; wakeup = true; - tipc_sk_push_backlog(tsk); + tipc_sk_push_backlog(tsk, false); break; case GROUP_PROTOCOL: tipc_group_proto_rcv(grp, &wakeup, hdr, inputq, xmitq); @@ -2180,7 +2218,7 @@ static bool tipc_sk_filter_connect(struct tipc_sock *tsk, struct sk_buff *skb, return false; case TIPC_ESTABLISHED: if (!skb_queue_empty(&sk->sk_write_queue)) - tipc_sk_push_backlog(tsk); + tipc_sk_push_backlog(tsk, false); /* Accept only connection-based messages sent by peer */ if (likely(con_msg && !err && pport == oport && pnode == onode)) { @@ -2188,8 +2226,10 @@ static bool tipc_sk_filter_connect(struct tipc_sock *tsk, struct sk_buff *skb, struct sk_buff *skb; skb = tipc_sk_build_ack(tsk); - if (skb) + if (skb) { + msg_set_nagle_ack(buf_msg(skb)); __skb_queue_tail(xmitq, skb); + } } return true; }