From patchwork Wed Mar 18 09:37:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222307 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2507C10DCE for ; Wed, 18 Mar 2020 09:46:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AAA522076E for ; Wed, 18 Mar 2020 09:46:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727654AbgCRJqx (ORCPT ); Wed, 18 Mar 2020 05:46:53 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726550AbgCRJqw (ORCPT ); Wed, 18 Mar 2020 05:46:52 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEJq006367; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 2652B36032A; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 02/28] tcp: fast path functions later Date: Wed, 18 Mar 2020 11:37:43 +0200 Message-Id: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen No functional changes Signed-off-by: Ilpo Järvinen --- include/net/tcp.h | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 07f947cc80e6..b97af0ff118f 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -673,29 +673,6 @@ static inline u32 __tcp_set_rto(const struct tcp_sock *tp) return usecs_to_jiffies((tp->srtt_us >> 3) + tp->rttvar_us); } -static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd) -{ - tp->pred_flags = htonl((tp->tcp_header_len << 26) | - ntohl(TCP_FLAG_ACK) | - snd_wnd); -} - -static inline void tcp_fast_path_on(struct tcp_sock *tp) -{ - __tcp_fast_path_on(tp, tp->snd_wnd >> tp->rx_opt.snd_wscale); -} - -static inline void tcp_fast_path_check(struct sock *sk) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (RB_EMPTY_ROOT(&tp->out_of_order_queue) && - tp->rcv_wnd && - atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf && - !tp->urg_data) - tcp_fast_path_on(tp); -} - /* Compute the actual rto_min value */ static inline u32 tcp_rto_min(struct sock *sk) { @@ -1510,6 +1487,29 @@ static inline bool tcp_paws_reject(const struct tcp_options_received *rx_opt, return true; } +static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd) +{ + tp->pred_flags = htonl((tp->tcp_header_len << 26) | + ntohl(TCP_FLAG_ACK) | + snd_wnd); +} + +static inline void tcp_fast_path_on(struct tcp_sock *tp) +{ + __tcp_fast_path_on(tp, tp->snd_wnd >> tp->rx_opt.snd_wscale); +} + +static inline void tcp_fast_path_check(struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (RB_EMPTY_ROOT(&tp->out_of_order_queue) && + tp->rcv_wnd && + atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf && + !tp->urg_data) + tcp_fast_path_on(tp); +} + bool tcp_oow_rate_limited(struct net *net, const struct sk_buff *skb, int mib_idx, u32 *last_oow_ack_time); From patchwork Wed Mar 18 09:37:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222302 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97A6EC5ACD7 for ; Wed, 18 Mar 2020 09:47:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7471220674 for ; Wed, 18 Mar 2020 09:47:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727706AbgCRJrS (ORCPT ); Wed, 18 Mar 2020 05:47:18 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727673AbgCRJrR (ORCPT ); Wed, 18 Mar 2020 05:47:17 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEbS006371; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 2E937360F45; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 04/28] tcp: create FLAG_TS_PROGRESS Date: Wed, 18 Mar 2020 11:37:45 +0200 Message-Id: <1584524289-24187-4-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen Whenever timestamp advances, it declares progress which can be used the other parts of the stack to decide that the ACK is the most recent one seen so far. AccECN will use this flag when deciding whether to update CEP from the ACE field or not. Signed-off-by: Ilpo Järvinen --- net/ipv4/tcp_input.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 860938e0f1b6..7c444541cefd 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -100,6 +100,7 @@ int sysctl_tcp_max_orphans __read_mostly = NR_FILE; #define FLAG_UPDATE_TS_RECENT 0x4000 /* tcp_replace_ts_recent() */ #define FLAG_NO_CHALLENGE_ACK 0x8000 /* do not call tcp_send_challenge_ack() */ #define FLAG_ACK_MAYBE_DELAYED 0x10000 /* Likely a delayed ACK */ +#define FLAG_TS_PROGRESS 0x20000 /* Positive timestamp delta */ #define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED) #define FLAG_NOT_DUP (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED) @@ -3492,8 +3493,16 @@ static void tcp_store_ts_recent(struct tcp_sock *tp) tp->rx_opt.ts_recent_stamp = ktime_get_seconds(); } -static void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq) +static int __tcp_replace_ts_recent(struct tcp_sock *tp, s32 tstamp_delta) { + tcp_store_ts_recent(tp); + return tstamp_delta > 0 ? FLAG_TS_PROGRESS : 0; +} + +static int tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq) +{ + s32 delta; + if (tp->rx_opt.saw_tstamp && !after(seq, tp->rcv_wup)) { /* PAWS bug workaround wrt. ACK frames, the PAWS discard * extra check below makes sure this can only happen @@ -3502,9 +3511,13 @@ static void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq) * Not only, also it occurs for expired timestamps. */ - if (tcp_paws_check(&tp->rx_opt, 0)) - tcp_store_ts_recent(tp); + if (tcp_paws_check(&tp->rx_opt, 0)) { + delta = tp->rx_opt.rcv_tsval - tp->rx_opt.ts_recent; + return __tcp_replace_ts_recent(tp, delta); + } } + + return 0; } /* This routine deals with acks during a TLP episode. @@ -3656,7 +3669,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) * is in window. */ if (flag & FLAG_UPDATE_TS_RECENT) - tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq); + flag |= tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq); if ((flag & (FLAG_SLOWPATH | FLAG_SND_UNA_ADVANCED)) == FLAG_SND_UNA_ADVANCED) { @@ -5608,6 +5621,8 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) TCP_SKB_CB(skb)->seq == tp->rcv_nxt && !after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) { int tcp_header_len = tp->tcp_header_len; + int flag = 0; + s32 tstamp_delta = 0; /* Timestamp header prediction: tcp_header_len * is automatically equal to th->doff*4 due to pred_flags @@ -5620,8 +5635,9 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) if (!tcp_parse_aligned_timestamp(tp, th)) goto slow_path; + tstamp_delta = tp->rx_opt.rcv_tsval - tp->rx_opt.ts_recent; /* If PAWS failed, check it more carefully in slow path */ - if ((s32)(tp->rx_opt.rcv_tsval - tp->rx_opt.ts_recent) < 0) + if (tstamp_delta < 0) goto slow_path; /* DO NOT update ts_recent here, if checksum fails @@ -5641,12 +5657,12 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) if (tcp_header_len == (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) && tp->rcv_nxt == tp->rcv_wup) - tcp_store_ts_recent(tp); + flag |= __tcp_replace_ts_recent(tp, tstamp_delta); /* We know that such packets are checksummed * on entry. */ - tcp_ack(sk, skb, 0); + tcp_ack(sk, skb, flag); __kfree_skb(skb); tcp_data_snd_check(sk); /* When receiving pure ack in fast path, update @@ -5676,7 +5692,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) if (tcp_header_len == (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) && tp->rcv_nxt == tp->rcv_wup) - tcp_store_ts_recent(tp); + flag |= __tcp_replace_ts_recent(tp, tstamp_delta); tcp_rcv_rtt_measure_ts(sk, skb); @@ -5690,7 +5706,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) if (TCP_SKB_CB(skb)->ack_seq != tp->snd_una) { /* Well, only one small jumplet in fast path... */ - tcp_ack(sk, skb, FLAG_DATA); + tcp_ack(sk, skb, flag | FLAG_DATA); tcp_data_snd_check(sk); if (!inet_csk_ack_scheduled(sk)) goto no_ack; From patchwork Wed Mar 18 09:43:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85BD4C10DCE for ; Wed, 18 Mar 2020 09:44:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6226F20767 for ; Wed, 18 Mar 2020 09:44:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727750AbgCRJox (ORCPT ); Wed, 18 Mar 2020 05:44:53 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:51556 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727566AbgCRJn4 (ORCPT ); Wed, 18 Mar 2020 05:43:56 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9hoqf012843; Wed, 18 Mar 2020 11:43:50 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id C3E94360030; Wed, 18 Mar 2020 11:43:50 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 06/28] tcp: reorganize SYN ECN code Date: Wed, 18 Mar 2020 11:43:10 +0200 Message-Id: <1584524612-24470-7-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524612-24470-1-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524612-24470-1-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen No functional changes. Signed-off-by: Ilpo Järvinen --- net/ipv4/tcp_output.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index dc225e616f98..116be30c1b2c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -336,10 +336,11 @@ static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb) tp->ecn_flags = 0; if (use_ecn) { - TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR; - tp->ecn_flags = TCP_ECN_OK; if (tcp_ca_needs_ecn(sk) || bpf_needs_ecn) INET_ECN_xmit(sk); + + TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR; + tp->ecn_flags = TCP_ECN_OK; } } From patchwork Wed Mar 18 09:43:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222312 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9114C10DCE for ; Wed, 18 Mar 2020 09:44:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AB85720767 for ; Wed, 18 Mar 2020 09:44:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727609AbgCRJn4 (ORCPT ); Wed, 18 Mar 2020 05:43:56 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:51584 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727586AbgCRJnz (ORCPT ); Wed, 18 Mar 2020 05:43:55 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9hoeq012845; Wed, 18 Mar 2020 11:43:50 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id C6600360F45; Wed, 18 Mar 2020 11:43:50 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 07/28] tcp: rework {__, }tcp_ecn_check_ce() -> tcp_data_ecn_check() Date: Wed, 18 Mar 2020 11:43:11 +0200 Message-Id: <1584524612-24470-8-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524612-24470-1-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524612-24470-1-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen Rename tcp_ecn_check_ce to tcp_data_ecn_check as it is called only for data segments, not for ACKs (with AccECN, also ACKs may get ECN bits). The extra "layer" in tcp_ecn_check_ce() function just checks for ECN being enabled, that can be moved into tcp_ecn_field_check rather than having the __ variant. No functional changes. Signed-off-by: Ilpo Järvinen --- net/ipv4/tcp_input.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index e49a6f7ad5ce..2f0a9a2ee5c1 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -273,10 +273,13 @@ static void tcp_ecn_withdraw_cwr(struct tcp_sock *tp) tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR; } -static void __tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb) +static void tcp_data_ecn_check(struct sock *sk, const struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); + if (!(tcp_sk(sk)->ecn_flags & TCP_ECN_OK)) + return; + switch (TCP_SKB_CB(skb)->ip_dsfield & INET_ECN_MASK) { case INET_ECN_NOT_ECT: /* Funny extension: if ECT is not set on a segment, @@ -305,12 +308,6 @@ static void __tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb) } } -static void tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb) -{ - if (tcp_sk(sk)->ecn_flags & TCP_ECN_OK) - __tcp_ecn_check_ce(sk, skb); -} - static void tcp_ecn_rcv_synack(struct tcp_sock *tp, const struct tcphdr *th) { if ((tp->ecn_flags & TCP_ECN_OK) && (!th->ece || th->cwr)) @@ -717,7 +714,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb) } icsk->icsk_ack.lrcvtime = now; - tcp_ecn_check_ce(sk, skb); + tcp_data_ecn_check(sk, skb); if (skb->len >= 128) tcp_grow_window(sk, skb); @@ -4578,7 +4575,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) u32 seq, end_seq; bool fragstolen; - tcp_ecn_check_ce(sk, skb); + tcp_data_ecn_check(sk, skb); if (unlikely(tcp_try_rmem_schedule(sk, skb, skb->truesize))) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFODROP); From patchwork Wed Mar 18 09:37:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222301 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABB29C5ACD6 for ; Wed, 18 Mar 2020 09:47:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8BEF920767 for ; Wed, 18 Mar 2020 09:47:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727718AbgCRJrX (ORCPT ); Wed, 18 Mar 2020 05:47:23 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727673AbgCRJrV (ORCPT ); Wed, 18 Mar 2020 05:47:21 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cE9R006445; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 42764360F4D; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 08/28] tcp: helpers for ECN mode handling Date: Wed, 18 Mar 2020 11:37:49 +0200 Message-Id: <1584524289-24187-8-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen Create helpers for TCP ECN modes. No functional changes. Signed-off-by: Ilpo Järvinen --- include/net/tcp.h | 44 ++++++++++++++++++++++++++++++++++++---- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_dctcp.c | 2 +- net/ipv4/tcp_input.c | 14 ++++++------- net/ipv4/tcp_minisocks.c | 4 +++- net/ipv4/tcp_output.c | 6 +++--- 6 files changed, 55 insertions(+), 17 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 6b6a1b8b3c6e..f4ac4c029215 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -363,10 +363,46 @@ static inline void tcp_dec_quickack_mode(struct sock *sk, } } -#define TCP_ECN_OK 1 -#define TCP_ECN_QUEUE_CWR 2 -#define TCP_ECN_DEMAND_CWR 4 -#define TCP_ECN_SEEN 8 +#define TCP_ECN_MODE_RFC3168 0x1 +#define TCP_ECN_QUEUE_CWR 0x2 +#define TCP_ECN_DEMAND_CWR 0x4 +#define TCP_ECN_SEEN 0x8 +#define TCP_ECN_MODE_ACCECN 0x10 + +#define TCP_ECN_DISABLED 0 +#define TCP_ECN_MODE_PENDING (TCP_ECN_MODE_RFC3168|TCP_ECN_MODE_ACCECN) +#define TCP_ECN_MODE_ANY (TCP_ECN_MODE_RFC3168|TCP_ECN_MODE_ACCECN) + +static inline bool tcp_ecn_mode_any(const struct tcp_sock *tp) +{ + return tp->ecn_flags & TCP_ECN_MODE_ANY; +} + +static inline bool tcp_ecn_mode_rfc3168(const struct tcp_sock *tp) +{ + return (tp->ecn_flags & TCP_ECN_MODE_ANY) == TCP_ECN_MODE_RFC3168; +} + +static inline bool tcp_ecn_mode_accecn(const struct tcp_sock *tp) +{ + return (tp->ecn_flags & TCP_ECN_MODE_ANY) == TCP_ECN_MODE_ACCECN; +} + +static inline bool tcp_ecn_disabled(const struct tcp_sock *tp) +{ + return !tcp_ecn_mode_any(tp); +} + +static inline bool tcp_ecn_mode_pending(const struct tcp_sock *tp) +{ + return (tp->ecn_flags & TCP_ECN_MODE_PENDING) == TCP_ECN_MODE_PENDING; +} + +static inline void tcp_ecn_mode_set(struct tcp_sock *tp, u8 mode) +{ + tp->ecn_flags &= ~TCP_ECN_MODE_ANY; + tp->ecn_flags |= mode; +} enum tcp_tw_status { TCP_TW_SUCCESS = 0, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 48aa457a9516..fbf365dd51e4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3254,7 +3254,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info) info->tcpi_rcv_wscale = tp->rx_opt.rcv_wscale; } - if (tp->ecn_flags & TCP_ECN_OK) + if (tcp_ecn_mode_any(tp)) info->tcpi_options |= TCPI_OPT_ECN; if (tp->ecn_flags & TCP_ECN_SEEN) info->tcpi_options |= TCPI_OPT_ECN_SEEN; diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index 79f705450c16..8cf81e942675 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -76,7 +76,7 @@ static void dctcp_init(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); - if ((tp->ecn_flags & TCP_ECN_OK) || + if (tcp_ecn_mode_any(tp) || (sk->sk_state == TCP_LISTEN || sk->sk_state == TCP_CLOSE)) { struct dctcp *ca = inet_csk_ca(sk); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2f0a9a2ee5c1..59078fa2240d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -249,7 +249,7 @@ static bool tcp_in_quickack_mode(struct sock *sk) static void tcp_ecn_queue_cwr(struct tcp_sock *tp) { - if (tp->ecn_flags & TCP_ECN_OK) + if (tcp_ecn_mode_rfc3168(tp)) tp->ecn_flags |= TCP_ECN_QUEUE_CWR; } @@ -277,7 +277,7 @@ static void tcp_data_ecn_check(struct sock *sk, const struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); - if (!(tcp_sk(sk)->ecn_flags & TCP_ECN_OK)) + if (tcp_ecn_disabled(tp)) return; switch (TCP_SKB_CB(skb)->ip_dsfield & INET_ECN_MASK) { @@ -310,19 +310,19 @@ static void tcp_data_ecn_check(struct sock *sk, const struct sk_buff *skb) static void tcp_ecn_rcv_synack(struct tcp_sock *tp, const struct tcphdr *th) { - if ((tp->ecn_flags & TCP_ECN_OK) && (!th->ece || th->cwr)) - tp->ecn_flags &= ~TCP_ECN_OK; + if (tcp_ecn_mode_rfc3168(tp) && (!th->ece || th->cwr)) + tcp_ecn_mode_set(tp, TCP_ECN_DISABLED); } static void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th) { - if ((tp->ecn_flags & TCP_ECN_OK) && (!th->ece || !th->cwr)) - tp->ecn_flags &= ~TCP_ECN_OK; + if (tcp_ecn_mode_rfc3168(tp) && (!th->ece || !th->cwr)) + tcp_ecn_mode_set(tp, TCP_ECN_DISABLED); } static bool tcp_ecn_rcv_ecn_echo(const struct tcp_sock *tp, const struct tcphdr *th) { - if (th->ece && !th->syn && (tp->ecn_flags & TCP_ECN_OK)) + if (th->ece && !th->syn && tcp_ecn_mode_rfc3168(tp)) return true; return false; } diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index c8274371c3d0..3b5a137e416c 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -400,7 +400,9 @@ EXPORT_SYMBOL(tcp_openreq_init_rwin); static void tcp_ecn_openreq_child(struct tcp_sock *tp, const struct request_sock *req) { - tp->ecn_flags = inet_rsk(req)->ecn_ok ? TCP_ECN_OK : 0; + tcp_ecn_mode_set(tp, inet_rsk(req)->ecn_ok ? + TCP_ECN_MODE_RFC3168 : + TCP_ECN_DISABLED); } void tcp_ca_openreq_child(struct sock *sk, const struct dst_entry *dst) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 116be30c1b2c..71a96983987d 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -311,7 +311,7 @@ static void tcp_ecn_send_synack(struct sock *sk, struct sk_buff *skb) const struct tcp_sock *tp = tcp_sk(sk); TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_CWR; - if (!(tp->ecn_flags & TCP_ECN_OK)) + if (tcp_ecn_disabled(tp)) TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ECE; else if (tcp_ca_needs_ecn(sk) || tcp_bpf_ca_needs_ecn(sk)) @@ -340,7 +340,7 @@ static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb) INET_ECN_xmit(sk); TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR; - tp->ecn_flags = TCP_ECN_OK; + tcp_ecn_mode_set(tp, TCP_ECN_MODE_RFC3168); } } @@ -368,7 +368,7 @@ static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb, { struct tcp_sock *tp = tcp_sk(sk); - if (tp->ecn_flags & TCP_ECN_OK) { + if (tcp_ecn_mode_rfc3168(tp)) { /* Not-retransmitted data segment: set ECT and inject CWR. */ if (skb->len != tcp_header_len && !before(TCP_SKB_CB(skb)->seq, tp->snd_nxt)) { From patchwork Wed Mar 18 09:43:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E734C5ACD8 for ; Wed, 18 Mar 2020 09:44:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3F28F20767 for ; Wed, 18 Mar 2020 09:44:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727752AbgCRJom (ORCPT ); Wed, 18 Mar 2020 05:44:42 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:51552 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727577AbgCRJn5 (ORCPT ); Wed, 18 Mar 2020 05:43:57 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9honR012853; Wed, 18 Mar 2020 11:43:50 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id CDCDD36032A; Wed, 18 Mar 2020 11:43:50 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 09/28] gso: AccECN support Date: Wed, 18 Mar 2020 11:43:13 +0200 Message-Id: <1584524612-24470-10-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524612-24470-1-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524612-24470-1-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen Handling the CWR flag differs between RFC 3168 ECN and AccECN. Take it into account in GSO by not clearing the CWR bit. Signed-off-by: Ilpo Järvinen --- drivers/net/tun.c | 3 ++- include/linux/netdev_features.h | 3 +++ include/linux/skbuff.h | 2 ++ net/ethtool/common.c | 1 + net/ipv4/tcp_offload.c | 6 +++++- 5 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 228fe449dc6d..d376a7cb0d63 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2788,7 +2788,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX | - NETIF_F_HW_VLAN_STAG_TX; + NETIF_F_HW_VLAN_STAG_TX | + NETIF_F_GSO_ACCECN; dev->features = dev->hw_features | NETIF_F_LLTX; dev->vlan_features = dev->features & ~(NETIF_F_HW_VLAN_CTAG_TX | diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 34d050bb1ae6..c7065b468d21 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -83,6 +83,7 @@ enum { NETIF_F_HW_TLS_RECORD_BIT, /* Offload TLS record */ NETIF_F_GRO_FRAGLIST_BIT, /* Fraglist GRO */ + NETIF_F_GSO_ACCECN_BIT, /* ... TCP AccECN support */ /* * Add your fresh new feature above and remember to update * netdev_features_strings[] in net/core/ethtool.c and maybe @@ -124,6 +125,7 @@ enum { #define NETIF_F_SG __NETIF_F(SG) #define NETIF_F_TSO6 __NETIF_F(TSO6) #define NETIF_F_TSO_ECN __NETIF_F(TSO_ECN) +#define NETIF_F_GSO_ACCECN __NETIF_F(GSO_ACCECN) #define NETIF_F_TSO __NETIF_F(TSO) #define NETIF_F_VLAN_CHALLENGED __NETIF_F(VLAN_CHALLENGED) #define NETIF_F_RXFCS __NETIF_F(RXFCS) @@ -205,6 +207,7 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start) /* List of features with software fallbacks. */ #define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | \ + NETIF_F_GSO_ACCECN | \ NETIF_F_GSO_SCTP) /* diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 21749b2cdc9b..fdd73dc126a2 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -594,6 +594,8 @@ enum { SKB_GSO_UDP_L4 = 1 << 17, SKB_GSO_FRAGLIST = 1 << 18, + + SKB_GSO_TCP_ACCECN = 1 << 19, }; #if BITS_PER_LONG > 32 diff --git a/net/ethtool/common.c b/net/ethtool/common.c index 7b6969af5ae7..26241b5d62a4 100644 --- a/net/ethtool/common.c +++ b/net/ethtool/common.c @@ -27,6 +27,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = { [NETIF_F_TSO_BIT] = "tx-tcp-segmentation", [NETIF_F_GSO_ROBUST_BIT] = "tx-gso-robust", [NETIF_F_TSO_ECN_BIT] = "tx-tcp-ecn-segmentation", + [NETIF_F_GSO_ACCECN_BIT] = "tx-tcp-accecn-segmentation", [NETIF_F_TSO_MANGLEID_BIT] = "tx-tcp-mangleid-segmentation", [NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation", [NETIF_F_FSO_BIT] = "tx-fcoe-segmentation", diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index e09147ac9a99..7a81cf438010 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -65,6 +65,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, struct sk_buff *gso_skb = skb; __sum16 newcheck; bool ooo_okay, copy_destructor; + bool ecn_cwr_mask; th = tcp_hdr(skb); thlen = th->doff * 4; @@ -121,6 +122,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, newcheck = ~csum_fold((__force __wsum)((__force u32)th->check + (__force u32)delta)); + ecn_cwr_mask = !!(skb_shinfo(gso_skb)->gso_type & SKB_GSO_TCP_ACCECN); + while (skb->next) { th->fin = th->psh = 0; th->check = newcheck; @@ -140,7 +143,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, th = tcp_hdr(skb); th->seq = htonl(seq); - th->cwr = 0; + + th->cwr &= ecn_cwr_mask; } /* Following permits TCP Small Queues to work well with GSO : From patchwork Wed Mar 18 09:37:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D18DC10DCE for ; Wed, 18 Mar 2020 09:46:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D4DDA20674 for ; Wed, 18 Mar 2020 09:46:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727619AbgCRJqq (ORCPT ); Wed, 18 Mar 2020 05:46:46 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727521AbgCRJqo (ORCPT ); Wed, 18 Mar 2020 05:46:44 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEVi006448; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 4B379360F4F; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 10/28] gro: prevent ACE field corruption & better AccECN handling Date: Wed, 18 Mar 2020 11:37:51 +0200 Message-Id: <1584524289-24187-10-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen There are important differences in how the CWR field behaves in RFC3168 and AccECN. Thus, it is better to never let anything to receive a mixed-CWR skb. Set the Accurate ECN GSO flag to avoid corrupting CWR bits somewhere. Signed-off-by: Ilpo Järvinen --- net/ipv4/tcp_offload.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 7a81cf438010..58ce382c793e 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -242,7 +242,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb) flush = NAPI_GRO_CB(p)->flush; flush |= (__force int)(flags & TCP_FLAG_CWR); flush |= (__force int)((flags ^ tcp_flag_word(th2)) & - ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH)); + ~(TCP_FLAG_FIN | TCP_FLAG_PSH)); flush |= (__force int)(th->ack_seq ^ th2->ack_seq); for (i = sizeof(*th); i < thlen; i += 4) flush |= *(u32 *)((u8 *)th + i) ^ @@ -300,7 +300,7 @@ int tcp_gro_complete(struct sk_buff *skb) skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count; if (th->cwr) - skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; + skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ACCECN; return 0; } From patchwork Wed Mar 18 09:37:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222310 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EC74C10DCE for ; Wed, 18 Mar 2020 09:46:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 698D520674 for ; Wed, 18 Mar 2020 09:46:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727580AbgCRJql (ORCPT ); Wed, 18 Mar 2020 05:46:41 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726586AbgCRJql (ORCPT ); Wed, 18 Mar 2020 05:46:41 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cElt006449; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 4E014360F50; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 11/28] tcp: AccECN support to tcp_add_backlog Date: Wed, 18 Mar 2020 11:37:52 +0200 Message-Id: <1584524289-24187-11-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen AE flag needs to preserved for AccECN. Signed-off-by: Ilpo Järvinen --- net/ipv4/tcp_ipv4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f18a2fbf2761..dab0c1b85e95 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1757,7 +1757,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) !((TCP_SKB_CB(tail)->tcp_flags & TCP_SKB_CB(skb)->tcp_flags) & TCPHDR_ACK) || ((TCP_SKB_CB(tail)->tcp_flags ^ - TCP_SKB_CB(skb)->tcp_flags) & (TCPHDR_ECE | TCPHDR_CWR)) || + TCP_SKB_CB(skb)->tcp_flags) & (TCPHDR_ECE | TCPHDR_CWR | TCPHDR_AE)) || #ifdef CONFIG_TLS_DEVICE tail->decrypted != skb->decrypted || #endif From patchwork Wed Mar 18 09:37:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222300 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 303E6C5ACD7 for ; Wed, 18 Mar 2020 09:47:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 03FC620674 for ; Wed, 18 Mar 2020 09:47:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727738AbgCRJr2 (ORCPT ); Wed, 18 Mar 2020 05:47:28 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727722AbgCRJr1 (ORCPT ); Wed, 18 Mar 2020 05:47:27 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEhN006455; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 5B508360F53; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 13/28] tcp: Pass flags to tcp_send_ack Date: Wed, 18 Mar 2020 11:37:54 +0200 Message-Id: <1584524289-24187-13-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen Accurate ECN reflector needs to send custom flags to handle IP-ECN field reflector. Signed-off-by: Ilpo Järvinen --- include/net/tcp.h | 4 ++-- net/ipv4/bpf_tcp_ca.c | 2 +- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_dctcp.h | 2 +- net/ipv4/tcp_input.c | 14 +++++++------- net/ipv4/tcp_output.c | 10 +++++----- net/ipv4/tcp_timer.c | 4 ++-- 7 files changed, 19 insertions(+), 19 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index ee938066fd8c..ddeb11c01faa 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -640,8 +640,8 @@ void tcp_send_fin(struct sock *sk); void tcp_send_active_reset(struct sock *sk, gfp_t priority); int tcp_send_synack(struct sock *); void tcp_push_one(struct sock *, unsigned int mss_now); -void __tcp_send_ack(struct sock *sk, u32 rcv_nxt); -void tcp_send_ack(struct sock *sk); +void __tcp_send_ack(struct sock *sk, u32 rcv_nxt, u16 flags); +void tcp_send_ack(struct sock *sk, u16 flags); void tcp_send_delayed_ack(struct sock *sk); void tcp_send_loss_probe(struct sock *sk); bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto); diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 574972bc7299..55b78183fbd9 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -146,7 +146,7 @@ static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log, BPF_CALL_2(bpf_tcp_send_ack, struct tcp_sock *, tp, u32, rcv_nxt) { /* bpf_tcp_ca prog cannot have NULL tp */ - __tcp_send_ack((struct sock *)tp, rcv_nxt); + __tcp_send_ack((struct sock *)tp, rcv_nxt, 0); return 0; } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 2ee1e4794c7d..edc03a1bf704 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1572,7 +1572,7 @@ static void tcp_cleanup_rbuf(struct sock *sk, int copied) } } if (time_to_ack) - tcp_send_ack(sk); + tcp_send_ack(sk, 0); } static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off) diff --git a/net/ipv4/tcp_dctcp.h b/net/ipv4/tcp_dctcp.h index d69a77cbd0c7..4b0259111d81 100644 --- a/net/ipv4/tcp_dctcp.h +++ b/net/ipv4/tcp_dctcp.h @@ -28,7 +28,7 @@ static inline void dctcp_ece_ack_update(struct sock *sk, enum tcp_ca_event evt, */ if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { dctcp_ece_ack_cwr(sk, *ce_state); - __tcp_send_ack(sk, *prior_rcv_nxt); + __tcp_send_ack(sk, *prior_rcv_nxt, 0); } inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_NOW; } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 65bbfadbee67..dbe70a114b1d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3512,7 +3512,7 @@ static void tcp_send_challenge_ack(struct sock *sk, const struct sk_buff *skb) if (count > 0) { WRITE_ONCE(challenge_count, count - 1); NET_INC_STATS(net, LINUX_MIB_TCPCHALLENGEACK); - tcp_send_ack(sk); + tcp_send_ack(sk, 0); } } @@ -4255,12 +4255,12 @@ void tcp_fin(struct sock *sk) * happens, we must ack the received FIN and * enter the CLOSING state. */ - tcp_send_ack(sk); + tcp_send_ack(sk, 0); tcp_set_state(sk, TCP_CLOSING); break; case TCP_FIN_WAIT2: /* Received a FIN -- send ACK and enter TIME_WAIT. */ - tcp_send_ack(sk); + tcp_send_ack(sk, 0); tcp_time_wait(sk, TCP_TIME_WAIT, 0); break; default: @@ -4367,7 +4367,7 @@ static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb) } } - tcp_send_ack(sk); + tcp_send_ack(sk, 0); } /* These routines update the SACK block as out-of-order packets arrive or @@ -4427,7 +4427,7 @@ static void tcp_sack_new_ofo_skb(struct sock *sk, u32 seq, u32 end_seq) */ if (this_sack >= TCP_NUM_SACKS) { if (tp->compressed_ack > TCP_FASTRETRANS_THRESH) - tcp_send_ack(sk); + tcp_send_ack(sk, 0); this_sack--; tp->rx_opt.num_sacks--; sp--; @@ -5331,7 +5331,7 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible) /* Protocol state mandates a one-time immediate ACK */ inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOW) { send_now: - tcp_send_ack(sk); + tcp_send_ack(sk, 0); return; } @@ -6126,7 +6126,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, tcp_drop(sk, skb); return 0; } else { - tcp_send_ack(sk); + tcp_send_ack(sk, 0); } return -1; } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a1414d1a8ef1..c8d0a7baf2d4 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3741,7 +3741,7 @@ void tcp_send_delayed_ack(struct sock *sk) */ if (icsk->icsk_ack.blocked || time_before_eq(icsk->icsk_ack.timeout, jiffies + (ato >> 2))) { - tcp_send_ack(sk); + tcp_send_ack(sk, 0); return; } @@ -3754,7 +3754,7 @@ void tcp_send_delayed_ack(struct sock *sk) } /* This routine sends an ack and also updates the window. */ -void __tcp_send_ack(struct sock *sk, u32 rcv_nxt) +void __tcp_send_ack(struct sock *sk, u32 rcv_nxt, u16 flags) { struct sk_buff *buff; @@ -3778,7 +3778,7 @@ void __tcp_send_ack(struct sock *sk, u32 rcv_nxt) /* Reserve space for headers and prepare control bits. */ skb_reserve(buff, MAX_TCP_HEADER); - tcp_init_nondata_skb(buff, tcp_acceptable_seq(sk), TCPHDR_ACK); + tcp_init_nondata_skb(buff, tcp_acceptable_seq(sk), TCPHDR_ACK | flags); /* We do not want pure acks influencing TCP Small Queues or fq/pacing * too much. @@ -3791,9 +3791,9 @@ void __tcp_send_ack(struct sock *sk, u32 rcv_nxt) } EXPORT_SYMBOL_GPL(__tcp_send_ack); -void tcp_send_ack(struct sock *sk) +void tcp_send_ack(struct sock *sk, u16 flags) { - __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt); + __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt, flags); } /* This routine sends a packet with an out of date sequence diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index c3f26dcd6704..f37289216d37 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -302,7 +302,7 @@ void tcp_delack_timer_handler(struct sock *sk) icsk->icsk_ack.ato = TCP_ATO_MIN; } tcp_mstamp_refresh(tcp_sk(sk)); - tcp_send_ack(sk); + tcp_send_ack(sk, 0); __NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKS); } @@ -754,7 +754,7 @@ static enum hrtimer_restart tcp_compressed_ack_kick(struct hrtimer *timer) bh_lock_sock(sk); if (!sock_owned_by_user(sk)) { if (tp->compressed_ack > TCP_FASTRETRANS_THRESH) - tcp_send_ack(sk); + tcp_send_ack(sk, 0); } else { if (!test_and_set_bit(TCP_DELACK_TIMER_DEFERRED, &sk->sk_tsq_flags)) From patchwork Wed Mar 18 09:37:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222305 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9997DC5ACD7 for ; Wed, 18 Mar 2020 09:47:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 78BC620767 for ; Wed, 18 Mar 2020 09:47:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727677AbgCRJrD (ORCPT ); Wed, 18 Mar 2020 05:47:03 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727041AbgCRJrD (ORCPT ); Wed, 18 Mar 2020 05:47:03 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEVk006448; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 7705F360F56; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 16/28] tcp: allow embedding leftover into option padding Date: Wed, 18 Mar 2020 11:37:57 +0200 Message-Id: <1584524289-24187-16-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen There is some waste space in the option usage due to padding of 32-bit fields. AccECN option can take advantage of those few bytes as its tail is often consuming just a few odd bytes. Signed-off-by: Ilpo Järvinen --- net/ipv4/tcp_output.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index adc22d0d75fd..9ff6d14363df 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -504,6 +504,8 @@ static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts) #endif } +#define NOP_LEFTOVER ((TCPOPT_NOP << 8) | TCPOPT_NOP) + /* Write previously computed TCP options to the packet. * * Beware: Something in the Internet is very sensitive to the ordering of @@ -521,6 +523,8 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, struct tcp_out_options *opts) { u16 options = opts->options; /* mungable copy */ + u16 leftover_bytes = NOP_LEFTOVER; /* replace next NOPs if avail */ + int leftover_size = 2; if (unlikely(OPTION_MD5 & options)) { *ptr++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) | @@ -554,17 +558,22 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, } if (unlikely(OPTION_SACK_ADVERTISE & options)) { - *ptr++ = htonl((TCPOPT_NOP << 24) | - (TCPOPT_NOP << 16) | + *ptr++ = htonl((leftover_bytes << 16) | (TCPOPT_SACK_PERM << 8) | TCPOLEN_SACK_PERM); + leftover_bytes = NOP_LEFTOVER; } if (unlikely(OPTION_WSCALE & options)) { - *ptr++ = htonl((TCPOPT_NOP << 24) | + u8 highbyte = TCPOPT_NOP; + + if (unlikely(leftover_size == 1)) + highbyte = leftover_bytes >> 8; + *ptr++ = htonl((highbyte << 24) | (TCPOPT_WINDOW << 16) | (TCPOLEN_WINDOW << 8) | opts->ws); + leftover_bytes = NOP_LEFTOVER; } if (unlikely(opts->num_sack_blocks)) { @@ -572,8 +581,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, tp->duplicate_sack : tp->selective_acks; int this_sack; - *ptr++ = htonl((TCPOPT_NOP << 24) | - (TCPOPT_NOP << 16) | + *ptr++ = htonl((leftover_bytes << 16) | (TCPOPT_SACK << 8) | (TCPOLEN_SACK_BASE + (opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK))); @@ -585,6 +593,10 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, } tp->rx_opt.dsack = 0; + } else if (unlikely(leftover_bytes != NOP_LEFTOVER)) { + *ptr++ = htonl((leftover_bytes << 16) | + (TCPOPT_NOP << 8) | + TCPOPT_NOP); } if (unlikely(OPTION_FAST_OPEN_COOKIE & options)) { From patchwork Wed Mar 18 09:37:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222303 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B9EFC10DCE for ; Wed, 18 Mar 2020 09:47:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 26AF220674 for ; Wed, 18 Mar 2020 09:47:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727690AbgCRJrO (ORCPT ); Wed, 18 Mar 2020 05:47:14 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727673AbgCRJrN (ORCPT ); Wed, 18 Mar 2020 05:47:13 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEcK006446; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 7C24A360F54; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 17/28] tcp: AccECN needs to know delivered bytes Date: Wed, 18 Mar 2020 11:37:58 +0200 Message-Id: <1584524289-24187-17-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen AccECN byte counter estimation requires delivered bytes which can be calculated while processing SACK blocks and cumulative ACK. Non-SACK calculation is quite annoying, inaccurate, and likely bogus. Signed-off-by: Ilpo Järvinen --- net/ipv4/tcp_input.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 3109e3199906..0a63f8a49057 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1235,6 +1235,7 @@ static bool tcp_check_dsack(struct sock *sk, const struct sk_buff *ack_skb, struct tcp_sacktag_state { u32 reord; + u32 delivered_bytes; /* Timestamps for earliest and latest never-retransmitted segment * that was SACKed. RTO needs the earliest RTT to stay conservative, * but congestion control should still get an accurate delay signal. @@ -1306,7 +1307,7 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb, static u8 tcp_sacktag_one(struct sock *sk, struct tcp_sacktag_state *state, u8 sacked, u32 start_seq, u32 end_seq, - int dup_sack, int pcount, + int dup_sack, int pcount, u32 plen, u64 xmit_time) { struct tcp_sock *tp = tcp_sk(sk); @@ -1365,6 +1366,7 @@ static u8 tcp_sacktag_one(struct sock *sk, state->flag |= FLAG_DATA_SACKED; tp->sacked_out += pcount; tp->delivered += pcount; /* Out-of-order packets delivered */ + state->delivered_bytes += plen; /* Lost marker hint past SACKed? Tweak RFC3517 cnt */ if (tp->lost_skb_hint && @@ -1406,7 +1408,7 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *prev, * tcp_highest_sack_seq() when skb is highest_sack. */ tcp_sacktag_one(sk, state, TCP_SKB_CB(skb)->sacked, - start_seq, end_seq, dup_sack, pcount, + start_seq, end_seq, dup_sack, pcount, skb->len, tcp_skb_timestamp_us(skb)); tcp_rate_skb_delivered(sk, skb, state->rate); @@ -1696,6 +1698,7 @@ static struct sk_buff *tcp_sacktag_walk(struct sk_buff *skb, struct sock *sk, TCP_SKB_CB(skb)->end_seq, dup_sack, tcp_skb_pcount(skb), + skb->len, tcp_skb_timestamp_us(skb)); tcp_rate_skb_delivered(sk, skb, state->rate); if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) @@ -3239,6 +3242,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack, tp->sacked_out -= acked_pcount; } else if (tcp_is_sack(tp)) { tp->delivered += acked_pcount; + sack->delivered_bytes += skb->len; if (!tcp_skb_spurious_retrans(tp, skb)) tcp_rack_advance(tp, sacked, scb->end_seq, tcp_skb_timestamp_us(skb)); @@ -3325,6 +3329,10 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack, */ if (flag & FLAG_RETRANS_DATA_ACKED) flag &= ~FLAG_ORIG_SACK_ACKED; + + sack->delivered_bytes = (skb ? + TCP_SKB_CB(skb)->seq : + tp->snd_una) - prior_snd_una; } else { int delta; @@ -3742,6 +3750,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) u32 ecn_count = 0; /* Did we receive ECE/an AccECN ACE update? */ u32 prior_fack; + sack_state.delivered_bytes = 0; sack_state.first_sackt = 0; sack_state.rate = &rs; From patchwork Wed Mar 18 09:37:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222308 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51961C5ACD7 for ; Wed, 18 Mar 2020 09:46:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 24D6620674 for ; Wed, 18 Mar 2020 09:46:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727631AbgCRJqu (ORCPT ); Wed, 18 Mar 2020 05:46:50 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727521AbgCRJqs (ORCPT ); Wed, 18 Mar 2020 05:46:48 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cENb006447; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 7F0EF360F58; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 18/28] tcp: don't early return when sack doesn't fit Date: Wed, 18 Mar 2020 11:37:59 +0200 Message-Id: <1584524289-24187-18-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen AccECN code will be placed after this fragment so no early returns please. Signed-off-by: Ilpo Järvinen --- net/ipv4/tcp_output.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 9ff6d14363df..084ebd2e725f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -875,17 +875,16 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb eff_sacks = tp->rx_opt.num_sacks + tp->rx_opt.dsack; if (unlikely(eff_sacks)) { const unsigned int remaining = MAX_TCP_OPTION_SPACE - size; - if (unlikely(remaining < TCPOLEN_SACK_BASE_ALIGNED + - TCPOLEN_SACK_PERBLOCK)) - return size; - - opts->num_sack_blocks = - min_t(unsigned int, eff_sacks, - (remaining - TCPOLEN_SACK_BASE_ALIGNED) / - TCPOLEN_SACK_PERBLOCK); - - size += TCPOLEN_SACK_BASE_ALIGNED + - opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK; + if (likely(remaining >= TCPOLEN_SACK_BASE_ALIGNED + + TCPOLEN_SACK_PERBLOCK)) { + opts->num_sack_blocks = + min_t(unsigned int, eff_sacks, + (remaining - TCPOLEN_SACK_BASE_ALIGNED) / + TCPOLEN_SACK_PERBLOCK); + + size += TCPOLEN_SACK_BASE_ALIGNED + + opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK; + } } return size; From patchwork Wed Mar 18 09:38:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222298 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53389C5ACD7 for ; Wed, 18 Mar 2020 09:47:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1D95020767 for ; Wed, 18 Mar 2020 09:47:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727743AbgCRJrj (ORCPT ); Wed, 18 Mar 2020 05:47:39 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727511AbgCRJri (ORCPT ); Wed, 18 Mar 2020 05:47:38 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEDQ006450; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 83816360F59; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 19/28] tcp: AccECN option Date: Wed, 18 Mar 2020 11:38:00 +0200 Message-Id: <1584524289-24187-19-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen AccECN option tx & rx side without option send control related features that are added in a later change. Signed-off-by: Ilpo Järvinen --- include/linux/tcp.h | 4 ++ include/net/tcp.h | 16 ++++++ net/ipv4/tcp_input.c | 126 +++++++++++++++++++++++++++++++++++++++--- net/ipv4/tcp_output.c | 112 ++++++++++++++++++++++++++++++++++++- 4 files changed, 246 insertions(+), 12 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 6b81d7eb0117..fd232bb7fae9 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -114,6 +114,7 @@ struct tcp_options_received { snd_wscale : 4, /* Window scaling received from sender */ rcv_wscale : 4; /* Window scaling to send to receiver */ u8 num_sacks; /* Number of SACK blocks */ + s8 accecn; /* AccECN index in header, -1=no option */ u16 user_mss; /* mss requested by user in ioctl */ u16 mss_clamp; /* Maximal mss, negotiated at connection setup */ #if IS_ENABLED(CONFIG_MPTCP) @@ -321,9 +322,12 @@ struct tcp_sock { u32 prr_out; /* Total number of pkts sent during Recovery. */ u32 delivered; /* Total data packets delivered incl. rexmits */ u32 delivered_ce; /* Like the above but only ECE marked packets */ + u32 delivered_ecn_bytes[3]; u32 received_ce; /* Like the above but for received CE marked packets */ u32 received_ce_tx; /* Like the above but max transmitted value */ u32 received_ecn_bytes[3]; + u8 accecn_minlen:2,/* Minimum length of AccECN option sent */ + estimate_ecnfield:2;/* ECN field for AccECN delivered estimates */ u32 lost; /* Total data packets lost incl. rexmits */ u32 app_limited; /* limited until "delivered" reaches this val */ u64 first_tx_mstamp; /* start of window send phase */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 5824447b1fc5..54471c2dedd5 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -189,6 +189,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); /* Magic number to be after the option value for sharing TCP * experimental options. See draft-ietf-tcpm-experimental-options-00.txt */ +#define TCPOPT_ACCECN_MAGIC 0xACCE #define TCPOPT_FASTOPEN_MAGIC 0xF989 #define TCPOPT_SMC_MAGIC 0xE2D4C3D9 @@ -204,6 +205,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCPOLEN_FASTOPEN_BASE 2 #define TCPOLEN_EXP_FASTOPEN_BASE 4 #define TCPOLEN_EXP_SMC_BASE 6 +#define TCPOLEN_EXP_ACCECN_BASE 4 /* But this is what stacks really send out. */ #define TCPOLEN_TSTAMP_ALIGNED 12 @@ -215,6 +217,13 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCPOLEN_MD5SIG_ALIGNED 20 #define TCPOLEN_MSS_ALIGNED 4 #define TCPOLEN_EXP_SMC_BASE_ALIGNED 8 +#define TCPOLEN_ACCECN_PERCOUNTER 3 + +/* Maximum number of byte counters in AccECN option + size */ +#define TCP_ACCECN_NUMCOUNTERS 3 +#define TCP_ACCECN_MAXSIZE (TCPOLEN_EXP_ACCECN_BASE + \ + TCPOLEN_ACCECN_PERCOUNTER * \ + TCP_ACCECN_NUMCOUNTERS) /* Flags in tp->nonagle */ #define TCP_NAGLE_OFF 1 /* Nagle's algo is disabled */ @@ -363,6 +372,10 @@ static inline void tcp_dec_quickack_mode(struct sock *sk, } } +/* sysctl_tcp_ecn value */ +#define TCP_ECN_ENABLE_MASK 0x3 +#define TCP_ACCECN_NO_OPT 0x100 + #define TCP_ECN_MODE_RFC3168 0x1 #define TCP_ECN_QUEUE_CWR 0x2 #define TCP_ECN_DEMAND_CWR 0x4 @@ -890,6 +903,9 @@ static inline void tcp_accecn_init_counters(struct tcp_sock *tp) tp->received_ce = 0; tp->received_ce_tx = 0; __tcp_accecn_init_bytes_counters(tp->received_ecn_bytes); + __tcp_accecn_init_bytes_counters(tp->delivered_ecn_bytes); + tp->accecn_minlen = 0; + tp->estimate_ecnfield = 0; } /* This is what the send packet queuing engine uses to pass diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0a63f8a49057..d34b50f73652 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -402,9 +402,92 @@ static u32 tcp_ecn_rcv_ecn_echo(const struct tcp_sock *tp, const struct tcphdr * return 0; } +/* Handles AccECN option ECT and CE 24-bit byte counters update into + * the u32 value in tcp_sock. As we're processing TCP options, it is + * safe to access from - 1. + */ +static s32 tcp_update_ecn_bytes(u32 *cnt, const char *from, u32 init_offset) +{ + u32 truncated = (get_unaligned_be32(from - 1) - init_offset) & 0xFFFFFFU; + u32 delta = (truncated - *cnt) & 0xFFFFFFU; + /* If delta has the highest bit set (24th bit) indicating negative, + * sign extend to correct an estimation error in the ecn_bytes + */ + delta = delta & 0x800000 ? delta | 0xFF000000 : delta; + *cnt += delta; + return (s32)delta; +} + +static u8 accecn_opt_ecnfield[3] = { + INET_ECN_ECT_0, INET_ECN_CE, INET_ECN_ECT_1, +}; + +/* Returns true if the byte counters can be used */ +static bool tcp_accecn_process_option(struct tcp_sock *tp, + const struct sk_buff *skb, + u32 delivered_bytes) +{ + unsigned char *ptr; + unsigned int optlen; + int i; + bool ambiguous_ecn_bytes_incr = false; + bool first_changed = false; + bool res; + + if (tp->rx_opt.accecn < 0) { + if (tp->estimate_ecnfield) { + tp->delivered_ecn_bytes[tp->estimate_ecnfield - 1] += + delivered_bytes; + return true; + } + return false; + } + + ptr = skb_transport_header(skb) + tp->rx_opt.accecn; + optlen = ptr[1]; + if (ptr[0] == TCPOPT_EXP) { + optlen -= 2; + ptr += 2; + } + ptr += 2; + + res = !!tp->estimate_ecnfield; + for (i = 0; i < 3; i++) { + if (optlen >= TCPOLEN_ACCECN_PERCOUNTER) { + u8 ecnfield = accecn_opt_ecnfield[i]; + u32 init_offset = i ? 0 : TCP_ACCECN_E0B_INIT_OFFSET; + s32 delta; + + delta = tcp_update_ecn_bytes(&(tp->delivered_ecn_bytes[ecnfield - 1]), + ptr, init_offset); + if (delta) { + if (delta < 0) { + res = false; + ambiguous_ecn_bytes_incr = true; + } + if (ecnfield != tp->estimate_ecnfield) { + if (!first_changed) { + tp->estimate_ecnfield = ecnfield; + first_changed = true; + } else { + res = false; + ambiguous_ecn_bytes_incr = true; + } + } + } + + optlen -= TCPOLEN_ACCECN_PERCOUNTER; + } + } + if (ambiguous_ecn_bytes_incr) + tp->estimate_ecnfield = 0; + + return res; +} + /* Returns the ECN CE delta */ static u32 tcp_accecn_process(struct tcp_sock *tp, const struct sk_buff *skb, - u32 delivered_pkts, int flag) + u32 delivered_pkts, u32 delivered_bytes, int flag) { u32 delta, safe_delta; u32 corrected_ace; @@ -413,6 +496,8 @@ static u32 tcp_accecn_process(struct tcp_sock *tp, const struct sk_buff *skb, if (!(flag & (FLAG_FORWARD_PROGRESS|FLAG_TS_PROGRESS))) return 0; + tcp_accecn_process_option(tp, skb, delivered_bytes); + if (!(flag & FLAG_SLOWPATH)) { /* AccECN counter might overflow on large ACKs */ if (delivered_pkts <= TCP_ACCECN_CEP_ACE_MASK) @@ -3839,7 +3924,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) if (tcp_ecn_mode_accecn(tp)) { ecn_count = tcp_accecn_process(tp, skb, - tp->delivered - delivered, flag); + tp->delivered - delivered, + sack_state.delivered_bytes, flag); if (ecn_count > 0) flag |= FLAG_ECE; } @@ -3878,7 +3964,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) no_queue: if (tcp_ecn_mode_accecn(tp)) { ecn_count = tcp_accecn_process(tp, skb, - tp->delivered - delivered, flag); + tp->delivered - delivered, + sack_state.delivered_bytes, flag); if (ecn_count > 0) flag |= FLAG_ECE; } @@ -4005,6 +4092,7 @@ void tcp_parse_options(const struct net *net, ptr = (const unsigned char *)(th + 1); opt_rx->saw_tstamp = 0; + opt_rx->accecn = -1; while (length > 0) { int opcode = *ptr++; @@ -4094,12 +4182,16 @@ void tcp_parse_options(const struct net *net, break; case TCPOPT_EXP: + if (opsize >= TCPOLEN_EXP_ACCECN_BASE && + get_unaligned_be16(ptr) == + TCPOPT_ACCECN_MAGIC) + opt_rx->accecn = (ptr - 2) - (unsigned char *)th; /* Fast Open option shares code 254 using a * 16 bits magic number. */ - if (opsize >= TCPOLEN_EXP_FASTOPEN_BASE && - get_unaligned_be16(ptr) == - TCPOPT_FASTOPEN_MAGIC) + else if (opsize >= TCPOLEN_EXP_FASTOPEN_BASE && + get_unaligned_be16(ptr) == + TCPOPT_FASTOPEN_MAGIC) tcp_parse_fastopen_option(opsize - TCPOLEN_EXP_FASTOPEN_BASE, ptr + 2, th->syn, foc, true); @@ -5567,6 +5659,19 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, const struct tcphdr *t } } +/* Maps ECT/CE bits to minimum length of AccECN option */ +static inline unsigned int tcp_ecn_field_to_accecn_len(u8 ecnfield) +{ + unsigned int opt; + + opt = (ecnfield - 2) & INET_ECN_MASK; + /* Shift+XOR for 11 -> 10 */ + opt = (opt ^ (opt >> 1)) + 1; + + return opt; +} + + /* Updates Accurate ECN received counters from the received IP ECN field */ void tcp_ecn_received_counters(struct sock *sk, const struct sk_buff *skb, u32 payload_len) @@ -5582,7 +5687,9 @@ void tcp_ecn_received_counters(struct sock *sk, const struct sk_buff *skb, tp->received_ce += is_ce * max_t(u16, 1, skb_shinfo(skb)->gso_segs); if (payload_len > 0) { + u8 minlen = tcp_ecn_field_to_accecn_len(ecnfield); tp->received_ecn_bytes[ecnfield - 1] += payload_len; + tp->accecn_minlen = max_t(u8, tp->accecn_minlen, minlen); } } } @@ -6639,9 +6746,10 @@ static void tcp_ecn_create_request(struct request_sock *req, u32 ecn_ok_dst; if (tcp_accecn_syn_requested(th) && - (net->ipv4.sysctl_tcp_ecn || tcp_ca_needs_accecn(listen_sk))) { + ((net->ipv4.sysctl_tcp_ecn & TCP_ECN_ENABLE_MASK) || + tcp_ca_needs_accecn(listen_sk))) { inet_rsk(req)->ecn_ok = 1; - if ((net->ipv4.sysctl_tcp_ecn >= 2) || + if (((net->ipv4.sysctl_tcp_ecn & TCP_ECN_ENABLE_MASK) >= 2) || tcp_ca_needs_accecn(listen_sk)) { tcp_rsk(req)->accecn_ok = 1; tcp_rsk(req)->syn_ect_rcv = @@ -6655,7 +6763,7 @@ static void tcp_ecn_create_request(struct request_sock *req, ect = !INET_ECN_is_not_ect(TCP_SKB_CB(skb)->ip_dsfield); ecn_ok_dst = dst_feature(dst, DST_FEATURE_ECN_MASK); - ecn_ok = net->ipv4.sysctl_tcp_ecn || ecn_ok_dst; + ecn_ok = (net->ipv4.sysctl_tcp_ecn & TCP_ECN_ENABLE_MASK) || ecn_ok_dst; if (((!ect || th->res1 || th->ae) && ecn_ok) || tcp_ca_needs_ecn(listen_sk) || diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 084ebd2e725f..7bce1a73ac8f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -330,9 +330,11 @@ static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); bool bpf_needs_ecn = tcp_bpf_ca_needs_ecn(sk); - bool use_accecn = sock_net(sk)->ipv4.sysctl_tcp_ecn == 3 || + bool use_accecn = + (sock_net(sk)->ipv4.sysctl_tcp_ecn & TCP_ECN_ENABLE_MASK) == 3 || tcp_ca_needs_accecn(sk); - bool use_ecn = sock_net(sk)->ipv4.sysctl_tcp_ecn == 1 || + bool use_ecn = + (sock_net(sk)->ipv4.sysctl_tcp_ecn & TCP_ECN_ENABLE_MASK) == 1 || tcp_ca_needs_ecn(sk) || bpf_needs_ecn || use_accecn; if (!use_ecn) { @@ -468,6 +470,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp) #define OPTION_FAST_OPEN_COOKIE (1 << 8) #define OPTION_SMC (1 << 9) #define OPTION_MPTCP (1 << 10) +#define OPTION_ACCECN (1 << 11) static void smc_options_write(__be32 *ptr, u16 *options) { @@ -488,12 +491,14 @@ struct tcp_out_options { u16 options; /* bit field of OPTION_* */ u16 mss; /* 0 to disable */ u8 ws; /* window scale, 0 to disable */ - u8 num_sack_blocks; /* number of SACK blocks to include */ + u8 num_sack_blocks:3, /* number of SACK blocks to include */ + num_ecn_bytes:2; /* number of AccECN fields needed */ u8 hash_size; /* bytes in hash_location */ __u8 *hash_location; /* temporary pointer, overloaded */ __u32 tsval, tsecr; /* need to include OPTION_TS */ struct tcp_fastopen_cookie *fastopen_cookie; /* Fast open cookie */ struct mptcp_out_options mptcp; + u32 *ecn_bytes; /* AccECN ECT/CE byte counters */ }; static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts) @@ -557,6 +562,33 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, *ptr++ = htonl(opts->tsecr); } + if (unlikely(OPTION_ACCECN & options)) { + u32 e0b = opts->ecn_bytes[INET_ECN_ECT_0 - 1] + TCP_ACCECN_E0B_INIT_OFFSET; + u32 ceb = opts->ecn_bytes[INET_ECN_CE - 1] + TCP_ACCECN_CEB_INIT_OFFSET; + u32 e1b = opts->ecn_bytes[INET_ECN_ECT_1 - 1] + TCP_ACCECN_E1B_INIT_OFFSET; + u8 len = TCPOLEN_EXP_ACCECN_BASE + + opts->num_ecn_bytes * TCPOLEN_ACCECN_PERCOUNTER; + + *ptr++ = htonl((TCPOPT_EXP << 24) | (len << 16) | + TCPOPT_ACCECN_MAGIC); + if (opts->num_ecn_bytes > 0) { + *ptr++ = htonl((e0b << 8) | + (opts->num_ecn_bytes > 1 ? + (ceb >> 16) & 0xff : + TCPOPT_NOP)); + if (opts->num_ecn_bytes == 2) { + leftover_bytes = (ceb >> 8) & 0xffff; + } else { + *ptr++ = htonl((ceb << 16) | + ((e1b >> 8) & 0xffff)); + leftover_bytes = ((e1b & 0xff) << 8) | + TCPOPT_NOP; + leftover_size = 1; + } + } + if (tp != NULL) + tp->accecn_minlen = 0; + } if (unlikely(OPTION_SACK_ADVERTISE & options)) { *ptr++ = htonl((leftover_bytes << 16) | (TCPOPT_SACK_PERM << 8) | @@ -677,6 +709,53 @@ static void mptcp_set_option_cond(const struct request_sock *req, } } +/* Initial values for AccECN option, ordered is based on ECN field bits + * similar to received_ecn_bytes. Used for SYN/ACK AccECN option. + */ +u32 synack_ecn_bytes[3] = { 0, 0, 0 }; + +static u32 tcp_synack_options_combine_saving(struct tcp_out_options *opts) +{ + /* How much there's room for combining with the alignment padding? */ + if ((opts->options & (OPTION_SACK_ADVERTISE|OPTION_TS)) == + OPTION_SACK_ADVERTISE) + return 2; + else if (opts->options & OPTION_WSCALE) + return 1; + return 0; +} + +/* AccECN can take here and there take advantage of NOPs for alignment of + * other options + */ +static int tcp_options_fit_accecn(struct tcp_out_options *opts, int required, + int remaining, int max_combine_saving) +{ + int size = TCP_ACCECN_MAXSIZE; + + opts->num_ecn_bytes = TCP_ACCECN_NUMCOUNTERS; + + while (opts->num_ecn_bytes >= required) { + int leftover_size = size & 0x3; + /* Pad to dword if cannot combine */ + if (leftover_size > max_combine_saving) + leftover_size = -((4 - leftover_size) & 0x3); + + if (remaining >= size - leftover_size) { + size -= leftover_size; + break; + } + + opts->num_ecn_bytes--; + size -= TCPOLEN_ACCECN_PERCOUNTER; + } + if (opts->num_ecn_bytes < required) + return 0; + + opts->options |= OPTION_ACCECN; + return size; +} + /* Compute TCP options for SYN packets. This is not the final * network wire format yet. */ @@ -755,6 +834,16 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, } } + /* Simultaneous open SYN/ACK needs AccECN option but not SYN */ + if (unlikely((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_ACK) && + tcp_ecn_mode_accecn(tp) && + !(sock_net(sk)->ipv4.sysctl_tcp_ecn & TCP_ACCECN_NO_OPT) && + (remaining >= TCPOLEN_EXP_ACCECN_BASE))) { + opts->ecn_bytes = synack_ecn_bytes; + remaining -= tcp_options_fit_accecn(opts, 0, remaining, + tcp_synack_options_combine_saving(opts)); + } + return MAX_TCP_OPTION_SPACE - remaining; } @@ -767,6 +856,7 @@ static unsigned int tcp_synack_options(const struct sock *sk, struct tcp_fastopen_cookie *foc) { struct inet_request_sock *ireq = inet_rsk(req); + struct tcp_request_sock *treq = tcp_rsk(req); unsigned int remaining = MAX_TCP_OPTION_SPACE; #ifdef CONFIG_TCP_MD5SIG @@ -820,6 +910,14 @@ static unsigned int tcp_synack_options(const struct sock *sk, smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining); + if (treq->accecn_ok && + !(sock_net(sk)->ipv4.sysctl_tcp_ecn & TCP_ACCECN_NO_OPT) && + (remaining >= TCPOLEN_EXP_ACCECN_BASE)) { + opts->ecn_bytes = synack_ecn_bytes; + remaining -= tcp_options_fit_accecn(opts, 0, remaining, + tcp_synack_options_combine_saving(opts)); + } + return MAX_TCP_OPTION_SPACE - remaining; } @@ -887,6 +985,14 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb } } + if (tcp_ecn_mode_accecn(tp) && + !(sock_net(sk)->ipv4.sysctl_tcp_ecn & TCP_ACCECN_NO_OPT)) { + opts->ecn_bytes = tp->received_ecn_bytes; + size += tcp_options_fit_accecn(opts, tp->accecn_minlen, + MAX_TCP_OPTION_SPACE - size, + opts->num_sack_blocks > 0 ? 2 : 0); + } + return size; } From patchwork Wed Mar 18 09:38:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222299 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CAA7C10DCE for ; Wed, 18 Mar 2020 09:47:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6671A20674 for ; Wed, 18 Mar 2020 09:47:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727469AbgCRJrd (ORCPT ); Wed, 18 Mar 2020 05:47:33 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727547AbgCRJrc (ORCPT ); Wed, 18 Mar 2020 05:47:32 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEjB006458; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 92E11360F5A; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 22/28] tcp: AccECN option order bit & failure handling Date: Wed, 18 Mar 2020 11:38:03 +0200 Message-Id: <1584524289-24187-22-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen AccECN option has two possible field orders. Collect the order bit from first AccECN option that has enough length to contain it. AccECN option may fail in various way, handle these: - Remove option from SYN/ACK rexmits to handle blackholes - If no option arrives in SYN/ACK, assume Option is not usable - If an option arrives later, re-enabled - If option is zeroed, disable AccECN option processing Signed-off-by: Ilpo Järvinen --- include/linux/tcp.h | 2 ++ include/net/tcp.h | 10 +++++++++ net/ipv4/tcp.c | 1 + net/ipv4/tcp_input.c | 46 ++++++++++++++++++++++++++++++++++------ net/ipv4/tcp_minisocks.c | 32 ++++++++++++++++++++++++++++ net/ipv4/tcp_output.c | 4 +++- 6 files changed, 88 insertions(+), 7 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index c381aea5c764..64db51e5d45e 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -151,6 +151,7 @@ struct tcp_request_sock { bool tfo_listener; bool is_mptcp; u8 accecn_ok : 1, + saw_accecn_opt : 3, syn_ect_snt: 2, syn_ect_rcv: 2; u32 txhash; @@ -252,6 +253,7 @@ struct tcp_sock { u8 compressed_ack; u8 syn_ect_snt:2, /* AccECN ECT memory, only */ syn_ect_rcv:2, /* ... needed durign 3WHS + first seqno */ + saw_accecn_opt:3, /* A valid AccECN option was seen */ ecn_fail:1; /* ECN reflector detected path mangling */ u32 chrono_start; /* Start time in jiffies of a TCP chrono */ u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 52567d8fca33..a29109fa2ce2 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -226,6 +226,14 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); TCP_ACCECN_NUMCOUNTERS) #define TCP_ACCECN_BEACON_FREQ_SHIFT 2 /* Send option at least 2^2 times per RTT */ +/* tp->saw_accecn_opt states, empty seen & orderbit are overloaded */ +#define TCP_ACCECN_OPT_EMPTY_SEEN 0x1 +#define TCP_ACCECN_OPT_ORDERBIT 0x1 +#define TCP_ACCECN_OPT_COUNTER_SEEN 0x2 +#define TCP_ACCECN_OPT_SEEN (TCP_ACCECN_OPT_COUNTER_SEEN | \ + TCP_ACCECN_OPT_EMPTY_SEEN) +#define TCP_ACCECN_OPT_FAIL 0x4 + /* Flags in tp->nonagle */ #define TCP_NAGLE_OFF 1 /* Nagle's algo is disabled */ #define TCP_NAGLE_CORK 2 /* Socket is corked */ @@ -443,6 +451,7 @@ static inline u32 tcp_accecn_ace_deficit(const struct tcp_sock *tp) bool tcp_accecn_validate_syn_feedback(struct sock *sk, u8 ace, u8 sent_ect); void tcp_accecn_third_ack(struct sock *sk, const struct sk_buff *skb, u8 syn_ect_snt); +u8 tcp_accecn_option_init(const struct sk_buff *skb, u8 opt_offset); void tcp_ecn_received_counters(struct sock *sk, const struct sk_buff *skb, u32 payload_len); @@ -885,6 +894,7 @@ static inline u64 tcp_skb_timestamp_us(const struct sk_buff *skb) */ #define TCP_ACCECN_CEP_INIT_OFFSET 5 #define TCP_ACCECN_E1B_INIT_OFFSET 0 +#define TCP_ACCECN_E1B_FIRST_INIT_OFFSET 0x800001 #define TCP_ACCECN_E0B_INIT_OFFSET 1 #define TCP_ACCECN_CEB_INIT_OFFSET 0 diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index cfbdc1468342..09f73f81e6fa 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2624,6 +2624,7 @@ int tcp_disconnect(struct sock *sk, int flags) tp->window_clamp = 0; tp->delivered = 0; tp->delivered_ce = 0; + tp->saw_accecn_opt = 0; tp->ecn_fail = 0; tcp_accecn_init_counters(tp); tp->prev_ecnfield = 0; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 504309a73de2..826dfd5bf114 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -352,7 +352,8 @@ bool tcp_accecn_validate_syn_feedback(struct sock *sk, u8 ace, u8 sent_ect) } /* See Table 2 of the AccECN draft */ -static void tcp_ecn_rcv_synack(struct sock *sk, const struct tcphdr *th, +static void tcp_ecn_rcv_synack(struct sock *sk, const struct sk_buff *skb, + const struct tcphdr *th, u8 ip_dsfield) { struct tcp_sock *tp = tcp_sk(sk); @@ -372,7 +373,12 @@ static void tcp_ecn_rcv_synack(struct sock *sk, const struct tcphdr *th, default: tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN); tp->syn_ect_rcv = ip_dsfield & INET_ECN_MASK; - tp->accecn_opt_demand = 2; + if (tp->rx_opt.accecn >= 0 && + tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) { + tp->saw_accecn_opt = tcp_accecn_option_init(skb, + tp->rx_opt.accecn); + tp->accecn_opt_demand = 2; + } if (tcp_accecn_validate_syn_feedback(sk, ace, tp->syn_ect_snt) && INET_ECN_is_ce(ip_dsfield)) tp->received_ce++; @@ -436,7 +442,19 @@ static bool tcp_accecn_process_option(struct tcp_sock *tp, bool first_changed = false; bool res; + if (tp->saw_accecn_opt == TCP_ACCECN_OPT_FAIL) + return false; + if (tp->rx_opt.accecn < 0) { + if (!tp->saw_accecn_opt) { + /* Too late to enable after this point due to + * potential counter wraps + */ + if (tp->bytes_sent >= (1 << 23) - 1) + tp->saw_accecn_opt = TCP_ACCECN_OPT_FAIL; + return false; + } + if (tp->estimate_ecnfield) { tp->delivered_ecn_bytes[tp->estimate_ecnfield - 1] += delivered_bytes; @@ -453,11 +471,20 @@ static bool tcp_accecn_process_option(struct tcp_sock *tp, } ptr += 2; + if (tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) + tp->saw_accecn_opt = tcp_accecn_option_init(skb, + tp->rx_opt.accecn); + res = !!tp->estimate_ecnfield; for (i = 0; i < 3; i++) { if (optlen >= TCPOLEN_ACCECN_PERCOUNTER) { - u8 ecnfield = accecn_opt_ecnfield[i]; - u32 init_offset = i ? 0 : TCP_ACCECN_E0B_INIT_OFFSET; + u8 orderbit = tp->saw_accecn_opt & TCP_ACCECN_OPT_ORDERBIT; + int idx = orderbit ? i : 2 - i; + u8 ecnfield = accecn_opt_ecnfield[idx]; + u32 init_offset = i ? 0 : + !orderbit ? + TCP_ACCECN_E0B_INIT_OFFSET : + TCP_ACCECN_E1B_FIRST_INIT_OFFSET; s32 delta; delta = tcp_update_ecn_bytes(&(tp->delivered_ecn_bytes[ecnfield - 1]), @@ -4188,6 +4215,7 @@ void tcp_parse_options(const struct net *net, get_unaligned_be16(ptr) == TCPOPT_ACCECN_MAGIC) opt_rx->accecn = (ptr - 2) - (unsigned char *)th; + /* Fast Open option shares code 254 using a * 16 bits magic number. */ @@ -5836,7 +5864,12 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb, if (th->syn) { if (tcp_ecn_mode_accecn(tp)) { send_accecn_reflector = true; - tp->accecn_opt_demand = max_t(u8, 1, tp->accecn_opt_demand); + if (tp->rx_opt.accecn >= 0 && + tp->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) { + tp->saw_accecn_opt = tcp_accecn_option_init(skb, + tp->rx_opt.accecn); + tp->accecn_opt_demand = max_t(u8, 1, tp->accecn_opt_demand); + } } syn_challenge: if (syn_inerr) @@ -6279,7 +6312,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, */ if (tcp_ecn_mode_any(tp)) - tcp_ecn_rcv_synack(sk, th, TCP_SKB_CB(skb)->ip_dsfield); + tcp_ecn_rcv_synack(sk, skb, th, TCP_SKB_CB(skb)->ip_dsfield); tcp_init_wl(tp, TCP_SKB_CB(skb)->seq); tcp_try_undo_spurious_syn(sk); @@ -6812,6 +6845,7 @@ static void tcp_openreq_init(struct request_sock *req, tcp_rsk(req)->snt_synack = 0; tcp_rsk(req)->last_oow_ack_time = 0; tcp_rsk(req)->accecn_ok = 0; + tcp_rsk(req)->saw_accecn_opt = 0; tcp_rsk(req)->syn_ect_rcv = 0; tcp_rsk(req)->syn_ect_snt = 0; req->mss = rx_opt->mss_clamp; diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 2e532758a34a..eda3d0c3af32 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -97,6 +97,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, bool paws_reject = false; tmp_opt.saw_tstamp = 0; + tmp_opt.accecn = -1; if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) { tcp_parse_options(twsk_net(tw), skb, &tmp_opt, 0, NULL); @@ -437,6 +438,7 @@ static void tcp_ecn_openreq_child(struct sock *sk, tcp_ecn_mode_set(tp, TCP_ECN_MODE_ACCECN); tp->syn_ect_snt = treq->syn_ect_snt; tcp_accecn_third_ack(sk, skb, treq->syn_ect_snt); + tp->saw_accecn_opt = treq->saw_accecn_opt; tp->prev_ecnfield = treq->syn_ect_rcv; tp->accecn_opt_demand = 1; tcp_ecn_received_counters(sk, skb, skb->len - th->doff * 4); @@ -491,6 +493,32 @@ static void smc_check_reset_syn_req(struct tcp_sock *oldtp, #endif } +u8 tcp_accecn_option_init(const struct sk_buff *skb, u8 opt_offset) +{ + unsigned char *ptr = skb_transport_header(skb) + opt_offset; + unsigned int optlen = ptr[1]; + + if (ptr[0] == TCPOPT_EXP) { + optlen -= 2; + ptr += 2; + } + ptr += 2; + + if (optlen >= TCPOLEN_ACCECN_PERCOUNTER) { + u32 first_field = get_unaligned_be32(ptr - 1) & 0xFFFFFFU; + u8 orderbit = first_field >> 23; + /* Detect option zeroing. Check the first byte counter value, + * if present, it must be != 0. + */ + if (!first_field) + return TCP_ACCECN_OPT_FAIL; + + return TCP_ACCECN_OPT_COUNTER_SEEN + orderbit; + } + + return TCP_ACCECN_OPT_EMPTY_SEEN; +} + /* This is not only more efficient than what we used to do, it eliminates * a lot of code duplication between IPv4/IPv6 SYN recv processing. -DaveM * @@ -793,6 +821,10 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, if (!(flg & TCP_FLAG_ACK)) return NULL; + if (tcp_rsk(req)->accecn_ok && tmp_opt.accecn >= 0 && + tcp_rsk(req)->saw_accecn_opt < TCP_ACCECN_OPT_COUNTER_SEEN) + tcp_rsk(req)->saw_accecn_opt = tcp_accecn_option_init(skb, tmp_opt.accecn); + /* For Fast Open no more processing is needed (sk is the * child socket). */ diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f070128b69e6..4cc590a47f43 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -841,6 +841,7 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, /* Simultaneous open SYN/ACK needs AccECN option but not SYN */ if (unlikely((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_ACK) && tcp_ecn_mode_accecn(tp) && + inet_csk(sk)->icsk_retransmits < 2 && !(sock_net(sk)->ipv4.sysctl_tcp_ecn & TCP_ACCECN_NO_OPT) && (remaining >= TCPOLEN_EXP_ACCECN_BASE))) { opts->ecn_bytes = synack_ecn_bytes; @@ -914,7 +915,7 @@ static unsigned int tcp_synack_options(const struct sock *sk, smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining); - if (treq->accecn_ok && + if (treq->accecn_ok && req->num_timeout < 1 && !(sock_net(sk)->ipv4.sysctl_tcp_ecn & TCP_ACCECN_NO_OPT) && (remaining >= TCPOLEN_EXP_ACCECN_BASE)) { opts->ecn_bytes = synack_ecn_bytes; @@ -990,6 +991,7 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb } if (tcp_ecn_mode_accecn(tp) && + (tp->saw_accecn_opt & TCP_ACCECN_OPT_SEEN) && !(sock_net(sk)->ipv4.sysctl_tcp_ecn & TCP_ACCECN_NO_OPT)) { if (tp->accecn_opt_demand || (tcp_stamp_us_delta(tp->tcp_mstamp, tp->accecn_opt_tstamp) >= From patchwork Wed Mar 18 09:38:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222306 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 601D1C5ACD6 for ; Wed, 18 Mar 2020 09:46:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 35F7220767 for ; Wed, 18 Mar 2020 09:46:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727666AbgCRJq6 (ORCPT ); Wed, 18 Mar 2020 05:46:58 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727655AbgCRJq5 (ORCPT ); Wed, 18 Mar 2020 05:46:57 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEjB006484; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id 9C2F0360F5D; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 23/28] tcp: AccECN option ceb/cep heuristic Date: Wed, 18 Mar 2020 11:38:04 +0200 Message-Id: <1584524289-24187-23-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen The heuristic algorithm from draft-09 Appendix A.2.2. Signed-off-by: Ilpo Järvinen --- include/net/tcp.h | 1 + net/ipv4/tcp_input.c | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index a29109fa2ce2..54a640fed673 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -225,6 +225,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); TCPOLEN_ACCECN_PERCOUNTER * \ TCP_ACCECN_NUMCOUNTERS) #define TCP_ACCECN_BEACON_FREQ_SHIFT 2 /* Send option at least 2^2 times per RTT */ +#define TCP_ACCECN_SAFETY_SHIFT 1 /* SAFETY_FACTOR in accecn draft */ /* tp->saw_accecn_opt states, empty seen & orderbit are overloaded */ #define TCP_ACCECN_OPT_EMPTY_SEEN 0x1 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 826dfd5bf114..6bc9995202c8 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -518,14 +518,16 @@ static bool tcp_accecn_process_option(struct tcp_sock *tp, static u32 tcp_accecn_process(struct tcp_sock *tp, const struct sk_buff *skb, u32 delivered_pkts, u32 delivered_bytes, int flag) { - u32 delta, safe_delta; + u32 delta, safe_delta, d_ceb; u32 corrected_ace; + u32 old_ceb = tp->delivered_ecn_bytes[INET_ECN_CE - 1]; + bool opt_deltas_valid; /* Reordered ACK? (...or uncertain due to lack of data to send and ts) */ if (!(flag & (FLAG_FORWARD_PROGRESS|FLAG_TS_PROGRESS))) return 0; - tcp_accecn_process_option(tp, skb, delivered_bytes); + opt_deltas_valid = tcp_accecn_process_option(tp, skb, delivered_bytes); if (!(flag & FLAG_SLOWPATH)) { /* AccECN counter might overflow on large ACKs */ @@ -545,6 +547,16 @@ static u32 tcp_accecn_process(struct tcp_sock *tp, const struct sk_buff *skb, safe_delta = delivered_pkts - ((delivered_pkts - delta) & TCP_ACCECN_CEP_ACE_MASK); + if (opt_deltas_valid) { + d_ceb = tp->delivered_ecn_bytes[INET_ECN_CE - 1] - old_ceb; + if (!d_ceb) + return delta; + if (d_ceb > delta * tp->mss_cache) + return safe_delta; + if (d_ceb < safe_delta * tp->mss_cache >> TCP_ACCECN_SAFETY_SHIFT) + return delta; + } + return safe_delta; } From patchwork Wed Mar 18 09:38:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ilpo_J=C3=A4rvinen?= X-Patchwork-Id: 222304 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF191C5ACD6 for ; Wed, 18 Mar 2020 09:47:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BB49520674 for ; Wed, 18 Mar 2020 09:47:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727574AbgCRJrK (ORCPT ); Wed, 18 Mar 2020 05:47:10 -0400 Received: from smtp-rs2-vallila1.fe.helsinki.fi ([128.214.173.73]:53282 "EHLO smtp-rs2-vallila1.fe.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727408AbgCRJrH (ORCPT ); Wed, 18 Mar 2020 05:47:07 -0400 Received: from whs-18.cs.helsinki.fi (whs-18.cs.helsinki.fi [128.214.166.46]) by smtp-rs2.it.helsinki.fi (8.14.7/8.14.7) with ESMTP id 02I9cEVm006448; Wed, 18 Mar 2020 11:38:14 +0200 Received: by whs-18.cs.helsinki.fi (Postfix, from userid 1070048) id A3069360F5F; Wed, 18 Mar 2020 11:38:14 +0200 (EET) From: =?iso-8859-1?q?Ilpo_J=E4rvinen?= To: netdev@vger.kernel.org Cc: Yuchung Cheng , Neal Cardwell , Eric Dumazet , Olivier Tilmans Subject: [RFC PATCH 25/28] tcp: try to avoid safer when ACKs are thinned Date: Wed, 18 Mar 2020 11:38:06 +0200 Message-Id: <1584524289-24187-25-git-send-email-ilpo.jarvinen@helsinki.fi> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> References: <1584524289-24187-2-git-send-email-ilpo.jarvinen@helsinki.fi> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Ilpo Järvinen Add newly acked pkts EWMA. When ACK thinning occurs, select between safer and unsafe cep delta based on it. Signed-off-by: Ilpo Järvinen --- include/linux/tcp.h | 1 + net/ipv4/tcp.c | 1 + net/ipv4/tcp_input.c | 20 +++++++++++++++++++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 64db51e5d45e..22be7cf2e084 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -332,6 +332,7 @@ struct tcp_sock { prev_ecnfield:2,/* ECN bits from the previous segment */ accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */ estimate_ecnfield:2;/* ECN field for AccECN delivered estimates */ + u16 pkts_acked_ewma;/* EWMA of packets acked for AccECN cep heuristic */ u64 accecn_opt_tstamp; /* Last AccECN option sent timestamp */ u32 lost; /* Total data packets lost incl. rexmits */ u32 app_limited; /* limited until "delivered" reaches this val */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 09f73f81e6fa..4a22c19fa6d5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2629,6 +2629,7 @@ int tcp_disconnect(struct sock *sk, int flags) tcp_accecn_init_counters(tp); tp->prev_ecnfield = 0; tp->accecn_opt_tstamp = 0; + tp->pkts_acked_ewma = 0; tcp_set_ca_state(sk, TCP_CA_Open); tp->is_sack_reneg = 0; tcp_clear_retrans(tp); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 6bc9995202c8..f5476e6b1479 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -514,6 +514,10 @@ static bool tcp_accecn_process_option(struct tcp_sock *tp, return res; } +#define PKTS_ACKED_WEIGHT 6 +#define PKTS_ACKED_PREC 6 +#define ACK_COMP_THRESH 4 + /* Returns the ECN CE delta */ static u32 tcp_accecn_process(struct tcp_sock *tp, const struct sk_buff *skb, u32 delivered_pkts, u32 delivered_bytes, int flag) @@ -529,6 +533,19 @@ static u32 tcp_accecn_process(struct tcp_sock *tp, const struct sk_buff *skb, opt_deltas_valid = tcp_accecn_process_option(tp, skb, delivered_bytes); + if (delivered_pkts) { + if (!tp->pkts_acked_ewma) { + tp->pkts_acked_ewma = delivered_pkts << PKTS_ACKED_PREC; + } else { + u32 ewma = tp->pkts_acked_ewma; + + ewma = (((ewma << PKTS_ACKED_WEIGHT) - ewma) + + (delivered_pkts << PKTS_ACKED_PREC)) >> + PKTS_ACKED_WEIGHT; + tp->pkts_acked_ewma = min_t(u32, ewma, 0xFFFFU); + } + } + if (!(flag & FLAG_SLOWPATH)) { /* AccECN counter might overflow on large ACKs */ if (delivered_pkts <= TCP_ACCECN_CEP_ACE_MASK) @@ -555,7 +572,8 @@ static u32 tcp_accecn_process(struct tcp_sock *tp, const struct sk_buff *skb, return safe_delta; if (d_ceb < safe_delta * tp->mss_cache >> TCP_ACCECN_SAFETY_SHIFT) return delta; - } + } else if (tp->pkts_acked_ewma > (ACK_COMP_THRESH << PKTS_ACKED_PREC)) + return delta; return safe_delta; }