[v12,net-next,0/5] DUALPI2 patch

Message ID	20250422201602.56368-1-chia-yu.chang@nokia-bell-labs.com
Headers	show Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2051.outbound.protection.outlook.com [40.107.20.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F18571EB5C2; Tue, 22 Apr 2025 20:16:14 +0000 (UTC) Received-SPF: TempError (protection.outlook.com: error in processing during lookup of nokia-bell-labs.com: DNS Timeout) From: chia-yu.chang@nokia-bell-labs.com To: xandfury@gmail.com, netdev@vger.kernel.org, dave.taht@gmail.com, pabeni@redhat.com, jhs@mojatatu.com, kuba@kernel.org, stephen@networkplumber.org, xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net, edumazet@google.com, horms@kernel.org, andrew+netdev@lunn.ch, donald.hunter@gmail.com, ast@fiberby.net, liuhangbin@gmail.com, shuah@kernel.org, linux-kselftest@vger.kernel.org, ij@kernel.org, ncardwell@google.com, koen.de_schepper@nokia-bell-labs.com, g.white@cablelabs.com, ingemar.s.johansson@ericsson.com, mirja.kuehlewind@ericsson.com, cheshire@apple.com, rs.ietf@gmx.at, Jason_Livingood@comcast.com, vidhi_goel@apple.com Cc: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Subject: [PATCH v12 net-next 0/5] DUALPI2 patch Date: Tue, 22 Apr 2025 22:15:57 +0200 Message-Id: <20250422201602.56368-1-chia-yu.chang@nokia-bell-labs.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	DUALPI2 patch \| expand [v12,net-next,0/5] DUALPI2 patch [v12,net-next,2/5] selftests/tc-testing: Add selftests for qdisc DualPI2 [v12,net-next,4/5] sched: Dump configuration and statistics of dualpi2 qdisc

Message ID

20250422201602.56368-1-chia-yu.chang@nokia-bell-labs.com

Headers

Received-SPF: TempError (protection.outlook.com: error in processing during
 lookup of nokia-bell-labs.com: DNS Timeout)
From: chia-yu.chang@nokia-bell-labs.com
To: xandfury@gmail.com,
	netdev@vger.kernel.org,
	dave.taht@gmail.com,
	pabeni@redhat.com,
	jhs@mojatatu.com,
	kuba@kernel.org,
	stephen@networkplumber.org,
	xiyou.wangcong@gmail.com,
	jiri@resnulli.us,
	davem@davemloft.net,
	edumazet@google.com,
	horms@kernel.org,
	andrew+netdev@lunn.ch,
	donald.hunter@gmail.com,
	ast@fiberby.net,
	liuhangbin@gmail.com,
	shuah@kernel.org,
	linux-kselftest@vger.kernel.org,
	ij@kernel.org,
	ncardwell@google.com,
	koen.de_schepper@nokia-bell-labs.com,
	g.white@cablelabs.com,
	ingemar.s.johansson@ericsson.com,
	mirja.kuehlewind@ericsson.com,
	cheshire@apple.com,
	rs.ietf@gmx.at,
	Jason_Livingood@comcast.com,
	vidhi_goel@apple.com
Cc: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Subject: [PATCH v12 net-next 0/5] DUALPI2 patch
Date: Tue, 22 Apr 2025 22:15:57 +0200
Message-Id: <20250422201602.56368-1-chia-yu.chang@nokia-bell-labs.com>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Apr 2025 20:16:07.1551
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 9758c08d-b515-4111-1981-08dd81da83ea
X-MS-Exchange-CrossTenant-Id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=5d471751-9675-428d-917b-70f44f9630b0;Ip=[131.228.2.20];Helo=[fihe3nok0734.emea.nsn-net.net]
X-MS-Exchange-CrossTenant-AuthSource: 
 AMS0EPF00000190.eurprd05.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR07MB7253

Series

DUALPI2 patch | expand

Message

Chia-Yu Chang (Nokia) April 22, 2025, 8:15 p.m. UTC

From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Hello,

  Please find the DualPI2 patch v12.

v12 (22-Apr-2025)
- Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni@redhat.com>)
- Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni@redhat.com>)
- Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni@redhat.com>)
- Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni@redhat.com>)
- Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni@redhat.com>)

v11 (15-Apr-2025)
- Replace hstimer_init with hstimer_setup in sch_dualpi2.c

v10 (25-Mar-2025)
- Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni@redhat.com>)
- Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni@redhat.com>)
- Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni@redhat.com>)

v9 (16-Mar-2025)
- Fix mem_usage error in previous version
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking.
  In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets.
  This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0.
  Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay'
    Old versions:
                           avg       median          # data pts
 Ping (ms) ICMP :        11.55        11.70 ms              350
 TCP upload avg :        18.96          N/A Mbits/s         350
 TCP upload sum :        18.96          N/A Mbits/s         350

    New version (v9):
                           avg       median          # data pts
 Ping (ms) ICMP :        10.81        10.70 ms              350
 TCP upload avg :        18.91          N/A Mbits/s         350
 TCP upload sum :        18.91          N/A Mbits/s         350


  Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
    Old versions:
                           avg       median          # data pts
 Ping (ms) ICMP :        12.61        12.80 ms              350
 TCP upload avg :         9.48          N/A Mbits/s         350
 TCP upload sum :         9.48          N/A Mbits/s         350

    New version (v9):
                           avg       median          # data pts
 Ping (ms) ICMP :        11.06        10.80 ms              350
 TCP upload avg :         9.43          N/A Mbits/s         350
 TCP upload sum :         9.43          N/A Mbits/s         350


  Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
    Old versions:
                           avg       median          # data pts
 Ping (ms) ICMP :        40.86        37.45 ms              350
 TCP upload avg :         0.88          N/A Mbits/s         350
 TCP upload sum :         0.88          N/A Mbits/s         350
 TCP upload::1  :         0.88         0.97 Mbits/s         350

    New version (v9):
                           avg       median          # data pts
 Ping (ms) ICMP :        11.07        10.40 ms              350
 TCP upload avg :         0.55          N/A Mbits/s         350
 TCP upload sum :         0.55          N/A Mbits/s         350
 TCP upload::1  :         0.55         0.59 Mbits/s         350

v8 (11-Mar-2025)
- Fix warning messages in v7

v7 (07-Mar-2025)
- Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong@gmail.com>)

v6 (04-Mar-2025)
- Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba@kernel.org>)
- Update test cases in dualpi2.json
- Update commit message

v5 (22-Feb-2025)
- A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL:
  Unshaped 1gigE with 4 download streams test:
   - Summary of tcp_4down run 'MQ + FQ_CODEL':
                             avg       median       # data pts
      Ping (ms) ICMP :       1.19     1.34 ms          349
      TCP download avg :   235.42      N/A Mbits/s     349
      TCP download sum :   941.68      N/A Mbits/s     349
      TCP download::1  :   235.19   235.39 Mbits/s     349
      TCP download::2  :   235.03   235.35 Mbits/s     349
      TCP download::3  :   236.89   235.44 Mbits/s     349
      TCP download::4  :   234.57   235.19 Mbits/s     349

   - Summary of tcp_4down run 'MQ + FQ_PIE'
                             avg       median        # data pts
      Ping (ms) ICMP :       1.21     1.37 ms          350
      TCP download avg :   235.42      N/A Mbits/s     350
      TCP download sum :   941.61     N/A Mbits/s      350
      TCP download::1  :   232.54  233.13 Mbits/s      350
      TCP download::2  :   232.52  232.80 Mbits/s      350
      TCP download::3  :   233.14  233.78 Mbits/s      350
      TCP download::4  :   243.41  241.48 Mbits/s      350

   - Summary of tcp_4down run 'MQ + DUALPI2'
                             avg       median        # data pts
      Ping (ms) ICMP :       1.19     1.34 ms          349
      TCP download avg :   235.42      N/A Mbits/s     349
      TCP download sum :   941.68      N/A Mbits/s     349
      TCP download::1  :   235.19   235.39 Mbits/s     349
      TCP download::2  :   235.03   235.35 Mbits/s     349
      TCP download::3  :   236.89   235.44 Mbits/s     349
      TCP download::4  :   234.57   235.19 Mbits/s     349


  Unshaped 1gigE with 128 download streams test:
   - Summary of tcp_128down run 'MQ + FQ_CODEL':
                             avg       median       # data pts
      Ping (ms) ICMP   :     1.88     1.86 ms          350
      TCP download avg :     7.39      N/A Mbits/s     350
      TCP download sum :   946.47      N/A Mbits/s     350

   - Summary of tcp_128down run 'MQ + FQ_PIE':
                             avg       median       # data pts
      Ping (ms) ICMP   :     1.88     1.86 ms          350
      TCP download avg :     7.39      N/A Mbits/s     350
      TCP download sum :   946.47      N/A Mbits/s     350

   - Summary of tcp_128down run 'MQ + DUALPI2':
                             avg       median       # data pts
      Ping (ms) ICMP   :     1.88     1.86 ms          350
      TCP download avg :     7.39      N/A Mbits/s     350
      TCP download sum :   946.47      N/A Mbits/s     350


  Unshaped 10gigE with 4 download streams test:
   - Summary of tcp_4down run 'MQ + FQ_CODEL':
                             avg       median       # data pts
      Ping (ms) ICMP :       0.22     0.23 ms          350
      TCP download avg :  2354.08      N/A Mbits/s     350
      TCP download sum :  9416.31      N/A Mbits/s     350
      TCP download::1  :  2353.65  2352.81 Mbits/s     350
      TCP download::2  :  2354.54  2354.21 Mbits/s     350
      TCP download::3  :  2353.56  2353.78 Mbits/s     350
      TCP download::4  :  2354.56  2354.45 Mbits/s     350

  - Summary of tcp_4down run 'MQ + FQ_PIE':
                             avg       median      # data pts
      Ping (ms) ICMP :       0.20     0.19 ms          350
      TCP download avg :  2354.76      N/A Mbits/s     350
      TCP download sum :  9419.04      N/A Mbits/s     350
      TCP download::1  :  2354.77  2353.89 Mbits/s     350
      TCP download::2  :  2353.41  2354.29 Mbits/s     350
      TCP download::3  :  2356.18  2354.19 Mbits/s     350
      TCP download::4  :  2354.68  2353.15 Mbits/s     350

   - Summary of tcp_4down run 'MQ + DUALPI2':
                             avg       median      # data pts
      Ping (ms) ICMP :       0.24     0.24 ms          350
      TCP download avg :  2354.11      N/A Mbits/s     350
      TCP download sum :  9416.43      N/A Mbits/s     350
      TCP download::1  :  2354.75  2353.93 Mbits/s     350
      TCP download::2  :  2353.15  2353.75 Mbits/s     350
      TCP download::3  :  2353.49  2353.72 Mbits/s     350
      TCP download::4  :  2355.04  2353.73 Mbits/s     350


  Unshaped 10gigE with 128 download streams test:
   - Summary of tcp_128down run 'MQ + FQ_CODEL':
                             avg       median       # data pts
      Ping (ms) ICMP   :     7.57     8.69 ms          350
      TCP download avg :    73.97      N/A Mbits/s     350
      TCP download sum :  9467.82      N/A Mbits/s     350

   - Summary of tcp_128down run 'MQ + FQ_PIE':
                             avg       median       # data pts
      Ping (ms) ICMP   :     7.82     8.91 ms          350
      TCP download avg :    73.97      N/A Mbits/s     350
      TCP download sum :  9468.42      N/A Mbits/s     350

   - Summary of tcp_128down run 'MQ + DUALPI2':
                             avg       median       # data pts
      Ping (ms) ICMP   :     6.87     7.93 ms          350
      TCP download avg :    73.95      N/A Mbits/s     350
      TCP download sum :  9465.87      N/A Mbits/s     350

   From the results shown above, we see small differences between combinations.
- Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht@gmail.com> and Paolo Abeni <pabeni@redhat.com>)
- Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht@gmail.com>)
- Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht@gmail.com>)
- Update license identifier (Dave Taht <dave.taht@gmail.com>)
- Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong@gmail.com>)
- Use netlink policies for parameter checks (Jamal Hadi Salim <jhs@mojatatu.com>)
- Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht@gmail.com>)
- Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c
- Fix step_thresh in packets
- Update code comments in sch_dualpi2.c

v4 (22-Oct-2024)
- Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen@networkplumber.org>)
- Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen@networkplumber.org>)
- Fix line length warning.

v3 (19-Oct-2024)
- Fix compilaiton error
- Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba@kernel.org>)

v2 (18-Oct-2024)
- Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba@kernel.org>)
- Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs@mojatatu.com>)
- Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs@mojatatu.com>)
- Fix line length warning

For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332).

Best regards,
Chia-Yu

Chia-Yu Chang (4):
  Documentation: netlink: specs: tc: Add DualPI2 specification
  selftests/tc-testing: Add selftests for qdisc DualPI2
  sched: Struct definition and parsing of dualpi2 qdisc
  sched: Dump configuration and statistics of dualpi2 qdisc

Koen De Schepper (1):
  sched: Add enqueue/dequeue of dualpi2 qdisc

 Documentation/netlink/specs/tc.yaml           |  144 +++
 include/net/dropreason-core.h                 |    6 +
 include/uapi/linux/pkt_sched.h                |   39 +
 net/sched/Kconfig                             |   12 +
 net/sched/Makefile                            |    1 +
 net/sched/sch_dualpi2.c                       | 1096 +++++++++++++++++
 tools/testing/selftests/tc-testing/config     |    1 +
 .../tc-testing/tc-tests/qdiscs/dualpi2.json   |  149 +++
 tools/testing/selftests/tc-testing/tdc.sh     |    1 +
 9 files changed, 1449 insertions(+)
 create mode 100644 net/sched/sch_dualpi2.c
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json

Comments

Donald Hunter April 23, 2025, 12:03 p.m. UTC | #1

chia-yu.chang@nokia-bell-labs.com writes:

> +
> +static const struct nla_policy dualpi2_policy[TCA_DUALPI2_MAX + 1] = {
> +	[TCA_DUALPI2_LIMIT]		= NLA_POLICY_MIN(NLA_U32, 1),
> +	[TCA_DUALPI2_MEMORY_LIMIT]	= NLA_POLICY_MIN(NLA_U32, 1),
> +	[TCA_DUALPI2_TARGET]		= {.type = NLA_U32},
> +	[TCA_DUALPI2_TUPDATE]		= NLA_POLICY_MIN(NLA_U32, 1),
> +	[TCA_DUALPI2_ALPHA]		=
> +		NLA_POLICY_FULL_RANGE(NLA_U32, &dualpi2_alpha_beta_range),
> +	[TCA_DUALPI2_BETA]		=
> +		NLA_POLICY_FULL_RANGE(NLA_U32, &dualpi2_alpha_beta_range),
> +	[TCA_DUALPI2_STEP_THRESH]	= {.type = NLA_U32},
> +	[TCA_DUALPI2_STEP_PACKETS]	= {.type = NLA_U8},
> +	[TCA_DUALPI2_MIN_QLEN_STEP]	= {.type = NLA_U32},
> +	[TCA_DUALPI2_COUPLING]		= NLA_POLICY_MIN(NLA_U8, 1),
> +	[TCA_DUALPI2_DROP_OVERLOAD]	= {.type = NLA_U8},
> +	[TCA_DUALPI2_DROP_EARLY]	= {.type = NLA_U8},
> +	[TCA_DUALPI2_C_PROTECTION]	=
> +		NLA_POLICY_FULL_RANGE(NLA_U8, &dualpi2_wc_range),
> +	[TCA_DUALPI2_ECN_MASK]		= {.type = NLA_U8},
> +	[TCA_DUALPI2_SPLIT_GSO]		= {.type = NLA_U8},
> +};
> +
> +static int dualpi2_change(struct Qdisc *sch, struct nlattr *opt,
> +			  struct netlink_ext_ack *extack)
> +{
> +	struct nlattr *tb[TCA_DUALPI2_MAX + 1];
> +	struct dualpi2_sched_data *q;
> +	int old_backlog;
> +	int old_qlen;
> +	int err;
> +
> +	if (!opt)
> +		return -EINVAL;
> +	err = nla_parse_nested(tb, TCA_DUALPI2_MAX, opt, dualpi2_policy,
> +			       extack);
> +	if (err < 0)
> +		return err;
> +
> +	q = qdisc_priv(sch);
> +	sch_tree_lock(sch);
> +
> +	if (tb[TCA_DUALPI2_LIMIT]) {
> +		u32 limit = nla_get_u32(tb[TCA_DUALPI2_LIMIT]);
> +
> +		WRITE_ONCE(sch->limit, limit);
> +		WRITE_ONCE(q->memory_limit, get_memory_limit(sch, limit));
> +	}
> +
> +	if (tb[TCA_DUALPI2_MEMORY_LIMIT])
> +		WRITE_ONCE(q->memory_limit,
> +			   nla_get_u32(tb[TCA_DUALPI2_MEMORY_LIMIT]));
> +
> +	if (tb[TCA_DUALPI2_TARGET]) {
> +		u64 target = nla_get_u32(tb[TCA_DUALPI2_TARGET]);
> +
> +		WRITE_ONCE(q->pi2_target, target * NSEC_PER_USEC);
> +	}
> +
> +	if (tb[TCA_DUALPI2_TUPDATE]) {
> +		u64 tupdate = nla_get_u32(tb[TCA_DUALPI2_TUPDATE]);
> +
> +		WRITE_ONCE(q->pi2_tupdate, tupdate * NSEC_PER_USEC);
> +	}
> +
> +	if (tb[TCA_DUALPI2_ALPHA]) {
> +		u32 alpha = nla_get_u32(tb[TCA_DUALPI2_ALPHA]);
> +
> +		WRITE_ONCE(q->pi2_alpha, dualpi2_scale_alpha_beta(alpha));
> +	}
> +
> +	if (tb[TCA_DUALPI2_BETA]) {
> +		u32 beta = nla_get_u32(tb[TCA_DUALPI2_BETA]);
> +
> +		WRITE_ONCE(q->pi2_beta, dualpi2_scale_alpha_beta(beta));
> +	}
> +
> +	if (tb[TCA_DUALPI2_STEP_PACKETS]) {
> +		bool step_pkt = !!nla_get_u8(tb[TCA_DUALPI2_STEP_PACKETS]);

Would it be better to define TCA_DUALPI2_STEP_PACKETS as type NLA_FLAG
to avoid the u8 to bool conversion?

> +		u32 step_th = READ_ONCE(q->step_thresh);
> +
> +		WRITE_ONCE(q->step_in_packets, step_pkt);
> +		WRITE_ONCE(q->step_thresh,
> +			   step_pkt ? step_th : (step_th * NSEC_PER_USEC));
> +	}
> +
> +	if (tb[TCA_DUALPI2_STEP_THRESH]) {
> +		u32 step_th = nla_get_u32(tb[TCA_DUALPI2_STEP_THRESH]);
> +		bool step_pkt = READ_ONCE(q->step_in_packets);
> +
> +		WRITE_ONCE(q->step_thresh,
> +			   step_pkt ? step_th : (step_th * NSEC_PER_USEC));
> +	}
> +
> +	if (tb[TCA_DUALPI2_MIN_QLEN_STEP])
> +		WRITE_ONCE(q->min_qlen_step,
> +			   nla_get_u32(tb[TCA_DUALPI2_MIN_QLEN_STEP]));
> +
> +	if (tb[TCA_DUALPI2_COUPLING]) {
> +		u8 coupling = nla_get_u8(tb[TCA_DUALPI2_COUPLING]);
> +
> +		WRITE_ONCE(q->coupling_factor, coupling);
> +	}
> +
> +	if (tb[TCA_DUALPI2_DROP_OVERLOAD])
> +		WRITE_ONCE(q->drop_overload,
> +			   !!nla_get_u8(tb[TCA_DUALPI2_DROP_OVERLOAD]));

Type NLA_FLAG?

> +
> +	if (tb[TCA_DUALPI2_DROP_EARLY])
> +		WRITE_ONCE(q->drop_early,
> +			   !!nla_get_u8(tb[TCA_DUALPI2_DROP_EARLY]));

Type NLA_FLAG?

> +
> +	if (tb[TCA_DUALPI2_C_PROTECTION]) {
> +		u8 wc = nla_get_u8(tb[TCA_DUALPI2_C_PROTECTION]);
> +
> +		dualpi2_calculate_c_protection(sch, q, wc);
> +	}
> +
> +	if (tb[TCA_DUALPI2_ECN_MASK])
> +		WRITE_ONCE(q->ecn_mask,
> +			   nla_get_u8(tb[TCA_DUALPI2_ECN_MASK]));
> +
> +	if (tb[TCA_DUALPI2_SPLIT_GSO])
> +		WRITE_ONCE(q->split_gso,
> +			   !!nla_get_u8(tb[TCA_DUALPI2_SPLIT_GSO]));

Type NLA_FLAG?

> +
> +	old_qlen = qdisc_qlen(sch);
> +	old_backlog = sch->qstats.backlog;
> +	while (qdisc_qlen(sch) > sch->limit ||
> +	       q->memory_used > q->memory_limit) {
> +		struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
> +
> +		q->memory_used -= skb->truesize;
> +		qdisc_qstats_backlog_dec(sch, skb);
> +		rtnl_qdisc_drop(skb, sch);
> +	}
> +	qdisc_tree_reduce_backlog(sch, old_qlen - qdisc_qlen(sch),
> +				  old_backlog - sch->qstats.backlog);
> +
> +	sch_tree_unlock(sch);
> +	return 0;
> +}

Chia-Yu Chang (Nokia) April 23, 2025, 5:15 p.m. UTC | #2

> -----Original Message-----
> From: Donald Hunter <donald.hunter@gmail.com> 
> Sent: Wednesday, April 23, 2025 1:29 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>
> Cc: xandfury@gmail.com; netdev@vger.kernel.org; dave.taht@gmail.com; pabeni@redhat.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; edumazet@google.com; horms@kernel.org; andrew+netdev@lunn.ch; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white <g.white@cablelabs.com>; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel <vidhi_goel@apple.com>
> Subject: Re: [PATCH v12 net-next 1/5] Documentation: netlink: specs: tc: Add DualPI2 specification
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> chia-yu.chang@nokia-bell-labs.com writes:
> 
> > From: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >
> > Introduce the specification of tc qdisc DualPI2 stats and attributes, 
> > which is the reference implementation of IETF RFC9332 DualQ Coupled 
> > AQM
> > (https://datatracker.ietf.org/doc/html/rfc9332) providing two 
> > different
> > queues: low latency queue (L-queue) and classic queue (C-queue).
> >
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> > ---
> >  Documentation/netlink/specs/tc.yaml | 144 
> > ++++++++++++++++++++++++++++
> >  1 file changed, 144 insertions(+)
> 
> The syntax is not valid so this doesn't pass the schema check and presumably hasn't been tested. Please validate YNL .yaml additions e.g.
> 
> ./tools/net/ynl/pyynl/cli.py \
>     --spec Documentation/netlink/specs/tc.yaml \
>     --list-ops
> 
> ...
> jsonschema.exceptions.ValidationError: Additional properties are not allowed ('entries' was unexpected) ...
> On instance['attribute-sets'][30]['attributes'][14]:
>     {'name': 'gso_split',
>      'type': 'flags',
>      'doc': 'Split aggregated skb or not',
>      'entries': ['split_gso', 'no_split_gso']}
> 

Hi Donald,

	Thanks for the feedback, and I will take actions for below points as well as the corresponding iproute2-net fixes.
	One more question is I see "uint" type is not valid during validation - see below (but which was suggested in v11), shall I change it back to u32/u8?

Failed validating 'enum' in schema['properties']['definitions']['items']['properties']['members']['items']['properties']['type']:
    {'description': "The netlink attribute type. Members of type 'binary' "
                    "or 'pad'\n"
                    "must also have the 'len' property set.\n",
     'enum': ['u8',
              'u16',
              'u32',
              'u64',
              's8',
              's16',
              's32',
              's64',
              'string',
              'binary',
              'pad']}

On instance['definitions'][42]['members'][12]['type']:
    'uint'	

Best regards,
Chia-Yu
 
> >
> > diff --git a/Documentation/netlink/specs/tc.yaml 
> > b/Documentation/netlink/specs/tc.yaml
> > index aacccea5dfe4..08255bba81c4 100644
> > --- a/Documentation/netlink/specs/tc.yaml
> > +++ b/Documentation/netlink/specs/tc.yaml
> > @@ -816,6 +816,58 @@ definitions:
> >        -
> >          name: drop-overmemory
> >          type: u32
> > +  -
> > +    name: tc-dualpi2-xstats
> > +    type: struct
> > +    members:
> > +      -
> > +        name: prob
> > +        type: uint
> > +        doc: Current probability
> > +      -
> > +        name: delay_c
> 
> Please use dashes in member names, e.g. "delay-c", to follow YNL conventions. Same for all member and attribute names below.
> 
> > +        type: uint
> > +        doc: Current C-queue delay in microseconds
> > +      -
> > +        name: delay_l
> > +        type: uint
> > +        doc: Current L-queue delay in microseconds
> > +      -
> > +        name: pkts_in_c
> > +        type: uint
> > +        doc: Number of packets enqueued in the C-queue
> > +      -
> > +        name: pkts_in_l
> > +        type: uint
> > +        doc: Number of packets enqueued in the L-queue
> > +      -
> > +        name: maxq
> > +        type: uint
> > +        doc: Maximum number of packets seen by the DualPI2
> > +      -
> > +        name: ecn_mark
> > +        type: uint
> > +        doc: All packets marked with ecn
> > +      -
> > +        name: step_mark
> > +        type: uint
> > +        doc: Only packets marked with ecn due to L-queue step AQM
> > +      -
> > +        name: credit
> > +        type: int
> > +        doc: Current credit value for WRR
> > +      -
> > +        name: memory_used
> > +        type: uint
> > +        doc: Memory used in bytes by the DualPI2
> > +      -
> > +        name: max_memory_used
> > +        type: uint
> > +        doc: Maximum memory used in bytes by the DualPI2
> > +      -
> > +        name: memory_limit
> > +        type: uint
> > +        doc: Memory limit in bytes
> >    -
> >      name: tc-fq-pie-xstats
> >      type: struct
> > @@ -2299,6 +2351,92 @@ attribute-sets:
> >        -
> >          name: quantum
> >          type: u32
> > +  -
> > +    name: tc-dualpi2-attrs
> > +    attributes:
> > +      -
> > +        name: limit
> > +        type: uint
> > +        doc: Limit of total number of packets in queue
> > +      -
> > +        name: memlimit
> > +        type: uint
> > +        doc: Memory limit of total number of packets in queue
> > +      -
> > +        name: target
> > +        type: uint
> > +        doc: Classic target delay in microseconds
> > +      -
> > +        name: tupdate
> > +        type: uint
> > +        doc: Drop probability update interval time in microseconds
> > +      -
> > +        name: alpha
> > +        type: uint
> > +        doc: Integral gain factor in Hz for PI controller
> > +      -
> > +        name: beta
> > +        type: uint
> > +        doc: Proportional gain factor in Hz for PI controller
> > +      -
> > +        name: step_thresh
> > +        type: uint
> > +        doc: L4S step marking threshold in microseconds or in packet (see step_packets)
> > +      -
> > +        name: step_packets
> > +        type: flags
> > +        doc: L4S Step marking threshold unit
> > +        entries:
> > +        - microseconds
> > +        - packets
> 
> This is not valid syntax. Enumerations and sets of flags need to be defined separately. For example, look at the definition of tc-cls-flags and its usage.
> 
> BUT step_packets is defined as a boolean in the implementation so could be implemented as a boolean flag in the API. If it needs to be extensible in future then it should be declared as an enum in uAPI and defined in this spec as an enum. Either way, the parsing and policy in patch 2 should be made more robust.
> 
> > +      -
> > +        name: min_qlen_step
> > +        type: uint
> > +        doc: Packets enqueued to the L-queue can apply the step threshold when the queue length of L-queue is larger than this value. (0 is recommended)
> > +      -
> > +        name: coupling_factor
> > +        type: uint
> > +        doc: Probability coupling factor between Classic and L4S (2 is recommended)
> > +      -
> > +        name: drop_overload
> > +        type: flags
> > +        doc: Control the overload strategy (drop to preserve latency or let the queue overflow)
> > +        entries:
> > +        - drop_on_overload
> > +        - overflow
> 
> Not valid syntax. Use a boolean flag or define an enum.
> 
> > +      -
> > +        name: drop_early
> > +        type: flags
> > +        doc: Decide where the Classic packets are PI-based dropped or marked
> > +        entries:
> > +        - drop_enqueue
> > +        - drop_dequeue
> 
> Not valid syntax. Use a boolean flag or define an enum.
> 
> > +      -
> > +        name: classic_protection
> > +        type: uint
> > +        doc: Classic WRR weight in percentage (from 0 to 100)
> > +      -
> > +        name: ecn_mask
> > +        type: flags
> > +        doc: Configure the L-queue ECN classifier
> > +        entries:
> > +        - l4s_ect
> > +        - any_ect
> 
> Not valid syntax. Type should probably match implementation, unless you want to enumerate the valid values by definining an enum.
> 
> > +      -
> > +        name: gso_split
> > +        type: flags
> > +        doc: Split aggregated skb or not
> > +        entries:
> > +        - split_gso
> > +        - no_split_gso
> 
> Not valid syntax. Use a boolean flag or define an enum.
> 
> > +      -
> > +        name: max_rtt
> > +        type: uint
> > +        doc: The maximum expected RTT of the traffic that is controlled by DualPI2 in usec
> > +      -
> > +        name: typical_rtt
> > +        type: uint
> > +        doc: The typical base RTT of the traffic that is controlled 
> > + by DualPI2 in usec
> >    -
> >      name: tc-ematch-attrs
> >      attributes:
> > @@ -3679,6 +3817,9 @@ sub-messages:
> >        -
> >          value: drr
> >          attribute-set: tc-drr-attrs
> > +      -
> > +        value: dualpi2
> > +        attribute-set: tc-dualpi2-attrs
> >        -
> >          value: etf
> >          attribute-set: tc-etf-attrs
> > @@ -3846,6 +3987,9 @@ sub-messages:
> >        -
> >          value: codel
> >          fixed-header: tc-codel-xstats
> > +      -
> > +        value: dualpi2
> > +        fixed-header: tc-dualpi2-xstats
> >        -
> >          value: fq
> >          fixed-header: tc-fq-qd-stats