From patchwork Wed Jun 9 13:55:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 457403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28061C48BE0 for ; Wed, 9 Jun 2021 13:56:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0D8BB61285 for ; Wed, 9 Jun 2021 13:56:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236946AbhFIN54 (ORCPT ); Wed, 9 Jun 2021 09:57:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236920AbhFIN5y (ORCPT ); Wed, 9 Jun 2021 09:57:54 -0400 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52176C0617AD; Wed, 9 Jun 2021 06:55:58 -0700 (PDT) Received: by mail-wm1-x336.google.com with SMTP id l18-20020a1ced120000b029014c1adff1edso4343983wmh.4; Wed, 09 Jun 2021 06:55:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tALAO84BEXwlra45I/+aqzN4vNsFkPhbOFHYbn72oHs=; b=Hw4mRZsUOIm/vf8qZK3Lhzh7sUXQSNVeoTtGrYSTbuJG+860zzm/CMlmDk0Y+0eFpR i7wuAWXV05KvXEFxdCWUKMkSqmTvop7ltWPt+JCk9tnxpdoPvzwA62Bv2ssGorND0hnJ HUkMgVxXX6+mlK6EGKDX/5hwlNRpcdLmHgRRZ5hhqYQ4FCEcjs2Nu3tyQFfwVGYgqQNX D8/HghFQzKmvYjVUlQH7P2LUWet59p+eHkJlkElymdAfocIwsFNu/KQQTDZQRBKkj8Cl oFEFDJAzkHzB29P2u+dZCRDtp6bwdDPuD5YkQSo5Z7gNAJdlaZB4NVHVr0NlOzsA0Z8r lAww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tALAO84BEXwlra45I/+aqzN4vNsFkPhbOFHYbn72oHs=; b=hDl0rPchQyC0HzoXfr6QvC+S+6zdXVrTb+k5j6vR4cPWsIkyGzLu3PxFFWWotOs66x Uy0CtaIm9CbmYmSD5Q6nLtQza0jDeMLP+G/Dnt/VB4ODQuD38+jWIofXVRcXFoMPjJLg OZfLkTW96WX5jGShwf0794jQm9EALuu+oQsY//1kAjuav0ts5kGL8lZHJxZP1UT0tlJ+ oemrSdM+Gs7NJDF+R7XNx6sbGIHlx7LJ3LfuK8JFXlpL4V/LGIkinL5yCqb9ldBAQAoO Mw6uLPtumm1C4xL+nIohbLJ+7LUZdJKdYjgUhQOzKiXMFGy1ra3imbeRza7RHWnDfyE+ JCNw== X-Gm-Message-State: AOAM530Mbgwo4RYx6nNTfv13f72pfdpc8zpCEbDnxhVCTVWNewmjPSZS JWtUHVMJUD3eZhjgsQscgWxTgHz4Wd6Hvks= X-Google-Smtp-Source: ABdhPJxQ6UCvfO5UbfWoaDejEReEAGf7r0T9lvcRbzxx88+aBf3ZaPVCxn/OBCVVWGeZUTTqmhUOrQ== X-Received: by 2002:a1c:f206:: with SMTP id s6mr4205924wmc.102.1623246954004; Wed, 09 Jun 2021 06:55:54 -0700 (PDT) Received: from balnab.. ([37.17.237.224]) by smtp.gmail.com with ESMTPSA id q20sm4575wrf.45.2021.06.09.06.55.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Jun 2021 06:55:53 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, Jussi Maki Subject: [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Date: Wed, 9 Jun 2021 13:55:35 +0000 Message-Id: <20210609135537.1460244-2-joamaki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210609135537.1460244-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org XDP is implemented in the bonding driver by transparently delegating the XDP program loading, removal and xmit operations to the bonding slave devices. The overall goal of this work is that XDP programs can be attached to a bond device *without* any further changes (or awareness) necessary to the program itself, meaning the same XDP program can be attached to a native device but also a bonding device. Semantics of XDP_TX when attached to a bond are equivalent in such setting to the case when a tc/BPF program would be attached to the bond, meaning transmitting the packet out of the bond itself using one of the bond's configured xmit methods to select a slave device (rather than XDP_TX on the slave itself). Handling of XDP_TX to transmit using the configured bonding mechanism is therefore implemented by rewriting the BPF program return value in bpf_prog_run_xdp. To avoid performance impact this check is guarded by a static key, which is incremented when a XDP program is loaded onto a bond device. This approach was chosen to avoid changes to drivers implementing XDP. If the slave device does not match the receive device, then XDP_REDIRECT is transparently used to perform the redirection in order to have the network driver release the packet from its RX ring. The bonding driver hashing functions have been refactored to allow reuse with xdp_buff's to avoid code duplication. The motivation for this change is to enable use of bonding (and 802.3ad) in hairpinning L4 load-balancers such as [1] implemented with XDP and also to transparently support bond devices for projects that use XDP given most modern NICs have dual port adapters. An alternative to this approach would be to implement 802.3ad in user-space and implement the bonding load-balancing in the XDP program itself, but is rather a cumbersome endeavor in terms of slave device management (e.g. by watching netlink) and requires separate programs for native vs bond cases for the orchestrator. A native in-kernel implementation overcomes these issues and provides more flexibility. Below are benchmark results done on two machines with 100Gbit Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and 16-core 3950X on receiving machine. 64 byte packets were sent with pktgen-dpdk at full rate. Two issues [2, 3] were identified with the ice driver, so the tests were performed with iommu=off and patch [2] applied. Additionally the bonding round robin algorithm was modified to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate of cache misses were caused by the shared rr_tx_counter (see patch 2/3). The statistics were collected using "sar -n dev -u 1 10". -----------------------| CPU |--| rxpck/s |--| txpck/s |---- without patch (1 dev): XDP_DROP: 3.15% 48.6Mpps XDP_TX: 3.12% 18.3Mpps 18.3Mpps XDP_DROP (RSS): 9.47% 116.5Mpps XDP_TX (RSS): 9.67% 25.3Mpps 24.2Mpps ----------------------- with patch, bond (1 dev): XDP_DROP: 3.14% 46.7Mpps XDP_TX: 3.15% 13.9Mpps 13.9Mpps XDP_DROP (RSS): 10.33% 117.2Mpps XDP_TX (RSS): 10.64% 25.1Mpps 24.0Mpps ----------------------- with patch, bond (2 devs): XDP_DROP: 6.27% 92.7Mpps XDP_TX: 6.26% 17.6Mpps 17.5Mpps XDP_DROP (RSS): 11.38% 117.2Mpps XDP_TX (RSS): 14.30% 28.7Mpps 27.4Mpps -------------------------------------------------------------- RSS: Receive Side Scaling, e.g. the packets were sent to a range of destination IPs. [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/ Signed-off-by: Jussi Maki Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: kernel test robot --- drivers/net/bonding/bond_main.c | 441 ++++++++++++++++++++++++++++---- include/linux/filter.h | 13 +- include/linux/netdevice.h | 5 + include/net/bonding.h | 1 + kernel/bpf/devmap.c | 34 ++- net/core/filter.c | 37 ++- 6 files changed, 467 insertions(+), 64 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index dafeaef3cbd3..38eea7e096f3 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -317,6 +317,19 @@ bool bond_sk_check(struct bonding *bond) } } +static bool bond_xdp_check(struct bonding *bond) +{ + switch (BOND_MODE(bond)) { + case BOND_MODE_ROUNDROBIN: + case BOND_MODE_ACTIVEBACKUP: + case BOND_MODE_8023AD: + case BOND_MODE_XOR: + return true; + default: + return false; + } +} + /*---------------------------------- VLAN -----------------------------------*/ /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid, @@ -2001,6 +2014,28 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, if (bond_mode_can_use_xmit_hash(bond)) bond_update_slave_arr(bond, NULL); + if (bond->xdp_prog) { + struct netdev_bpf xdp = { + .command = XDP_SETUP_PROG, + .flags = 0, + .prog = bond->xdp_prog, + .extack = extack, + }; + if (!slave_dev->netdev_ops->ndo_bpf || + !slave_dev->netdev_ops->ndo_xdp_xmit) { + NL_SET_ERR_MSG(extack, "Slave does not support XDP"); + slave_err(bond_dev, slave_dev, "Slave does not support XDP\n"); + res = -EOPNOTSUPP; + goto err_sysfs_del; + } + res = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp); + if (res < 0) { + /* ndo_bpf() sets extack error message */ + slave_dbg(bond_dev, slave_dev, "Error %d calling ndo_bpf\n", res); + goto err_sysfs_del; + } + bpf_prog_inc(bond->xdp_prog); + } slave_info(bond_dev, slave_dev, "Enslaving as %s interface with %s link\n", bond_is_active_slave(new_slave) ? "an active" : "a backup", @@ -2121,6 +2156,17 @@ static int __bond_release_one(struct net_device *bond_dev, /* recompute stats just before removing the slave */ bond_get_stats(bond->dev, &bond->bond_stats); + if (bond->xdp_prog) { + struct netdev_bpf xdp = { + .command = XDP_SETUP_PROG, + .flags = 0, + .prog = NULL, + .extack = NULL, + }; + if (slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp)) + slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n"); + } + bond_upper_dev_unlink(bond, slave); /* unregister rx_handler early so bond_handle_frame wouldn't be called * for this slave anymore. @@ -3479,55 +3525,80 @@ static struct notifier_block bond_netdev_notifier = { /*---------------------------- Hashing Policies -----------------------------*/ +/* Helper to access data in a packet, with or without a backing skb. + * If skb is given the data is linearized if necessary via pskb_may_pull. + */ +static inline const void *bond_pull_data(struct sk_buff *skb, + const void *data, int hlen, int n) +{ + if (likely(n <= hlen)) + return data; + else if (skb && likely(pskb_may_pull(skb, n))) + return skb->head; + + return NULL; +} + /* L2 hash helper */ -static inline u32 bond_eth_hash(struct sk_buff *skb) +static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen) { - struct ethhdr *ep, hdr_tmp; + struct ethhdr *ep; - ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp); - if (ep) - return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto; - return 0; + data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr)); + if (!data) + return 0; + + ep = (struct ethhdr *)(data + mhoff); + return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto; } -static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, - int *noff, int *proto, bool l34) +static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data, + int hlen, int l2_proto, int *nhoff, int *ip_proto, bool l34) { const struct ipv6hdr *iph6; const struct iphdr *iph; - if (skb->protocol == htons(ETH_P_IP)) { - if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph)))) + if (l2_proto == htons(ETH_P_IP)) { + data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph)); + if (!data) return false; - iph = (const struct iphdr *)(skb->data + *noff); + + iph = (const struct iphdr *)(data + *nhoff); iph_to_flow_copy_v4addrs(fk, iph); - *noff += iph->ihl << 2; + *nhoff += iph->ihl << 2; if (!ip_is_fragment(iph)) - *proto = iph->protocol; - } else if (skb->protocol == htons(ETH_P_IPV6)) { - if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6)))) + *ip_proto = iph->protocol; + } else if (l2_proto == htons(ETH_P_IPV6)) { + data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6)); + if (!data) return false; - iph6 = (const struct ipv6hdr *)(skb->data + *noff); + + iph6 = (const struct ipv6hdr *)(data + *nhoff); iph_to_flow_copy_v6addrs(fk, iph6); - *noff += sizeof(*iph6); - *proto = iph6->nexthdr; + *nhoff += sizeof(*iph6); + *ip_proto = iph6->nexthdr; } else { return false; } - if (l34 && *proto >= 0) - fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto); + if (l34 && *ip_proto >= 0) + fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen); return true; } -static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) +static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen) { - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); + struct ethhdr *mac_hdr; u32 srcmac_vendor = 0, srcmac_dev = 0; u16 vlan; int i; + data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr)); + if (!data) + return 0; + mac_hdr = (struct ethhdr *)(data + mhoff); + for (i = 0; i < 3; i++) srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i]; @@ -3543,26 +3614,30 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) } /* Extract the appropriate headers based on bond's xmit policy */ -static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, +static bool bond_flow_dissect(struct bonding *bond, + struct sk_buff *skb, + const void *data, + __be16 l2_proto, + int nhoff, + int hlen, struct flow_keys *fk) { bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34; - int noff, proto = -1; + int ip_proto = -1; switch (bond->params.xmit_policy) { case BOND_XMIT_POLICY_ENCAP23: case BOND_XMIT_POLICY_ENCAP34: memset(fk, 0, sizeof(*fk)); return __skb_flow_dissect(NULL, skb, &flow_keys_bonding, - fk, NULL, 0, 0, 0, 0); + fk, data, l2_proto, nhoff, hlen, 0); default: break; } fk->ports.ports = 0; memset(&fk->icmp, 0, sizeof(fk->icmp)); - noff = skb_network_offset(skb); - if (!bond_flow_ip(skb, fk, &noff, &proto, l34)) + if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34)) return false; /* ICMP error packets contains at least 8 bytes of the header @@ -3570,22 +3645,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, * to correlate ICMP error packets within the same flow which * generated the error. */ - if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) { - skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data, - skb_transport_offset(skb), - skb_headlen(skb)); - if (proto == IPPROTO_ICMP) { + if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) { + skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen); + if (ip_proto == IPPROTO_ICMP) { if (!icmp_is_err(fk->icmp.type)) return true; - noff += sizeof(struct icmphdr); - } else if (proto == IPPROTO_ICMPV6) { + nhoff += sizeof(struct icmphdr); + } else if (ip_proto == IPPROTO_ICMPV6) { if (!icmpv6_is_err(fk->icmp.type)) return true; - noff += sizeof(struct icmp6hdr); + nhoff += sizeof(struct icmp6hdr); } - return bond_flow_ip(skb, fk, &noff, &proto, l34); + return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34); } return true; @@ -3601,33 +3674,30 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow) return hash >> 1; } -/** - * bond_xmit_hash - generate a hash value based on the xmit policy - * @bond: bonding device - * @skb: buffer to use for headers - * - * This function will extract the necessary headers from the skb buffer and use - * them to generate a hash based on the xmit_policy set in the bonding device +/* Generate hash based on xmit policy. If @skb is given it is used to linearize + * the data as required, but this function can be used without it. */ -u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) +static u32 __bond_xmit_hash(struct bonding *bond, + struct sk_buff *skb, + const void *data, + __be16 l2_proto, + int mhoff, + int nhoff, + int hlen) { struct flow_keys flow; u32 hash; - if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 && - skb->l4_hash) - return skb->hash; - if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC) - return bond_vlan_srcmac_hash(skb); + return bond_vlan_srcmac_hash(skb, data, mhoff, hlen); if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 || - !bond_flow_dissect(bond, skb, &flow)) - return bond_eth_hash(skb); + !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow)) + return bond_eth_hash(skb, data, mhoff, hlen); if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 || bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) { - hash = bond_eth_hash(skb); + hash = bond_eth_hash(skb, data, mhoff, hlen); } else { if (flow.icmp.id) memcpy(&hash, &flow.icmp, sizeof(hash)); @@ -3638,6 +3708,48 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) return bond_ip_hash(hash, &flow); } +/** + * bond_xmit_hash_skb - generate a hash value based on the xmit policy + * @bond: bonding device + * @skb: buffer to use for headers + * + * This function will extract the necessary headers from the skb buffer and use + * them to generate a hash based on the xmit_policy set in the bonding device + */ +u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) +{ + if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 && + skb->l4_hash) + return skb->hash; + + return __bond_xmit_hash(bond, skb, skb->head, skb->protocol, + skb->mac_header, + skb->network_header, + skb_headlen(skb)); +} + +/** + * bond_xmit_hash_xdp - generate a hash value based on the xmit policy + * @bond: bonding device + * @xdp: buffer to use for headers + * + * XDP variant of bond_xmit_hash. + */ +static u32 bond_xmit_hash_xdp(struct bonding *bond, struct xdp_buff *xdp) +{ + struct ethhdr *eth; + + if (xdp->data + sizeof(struct ethhdr) > xdp->data_end) + return 0; + + eth = (struct ethhdr *)xdp->data; + + return __bond_xmit_hash(bond, NULL, xdp->data, eth->h_proto, + 0, + sizeof(struct ethhdr), + xdp->data_end - xdp->data); +} + /*-------------------------- Device entry points ----------------------------*/ void bond_work_init_all(struct bonding *bond) @@ -4254,6 +4366,47 @@ static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond, return NULL; } +static struct slave *bond_xdp_xmit_roundrobin_slave_get(struct bonding *bond, + struct xdp_buff *xdp) +{ + struct slave *slave; + int slave_cnt; + u32 slave_id; + const struct ethhdr *eth; + void *data = xdp->data; + + if (data + sizeof(struct ethhdr) > xdp->data_end) + goto non_igmp; + + eth = (struct ethhdr *)data; + data += sizeof(struct ethhdr); + + /* See comment on IGMP in bond_xmit_roundrobin_slave_get() */ + if (eth->h_proto == htons(ETH_P_IP)) { + const struct iphdr *iph; + + if (data + sizeof(struct iphdr) > xdp->data_end) + goto non_igmp; + + iph = (struct iphdr *)data; + + if (iph->protocol == IPPROTO_IGMP) { + slave = rcu_dereference(bond->curr_active_slave); + if (slave) + return slave; + return bond_get_slave_by_id(bond, 0); + } + } + +non_igmp: + slave_cnt = READ_ONCE(bond->slave_cnt); + if (likely(slave_cnt)) { + slave_id = bond_rr_gen_slave_id(bond) % slave_cnt; + return bond_get_slave_by_id(bond, slave_id); + } + return NULL; +} + static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev) { @@ -4267,8 +4420,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb, return bond_tx_drop(bond_dev, skb); } -static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond, - struct sk_buff *skb) +static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond) { return rcu_dereference(bond->curr_active_slave); } @@ -4282,7 +4434,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb, struct bonding *bond = netdev_priv(bond_dev); struct slave *slave; - slave = bond_xmit_activebackup_slave_get(bond, skb); + slave = bond_xmit_activebackup_slave_get(bond); if (slave) return bond_dev_queue_xmit(bond, skb, slave->dev); @@ -4470,6 +4622,22 @@ static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond, return slave; } +static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond, + struct xdp_buff *xdp) +{ + struct bond_up_slave *slaves; + unsigned int count; + u32 hash; + + hash = bond_xmit_hash_xdp(bond, xdp); + slaves = bond->usable_slaves; + count = slaves ? READ_ONCE(slaves->count) : 0; + if (unlikely(!count)) + return NULL; + + return slaves->arr[hash % count]; +} + /* Use this Xmit function for 3AD as well as XOR modes. The current * usable slave array is formed in the control path. The xmit function * just calculates hash and sends the packet out. @@ -4580,7 +4748,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev, slave = bond_xmit_roundrobin_slave_get(bond, skb); break; case BOND_MODE_ACTIVEBACKUP: - slave = bond_xmit_activebackup_slave_get(bond, skb); + slave = bond_xmit_activebackup_slave_get(bond); break; case BOND_MODE_8023AD: case BOND_MODE_XOR: @@ -4754,6 +4922,164 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev) return ret; } +struct net_device * +bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp) +{ + struct bonding *bond = netdev_priv(bond_dev); + struct slave *slave; + + /* Caller needs to hold rcu_read_lock() */ + + switch (BOND_MODE(bond)) { + case BOND_MODE_ROUNDROBIN: + slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp); + break; + + case BOND_MODE_ACTIVEBACKUP: + slave = bond_xmit_activebackup_slave_get(bond); + break; + + case BOND_MODE_8023AD: + case BOND_MODE_XOR: + slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp); + break; + + default: + /* Should never happen. Mode guarded by bond_xdp_check() */ + netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond)); + WARN_ON_ONCE(1); + return NULL; + } + + if (slave) + return slave->dev; + + return NULL; +} + +static int bond_xdp_xmit(struct net_device *bond_dev, + int n, struct xdp_frame **frames, u32 flags) +{ + int nxmit, err = -ENXIO; + + rcu_read_lock(); + + for (nxmit = 0; nxmit < n; nxmit++) { + struct xdp_frame *frame = frames[nxmit]; + struct xdp_frame *frames1[] = {frame}; + struct net_device *slave_dev; + struct xdp_buff xdp; + + xdp_convert_frame_to_buff(frame, &xdp); + + slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp); + if (!slave_dev) { + err = -ENXIO; + break; + } + + err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags); + if (err < 1) + break; + } + + rcu_read_unlock(); + + /* If error happened on the first frame then we can pass the error up, otherwise + * report the number of frames that were xmitted. + */ + if (err < 0) + return (nxmit == 0 ? err : nxmit); + + return nxmit; +} + +static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netlink_ext_ack *extack) +{ + struct bonding *bond = netdev_priv(dev); + struct list_head *iter; + struct slave *slave, *rollback_slave; + struct bpf_prog *old_prog; + struct netdev_bpf xdp = { + .command = XDP_SETUP_PROG, + .flags = 0, + .prog = prog, + .extack = extack, + }; + int err; + + ASSERT_RTNL(); + + if (!bond_xdp_check(bond)) + return -EOPNOTSUPP; + + old_prog = bond->xdp_prog; + bond->xdp_prog = prog; + + bond_for_each_slave(bond, slave, iter) { + struct net_device *slave_dev = slave->dev; + + if (!slave_dev->netdev_ops->ndo_bpf || + !slave_dev->netdev_ops->ndo_xdp_xmit) { + NL_SET_ERR_MSG(extack, "Slave device does not support XDP"); + slave_err(dev, slave_dev, "Slave does not support XDP\n"); + err = -EOPNOTSUPP; + goto err; + } + err = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp); + if (err < 0) { + /* ndo_bpf() sets extack error message */ + slave_err(dev, slave_dev, "Error %d calling ndo_bpf\n", err); + goto err; + } + if (prog) + bpf_prog_inc(prog); + } + + if (old_prog) + bpf_prog_put(old_prog); + + if (prog) + static_branch_inc(&bpf_bond_redirect_enabled_key); + else + static_branch_dec(&bpf_bond_redirect_enabled_key); + + return 0; + +err: + /* unwind the program changes */ + bond->xdp_prog = old_prog; + xdp.prog = old_prog; + xdp.extack = NULL; /* do not overwrite original error */ + + bond_for_each_slave(bond, rollback_slave, iter) { + struct net_device *slave_dev = rollback_slave->dev; + int err_unwind; + + if (slave == rollback_slave) + break; + + err_unwind = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp); + if (err_unwind < 0) + slave_err(dev, slave_dev, + "Error %d when unwinding XDP program change\n", err_unwind); + else if (xdp.prog) + bpf_prog_inc(xdp.prog); + } + return err; +} + +static int bond_xdp(struct net_device *dev, struct netdev_bpf *xdp) +{ + switch (xdp->command) { + case XDP_SETUP_PROG: + return bond_xdp_set(dev, xdp->prog, xdp->extack); + default: + return -EINVAL; + } +} + static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed) { if (speed == 0 || speed == SPEED_UNKNOWN) @@ -4840,6 +5166,9 @@ static const struct net_device_ops bond_netdev_ops = { .ndo_features_check = passthru_features_check, .ndo_get_xmit_slave = bond_xmit_get_slave, .ndo_sk_get_lower_dev = bond_sk_get_lower_dev, + .ndo_bpf = bond_xdp, + .ndo_xdp_xmit = bond_xdp_xmit, + .ndo_xdp_get_xmit_slave = bond_xdp_get_xmit_slave, }; static const struct device_type bond_type = { diff --git a/include/linux/filter.h b/include/linux/filter.h index c5ad7df029ed..57c166089456 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -760,6 +760,10 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, DECLARE_BPF_DISPATCHER(xdp) +DECLARE_STATIC_KEY_FALSE(bpf_bond_redirect_enabled_key); + +u32 xdp_bond_redirect(struct xdp_buff *xdp); + static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, struct xdp_buff *xdp) { @@ -769,7 +773,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, * already takes rcu_read_lock() when fetching the program, so * it's not necessary here anymore. */ - return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); + u32 act = __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); + + if (static_branch_unlikely(&bpf_bond_redirect_enabled_key)) { + if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) + act = xdp_bond_redirect(xdp); + } + + return act; } void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5cbc950b34df..1a6cc6356498 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1321,6 +1321,9 @@ struct netdev_net_notifier { * that got dropped are freed/returned via xdp_return_frame(). * Returns negative number, means general error invoking ndo, meaning * no frames were xmit'ed and core-caller will free all frames. + * struct net_device *(*ndo_xdp_get_xmit_slave)(struct net_device *dev, + * struct xdp_buff *xdp); + * Get the xmit slave of master device based on the xdp_buff. * int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags); * This function is used to wake up the softirq, ksoftirqd or kthread * responsible for sending and/or receiving packets on a specific @@ -1539,6 +1542,8 @@ struct net_device_ops { int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp, u32 flags); + struct net_device * (*ndo_xdp_get_xmit_slave)(struct net_device *dev, + struct xdp_buff *xdp); int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags); struct devlink_port * (*ndo_get_devlink_port)(struct net_device *dev); diff --git a/include/net/bonding.h b/include/net/bonding.h index 019e998d944a..34acb81b4234 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -251,6 +251,7 @@ struct bonding { #ifdef CONFIG_XFRM_OFFLOAD struct xfrm_state *xs; #endif /* CONFIG_XFRM_OFFLOAD */ + struct bpf_prog *xdp_prog; }; #define bond_slave_get_rcu(dev) \ diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 2a75e6c2d27d..2caff5714f4d 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -514,9 +514,11 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, } static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp, - int exclude_ifindex) + int exclude_ifindex, int exclude_ifindex_master) { - if (!obj || obj->dev->ifindex == exclude_ifindex || + if (!obj || + obj->dev->ifindex == exclude_ifindex || + obj->dev->ifindex == exclude_ifindex_master || !obj->dev->netdev_ops->ndo_xdp_xmit) return false; @@ -546,12 +548,19 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0; + int exclude_ifindex_master = 0; struct bpf_dtab_netdev *dst, *last_dst = NULL; struct hlist_head *head; struct xdp_frame *xdpf; unsigned int i; int err; + if (static_branch_unlikely(&bpf_bond_redirect_enabled_key)) { + struct net_device *master = netdev_master_upper_dev_get_rcu(dev_rx); + + exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0; + } + xdpf = xdp_convert_buff_to_frame(xdp); if (unlikely(!xdpf)) return -EOVERFLOW; @@ -559,7 +568,7 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, if (map->map_type == BPF_MAP_TYPE_DEVMAP) { for (i = 0; i < map->max_entries; i++) { dst = READ_ONCE(dtab->netdev_map[i]); - if (!is_valid_dst(dst, xdp, exclude_ifindex)) + if (!is_valid_dst(dst, xdp, exclude_ifindex, exclude_ifindex_master)) continue; /* we only need n-1 clones; last_dst enqueued below */ @@ -579,7 +588,9 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, head = dev_map_index_hash(dtab, i); hlist_for_each_entry_rcu(dst, head, index_hlist, lockdep_is_held(&dtab->index_lock)) { - if (!is_valid_dst(dst, xdp, exclude_ifindex)) + if (!is_valid_dst(dst, xdp, + exclude_ifindex, + exclude_ifindex_master)) continue; /* we only need n-1 clones; last_dst enqueued below */ @@ -646,16 +657,25 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); int exclude_ifindex = exclude_ingress ? dev->ifindex : 0; + int exclude_ifindex_master = 0; struct bpf_dtab_netdev *dst, *last_dst = NULL; struct hlist_head *head; struct hlist_node *next; unsigned int i; int err; + if (static_branch_unlikely(&bpf_bond_redirect_enabled_key)) { + struct net_device *master = netdev_master_upper_dev_get_rcu(dev); + + exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0; + } + if (map->map_type == BPF_MAP_TYPE_DEVMAP) { for (i = 0; i < map->max_entries; i++) { dst = READ_ONCE(dtab->netdev_map[i]); - if (!dst || dst->dev->ifindex == exclude_ifindex) + if (!dst || + dst->dev->ifindex == exclude_ifindex || + dst->dev->ifindex == exclude_ifindex_master) continue; /* we only need n-1 clones; last_dst enqueued below */ @@ -674,7 +694,9 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, for (i = 0; i < dtab->n_buckets; i++) { head = dev_map_index_hash(dtab, i); hlist_for_each_entry_safe(dst, next, head, index_hlist) { - if (!dst || dst->dev->ifindex == exclude_ifindex) + if (!dst || + dst->dev->ifindex == exclude_ifindex || + dst->dev->ifindex == exclude_ifindex_master) continue; /* we only need n-1 clones; last_dst enqueued below */ diff --git a/net/core/filter.c b/net/core/filter.c index caa88955562e..5d268eb980e7 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2469,6 +2469,7 @@ int skb_do_redirect(struct sk_buff *skb) ri->flags = 0; if (unlikely(!dev)) goto out_drop; + if (flags & BPF_F_PEER) { const struct net_device_ops *ops = dev->netdev_ops; @@ -3947,6 +3948,40 @@ void bpf_clear_redirect_map(struct bpf_map *map) } } +DEFINE_STATIC_KEY_FALSE(bpf_bond_redirect_enabled_key); +EXPORT_SYMBOL_GPL(bpf_bond_redirect_enabled_key); +INDIRECT_CALLABLE_DECLARE(struct net_device * + bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)); + +u32 xdp_bond_redirect(struct xdp_buff *xdp) +{ + struct net_device *master, *slave; + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev); + +#if IS_BUILTIN(CONFIG_BONDING) + slave = INDIRECT_CALL_1(master->netdev_ops->ndo_xdp_get_xmit_slave, + bond_xdp_get_xmit_slave, + master, xdp); +#else + slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); +#endif + if (slave && slave != xdp->rxq->dev) { + /* The target device is different from the receiving device, so + * redirect it to the new device. + * Using XDP_REDIRECT gets the correct behaviour from XDP enabled + * drivers to unmap the packet from their rx ring. + */ + ri->tgt_index = slave->ifindex; + ri->map_id = INT_MAX; + ri->map_type = BPF_MAP_TYPE_UNSPEC; + return XDP_REDIRECT; + } + return XDP_TX; +} +EXPORT_SYMBOL_GPL(xdp_bond_redirect); + int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { @@ -4466,7 +4501,7 @@ static const struct bpf_func_proto bpf_skb_cgroup_id_proto = { }; static inline u64 __bpf_sk_ancestor_cgroup_id(struct sock *sk, - int ancestor_level) + int ancestor_level) { struct cgroup *ancestor; struct cgroup *cgrp; From patchwork Wed Jun 9 13:55:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 458432 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E55BBC48BD1 for ; Wed, 9 Jun 2021 13:56:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C6A6861182 for ; Wed, 9 Jun 2021 13:56:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236938AbhFIN5z (ORCPT ); Wed, 9 Jun 2021 09:57:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234654AbhFIN5v (ORCPT ); Wed, 9 Jun 2021 09:57:51 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D6E2C061760; Wed, 9 Jun 2021 06:55:56 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id c5so25560533wrq.9; Wed, 09 Jun 2021 06:55:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tbTIFZQTIEpa1eyBbGiQT0roNrVenTXWuaspqjQrcdk=; b=gOmJuGsx9Phzg2NGfm5LJ7fSnle+7mBBXGBplGinnuk27o/HMwQ+PRxnNOYBKHK/Hw RWUxWQ0XlA/qfQEpB4IEIcvXezeVcjyZfLxuhadkXFrxImzZAYT28Alc7CWvqy+gXAa9 P9+kMN0uB79PkGjrmqeLel6ooCXAaYKRHw180gmkCEo7rKDp3bEqcNYigXAoSMRTeHsC zMpYSfxBa25ykISQWFB4RW003KeH0msoUfJxwgvMVVLZWVpXutYJ1yTGVGsPVPXKgLTE isTx6FFcQl6SJZFn9CMIIfhqq94dyXfDZffLSJ/5JriCjHcmUvtlXuwqQUVIk5tXrwZG Gw5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tbTIFZQTIEpa1eyBbGiQT0roNrVenTXWuaspqjQrcdk=; b=dpQSRvmPt7T/rlJl9rv4q3a4iHE0AZuyJE2scUvuNbF4jE+b9brY4sWpClmp1j9x6v DUkjO587XuSscLeCVUzfxdXc/N3TLs7PR7V9VPmV6DkJJKNh+Lg3G67ysiCtQ4T3WxA2 qY7D7GSMyBjSs0bBIwcMoU1hqeO/5Tiz/PtPi/IUFlL/j8xjLd2MaArdCdwMmcvLkwTm pdERXmhT2EtCZeHaDCqG3r5q2Tt9Vro3qE6cFx5lSi+sS7AMKyA7GsdLDBG8DtxLgQU2 w5+TbhGCLsf4tzqM2pPyzzKlZtheD2bLAj3aHbCjd3XWSYxG+lFHEqe27T/FZHTI1vrp 7f9g== X-Gm-Message-State: AOAM531VIN/6PtvNGRc+NJZ7DvRvYQgE8kbJqFL4pe4EH6wZ3tjepQtf ldcvFeAvMYyVX0DYjQOGtP+MsH2DfQinRJs= X-Google-Smtp-Source: ABdhPJyrlvXOBPRcaMWVxhlQvES6nAcf+k+Wb7q0pkR8P2yF2NfpjZzvXT0cUDOoCVn0uCb6fWw+tw== X-Received: by 2002:a5d:47c3:: with SMTP id o3mr20219233wrc.122.1623246954905; Wed, 09 Jun 2021 06:55:54 -0700 (PDT) Received: from balnab.. ([37.17.237.224]) by smtp.gmail.com with ESMTPSA id q20sm4575wrf.45.2021.06.09.06.55.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Jun 2021 06:55:54 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, Jussi Maki Subject: [PATCH bpf-next 2/3] net: bonding: Use per-cpu rr_tx_counter Date: Wed, 9 Jun 2021 13:55:36 +0000 Message-Id: <20210609135537.1460244-3-joamaki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210609135537.1460244-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The round-robin rr_tx_counter was shared across CPUs leading to significant cache trashing at high packet rates. This patch switches the round-robin mechanism to use a per-cpu counter to decide the destination device. On a 100Gbit 64 byte packet test this reduces the CPU load from 50% to 10% on the test system. Signed-off-by: Jussi Maki Acked-by: Jay Vosburgh --- drivers/net/bonding/bond_main.c | 18 +++++++++++++++--- include/net/bonding.h | 2 +- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 38eea7e096f3..917dd2cdcbf4 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4314,16 +4314,16 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond) slave_id = prandom_u32(); break; case 1: - slave_id = bond->rr_tx_counter; + slave_id = this_cpu_inc_return(*bond->rr_tx_counter); break; default: reciprocal_packets_per_slave = bond->params.reciprocal_packets_per_slave; - slave_id = reciprocal_divide(bond->rr_tx_counter, + slave_id = this_cpu_inc_return(*bond->rr_tx_counter); + slave_id = reciprocal_divide(slave_id, reciprocal_packets_per_slave); break; } - bond->rr_tx_counter++; return slave_id; } @@ -5278,6 +5278,9 @@ static void bond_uninit(struct net_device *bond_dev) list_del(&bond->bond_list); + if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN) + free_percpu(bond->rr_tx_counter); + bond_debug_unregister(bond); } @@ -5681,6 +5684,15 @@ static int bond_init(struct net_device *bond_dev) if (!bond->wq) return -ENOMEM; + if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN) { + bond->rr_tx_counter = alloc_percpu(u32); + if (!bond->rr_tx_counter) { + destroy_workqueue(bond->wq); + bond->wq = NULL; + return -ENOMEM; + } + } + spin_lock_init(&bond->stats_lock); netdev_lockdep_set_classes(bond_dev); diff --git a/include/net/bonding.h b/include/net/bonding.h index 34acb81b4234..8de8180f1be8 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -232,7 +232,7 @@ struct bonding { char proc_file_name[IFNAMSIZ]; #endif /* CONFIG_PROC_FS */ struct list_head bond_list; - u32 rr_tx_counter; + u32 __percpu *rr_tx_counter; struct ad_bond_info ad_info; struct alb_bond_info alb_info; struct bond_params params; From patchwork Wed Jun 9 13:55:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 457402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC2FCC48BCF for ; Wed, 9 Jun 2021 13:56:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D8DE561182 for ; Wed, 9 Jun 2021 13:56:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236964AbhFIN6P (ORCPT ); Wed, 9 Jun 2021 09:58:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236832AbhFIN6H (ORCPT ); Wed, 9 Jun 2021 09:58:07 -0400 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A628BC0617A8; Wed, 9 Jun 2021 06:55:57 -0700 (PDT) Received: by mail-wm1-x32b.google.com with SMTP id f17so4087649wmf.2; Wed, 09 Jun 2021 06:55:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1oV/DQR+6uAg1LS3J/53PTF/z6vGGDElflIOEcl8ULM=; b=Tqh74Xo0+8K9XzEgkvRPDiMudt1oLUuyRYSizOY6A3ALqUtgM0eutPpsuz36vDNbFT jSqunBXi8CYF/IVbiK0JaQZNi+zNi+/i6qeB926+SqzS0FmfQtSjBm6a2LZV9H5VWBCs wGHpoU62+AY3XkhsoIddTUbNQ2RSUrXHU9rmFqitwZm95rQbzLT8TRw6Gin2lOz3BQeg PRoZaZ3zrbkQ729k4iedhVjCzaI3R8p//jOJvaV7SXDfYlHDfUr8s7X4FIl6i29WLuhS y7yC3U+UURJB2T3WcanJxppjXpLF2f0ATZA5fWUsdbwsR2wn8Kw1LZmqAN1oStmMt7RZ Mqvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1oV/DQR+6uAg1LS3J/53PTF/z6vGGDElflIOEcl8ULM=; b=n6/zE4LSnhm6Mq6Q3jnb5/3z2b05NOFUzxqX6SuKO0IJeHjiu7+SAtTMoSnmvsxFMj BL34rASTi+iUUWgSqV1nVTRZOa9WZLch0dJG5cg1e1Oz7DrPg3tNzo+0agVn18w7pNQ9 6gqCBoFECWQOl/6GUbA+kTrf9uwJNw5ssf6JlTBi1VU2zNm3aJPxwx88aXn1OO16lPxg BC3bOxTReX1QDzllsevpifQp+V+JAWur16IGfkHkRRFXw9AJUZ6Q17lmaipiif1VffqE PBZLwhxdJ6wQnSD6bVAD3OYdAVgveikkBwmcE0sGPNYsrPJ+7+D+GGYjIK1qquNH6sL1 3BpQ== X-Gm-Message-State: AOAM531m/DFJr1ZBKCTfyqrOwq8fnmYDVKGpDsanQ7Wy9CfSRHgc2Dbd TH61wg9tlnuaqgGRtqwip/5vDvE6o63DUeY= X-Google-Smtp-Source: ABdhPJwod548NVF1EY2JrKqKh+q4L0IKZ1p22rA/Wh8RGttQiDQPgtIifyhNHwJKA0pnk/du2n+QcQ== X-Received: by 2002:a7b:c193:: with SMTP id y19mr28369337wmi.172.1623246955837; Wed, 09 Jun 2021 06:55:55 -0700 (PDT) Received: from balnab.. ([37.17.237.224]) by smtp.gmail.com with ESMTPSA id q20sm4575wrf.45.2021.06.09.06.55.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Jun 2021 06:55:55 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, Jussi Maki Subject: [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding Date: Wed, 9 Jun 2021 13:55:37 +0000 Message-Id: <20210609135537.1460244-4-joamaki@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210609135537.1460244-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add a test suite to test XDP bonding implementation over a pair of veth devices. Signed-off-by: Jussi Maki --- .../selftests/bpf/prog_tests/xdp_bonding.c | 342 ++++++++++++++++++ tools/testing/selftests/bpf/vmtest.sh | 30 +- 2 files changed, 360 insertions(+), 12 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_bonding.c diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c new file mode 100644 index 000000000000..fd2b83194127 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c @@ -0,0 +1,342 @@ +// SPDX-License-Identifier: GPL-2.0 + +/** + * Test XDP bonding support + * + * Sets up two bonded veth pairs between two fresh namespaces + * and verifies that XDP_TX program loaded on a bond device + * are correctly loaded onto the slave devices and XDP_TX'd + * packets are balanced using bonding. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define BOND1_MAC {0x00, 0x11, 0x22, 0x33, 0x44, 0x55} +#define BOND1_MAC_STR "00:11:22:33:44:55" +#define BOND2_MAC {0x00, 0x22, 0x33, 0x44, 0x55, 0x66} +#define BOND2_MAC_STR "00:22:33:44:55:66" +#define NPACKETS 100 + +static int root_netns_fd = -1; + +static void restore_root_netns(void) +{ + ASSERT_OK(setns(root_netns_fd, CLONE_NEWNET), "restore_root_netns"); +} + +int setns_by_name(char *name) +{ + int nsfd, err; + char nspath[PATH_MAX]; + + snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name); + nsfd = open(nspath, O_RDONLY | O_CLOEXEC); + if (nsfd < 0) + return -1; + + err = setns(nsfd, CLONE_NEWNET); + close(nsfd); + return err; +} + +static int get_rx_packets(const char *iface) +{ + FILE *f; + char line[512]; + int iface_len = strlen(iface); + + f = fopen("/proc/net/dev", "r"); + if (!f) + return -1; + + while (fgets(line, sizeof(line), f)) { + char *p = line; + + while (*p == ' ') + p++; /* skip whitespace */ + if (!strncmp(p, iface, iface_len)) { + p += iface_len; + if (*p++ != ':') + continue; + while (*p == ' ') + p++; /* skip whitespace */ + while (*p && *p != ' ') + p++; /* skip rx bytes */ + while (*p == ' ') + p++; /* skip whitespace */ + fclose(f); + return atoi(p); + } + } + fclose(f); + return -1; +} + +enum { + BOND_ONE_NO_ATTACH = 0, + BOND_BOTH_AND_ATTACH, +}; + +static int bonding_setup(int mode, int xmit_policy, int bond_both_attach) +{ +#define SYS(fmt, ...) \ + ({ \ + char cmd[1024]; \ + snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ + if (!ASSERT_OK(system(cmd), cmd)) \ + return -1; \ + }) + + SYS("ip netns add ns_dst"); + SYS("ip link add veth1_1 type veth peer name veth2_1 netns ns_dst"); + SYS("ip link add veth1_2 type veth peer name veth2_2 netns ns_dst"); + + SYS("modprobe -r bonding &> /dev/null"); + SYS("modprobe bonding mode=%d packets_per_slave=1 xmit_hash_policy=%d", mode, xmit_policy); + + SYS("ip link add bond1 type bond"); + SYS("ip link set bond1 address " BOND1_MAC_STR); + SYS("ip link set bond1 up"); + SYS("ip -netns ns_dst link add bond2 type bond"); + SYS("ip -netns ns_dst link set bond2 address " BOND2_MAC_STR); + SYS("ip -netns ns_dst link set bond2 up"); + + SYS("ip link set veth1_1 master bond1"); + if (bond_both_attach == BOND_BOTH_AND_ATTACH) { + SYS("ip link set veth1_2 master bond1"); + } else { + SYS("ip link set veth1_2 up"); + SYS("ip link set dev veth1_2 xdpdrv obj xdp_dummy.o sec xdp_dummy"); + } + + SYS("ip -netns ns_dst link set veth2_1 master bond2"); + + if (bond_both_attach == BOND_BOTH_AND_ATTACH) + SYS("ip -netns ns_dst link set veth2_2 master bond2"); + else + SYS("ip -netns ns_dst link set veth2_2 up"); + + /* Load a dummy program on sending side as with veth peer needs to have a + * XDP program loaded as well. + */ + SYS("ip link set dev bond1 xdpdrv obj xdp_dummy.o sec xdp_dummy"); + + if (bond_both_attach == BOND_BOTH_AND_ATTACH) + SYS("ip -netns ns_dst link set dev bond2 xdpdrv obj xdp_tx.o sec tx"); + +#undef SYS + return 0; +} + +static void bonding_cleanup(void) +{ + ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1"); + ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2"); + ASSERT_OK(system("ip netns delete ns_dst"), "delete ns_dst"); + ASSERT_OK(system("modprobe -r bonding"), "unload bond"); +} + +static int send_udp_packets(int vary_dst_ip) +{ + int i, s = -1; + int ifindex; + uint8_t buf[128] = {}; + struct ethhdr eh = { + .h_source = BOND1_MAC, + .h_dest = BOND2_MAC, + .h_proto = htons(ETH_P_IP), + }; + struct iphdr *iph = (struct iphdr *)(buf + sizeof(eh)); + struct udphdr *uh = (struct udphdr *)(buf + sizeof(eh) + sizeof(*iph)); + + s = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW); + if (!ASSERT_GE(s, 0, "socket")) + goto err; + + ifindex = if_nametoindex("bond1"); + if (!ASSERT_GT(ifindex, 0, "get bond1 ifindex")) + goto err; + + memcpy(buf, &eh, sizeof(eh)); + iph->ihl = 5; + iph->version = 4; + iph->tos = 16; + iph->id = 1; + iph->ttl = 64; + iph->protocol = IPPROTO_UDP; + iph->saddr = 1; + iph->daddr = 2; + iph->tot_len = htons(sizeof(buf) - ETH_HLEN); + iph->check = 0; + + for (i = 1; i <= NPACKETS; i++) { + int n; + struct sockaddr_ll saddr_ll = { + .sll_ifindex = ifindex, + .sll_halen = ETH_ALEN, + .sll_addr = BOND2_MAC, + }; + + /* vary the UDP destination port for even distribution with roundrobin/xor modes */ + uh->dest++; + + if (vary_dst_ip) + iph->daddr++; + + n = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *)&saddr_ll, sizeof(saddr_ll)); + if (!ASSERT_EQ(n, sizeof(buf), "sendto")) + goto err; + } + + return 0; + +err: + if (s >= 0) + close(s); + return -1; +} + +void test_xdp_bonding_with_mode(char *name, int mode, int xmit_policy) +{ + int bond1_rx; + + if (!test__start_subtest(name)) + return; + + if (bonding_setup(mode, xmit_policy, BOND_BOTH_AND_ATTACH)) + return; + + if (send_udp_packets(xmit_policy != BOND_XMIT_POLICY_LAYER34)) + return; + + bond1_rx = get_rx_packets("bond1"); + ASSERT_TRUE( + bond1_rx >= NPACKETS, + "expected more received packets"); + + switch (mode) { + case BOND_MODE_ROUNDROBIN: + case BOND_MODE_XOR: { + int veth1_rx = get_rx_packets("veth1_1"); + int veth2_rx = get_rx_packets("veth1_2"); + int diff = abs(veth1_rx - veth2_rx); + + ASSERT_GE(veth1_rx + veth2_rx, NPACKETS, "expected more packets"); + + switch (xmit_policy) { + case BOND_XMIT_POLICY_LAYER2: + ASSERT_GE(diff, NPACKETS/2, + "expected packets on only one of the interfaces"); + break; + case BOND_XMIT_POLICY_LAYER23: + case BOND_XMIT_POLICY_LAYER34: + ASSERT_LT(diff, NPACKETS/2, + "expected even distribution of packets"); + break; + default: + abort(); + } + break; + } + default: + break; + } + + bonding_cleanup(); +} + +void test_xdp_bonding_redirect_multi(void) +{ + static const char * const ifaces[] = {"bond2", "veth2_1", "veth2_2"}; + int veth1_rx, veth2_rx; + int err; + + if (!test__start_subtest("xdp_bonding_redirect_multi")) + return; + + if (bonding_setup(BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, BOND_ONE_NO_ATTACH)) + goto out; + + err = system("ip -netns ns_dst link set dev bond2 xdpdrv " + "obj xdp_redirect_multi_kern.o sec xdp_redirect_map_multi"); + if (!ASSERT_OK(err, "link set xdpdrv")) + goto out; + + /* populate the redirection devmap with the relevant interfaces */ + if (!ASSERT_OK(setns_by_name("ns_dst"), "could not set netns to ns_dst")) + goto out; + + for (int i = 0; i < ARRAY_SIZE(ifaces); i++) { + char cmd[512]; + int ifindex = if_nametoindex(ifaces[i]); + + if (!ASSERT_GT(ifindex, 0, "could not get interface index")) + goto out; + + snprintf(cmd, sizeof(cmd), + "ip netns exec ns_dst bpftool map update name map_all key %d 0 0 0 value %d 0 0 0", + i, ifindex); + + if (!ASSERT_OK(system(cmd), "bpftool map update")) + goto out; + } + restore_root_netns(); + + send_udp_packets(BOND_MODE_ROUNDROBIN); + + veth1_rx = get_rx_packets("veth1_1"); + veth2_rx = get_rx_packets("veth1_2"); + + ASSERT_LT(veth1_rx, NPACKETS/2, "expected few packets on veth1"); + ASSERT_GE(veth2_rx, NPACKETS, "expected more packets on veth2"); +out: + restore_root_netns(); + bonding_cleanup(); +} + +struct bond_test_case { + char *name; + int mode; + int xmit_policy; +}; + +static struct bond_test_case bond_test_cases[] = { + { "xdp_bonding_roundrobin", BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, }, + { "xdp_bonding_activebackup", BOND_MODE_ACTIVEBACKUP, BOND_XMIT_POLICY_LAYER23 }, + + { "xdp_bonding_xor_layer2", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER2, }, + { "xdp_bonding_xor_layer23", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER23, }, + { "xdp_bonding_xor_layer34", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER34, }, +}; + +void test_xdp_bonding(void) +{ + int i; + + root_netns_fd = open("/proc/self/ns/net", O_RDONLY); + if (!ASSERT_GE(root_netns_fd, 0, "open /proc/self/ns/net")) + return; + + for (i = 0; i < ARRAY_SIZE(bond_test_cases); i++) { + struct bond_test_case *test_case = &bond_test_cases[i]; + + test_xdp_bonding_with_mode( + test_case->name, + test_case->mode, + test_case->xmit_policy); + } + + test_xdp_bonding_redirect_multi(); +} diff --git a/tools/testing/selftests/bpf/vmtest.sh b/tools/testing/selftests/bpf/vmtest.sh index 8889b3f55236..68818780e072 100755 --- a/tools/testing/selftests/bpf/vmtest.sh +++ b/tools/testing/selftests/bpf/vmtest.sh @@ -106,17 +106,6 @@ download_rootfs() zstd -d | sudo tar -C "$dir" -x } -recompile_kernel() -{ - local kernel_checkout="$1" - local make_command="$2" - - cd "${kernel_checkout}" - - ${make_command} olddefconfig - ${make_command} -} - mount_image() { local rootfs_img="${OUTPUT_DIR}/${ROOTFS_IMAGE}" @@ -132,6 +121,23 @@ unmount_image() sudo umount "${mount_dir}" &> /dev/null } +recompile_kernel() +{ + local kernel_checkout="$1" + local make_command="$2" + local kernel_config="$3" + + cd "${kernel_checkout}" + + ${make_command} olddefconfig + scripts/config --file ${kernel_config} --module CONFIG_BONDING + ${make_command} + ${make_command} modules + mount_image + sudo ${make_command} INSTALL_MOD_PATH=${OUTPUT_DIR}/${MOUNT_DIR} modules_install + unmount_image +} + update_selftests() { local kernel_checkout="$1" @@ -358,7 +364,7 @@ main() mkdir -p "${mount_dir}" update_kconfig "${kconfig_file}" - recompile_kernel "${kernel_checkout}" "${make_command}" + recompile_kernel "${kernel_checkout}" "${make_command}" "${kconfig_file}" if [[ "${update_image}" == "no" && ! -f "${rootfs_img}" ]]; then echo "rootfs image not found in ${rootfs_img}" From patchwork Thu Jun 24 09:18:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 466741 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F841C48BDF for ; Thu, 24 Jun 2021 09:19:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 88A22613E3 for ; Thu, 24 Jun 2021 09:19:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231759AbhFXJV3 (ORCPT ); Thu, 24 Jun 2021 05:21:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231656AbhFXJVT (ORCPT ); Thu, 24 Jun 2021 05:21:19 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C21EC061760; Thu, 24 Jun 2021 02:19:00 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id j1so5801720wrn.9; Thu, 24 Jun 2021 02:19:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=DuciXB8pC/S/cafIAkU7rNe3wfMQWIg6ua/o3EYWQgk=; b=YirveEKm5nW9Advs2EgBG4r4bLB00qOAjhdq6hMtC7BZOACkjyZTPKaPgF0Tmo5smE 01A334WlLy6ddUAGtMgaOi5dqZKCjXEruCAiBN0rqV9bcHE0c3q6QMMj4DzLDbZe3vNc ++qslY10Lw9MwojPhKgB38LjEJpjBGZomcH793J79lJVW0uMkHS/wkczqjkPW9nuVIQ9 EZe+5ofEYj2vYtbbWqvdow/nDz0Hj74YsYmBDdDTkWDAnd/Ul7rLG9EZUj7t+YvkX2DA OE4vndkwKu3s0z/VPLy+kahTJ0yHwjVCFt1ohxCiRII0DAR9aCnk28BWgfRfc5rZdvbb ZLvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=DuciXB8pC/S/cafIAkU7rNe3wfMQWIg6ua/o3EYWQgk=; b=E4RJ5wY32DewtX/6ZVzt4bkgFNgB/Cd/PpK7XL7Cyso2tnNiY+gmBYDkporRs9sbb1 UQi/Ju6dY3DvRMV5EtCvzeTgaycMy1lb6myHTdnpbpXI7sOp56GKSLQdFQjTK8nSRr08 7RchqKBGV6oD8aeCJ7Do8wwThlX7e+BkOx07QwB8/N0zCG+72p052A2HVA8IOYO20oow Q5eZLiXFheaN054XgCp4rVitnSexGHqQ4OB0Qt9iezXck41SAvahNH4hXBss92UCy3B3 PutHNvmVLrzPJ1G3wy6JkmGFdW2KzssjQH0CSmglWdidCOV45oV9Nj3vObsmvY36CSM/ fFBQ== X-Gm-Message-State: AOAM5336M13UghnKNhnSNv7djsOKz7unl7xvuN0LorGAJvOXRskYD802 TRpwxtgS+cQ2d+m0RRG8/Q2YH06YCg9bZr4= X-Google-Smtp-Source: ABdhPJzIhL8DwCPWubiRYfU6YG5cV/d3+8cp6149ytNIu3E1a91PUCgAWwjLTXTkIK7Z7wff2p+iug== X-Received: by 2002:a5d:69c3:: with SMTP id s3mr3323656wrw.235.1624526338829; Thu, 24 Jun 2021 02:18:58 -0700 (PDT) Received: from localhost.localdomain (212-51-151-130.fiber7.init7.net. [212.51.151.130]) by smtp.gmail.com with ESMTPSA id r1sm2456216wmh.32.2021.06.24.02.18.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Jun 2021 02:18:58 -0700 (PDT) From: joamaki@gmail.com To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, maciej.fijalkowski@intel.com, magnus.karlsson@intel.com, Jussi Maki Subject: [PATCH bpf-next v2 4/4] devmap: Exclude XDP broadcast to master device Date: Thu, 24 Jun 2021 09:18:43 +0000 Message-Id: <20210624091843.5151-5-joamaki@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210624091843.5151-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> <20210624091843.5151-1-joamaki@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Jussi Maki If the ingress device is bond slave, do not broadcast back through it or the bond master. Signed-off-by: Jussi Maki --- kernel/bpf/devmap.c | 34 ++++++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 2a75e6c2d27d..0864fb28c8b5 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -514,9 +514,11 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, } static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp, - int exclude_ifindex) + int exclude_ifindex, int exclude_ifindex_master) { - if (!obj || obj->dev->ifindex == exclude_ifindex || + if (!obj || + obj->dev->ifindex == exclude_ifindex || + obj->dev->ifindex == exclude_ifindex_master || !obj->dev->netdev_ops->ndo_xdp_xmit) return false; @@ -546,12 +548,19 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0; + int exclude_ifindex_master = 0; struct bpf_dtab_netdev *dst, *last_dst = NULL; struct hlist_head *head; struct xdp_frame *xdpf; unsigned int i; int err; + if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { + struct net_device *master = netdev_master_upper_dev_get_rcu(dev_rx); + + exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0; + } + xdpf = xdp_convert_buff_to_frame(xdp); if (unlikely(!xdpf)) return -EOVERFLOW; @@ -559,7 +568,7 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, if (map->map_type == BPF_MAP_TYPE_DEVMAP) { for (i = 0; i < map->max_entries; i++) { dst = READ_ONCE(dtab->netdev_map[i]); - if (!is_valid_dst(dst, xdp, exclude_ifindex)) + if (!is_valid_dst(dst, xdp, exclude_ifindex, exclude_ifindex_master)) continue; /* we only need n-1 clones; last_dst enqueued below */ @@ -579,7 +588,9 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, head = dev_map_index_hash(dtab, i); hlist_for_each_entry_rcu(dst, head, index_hlist, lockdep_is_held(&dtab->index_lock)) { - if (!is_valid_dst(dst, xdp, exclude_ifindex)) + if (!is_valid_dst(dst, xdp, + exclude_ifindex, + exclude_ifindex_master)) continue; /* we only need n-1 clones; last_dst enqueued below */ @@ -646,16 +657,25 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); int exclude_ifindex = exclude_ingress ? dev->ifindex : 0; + int exclude_ifindex_master = 0; struct bpf_dtab_netdev *dst, *last_dst = NULL; struct hlist_head *head; struct hlist_node *next; unsigned int i; int err; + if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { + struct net_device *master = netdev_master_upper_dev_get_rcu(dev); + + exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0; + } + if (map->map_type == BPF_MAP_TYPE_DEVMAP) { for (i = 0; i < map->max_entries; i++) { dst = READ_ONCE(dtab->netdev_map[i]); - if (!dst || dst->dev->ifindex == exclude_ifindex) + if (!dst || + dst->dev->ifindex == exclude_ifindex || + dst->dev->ifindex == exclude_ifindex_master) continue; /* we only need n-1 clones; last_dst enqueued below */ @@ -674,7 +694,9 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, for (i = 0; i < dtab->n_buckets; i++) { head = dev_map_index_hash(dtab, i); hlist_for_each_entry_safe(dst, next, head, index_hlist) { - if (!dst || dst->dev->ifindex == exclude_ifindex) + if (!dst || + dst->dev->ifindex == exclude_ifindex || + dst->dev->ifindex == exclude_ifindex_master) continue; /* we only need n-1 clones; last_dst enqueued below */ From patchwork Wed Jul 7 11:25:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 471159 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A16D5C07E95 for ; Wed, 7 Jul 2021 13:14:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 81FAA61A24 for ; Wed, 7 Jul 2021 13:14:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231800AbhGGNQp (ORCPT ); Wed, 7 Jul 2021 09:16:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231733AbhGGNQe (ORCPT ); Wed, 7 Jul 2021 09:16:34 -0400 Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 449AEC06175F; Wed, 7 Jul 2021 06:13:53 -0700 (PDT) Received: by mail-lj1-x234.google.com with SMTP id u25so2602994ljj.11; Wed, 07 Jul 2021 06:13:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=h25gHlOngZgr86BE+CbM0uJNam8WoUfeVATamHxArOw=; b=RXO8ymGzxfsEu2ttOuCke4ohRTFXlQktxVetUEhdx57MFwf10ngGBTlPxdiFhb0UPA d85N0+HjCqAtlB0gyYnX9x3LgBuuCgBQ5eUmHpQccGDD/mtjP7PJylSCjRKwloD4qmkq dXccUP8RutEDTXrcbFGnVYugRklUFengVt4OycsGn7ajbWLD6i4CGd5IVAylnrJBvyU5 hBScKyc/y1syBXpAZzEB7L1g25kcpcesreYJ9eZwLu+rZd7XUnR5KgE7mLVSX6QZsvLN mQXAMekNo5pezEPncekDFxQ4VFNJZXg5ny+r2EWWZbb4vpVQM6o13ai69fP6eoLHaman zKMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=h25gHlOngZgr86BE+CbM0uJNam8WoUfeVATamHxArOw=; b=BeiCrMYK2fvhmi6G8UGzlqJHTHN/CrYxBLPL0JQD46T/0KUno54P1Faqj4Cz6sQd3/ Pzvd4+F1aS0kWRnrNaKUJUqeYYOB/bKsI1SJ1dVO+sHFVlTN0cCxIcQbKgJ5g561vGaG 4hmcKkYaGtGxsctgaCSWvGHeks+tbv8fWeo5edqQyWd0Igsbk9ERfABpBUoNh8DOQJ79 Nx5PAmSDGvF9CvBBIr0GaWiT1v30dcu5O/KVGBit4K19OnL8+l1t4K1Kht5h27xGrkb1 PbR/epJ1IGjqwvxiyIDf4YuTOKEHd64b0vBEbumXf5qPiDtH6qQaGnIERmRDHHDms2iH ED/w== X-Gm-Message-State: AOAM531aDomzKjVjNbuoslbm5V5VfHamUHRxgfsWlTR/L/t+MPPlvfgh zzOtxq2EN8zqv8bMJQw48DdX35rQCXbNHHCE1Q== X-Google-Smtp-Source: ABdhPJzC/+0kPktq84jrmOss4dX9lnay2HQ5PHuUtkFZew4yP9+HtEs4QL/0WJNnnChbVBT46ecUeA== X-Received: by 2002:a2e:b804:: with SMTP id u4mr2728465ljo.312.1625663631378; Wed, 07 Jul 2021 06:13:51 -0700 (PDT) Received: from localhost.localdomain ([89.42.43.188]) by smtp.gmail.com with ESMTPSA id u9sm1423571lfm.127.2021.07.07.06.13.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jul 2021 06:13:50 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, maciej.fijalkowski@intel.com, magnus.karlsson@intel.com, Jussi Maki , toke@redhat.com Subject: [PATCH bpf-next v3 5/5] net: core: Allow netdev_lower_get_next_private_rcu in bh context Date: Wed, 7 Jul 2021 11:25:51 +0000 Message-Id: <20210707112551.9782-6-joamaki@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210707112551.9782-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> <20210707112551.9782-1-joamaki@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org For the XDP bonding slave lookup to work in the NAPI poll context in which the redudant rcu_read_lock() has been removed we have to follow the same approach as in [1] and modify the WARN_ON to also check rcu_read_lock_bh_held(). [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=694cea395fded425008e93cd90cfdf7a451674af Signed-off-by: Jussi Maki --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 05aac85b2bbc..27f95aeddc59 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7569,7 +7569,7 @@ void *netdev_lower_get_next_private_rcu(struct net_device *dev, { struct netdev_adjacent *lower; - WARN_ON_ONCE(!rcu_read_lock_held()); + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()); lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list); From patchwork Fri Jul 30 06:18:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 492116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00, DATE_IN_PAST_96_XX, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E327AC4338F for ; Wed, 4 Aug 2021 12:46:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE7C360F14 for ; Wed, 4 Aug 2021 12:46:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237396AbhHDMqs (ORCPT ); Wed, 4 Aug 2021 08:46:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238246AbhHDMq0 (ORCPT ); Wed, 4 Aug 2021 08:46:26 -0400 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 506C1C0617B1; Wed, 4 Aug 2021 05:45:53 -0700 (PDT) Received: by mail-wm1-x336.google.com with SMTP id l34-20020a05600c1d22b02902573c214807so3839368wms.2; Wed, 04 Aug 2021 05:45:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=cTkEzedgsHwcRt0daK5worxXRzwlWwNbUzqq6bWtxEU=; b=SOXwPYJQzU3gLnvN54Sb9+eH5Cce0nMx2LODLJMWJ6eHozCURXDWeukcTksQ+GY3JH 1i/V9sQuvDMNyS49HrsxcoLEXYjKzziDe6Q35/nUVGFHARAU7uw0J5yAX/SCuKyw0bJm WTBgGCL8W9V1XdtNadHFJ7jkhMaYkQLgs41Sz6OoymDGxUzQhKbgqTI+rii8tEJhxBBK IwIDY3ZxCOQ3DsjepR31pe49lQ2PorcJ+EXzk8fjT5dzKIx6HUkHfEArOFkVBgmpLTD2 txbYWlARPfuliH/Z7euri1LFXnZAFzsPPgbvVAem1osZNp12Jq7SLTSwB51F53gfFNew 8FmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=cTkEzedgsHwcRt0daK5worxXRzwlWwNbUzqq6bWtxEU=; b=ASh1HNF/vS0uTqXnRT+sqH2kDHys3xKmnj3Y4SRYf0DsI9UTV/YKJ70XAuBEAMGwsl hAlWgmkTW4RXamlxmNojM70JXKqxYNTyd0zgiF/9vxLqqXuT8wa1FKvGfVMKrW9nVUJO HiCzTXvU4KWh+cgEMePEaW7gqhJgmzqU8iu3QlK9DbLpNfUafyQFWEl4pfEpsfFXC8LC ngTUvxEtOHvUSzKQdlBwOKXkmsb8z6WYjeC/LTzEBoN8KyeBkYnKTM6rorL8nZqq/vJ0 dniktCodrJCcFYfxwrJlRIUaFQfqIkwkOCWgMpJBJmEFufyRMcYlNP0hJNHd/xVP6HZz mKsg== X-Gm-Message-State: AOAM530MU6ClIa5be61020ume0tyKEsa+XFWttGkad5x3SoEkIHcBsi+ qSzrW0yDC8wUONnl4oA1UTnuJ9BmTk0BJlQ= X-Google-Smtp-Source: ABdhPJyXMZscttekkLiMuP6L6lRP/uqRucrqsLb19fx2QskQFqC6tXPwuqptnl3hUa3StVyDi2xHKA== X-Received: by 2002:a7b:c18f:: with SMTP id y15mr9624530wmi.117.1628081151672; Wed, 04 Aug 2021 05:45:51 -0700 (PDT) Received: from localhost.localdomain ([77.109.191.101]) by smtp.gmail.com with ESMTPSA id y4sm2257923wmi.22.2021.08.04.05.45.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Aug 2021 05:45:51 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, maciej.fijalkowski@intel.com, magnus.karlsson@intel.com, Jussi Maki Subject: [PATCH bpf-next v5 6/7] selftests/bpf: Fix xdp_tx.c prog section name Date: Fri, 30 Jul 2021 06:18:21 +0000 Message-Id: <20210730061822.6600-7-joamaki@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210730061822.6600-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> <20210730061822.6600-1-joamaki@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The program type cannot be deduced from 'tx' which causes an invalid argument error when trying to load xdp_tx.o using the skeleton. Rename the section name to "xdp/tx" so that libbpf can deduce the type. Signed-off-by: Jussi Maki --- tools/testing/selftests/bpf/progs/xdp_tx.c | 2 +- tools/testing/selftests/bpf/test_xdp_veth.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/xdp_tx.c b/tools/testing/selftests/bpf/progs/xdp_tx.c index 94e6c2b281cb..ece1fbbc0984 100644 --- a/tools/testing/selftests/bpf/progs/xdp_tx.c +++ b/tools/testing/selftests/bpf/progs/xdp_tx.c @@ -3,7 +3,7 @@ #include #include -SEC("tx") +SEC("xdp/tx") int xdp_tx(struct xdp_md *xdp) { return XDP_TX; diff --git a/tools/testing/selftests/bpf/test_xdp_veth.sh b/tools/testing/selftests/bpf/test_xdp_veth.sh index ba8ffcdaac30..c8e0b7d36f56 100755 --- a/tools/testing/selftests/bpf/test_xdp_veth.sh +++ b/tools/testing/selftests/bpf/test_xdp_veth.sh @@ -108,7 +108,7 @@ ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1 ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2 ip -n ns1 link set dev veth11 xdp obj xdp_dummy.o sec xdp_dummy -ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec tx +ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec xdp/tx ip -n ns3 link set dev veth33 xdp obj xdp_dummy.o sec xdp_dummy trap cleanup EXIT From patchwork Sat Jul 31 05:57:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 492690 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00, DATE_IN_PAST_96_XX, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E549BC432BE for ; Thu, 5 Aug 2021 16:10:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C94B56113B for ; Thu, 5 Aug 2021 16:10:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230137AbhHEQK5 (ORCPT ); Thu, 5 Aug 2021 12:10:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230513AbhHEQKr (ORCPT ); Thu, 5 Aug 2021 12:10:47 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 909FCC0617A1; Thu, 5 Aug 2021 09:10:29 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id b13so7292304wrs.3; Thu, 05 Aug 2021 09:10:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=IrZj62K2inceQssu/AMuv62cM5yILtqKBEklvWsChEw=; b=XZmNZL4QrsFZEhYUNOckdjihvcNwlE+QX+nu2B2uNulsNmoqWxjsna51LiaLnLC3ET 7WUYyTu3VzQQv2XgdL7Y0x4fsLTRJREzmWh+1DeHtCA/TWHN0BhImCyd+aUCmAb6tqaZ AMXB7tBWfi1vbasrDyvrPwnt6QnxNB5l8nN+Vd02aUaWVH8gRNN04GL9oJWh3LWGd8rQ jyaITRFjR2zIzKti1Art+49/G0msZgfXkbaepARj77sthAGR6fjynwizTzcTa+JjLyy1 IzPpgXfaJSiv4fQ5vcteMStgp0zVWBVHgr3h/ERg4PUvhGxItNGeZ1PHe2pMJP8N6ZGf CtXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=IrZj62K2inceQssu/AMuv62cM5yILtqKBEklvWsChEw=; b=YVbpWe5YycFSCoUTyeirELtirfyZnzXlB3dIViHTHuYGrUoHVB7m8yH3SIQdLlt3lG sinbUgmUVL4r+i8akk/ZAJLZXBXriJhb9SBaXu6PxbOKY/brIs0M/M3YDqlGD0nAKczL BxCBdr6gkWm6OUwBzQaWquM6cmdwxHZNwpTC/XK8tPF+aA0IHBBEuw+sMiavdoGdJ26L vd4BZVGa9fFuhJaUttF8NQCiUvJ9VLBLMCyhqerGvkFfHd4jD6AUspIYv3JSz5+pamLF iZ6HXPCgtnu4nCaEJCYMVHt6pz+g2vvAHmaNBCysIbZTWMlB8UfoGBKvvr8gYigsmLby C2OQ== X-Gm-Message-State: AOAM533zm/bybkskvDY2mCJiPfj4g2xJzUmKUWfu0a1hfa0eaS3qVNB4 WjU5Ir86aRcemUIYD/5vb/JlahVQRuELYDw= X-Google-Smtp-Source: ABdhPJz5+H9EGPuhDo/MCyyWBmWgYRfUBrYMw5BFF6G4/8Pb7jT5409ECwssJgr6Cz3pSHR2SC6rhw== X-Received: by 2002:a5d:66d1:: with SMTP id k17mr6339443wrw.251.1628179827809; Thu, 05 Aug 2021 09:10:27 -0700 (PDT) Received: from localhost.localdomain ([77.109.191.101]) by smtp.gmail.com with ESMTPSA id n5sm5843968wme.47.2021.08.05.09.10.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Aug 2021 09:10:27 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, maciej.fijalkowski@intel.com, magnus.karlsson@intel.com, Jussi Maki Subject: [PATCH bpf-next v6 7/7] selftests/bpf: Add tests for XDP bonding Date: Sat, 31 Jul 2021 05:57:38 +0000 Message-Id: <20210731055738.16820-8-joamaki@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210731055738.16820-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> <20210731055738.16820-1-joamaki@gmail.com> Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add a test suite to test XDP bonding implementation over a pair of veth devices. Signed-off-by: Jussi Maki Acked-by: Andrii Nakryiko --- .../selftests/bpf/prog_tests/xdp_bonding.c | 520 ++++++++++++++++++ 1 file changed, 520 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c new file mode 100644 index 000000000000..334a04721a59 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c @@ -0,0 +1,520 @@ +// SPDX-License-Identifier: GPL-2.0 + +/** + * Test XDP bonding support + * + * Sets up two bonded veth pairs between two fresh namespaces + * and verifies that XDP_TX program loaded on a bond device + * are correctly loaded onto the slave devices and XDP_TX'd + * packets are balanced using bonding. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include "test_progs.h" +#include "network_helpers.h" +#include +#include +#include + +#include "xdp_dummy.skel.h" +#include "xdp_redirect_multi_kern.skel.h" +#include "xdp_tx.skel.h" + +#define BOND1_MAC {0x00, 0x11, 0x22, 0x33, 0x44, 0x55} +#define BOND1_MAC_STR "00:11:22:33:44:55" +#define BOND2_MAC {0x00, 0x22, 0x33, 0x44, 0x55, 0x66} +#define BOND2_MAC_STR "00:22:33:44:55:66" +#define NPACKETS 100 + +static int root_netns_fd = -1; + +static void restore_root_netns(void) +{ + ASSERT_OK(setns(root_netns_fd, CLONE_NEWNET), "restore_root_netns"); +} + +static int setns_by_name(char *name) +{ + int nsfd, err; + char nspath[PATH_MAX]; + + snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name); + nsfd = open(nspath, O_RDONLY | O_CLOEXEC); + if (nsfd < 0) + return -1; + + err = setns(nsfd, CLONE_NEWNET); + close(nsfd); + return err; +} + +static int get_rx_packets(const char *iface) +{ + FILE *f; + char line[512]; + int iface_len = strlen(iface); + + f = fopen("/proc/net/dev", "r"); + if (!f) + return -1; + + while (fgets(line, sizeof(line), f)) { + char *p = line; + + while (*p == ' ') + p++; /* skip whitespace */ + if (!strncmp(p, iface, iface_len)) { + p += iface_len; + if (*p++ != ':') + continue; + while (*p == ' ') + p++; /* skip whitespace */ + while (*p && *p != ' ') + p++; /* skip rx bytes */ + while (*p == ' ') + p++; /* skip whitespace */ + fclose(f); + return atoi(p); + } + } + fclose(f); + return -1; +} + +#define MAX_BPF_LINKS 8 + +struct skeletons { + struct xdp_dummy *xdp_dummy; + struct xdp_tx *xdp_tx; + struct xdp_redirect_multi_kern *xdp_redirect_multi_kern; + + int nlinks; + struct bpf_link *links[MAX_BPF_LINKS]; +}; + +static int xdp_attach(struct skeletons *skeletons, struct bpf_program *prog, char *iface) +{ + struct bpf_link *link; + int ifindex; + + ifindex = if_nametoindex(iface); + if (!ASSERT_GT(ifindex, 0, "get ifindex")) + return -1; + + if (!ASSERT_LE(skeletons->nlinks+1, MAX_BPF_LINKS, "too many XDP programs attached")) + return -1; + + link = bpf_program__attach_xdp(prog, ifindex); + if (!ASSERT_OK_PTR(link, "attach xdp program")) + return -1; + + skeletons->links[skeletons->nlinks++] = link; + return 0; +} + +enum { + BOND_ONE_NO_ATTACH = 0, + BOND_BOTH_AND_ATTACH, +}; + +static const char * const mode_names[] = { + [BOND_MODE_ROUNDROBIN] = "balance-rr", + [BOND_MODE_ACTIVEBACKUP] = "active-backup", + [BOND_MODE_XOR] = "balance-xor", + [BOND_MODE_BROADCAST] = "broadcast", + [BOND_MODE_8023AD] = "802.3ad", + [BOND_MODE_TLB] = "balance-tlb", + [BOND_MODE_ALB] = "balance-alb", +}; + +static const char * const xmit_policy_names[] = { + [BOND_XMIT_POLICY_LAYER2] = "layer2", + [BOND_XMIT_POLICY_LAYER34] = "layer3+4", + [BOND_XMIT_POLICY_LAYER23] = "layer2+3", + [BOND_XMIT_POLICY_ENCAP23] = "encap2+3", + [BOND_XMIT_POLICY_ENCAP34] = "encap3+4", +}; + +static int bonding_setup(struct skeletons *skeletons, int mode, int xmit_policy, + int bond_both_attach) +{ +#define SYS(fmt, ...) \ + ({ \ + char cmd[1024]; \ + snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ + if (!ASSERT_OK(system(cmd), cmd)) \ + return -1; \ + }) + + SYS("ip netns add ns_dst"); + SYS("ip link add veth1_1 type veth peer name veth2_1 netns ns_dst"); + SYS("ip link add veth1_2 type veth peer name veth2_2 netns ns_dst"); + + SYS("ip link add bond1 type bond mode %s xmit_hash_policy %s", + mode_names[mode], xmit_policy_names[xmit_policy]); + SYS("ip link set bond1 up address " BOND1_MAC_STR " addrgenmode none"); + SYS("ip -netns ns_dst link add bond2 type bond mode %s xmit_hash_policy %s", + mode_names[mode], xmit_policy_names[xmit_policy]); + SYS("ip -netns ns_dst link set bond2 up address " BOND2_MAC_STR " addrgenmode none"); + + SYS("ip link set veth1_1 master bond1"); + if (bond_both_attach == BOND_BOTH_AND_ATTACH) { + SYS("ip link set veth1_2 master bond1"); + } else { + SYS("ip link set veth1_2 up addrgenmode none"); + + if (xdp_attach(skeletons, skeletons->xdp_dummy->progs.xdp_dummy_prog, "veth1_2")) + return -1; + } + + SYS("ip -netns ns_dst link set veth2_1 master bond2"); + + if (bond_both_attach == BOND_BOTH_AND_ATTACH) + SYS("ip -netns ns_dst link set veth2_2 master bond2"); + else + SYS("ip -netns ns_dst link set veth2_2 up addrgenmode none"); + + /* Load a dummy program on sending side as with veth peer needs to have a + * XDP program loaded as well. + */ + if (xdp_attach(skeletons, skeletons->xdp_dummy->progs.xdp_dummy_prog, "bond1")) + return -1; + + if (bond_both_attach == BOND_BOTH_AND_ATTACH) { + if (!ASSERT_OK(setns_by_name("ns_dst"), "set netns to ns_dst")) + return -1; + + if (xdp_attach(skeletons, skeletons->xdp_tx->progs.xdp_tx, "bond2")) + return -1; + + restore_root_netns(); + } + + return 0; + +#undef SYS +} + +static void bonding_cleanup(struct skeletons *skeletons) +{ + restore_root_netns(); + while (skeletons->nlinks) { + skeletons->nlinks--; + bpf_link__destroy(skeletons->links[skeletons->nlinks]); + } + ASSERT_OK(system("ip link delete bond1"), "delete bond1"); + ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1"); + ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2"); + ASSERT_OK(system("ip netns delete ns_dst"), "delete ns_dst"); +} + +static int send_udp_packets(int vary_dst_ip) +{ + struct ethhdr eh = { + .h_source = BOND1_MAC, + .h_dest = BOND2_MAC, + .h_proto = htons(ETH_P_IP), + }; + uint8_t buf[128] = {}; + struct iphdr *iph = (struct iphdr *)(buf + sizeof(eh)); + struct udphdr *uh = (struct udphdr *)(buf + sizeof(eh) + sizeof(*iph)); + int i, s = -1; + int ifindex; + + s = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW); + if (!ASSERT_GE(s, 0, "socket")) + goto err; + + ifindex = if_nametoindex("bond1"); + if (!ASSERT_GT(ifindex, 0, "get bond1 ifindex")) + goto err; + + memcpy(buf, &eh, sizeof(eh)); + iph->ihl = 5; + iph->version = 4; + iph->tos = 16; + iph->id = 1; + iph->ttl = 64; + iph->protocol = IPPROTO_UDP; + iph->saddr = 1; + iph->daddr = 2; + iph->tot_len = htons(sizeof(buf) - ETH_HLEN); + iph->check = 0; + + for (i = 1; i <= NPACKETS; i++) { + int n; + struct sockaddr_ll saddr_ll = { + .sll_ifindex = ifindex, + .sll_halen = ETH_ALEN, + .sll_addr = BOND2_MAC, + }; + + /* vary the UDP destination port for even distribution with roundrobin/xor modes */ + uh->dest++; + + if (vary_dst_ip) + iph->daddr++; + + n = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *)&saddr_ll, sizeof(saddr_ll)); + if (!ASSERT_EQ(n, sizeof(buf), "sendto")) + goto err; + } + + return 0; + +err: + if (s >= 0) + close(s); + return -1; +} + +static void test_xdp_bonding_with_mode(struct skeletons *skeletons, int mode, int xmit_policy) +{ + int bond1_rx; + + if (bonding_setup(skeletons, mode, xmit_policy, BOND_BOTH_AND_ATTACH)) + goto out; + + if (send_udp_packets(xmit_policy != BOND_XMIT_POLICY_LAYER34)) + goto out; + + bond1_rx = get_rx_packets("bond1"); + ASSERT_EQ(bond1_rx, NPACKETS, "expected more received packets"); + + switch (mode) { + case BOND_MODE_ROUNDROBIN: + case BOND_MODE_XOR: { + int veth1_rx = get_rx_packets("veth1_1"); + int veth2_rx = get_rx_packets("veth1_2"); + int diff = abs(veth1_rx - veth2_rx); + + ASSERT_GE(veth1_rx + veth2_rx, NPACKETS, "expected more packets"); + + switch (xmit_policy) { + case BOND_XMIT_POLICY_LAYER2: + ASSERT_GE(diff, NPACKETS, + "expected packets on only one of the interfaces"); + break; + case BOND_XMIT_POLICY_LAYER23: + case BOND_XMIT_POLICY_LAYER34: + ASSERT_LT(diff, NPACKETS/2, + "expected even distribution of packets"); + break; + default: + PRINT_FAIL("Unimplemented xmit_policy=%d\n", xmit_policy); + break; + } + break; + } + case BOND_MODE_ACTIVEBACKUP: { + int veth1_rx = get_rx_packets("veth1_1"); + int veth2_rx = get_rx_packets("veth1_2"); + int diff = abs(veth1_rx - veth2_rx); + + ASSERT_GE(diff, NPACKETS, + "expected packets on only one of the interfaces"); + break; + } + default: + PRINT_FAIL("Unimplemented xmit_policy=%d\n", xmit_policy); + break; + } + +out: + bonding_cleanup(skeletons); +} + +/* Test the broadcast redirection using xdp_redirect_map_multi_prog and adding + * all the interfaces to it and checking that broadcasting won't send the packet + * to neither the ingress bond device (bond2) or its slave (veth2_1). + */ +static void test_xdp_bonding_redirect_multi(struct skeletons *skeletons) +{ + static const char * const ifaces[] = {"bond2", "veth2_1", "veth2_2"}; + int veth1_1_rx, veth1_2_rx; + int err; + + if (bonding_setup(skeletons, BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, + BOND_ONE_NO_ATTACH)) + goto out; + + + if (!ASSERT_OK(setns_by_name("ns_dst"), "could not set netns to ns_dst")) + goto out; + + /* populate the devmap with the relevant interfaces */ + for (int i = 0; i < ARRAY_SIZE(ifaces); i++) { + int ifindex = if_nametoindex(ifaces[i]); + int map_fd = bpf_map__fd(skeletons->xdp_redirect_multi_kern->maps.map_all); + + if (!ASSERT_GT(ifindex, 0, "could not get interface index")) + goto out; + + err = bpf_map_update_elem(map_fd, &ifindex, &ifindex, 0); + if (!ASSERT_OK(err, "add interface to map_all")) + goto out; + } + + if (xdp_attach(skeletons, + skeletons->xdp_redirect_multi_kern->progs.xdp_redirect_map_multi_prog, + "bond2")) + goto out; + + restore_root_netns(); + + if (send_udp_packets(BOND_MODE_ROUNDROBIN)) + goto out; + + veth1_1_rx = get_rx_packets("veth1_1"); + veth1_2_rx = get_rx_packets("veth1_2"); + + ASSERT_EQ(veth1_1_rx, 0, "expected no packets on veth1_1"); + ASSERT_GE(veth1_2_rx, NPACKETS, "expected packets on veth1_2"); + +out: + restore_root_netns(); + bonding_cleanup(skeletons); +} + +/* Test that XDP programs cannot be attached to both the bond master and slaves simultaneously */ +static void test_xdp_bonding_attach(struct skeletons *skeletons) +{ + struct bpf_link *link = NULL; + struct bpf_link *link2 = NULL; + int veth, bond; + int err; + + if (!ASSERT_OK(system("ip link add veth type veth"), "add veth")) + goto out; + if (!ASSERT_OK(system("ip link add bond type bond"), "add bond")) + goto out; + + veth = if_nametoindex("veth"); + if (!ASSERT_GE(veth, 0, "if_nametoindex veth")) + goto out; + bond = if_nametoindex("bond"); + if (!ASSERT_GE(bond, 0, "if_nametoindex bond")) + goto out; + + /* enslaving with a XDP program loaded fails */ + link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth); + if (!ASSERT_OK_PTR(link, "attach program to veth")) + goto out; + + err = system("ip link set veth master bond"); + if (!ASSERT_NEQ(err, 0, "attaching slave with xdp program expected to fail")) + goto out; + + bpf_link__destroy(link); + link = NULL; + + err = system("ip link set veth master bond"); + if (!ASSERT_OK(err, "set veth master")) + goto out; + + /* attaching to slave when master has no program is allowed */ + link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth); + if (!ASSERT_OK_PTR(link, "attach program to slave when enslaved")) + goto out; + + /* attaching to master not allowed when slave has program loaded */ + link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond); + if (!ASSERT_ERR_PTR(link2, "attach program to master when slave has program")) + goto out; + + bpf_link__destroy(link); + link = NULL; + + /* attaching XDP program to master allowed when slave has no program */ + link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond); + if (!ASSERT_OK_PTR(link, "attach program to master")) + goto out; + + /* attaching to slave not allowed when master has program loaded */ + link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond); + ASSERT_ERR_PTR(link2, "attach program to slave when master has program"); + +out: + bpf_link__destroy(link); + bpf_link__destroy(link2); + + system("ip link del veth"); + system("ip link del bond"); +} + +static int libbpf_debug_print(enum libbpf_print_level level, + const char *format, va_list args) +{ + if (level != LIBBPF_WARN) + vprintf(format, args); + return 0; +} + +struct bond_test_case { + char *name; + int mode; + int xmit_policy; +}; + +static struct bond_test_case bond_test_cases[] = { + { "xdp_bonding_roundrobin", BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, }, + { "xdp_bonding_activebackup", BOND_MODE_ACTIVEBACKUP, BOND_XMIT_POLICY_LAYER23 }, + + { "xdp_bonding_xor_layer2", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER2, }, + { "xdp_bonding_xor_layer23", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER23, }, + { "xdp_bonding_xor_layer34", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER34, }, +}; + +void test_xdp_bonding(void) +{ + libbpf_print_fn_t old_print_fn; + struct skeletons skeletons = {}; + int i; + + old_print_fn = libbpf_set_print(libbpf_debug_print); + + root_netns_fd = open("/proc/self/ns/net", O_RDONLY); + if (!ASSERT_GE(root_netns_fd, 0, "open /proc/self/ns/net")) + goto out; + + skeletons.xdp_dummy = xdp_dummy__open_and_load(); + if (!ASSERT_OK_PTR(skeletons.xdp_dummy, "xdp_dummy__open_and_load")) + goto out; + + skeletons.xdp_tx = xdp_tx__open_and_load(); + if (!ASSERT_OK_PTR(skeletons.xdp_tx, "xdp_tx__open_and_load")) + goto out; + + skeletons.xdp_redirect_multi_kern = xdp_redirect_multi_kern__open_and_load(); + if (!ASSERT_OK_PTR(skeletons.xdp_redirect_multi_kern, + "xdp_redirect_multi_kern__open_and_load")) + goto out; + + if (!test__start_subtest("xdp_bonding_attach")) + test_xdp_bonding_attach(&skeletons); + + for (i = 0; i < ARRAY_SIZE(bond_test_cases); i++) { + struct bond_test_case *test_case = &bond_test_cases[i]; + + if (!test__start_subtest(test_case->name)) + test_xdp_bonding_with_mode( + &skeletons, + test_case->mode, + test_case->xmit_policy); + } + + if (!test__start_subtest("xdp_bonding_redirect_multi")) + test_xdp_bonding_redirect_multi(&skeletons); + +out: + xdp_dummy__destroy(skeletons.xdp_dummy); + xdp_tx__destroy(skeletons.xdp_tx); + xdp_redirect_multi_kern__destroy(skeletons.xdp_redirect_multi_kern); + + libbpf_set_print(old_print_fn); + if (root_netns_fd) + close(root_netns_fd); +}