From patchwork Wed Mar 24 01:30:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B885FC433E5 for ; Wed, 24 Mar 2021 01:32:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7AC25619FB for ; Wed, 24 Mar 2021 01:32:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234550AbhCXBbp (ORCPT ); Tue, 23 Mar 2021 21:31:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231422AbhCXBbK (ORCPT ); Tue, 23 Mar 2021 21:31:10 -0400 Received: from mail.netfilter.org (mail.netfilter.org [IPv6:2001:4b98:dc0:41:216:3eff:fe8c:2bda]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 701BCC061765; Tue, 23 Mar 2021 18:31:09 -0700 (PDT) Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id ACDB4630BB; Wed, 24 Mar 2021 02:31:00 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 01/24] net: resolve forwarding path from virtual netdevice and HW destination address Date: Wed, 24 Mar 2021 02:30:32 +0100 Message-Id: <20210324013055.5619-2-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds dev_fill_forward_path() which resolves the path to reach the real netdevice from the IP forwarding side. This function takes as input the netdevice and the destination hardware address and it walks down the devices calling .ndo_fill_forward_path() for each device until the real device is found. For instance, assuming the following topology: IP forwarding / \ br0 eth0 / \ eth1 eth2 . . . ethX ab:cd:ef:ab:cd:ef where eth1 and eth2 are bridge ports and eth0 provides WAN connectivity. ethX is the interface in another box which is connected to the eth1 bridge port. For packets going through IP forwarding to br0 whose destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path() provides the following path: br0 -> eth1 .ndo_fill_forward_path for br0 looks up at the FDB for the bridge port from the destination MAC address to get the bridge port eth1. This information allows to create a fast path that bypasses the classic bridge and IP forwarding paths, so packets go directly from the bridge port eth1 to eth0 (wan interface) and vice versa. fast path .------------------------. / \ | IP forwarding | | / \ \/ | br0 eth0 . / \ -> eth1 eth2 . . . ethX ab:cd:ef:ab:cd:ef Signed-off-by: Pablo Neira Ayuso --- v2: no changes. include/linux/netdevice.h | 27 +++++++++++++++++++++++ net/core/dev.c | 46 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7005ad80e8d1..f9ac960699a4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -848,6 +848,27 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, struct sk_buff *skb, struct net_device *sb_dev); +enum net_device_path_type { + DEV_PATH_ETHERNET = 0, +}; + +struct net_device_path { + enum net_device_path_type type; + const struct net_device *dev; +}; + +#define NET_DEVICE_PATH_STACK_MAX 5 + +struct net_device_path_stack { + int num_paths; + struct net_device_path path[NET_DEVICE_PATH_STACK_MAX]; +}; + +struct net_device_path_ctx { + const struct net_device *dev; + const u8 *daddr; +}; + enum tc_setup_type { TC_SETUP_QDISC_MQPRIO, TC_SETUP_CLSU32, @@ -1282,6 +1303,8 @@ struct netdev_net_notifier { * struct net_device *(*ndo_get_peer_dev)(struct net_device *dev); * If a device is paired with a peer device, return the peer instance. * The caller must be under RCU read context. + * int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path); + * Get the forwarding path to reach the real device from the HW destination address */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1488,6 +1511,8 @@ struct net_device_ops { int (*ndo_tunnel_ctl)(struct net_device *dev, struct ip_tunnel_parm *p, int cmd); struct net_device * (*ndo_get_peer_dev)(struct net_device *dev); + int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, + struct net_device_path *path); }; /** @@ -2870,6 +2895,8 @@ void dev_remove_offload(struct packet_offload *po); int dev_get_iflink(const struct net_device *dev); int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb); +int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr, + struct net_device_path_stack *stack); struct net_device *__dev_get_by_flags(struct net *net, unsigned short flags, unsigned short mask); struct net_device *dev_get_by_name(struct net *net, const char *name); diff --git a/net/core/dev.c b/net/core/dev.c index c9a496f5e687..8a03f71aecac 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -848,6 +848,52 @@ int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) } EXPORT_SYMBOL_GPL(dev_fill_metadata_dst); +static struct net_device_path *dev_fwd_path(struct net_device_path_stack *stack) +{ + int k = stack->num_paths++; + + if (WARN_ON_ONCE(k >= NET_DEVICE_PATH_STACK_MAX)) + return NULL; + + return &stack->path[k]; +} + +int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr, + struct net_device_path_stack *stack) +{ + const struct net_device *last_dev; + struct net_device_path_ctx ctx = { + .dev = dev, + .daddr = daddr, + }; + struct net_device_path *path; + int ret = 0; + + stack->num_paths = 0; + while (ctx.dev && ctx.dev->netdev_ops->ndo_fill_forward_path) { + last_dev = ctx.dev; + path = dev_fwd_path(stack); + if (!path) + return -1; + + memset(path, 0, sizeof(struct net_device_path)); + ret = ctx.dev->netdev_ops->ndo_fill_forward_path(&ctx, path); + if (ret < 0) + return -1; + + if (WARN_ON_ONCE(last_dev == ctx.dev)) + return -1; + } + path = dev_fwd_path(stack); + if (!path) + return -1; + path->type = DEV_PATH_ETHERNET; + path->dev = ctx.dev; + + return ret; +} +EXPORT_SYMBOL_GPL(dev_fill_forward_path); + /** * __dev_get_by_name - find a device by its name * @net: the applicable net namespace From patchwork Wed Mar 24 01:30:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408684 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 971E1C433C1 for ; Wed, 24 Mar 2021 01:32:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 52FFC619E9 for ; Wed, 24 Mar 2021 01:32:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234535AbhCXBbn (ORCPT ); Tue, 23 Mar 2021 21:31:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234180AbhCXBbK (ORCPT ); Tue, 23 Mar 2021 21:31:10 -0400 Received: from mail.netfilter.org (mail.netfilter.org [IPv6:2001:4b98:dc0:41:216:3eff:fe8c:2bda]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2A2B5C0613D8; Tue, 23 Mar 2021 18:31:10 -0700 (PDT) Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id 7BAA0630C3; Wed, 24 Mar 2021 02:31:01 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 02/24] net: 8021q: resolve forwarding path for vlan devices Date: Wed, 24 Mar 2021 02:30:33 +0100 Message-Id: <20210324013055.5619-3-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add .ndo_fill_forward_path for vlan devices. For instance, assuming the following topology: IP forwarding / \ eth0.100 eth0 | eth0 . . . ethX ab:cd:ef:ab:cd:ef For packets going through IP forwarding to eth0.100 whose destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path() provides the following path: eth0.100 -> eth0 Signed-off-by: Pablo Neira Ayuso --- v2: no changes. include/linux/netdevice.h | 7 +++++++ net/8021q/vlan_dev.c | 15 +++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f9ac960699a4..6263bc3f2422 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -850,11 +850,18 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, enum net_device_path_type { DEV_PATH_ETHERNET = 0, + DEV_PATH_VLAN, }; struct net_device_path { enum net_device_path_type type; const struct net_device *dev; + union { + struct { + u16 id; + __be16 proto; + } encap; + }; }; #define NET_DEVICE_PATH_STACK_MAX 5 diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index dc1a197792e6..1b1955a63f7f 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -776,6 +776,20 @@ static int vlan_dev_get_iflink(const struct net_device *dev) return real_dev->ifindex; } +static int vlan_dev_fill_forward_path(struct net_device_path_ctx *ctx, + struct net_device_path *path) +{ + struct vlan_dev_priv *vlan = vlan_dev_priv(ctx->dev); + + path->type = DEV_PATH_VLAN; + path->encap.id = vlan->vlan_id; + path->encap.proto = vlan->vlan_proto; + path->dev = ctx->dev; + ctx->dev = vlan->real_dev; + + return 0; +} + static const struct ethtool_ops vlan_ethtool_ops = { .get_link_ksettings = vlan_ethtool_get_link_ksettings, .get_drvinfo = vlan_ethtool_get_drvinfo, @@ -814,6 +828,7 @@ static const struct net_device_ops vlan_netdev_ops = { #endif .ndo_fix_features = vlan_dev_fix_features, .ndo_get_iflink = vlan_dev_get_iflink, + .ndo_fill_forward_path = vlan_dev_fill_forward_path, }; static void vlan_dev_free(struct net_device *dev) From patchwork Wed Mar 24 01:30:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408682 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C6B5C433F1 for ; Wed, 24 Mar 2021 01:32:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 69FDE619EE for ; Wed, 24 Mar 2021 01:32:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234583AbhCXBbu (ORCPT ); Tue, 23 Mar 2021 21:31:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234440AbhCXBbM (ORCPT ); Tue, 23 Mar 2021 21:31:12 -0400 Received: from mail.netfilter.org (mail.netfilter.org [IPv6:2001:4b98:dc0:41:216:3eff:fe8c:2bda]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B519CC061763; Tue, 23 Mar 2021 18:31:12 -0700 (PDT) Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id 68AB6630BB; Wed, 24 Mar 2021 02:31:04 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 06/24] net: dsa: resolve forwarding path for dsa slave ports Date: Wed, 24 Mar 2021 02:30:37 +0100 Message-Id: <20210324013055.5619-7-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Felix Fietkau Add .ndo_fill_forward_path for dsa slave port devices Signed-off-by: Felix Fietkau Signed-off-by: Pablo Neira Ayuso --- v2: no changes. include/linux/netdevice.h | 5 +++++ net/dsa/slave.c | 16 ++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cf1041756438..7ad7df75aaaa 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -853,6 +853,7 @@ enum net_device_path_type { DEV_PATH_VLAN, DEV_PATH_BRIDGE, DEV_PATH_PPPOE, + DEV_PATH_DSA, }; struct net_device_path { @@ -873,6 +874,10 @@ struct net_device_path { u16 vlan_id; __be16 vlan_proto; } bridge; + struct { + int port; + u16 proto; + } dsa; }; }; diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 992fcab4b552..df7d789236fe 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -1654,6 +1654,21 @@ static void dsa_slave_get_stats64(struct net_device *dev, dev_get_tstats64(dev, s); } +static int dsa_slave_fill_forward_path(struct net_device_path_ctx *ctx, + struct net_device_path *path) +{ + struct dsa_port *dp = dsa_slave_to_port(ctx->dev); + struct dsa_port *cpu_dp = dp->cpu_dp; + + path->dev = ctx->dev; + path->type = DEV_PATH_DSA; + path->dsa.proto = cpu_dp->tag_ops->proto; + path->dsa.port = dp->index; + ctx->dev = cpu_dp->master; + + return 0; +} + static const struct net_device_ops dsa_slave_netdev_ops = { .ndo_open = dsa_slave_open, .ndo_stop = dsa_slave_close, @@ -1679,6 +1694,7 @@ static const struct net_device_ops dsa_slave_netdev_ops = { .ndo_vlan_rx_kill_vid = dsa_slave_vlan_rx_kill_vid, .ndo_get_devlink_port = dsa_slave_get_devlink_port, .ndo_change_mtu = dsa_slave_change_mtu, + .ndo_fill_forward_path = dsa_slave_fill_forward_path, }; static struct device_type dsa_type = { From patchwork Wed Mar 24 01:30:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF7B2C43461 for ; Wed, 24 Mar 2021 01:32:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 84930619F3 for ; Wed, 24 Mar 2021 01:32:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234627AbhCXBb4 (ORCPT ); Tue, 23 Mar 2021 21:31:56 -0400 Received: from mail.netfilter.org ([217.70.188.207]:60560 "EHLO mail.netfilter.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234485AbhCXBbR (ORCPT ); Tue, 23 Mar 2021 21:31:17 -0400 Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id 18866630C3; Wed, 24 Mar 2021 02:31:05 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 07/24] netfilter: flowtable: add xmit path types Date: Wed, 24 Mar 2021 02:30:38 +0100 Message-Id: <20210324013055.5619-8-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add the xmit_type field that defines the two supported xmit paths in the flowtable data plane, which are the neighbour and the xfrm xmit paths. This patch prepares for new flowtable xmit path types to come. Signed-off-by: Pablo Neira Ayuso --- v2: resolve conflicts from rebasing on top of net-next: inconditional dst_check() call. include/net/netfilter/nf_flow_table.h | 11 +++++++++-- net/netfilter/nf_flow_table_core.c | 1 + net/netfilter/nf_flow_table_ip.c | 14 ++++++++------ net/netfilter/nft_flow_offload.c | 20 ++++++++++++++++++-- 4 files changed, 36 insertions(+), 10 deletions(-) diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index fb165697c8a1..828fcfbd5e6f 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -89,6 +89,11 @@ enum flow_offload_tuple_dir { }; #define FLOW_OFFLOAD_DIR_MAX IP_CT_DIR_MAX +enum flow_offload_xmit_type { + FLOW_OFFLOAD_XMIT_NEIGH = 0, + FLOW_OFFLOAD_XMIT_XFRM, +}; + struct flow_offload_tuple { union { struct in_addr src_v4; @@ -111,7 +116,8 @@ struct flow_offload_tuple { /* All members above are keys for lookups, see flow_offload_hash(). */ struct { } __hash; - u8 dir; + u8 dir:6, + xmit_type:2; u16 mtu; @@ -158,7 +164,8 @@ static inline __s32 nf_flow_timeout_delta(unsigned int timeout) struct nf_flow_route { struct { - struct dst_entry *dst; + struct dst_entry *dst; + enum flow_offload_xmit_type xmit_type; } tuple[FLOW_OFFLOAD_DIR_MAX]; }; diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 8ffd3f3c288c..573be4d1efb5 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -95,6 +95,7 @@ static int flow_offload_fill_route(struct flow_offload *flow, } flow_tuple->iifidx = other_dst->dev->ifindex; + flow_tuple->xmit_type = route->tuple[dir].xmit_type; flow_tuple->dst_cache = dst; return 0; diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c index 3be58b6d60af..e9bef38a356b 100644 --- a/net/netfilter/nf_flow_table_ip.c +++ b/net/netfilter/nf_flow_table_ip.c @@ -235,8 +235,6 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, dir = tuplehash->tuple.dir; flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]); - rt = (struct rtable *)flow->tuplehash[dir].tuple.dst_cache; - outdev = rt->dst.dev; if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu))) return NF_ACCEPT; @@ -265,13 +263,16 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, if (flow_table->flags & NF_FLOWTABLE_COUNTER) nf_ct_acct_update(flow->ct, tuplehash->tuple.dir, skb->len); - if (unlikely(dst_xfrm(&rt->dst))) { + rt = (struct rtable *)tuplehash->tuple.dst_cache; + + if (unlikely(tuplehash->tuple.xmit_type == FLOW_OFFLOAD_XMIT_XFRM)) { memset(skb->cb, 0, sizeof(struct inet_skb_parm)); IPCB(skb)->iif = skb->dev->ifindex; IPCB(skb)->flags = IPSKB_FORWARDED; return nf_flow_xmit_xfrm(skb, state, &rt->dst); } + outdev = rt->dst.dev; skb->dev = outdev; nexthop = rt_nexthop(rt, flow->tuplehash[!dir].tuple.src_v4.s_addr); skb_dst_set_noref(skb, &rt->dst); @@ -456,8 +457,6 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, dir = tuplehash->tuple.dir; flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]); - rt = (struct rt6_info *)flow->tuplehash[dir].tuple.dst_cache; - outdev = rt->dst.dev; if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu))) return NF_ACCEPT; @@ -485,13 +484,16 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, if (flow_table->flags & NF_FLOWTABLE_COUNTER) nf_ct_acct_update(flow->ct, tuplehash->tuple.dir, skb->len); - if (unlikely(dst_xfrm(&rt->dst))) { + rt = (struct rt6_info *)tuplehash->tuple.dst_cache; + + if (unlikely(tuplehash->tuple.xmit_type == FLOW_OFFLOAD_XMIT_XFRM)) { memset(skb->cb, 0, sizeof(struct inet6_skb_parm)); IP6CB(skb)->iif = skb->dev->ifindex; IP6CB(skb)->flags = IP6SKB_FORWARDED; return nf_flow_xmit_xfrm(skb, state, &rt->dst); } + outdev = rt->dst.dev; skb->dev = outdev; nexthop = rt6_nexthop(rt, &flow->tuplehash[!dir].tuple.src_v6); skb_dst_set_noref(skb, &rt->dst); diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index 3a6c84fb2c90..1da2bb24f6c0 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -19,6 +19,22 @@ struct nft_flow_offload { struct nft_flowtable *flowtable; }; +static enum flow_offload_xmit_type nft_xmit_type(struct dst_entry *dst) +{ + if (dst_xfrm(dst)) + return FLOW_OFFLOAD_XMIT_XFRM; + + return FLOW_OFFLOAD_XMIT_NEIGH; +} + +static void nft_default_forward_path(struct nf_flow_route *route, + struct dst_entry *dst_cache, + enum ip_conntrack_dir dir) +{ + route->tuple[dir].dst = dst_cache; + route->tuple[dir].xmit_type = nft_xmit_type(dst_cache); +} + static int nft_flow_route(const struct nft_pktinfo *pkt, const struct nf_conn *ct, struct nf_flow_route *route, @@ -44,8 +60,8 @@ static int nft_flow_route(const struct nft_pktinfo *pkt, if (!other_dst) return -ENOENT; - route->tuple[dir].dst = this_dst; - route->tuple[!dir].dst = other_dst; + nft_default_forward_path(route, this_dst, dir); + nft_default_forward_path(route, other_dst, !dir); return 0; } From patchwork Wed Mar 24 01:30:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408678 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19D90C4345E for ; Wed, 24 Mar 2021 01:32:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ED4BC619E9 for ; Wed, 24 Mar 2021 01:32:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234618AbhCXBbz (ORCPT ); Tue, 23 Mar 2021 21:31:55 -0400 Received: from mail.netfilter.org ([217.70.188.207]:60566 "EHLO mail.netfilter.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234486AbhCXBbR (ORCPT ); Tue, 23 Mar 2021 21:31:17 -0400 Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id B1257630CA; Wed, 24 Mar 2021 02:31:05 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 08/24] netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device Date: Wed, 24 Mar 2021 02:30:39 +0100 Message-Id: <20210324013055.5619-9-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Obtain the ingress device in the tuple from the route in the reply direction. Use dev_fill_forward_path() instead to get the real ingress device for this flow. Fall back to use the ingress device that the IP forwarding route provides if: - dev_fill_forward_path() finds no real ingress device. - the ingress device that is obtained is not part of the flowtable devices. - this route has a xfrm policy. Signed-off-by: Pablo Neira Ayuso --- v2: no changes. include/net/netfilter/nf_flow_table.h | 3 + net/netfilter/nf_flow_table_core.c | 3 +- net/netfilter/nft_flow_offload.c | 102 +++++++++++++++++++++++++- 3 files changed, 103 insertions(+), 5 deletions(-) diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 828fcfbd5e6f..dca9fc66405f 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -165,6 +165,9 @@ static inline __s32 nf_flow_timeout_delta(unsigned int timeout) struct nf_flow_route { struct { struct dst_entry *dst; + struct { + u32 ifindex; + } in; enum flow_offload_xmit_type xmit_type; } tuple[FLOW_OFFLOAD_DIR_MAX]; }; diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 573be4d1efb5..51e3e1b08e1c 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -79,7 +79,6 @@ static int flow_offload_fill_route(struct flow_offload *flow, enum flow_offload_tuple_dir dir) { struct flow_offload_tuple *flow_tuple = &flow->tuplehash[dir].tuple; - struct dst_entry *other_dst = route->tuple[!dir].dst; struct dst_entry *dst = route->tuple[dir].dst; if (!dst_hold_safe(route->tuple[dir].dst)) @@ -94,7 +93,7 @@ static int flow_offload_fill_route(struct flow_offload *flow, break; } - flow_tuple->iifidx = other_dst->dev->ifindex; + flow_tuple->iifidx = route->tuple[dir].in.ifindex; flow_tuple->xmit_type = route->tuple[dir].xmit_type; flow_tuple->dst_cache = dst; diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index 1da2bb24f6c0..15f90c31feb0 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -31,14 +31,104 @@ static void nft_default_forward_path(struct nf_flow_route *route, struct dst_entry *dst_cache, enum ip_conntrack_dir dir) { + route->tuple[!dir].in.ifindex = dst_cache->dev->ifindex; route->tuple[dir].dst = dst_cache; route->tuple[dir].xmit_type = nft_xmit_type(dst_cache); } +static int nft_dev_fill_forward_path(const struct nf_flow_route *route, + const struct dst_entry *dst_cache, + const struct nf_conn *ct, + enum ip_conntrack_dir dir, + struct net_device_path_stack *stack) +{ + const void *daddr = &ct->tuplehash[!dir].tuple.src.u3; + struct net_device *dev = dst_cache->dev; + unsigned char ha[ETH_ALEN]; + struct neighbour *n; + u8 nud_state; + + n = dst_neigh_lookup(dst_cache, daddr); + if (!n) + return -1; + + read_lock_bh(&n->lock); + nud_state = n->nud_state; + ether_addr_copy(ha, n->ha); + read_unlock_bh(&n->lock); + neigh_release(n); + + if (!(nud_state & NUD_VALID)) + return -1; + + return dev_fill_forward_path(dev, ha, stack); +} + +struct nft_forward_info { + const struct net_device *indev; +}; + +static void nft_dev_path_info(const struct net_device_path_stack *stack, + struct nft_forward_info *info) +{ + const struct net_device_path *path; + int i; + + for (i = 0; i < stack->num_paths; i++) { + path = &stack->path[i]; + switch (path->type) { + case DEV_PATH_ETHERNET: + info->indev = path->dev; + break; + case DEV_PATH_VLAN: + case DEV_PATH_BRIDGE: + default: + info->indev = NULL; + break; + } + } +} + +static bool nft_flowtable_find_dev(const struct net_device *dev, + struct nft_flowtable *ft) +{ + struct nft_hook *hook; + bool found = false; + + list_for_each_entry_rcu(hook, &ft->hook_list, list) { + if (hook->ops.dev != dev) + continue; + + found = true; + break; + } + + return found; +} + +static void nft_dev_forward_path(struct nf_flow_route *route, + const struct nf_conn *ct, + enum ip_conntrack_dir dir, + struct nft_flowtable *ft) +{ + const struct dst_entry *dst = route->tuple[dir].dst; + struct net_device_path_stack stack; + struct nft_forward_info info = {}; + + if (nft_dev_fill_forward_path(route, dst, ct, dir, &stack) >= 0) + nft_dev_path_info(&stack, &info); + + if (!info.indev || !nft_flowtable_find_dev(info.indev, ft)) + return; + + route->tuple[!dir].in.ifindex = info.indev->ifindex; +} + static int nft_flow_route(const struct nft_pktinfo *pkt, const struct nf_conn *ct, struct nf_flow_route *route, - enum ip_conntrack_dir dir) + enum ip_conntrack_dir dir, + struct nft_flowtable *ft) { struct dst_entry *this_dst = skb_dst(pkt->skb); struct dst_entry *other_dst = NULL; @@ -63,6 +153,12 @@ static int nft_flow_route(const struct nft_pktinfo *pkt, nft_default_forward_path(route, this_dst, dir); nft_default_forward_path(route, other_dst, !dir); + if (route->tuple[dir].xmit_type == FLOW_OFFLOAD_XMIT_NEIGH && + route->tuple[!dir].xmit_type == FLOW_OFFLOAD_XMIT_NEIGH) { + nft_dev_forward_path(route, ct, dir, ft); + nft_dev_forward_path(route, ct, !dir, ft); + } + return 0; } @@ -90,8 +186,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr, struct nft_flow_offload *priv = nft_expr_priv(expr); struct nf_flowtable *flowtable = &priv->flowtable->data; struct tcphdr _tcph, *tcph = NULL; + struct nf_flow_route route = {}; enum ip_conntrack_info ctinfo; - struct nf_flow_route route; struct flow_offload *flow; enum ip_conntrack_dir dir; struct nf_conn *ct; @@ -128,7 +224,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr, goto out; dir = CTINFO2DIR(ctinfo); - if (nft_flow_route(pkt, ct, &route, dir) < 0) + if (nft_flow_route(pkt, ct, &route, dir, priv->flowtable) < 0) goto err_flow_route; flow = flow_offload_alloc(ct); From patchwork Wed Mar 24 01:30:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408680 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 466E9C4345A for ; Wed, 24 Mar 2021 01:32:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3383F619F8 for ; Wed, 24 Mar 2021 01:32:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234609AbhCXBby (ORCPT ); Tue, 23 Mar 2021 21:31:54 -0400 Received: from mail.netfilter.org ([217.70.188.207]:60574 "EHLO mail.netfilter.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234491AbhCXBbR (ORCPT ); Tue, 23 Mar 2021 21:31:17 -0400 Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id EA358630BB; Wed, 24 Mar 2021 02:31:06 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next,v2 10/24] netfilter: flowtable: add vlan support Date: Wed, 24 Mar 2021 02:30:41 +0100 Message-Id: <20210324013055.5619-11-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add the vlan id and protocol to the flow tuple to uniquely identify flows from the receive path. For the transmit path, dev_hard_header() on the vlan device push the headers. This patch includes support for two vlan headers (QinQ) from the ingress path. Add a generic encap field to the flowtable entry which stores the protocol and the tag id. This allows to reuse these fields in the PPPoE support coming in a later patch. Signed-off-by: Pablo Neira Ayuso --- v2: rebase on top of net-next. Calculate offset to layer 3 header from nf_flow_skb_encap_protocol(). Pass offset to build_tuple function. include/net/netfilter/nf_flow_table.h | 17 +++- net/netfilter/nf_flow_table_core.c | 7 ++ net/netfilter/nf_flow_table_ip.c | 121 ++++++++++++++++++++------ net/netfilter/nft_flow_offload.c | 26 +++++- 4 files changed, 142 insertions(+), 29 deletions(-) diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 41c8436bc77e..e34fd3eb4bb5 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -95,6 +95,8 @@ enum flow_offload_xmit_type { FLOW_OFFLOAD_XMIT_DIRECT, }; +#define NF_FLOW_TABLE_ENCAP_MAX 2 + struct flow_offload_tuple { union { struct in_addr src_v4; @@ -113,13 +115,17 @@ struct flow_offload_tuple { u8 l3proto; u8 l4proto; + struct { + u16 id; + __be16 proto; + } encap[NF_FLOW_TABLE_ENCAP_MAX]; /* All members above are keys for lookups, see flow_offload_hash(). */ struct { } __hash; - u8 dir:6, - xmit_type:2; - + u8 dir:4, + xmit_type:2, + encap_num:2; u16 mtu; union { struct dst_entry *dst_cache; @@ -174,6 +180,11 @@ struct nf_flow_route { struct dst_entry *dst; struct { u32 ifindex; + struct { + u16 id; + __be16 proto; + } encap[NF_FLOW_TABLE_ENCAP_MAX]; + u8 num_encaps; } in; struct { u32 ifindex; diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index a92acb3ed019..595f4434b84d 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -80,6 +80,7 @@ static int flow_offload_fill_route(struct flow_offload *flow, { struct flow_offload_tuple *flow_tuple = &flow->tuplehash[dir].tuple; struct dst_entry *dst = route->tuple[dir].dst; + int i, j = 0; switch (flow_tuple->l3proto) { case NFPROTO_IPV4: @@ -91,6 +92,12 @@ static int flow_offload_fill_route(struct flow_offload *flow, } flow_tuple->iifidx = route->tuple[dir].in.ifindex; + for (i = route->tuple[dir].in.num_encaps - 1; i >= 0; i--) { + flow_tuple->encap[j].id = route->tuple[dir].in.encap[i].id; + flow_tuple->encap[j].proto = route->tuple[dir].in.encap[i].proto; + j++; + } + flow_tuple->encap_num = route->tuple[dir].in.num_encaps; switch (route->tuple[dir].xmit_type) { case FLOW_OFFLOAD_XMIT_DIRECT: diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c index 9e84767a7e9b..d90636295b0d 100644 --- a/net/netfilter/nf_flow_table_ip.c +++ b/net/netfilter/nf_flow_table_ip.c @@ -136,23 +136,44 @@ static bool ip_has_options(unsigned int thoff) return thoff != sizeof(struct iphdr); } +static void nf_flow_tuple_encap(struct sk_buff *skb, + struct flow_offload_tuple *tuple) +{ + int i = 0; + + if (skb_vlan_tag_present(skb)) { + tuple->encap[i].id = skb_vlan_tag_get(skb); + tuple->encap[i].proto = skb->vlan_proto; + i++; + } + if (skb->protocol == htons(ETH_P_8021Q)) { + struct vlan_ethhdr *veth = (struct vlan_ethhdr *)skb_mac_header(skb); + + tuple->encap[i].id = ntohs(veth->h_vlan_TCI); + tuple->encap[i].proto = skb->protocol; + } +} + static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev, - struct flow_offload_tuple *tuple, u32 *hdrsize) + struct flow_offload_tuple *tuple, u32 *hdrsize, + u32 offset) { struct flow_ports *ports; unsigned int thoff; struct iphdr *iph; - if (!pskb_may_pull(skb, sizeof(*iph))) + if (!pskb_may_pull(skb, sizeof(*iph) + offset)) return -1; - iph = ip_hdr(skb); - thoff = iph->ihl * 4; + iph = (struct iphdr *)(skb_network_header(skb) + offset); + thoff = (iph->ihl * 4); if (ip_is_fragment(iph) || unlikely(ip_has_options(thoff))) return -1; + thoff += offset; + switch (iph->protocol) { case IPPROTO_TCP: *hdrsize = sizeof(struct tcphdr); @@ -167,11 +188,10 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev, if (iph->ttl <= 1) return -1; - thoff = iph->ihl * 4; if (!pskb_may_pull(skb, thoff + *hdrsize)) return -1; - iph = ip_hdr(skb); + iph = (struct iphdr *)(skb_network_header(skb) + offset); ports = (struct flow_ports *)(skb_network_header(skb) + thoff); tuple->src_v4.s_addr = iph->saddr; @@ -181,6 +201,7 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev, tuple->l3proto = AF_INET; tuple->l4proto = iph->protocol; tuple->iifidx = dev->ifindex; + nf_flow_tuple_encap(skb, tuple); return 0; } @@ -207,6 +228,43 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb, return NF_STOLEN; } +static bool nf_flow_skb_encap_protocol(const struct sk_buff *skb, __be16 proto, + u32 *offset) +{ + if (skb->protocol == htons(ETH_P_8021Q)) { + struct vlan_ethhdr *veth; + + veth = (struct vlan_ethhdr *)skb_mac_header(skb); + if (veth->h_vlan_encapsulated_proto == proto) { + *offset += VLAN_HLEN; + return true; + } + } + + return false; +} + +static void nf_flow_encap_pop(struct sk_buff *skb, + struct flow_offload_tuple_rhash *tuplehash) +{ + struct vlan_hdr *vlan_hdr; + int i; + + for (i = 0; i < tuplehash->tuple.encap_num; i++) { + if (skb_vlan_tag_present(skb)) { + __vlan_hwaccel_clear_tag(skb); + continue; + } + if (skb->protocol == htons(ETH_P_8021Q)) { + vlan_hdr = (struct vlan_hdr *)skb->data; + __skb_pull(skb, VLAN_HLEN); + vlan_set_encap_proto(skb, vlan_hdr); + skb_reset_network_header(skb); + break; + } + } +} + static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb, const struct flow_offload_tuple_rhash *tuplehash, unsigned short type) @@ -235,17 +293,18 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, enum flow_offload_tuple_dir dir; struct flow_offload *flow; struct net_device *outdev; + u32 hdrsize, offset = 0; + unsigned int thoff, mtu; struct rtable *rt; - unsigned int thoff; struct iphdr *iph; __be32 nexthop; - u32 hdrsize; int ret; - if (skb->protocol != htons(ETH_P_IP)) + if (skb->protocol != htons(ETH_P_IP) && + !nf_flow_skb_encap_protocol(skb, htons(ETH_P_IP), &offset)) return NF_ACCEPT; - if (nf_flow_tuple_ip(skb, state->in, &tuple, &hdrsize) < 0) + if (nf_flow_tuple_ip(skb, state->in, &tuple, &hdrsize, offset) < 0) return NF_ACCEPT; tuplehash = flow_offload_lookup(flow_table, &tuple); @@ -255,11 +314,12 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, dir = tuplehash->tuple.dir; flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]); - if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu))) + mtu = flow->tuplehash[dir].tuple.mtu + offset; + if (unlikely(nf_flow_exceeds_mtu(skb, mtu))) return NF_ACCEPT; - iph = ip_hdr(skb); - thoff = iph->ihl * 4; + iph = (struct iphdr *)(skb_network_header(skb) + offset); + thoff = (iph->ihl * 4) + offset; if (nf_flow_state_check(flow, iph->protocol, skb, thoff)) return NF_ACCEPT; @@ -277,6 +337,9 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, flow_offload_refresh(flow_table, flow); + nf_flow_encap_pop(skb, tuplehash); + thoff -= offset; + iph = ip_hdr(skb); nf_flow_nat_ip(flow, skb, thoff, dir, iph); @@ -418,16 +481,18 @@ static void nf_flow_nat_ipv6(const struct flow_offload *flow, } static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev, - struct flow_offload_tuple *tuple, u32 *hdrsize) + struct flow_offload_tuple *tuple, u32 *hdrsize, + u32 offset) { struct flow_ports *ports; struct ipv6hdr *ip6h; unsigned int thoff; - if (!pskb_may_pull(skb, sizeof(*ip6h))) + thoff = sizeof(*ip6h) + offset; + if (!pskb_may_pull(skb, thoff)) return -1; - ip6h = ipv6_hdr(skb); + ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset); switch (ip6h->nexthdr) { case IPPROTO_TCP: @@ -443,11 +508,10 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev, if (ip6h->hop_limit <= 1) return -1; - thoff = sizeof(*ip6h); if (!pskb_may_pull(skb, thoff + *hdrsize)) return -1; - ip6h = ipv6_hdr(skb); + ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset); ports = (struct flow_ports *)(skb_network_header(skb) + thoff); tuple->src_v6 = ip6h->saddr; @@ -457,6 +521,7 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev, tuple->l3proto = AF_INET6; tuple->l4proto = ip6h->nexthdr; tuple->iifidx = dev->ifindex; + nf_flow_tuple_encap(skb, tuple); return 0; } @@ -472,15 +537,17 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, const struct in6_addr *nexthop; struct flow_offload *flow; struct net_device *outdev; + unsigned int thoff, mtu; + u32 hdrsize, offset = 0; struct ipv6hdr *ip6h; struct rt6_info *rt; - u32 hdrsize; int ret; - if (skb->protocol != htons(ETH_P_IPV6)) + if (skb->protocol != htons(ETH_P_IPV6) && + !nf_flow_skb_encap_protocol(skb, htons(ETH_P_IPV6), &offset)) return NF_ACCEPT; - if (nf_flow_tuple_ipv6(skb, state->in, &tuple, &hdrsize) < 0) + if (nf_flow_tuple_ipv6(skb, state->in, &tuple, &hdrsize, offset) < 0) return NF_ACCEPT; tuplehash = flow_offload_lookup(flow_table, &tuple); @@ -490,11 +557,13 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, dir = tuplehash->tuple.dir; flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]); - if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu))) + mtu = flow->tuplehash[dir].tuple.mtu + offset; + if (unlikely(nf_flow_exceeds_mtu(skb, mtu))) return NF_ACCEPT; - if (nf_flow_state_check(flow, ipv6_hdr(skb)->nexthdr, skb, - sizeof(*ip6h))) + ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset); + thoff = sizeof(*ip6h) + offset; + if (nf_flow_state_check(flow, ip6h->nexthdr, skb, thoff)) return NF_ACCEPT; if (tuplehash->tuple.xmit_type == FLOW_OFFLOAD_XMIT_NEIGH || @@ -506,11 +575,13 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, } } - if (skb_try_make_writable(skb, sizeof(*ip6h) + hdrsize)) + if (skb_try_make_writable(skb, thoff + hdrsize)) return NF_DROP; flow_offload_refresh(flow_table, flow); + nf_flow_encap_pop(skb, tuplehash); + ip6h = ipv6_hdr(skb); nf_flow_nat_ipv6(flow, skb, dir, ip6h); diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index a6595dca1b1f..8392b1a8108b 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -66,6 +66,11 @@ static int nft_dev_fill_forward_path(const struct nf_flow_route *route, struct nft_forward_info { const struct net_device *indev; const struct net_device *outdev; + struct id { + __u16 id; + __be16 proto; + } encap[NF_FLOW_TABLE_ENCAP_MAX]; + u8 num_encaps; u8 h_source[ETH_ALEN]; u8 h_dest[ETH_ALEN]; enum flow_offload_xmit_type xmit_type; @@ -84,9 +89,23 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack, path = &stack->path[i]; switch (path->type) { case DEV_PATH_ETHERNET: + case DEV_PATH_VLAN: info->indev = path->dev; if (is_zero_ether_addr(info->h_source)) memcpy(info->h_source, path->dev->dev_addr, ETH_ALEN); + + if (path->type == DEV_PATH_ETHERNET) + break; + + /* DEV_PATH_VLAN */ + if (info->num_encaps >= NF_FLOW_TABLE_ENCAP_MAX) { + info->indev = NULL; + break; + } + info->outdev = path->dev; + info->encap[info->num_encaps].id = path->encap.id; + info->encap[info->num_encaps].proto = path->encap.proto; + info->num_encaps++; break; case DEV_PATH_BRIDGE: if (is_zero_ether_addr(info->h_source)) @@ -94,7 +113,6 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack, info->xmit_type = FLOW_OFFLOAD_XMIT_DIRECT; break; - case DEV_PATH_VLAN: default: info->indev = NULL; break; @@ -130,6 +148,7 @@ static void nft_dev_forward_path(struct nf_flow_route *route, struct net_device_path_stack stack; struct nft_forward_info info = {}; unsigned char ha[ETH_ALEN]; + int i; if (nft_dev_fill_forward_path(route, dst, ct, dir, ha, &stack) >= 0) nft_dev_path_info(&stack, &info, ha); @@ -138,6 +157,11 @@ static void nft_dev_forward_path(struct nf_flow_route *route, return; route->tuple[!dir].in.ifindex = info.indev->ifindex; + for (i = 0; i < info.num_encaps; i++) { + route->tuple[!dir].in.encap[i].id = info.encap[i].id; + route->tuple[!dir].in.encap[i].proto = info.encap[i].proto; + } + route->tuple[!dir].in.num_encaps = info.num_encaps; if (info.xmit_type == FLOW_OFFLOAD_XMIT_DIRECT) { memcpy(route->tuple[dir].out.h_source, info.h_source, ETH_ALEN); From patchwork Wed Mar 24 01:30:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A2A5C433FC for ; Wed, 24 Mar 2021 01:32:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DB9C7619E3 for ; Wed, 24 Mar 2021 01:32:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234593AbhCXBbv (ORCPT ); Tue, 23 Mar 2021 21:31:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234505AbhCXBbQ (ORCPT ); Tue, 23 Mar 2021 21:31:16 -0400 Received: from mail.netfilter.org (mail.netfilter.org [IPv6:2001:4b98:dc0:41:216:3eff:fe8c:2bda]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 36B59C061763; Tue, 23 Mar 2021 18:31:16 -0700 (PDT) Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id 871FA63128; Wed, 24 Mar 2021 02:31:07 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 11/24] netfilter: flowtable: add bridge vlan filtering support Date: Wed, 24 Mar 2021 02:30:42 +0100 Message-Id: <20210324013055.5619-12-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add the vlan tag based when PVID is set on. Signed-off-by: Pablo Neira Ayuso --- v2: no changes. net/netfilter/nft_flow_offload.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index 8392b1a8108b..651364d93efd 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -111,6 +111,18 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack, if (is_zero_ether_addr(info->h_source)) memcpy(info->h_source, path->dev->dev_addr, ETH_ALEN); + switch (path->bridge.vlan_mode) { + case DEV_PATH_BR_VLAN_TAG: + info->encap[info->num_encaps].id = path->bridge.vlan_id; + info->encap[info->num_encaps].proto = path->bridge.vlan_proto; + info->num_encaps++; + break; + case DEV_PATH_BR_VLAN_UNTAG: + info->num_encaps--; + break; + case DEV_PATH_BR_VLAN_KEEP: + break; + } info->xmit_type = FLOW_OFFLOAD_XMIT_DIRECT; break; default: From patchwork Wed Mar 24 01:30:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408676 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAE24C433E1 for ; Wed, 24 Mar 2021 01:32:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8E56B619EB for ; Wed, 24 Mar 2021 01:32:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234663AbhCXBcA (ORCPT ); Tue, 23 Mar 2021 21:32:00 -0400 Received: from mail.netfilter.org ([217.70.188.207]:60586 "EHLO mail.netfilter.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234509AbhCXBbT (ORCPT ); Tue, 23 Mar 2021 21:31:19 -0400 Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id BE4B662C0E; Wed, 24 Mar 2021 02:31:08 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next,v2 13/24] netfilter: flowtable: add dsa support Date: Wed, 24 Mar 2021 02:30:44 +0100 Message-Id: <20210324013055.5619-14-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Replace the master ethernet device by the dsa slave port. Packets coming in from the software ingress path use the dsa slave port as input device. Signed-off-by: Pablo Neira Ayuso --- v2: no changes. net/netfilter/nft_flow_offload.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index 81a5e2b6c901..143d049fd7f1 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -89,6 +89,7 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack, path = &stack->path[i]; switch (path->type) { case DEV_PATH_ETHERNET: + case DEV_PATH_DSA: case DEV_PATH_VLAN: case DEV_PATH_PPPOE: info->indev = path->dev; @@ -97,6 +98,10 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack, if (path->type == DEV_PATH_ETHERNET) break; + if (path->type == DEV_PATH_DSA) { + i = stack->num_paths; + break; + } /* DEV_PATH_VLAN and DEV_PATH_PPPOE */ if (info->num_encaps >= NF_FLOW_TABLE_ENCAP_MAX) { From patchwork Wed Mar 24 01:30:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408674 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFA70C43332 for ; Wed, 24 Mar 2021 01:32:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A7ED0619FF for ; Wed, 24 Mar 2021 01:32:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234671AbhCXBcB (ORCPT ); Tue, 23 Mar 2021 21:32:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50994 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234515AbhCXBbT (ORCPT ); Tue, 23 Mar 2021 21:31:19 -0400 Received: from mail.netfilter.org (mail.netfilter.org [IPv6:2001:4b98:dc0:41:216:3eff:fe8c:2bda]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7B3E9C061763; Tue, 23 Mar 2021 18:31:18 -0700 (PDT) Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id D82316312B; Wed, 24 Mar 2021 02:31:09 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 15/24] netfilter: flowtable: add offload support for xmit path types Date: Wed, 24 Mar 2021 02:30:46 +0100 Message-Id: <20210324013055.5619-16-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When the flow tuple xmit_type is set to FLOW_OFFLOAD_XMIT_DIRECT, the dst_cache pointer is not valid, and the h_source/h_dest/ifidx out fields need to be used. This patch also adds the FLOW_ACTION_VLAN_PUSH action to pass the VLAN tag to the driver. Signed-off-by: Pablo Neira Ayuso --- v2: no changes. net/netfilter/nf_flow_table_offload.c | 166 +++++++++++++++++++------- 1 file changed, 124 insertions(+), 42 deletions(-) diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index 1b979c8b3ba0..3fbed447132a 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -177,28 +177,45 @@ static int flow_offload_eth_src(struct net *net, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { - const struct flow_offload_tuple *tuple = &flow->tuplehash[!dir].tuple; struct flow_action_entry *entry0 = flow_action_entry_next(flow_rule); struct flow_action_entry *entry1 = flow_action_entry_next(flow_rule); - struct net_device *dev; + const struct flow_offload_tuple *other_tuple, *this_tuple; + struct net_device *dev = NULL; + const unsigned char *addr; u32 mask, val; u16 val16; - dev = dev_get_by_index(net, tuple->iifidx); - if (!dev) - return -ENOENT; + this_tuple = &flow->tuplehash[dir].tuple; + + switch (this_tuple->xmit_type) { + case FLOW_OFFLOAD_XMIT_DIRECT: + addr = this_tuple->out.h_source; + break; + case FLOW_OFFLOAD_XMIT_NEIGH: + other_tuple = &flow->tuplehash[!dir].tuple; + dev = dev_get_by_index(net, other_tuple->iifidx); + if (!dev) + return -ENOENT; + + addr = dev->dev_addr; + break; + default: + return -EOPNOTSUPP; + } mask = ~0xffff0000; - memcpy(&val16, dev->dev_addr, 2); + memcpy(&val16, addr, 2); val = val16 << 16; flow_offload_mangle(entry0, FLOW_ACT_MANGLE_HDR_TYPE_ETH, 4, &val, &mask); mask = ~0xffffffff; - memcpy(&val, dev->dev_addr + 2, 4); + memcpy(&val, addr + 2, 4); flow_offload_mangle(entry1, FLOW_ACT_MANGLE_HDR_TYPE_ETH, 8, &val, &mask); - dev_put(dev); + + if (dev) + dev_put(dev); return 0; } @@ -210,27 +227,40 @@ static int flow_offload_eth_dst(struct net *net, { struct flow_action_entry *entry0 = flow_action_entry_next(flow_rule); struct flow_action_entry *entry1 = flow_action_entry_next(flow_rule); - const void *daddr = &flow->tuplehash[!dir].tuple.src_v4; + const struct flow_offload_tuple *other_tuple, *this_tuple; const struct dst_entry *dst_cache; unsigned char ha[ETH_ALEN]; struct neighbour *n; + const void *daddr; u32 mask, val; u8 nud_state; u16 val16; - dst_cache = flow->tuplehash[dir].tuple.dst_cache; - n = dst_neigh_lookup(dst_cache, daddr); - if (!n) - return -ENOENT; + this_tuple = &flow->tuplehash[dir].tuple; - read_lock_bh(&n->lock); - nud_state = n->nud_state; - ether_addr_copy(ha, n->ha); - read_unlock_bh(&n->lock); - - if (!(nud_state & NUD_VALID)) { + switch (this_tuple->xmit_type) { + case FLOW_OFFLOAD_XMIT_DIRECT: + ether_addr_copy(ha, this_tuple->out.h_dest); + break; + case FLOW_OFFLOAD_XMIT_NEIGH: + other_tuple = &flow->tuplehash[!dir].tuple; + daddr = &other_tuple->src_v4; + dst_cache = this_tuple->dst_cache; + n = dst_neigh_lookup(dst_cache, daddr); + if (!n) + return -ENOENT; + + read_lock_bh(&n->lock); + nud_state = n->nud_state; + ether_addr_copy(ha, n->ha); + read_unlock_bh(&n->lock); neigh_release(n); - return -ENOENT; + + if (!(nud_state & NUD_VALID)) + return -ENOENT; + break; + default: + return -EOPNOTSUPP; } mask = ~0xffffffff; @@ -243,7 +273,6 @@ static int flow_offload_eth_dst(struct net *net, val = val16; flow_offload_mangle(entry1, FLOW_ACT_MANGLE_HDR_TYPE_ETH, 4, &val, &mask); - neigh_release(n); return 0; } @@ -465,27 +494,52 @@ static void flow_offload_ipv4_checksum(struct net *net, } } -static void flow_offload_redirect(const struct flow_offload *flow, +static void flow_offload_redirect(struct net *net, + const struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { - struct flow_action_entry *entry = flow_action_entry_next(flow_rule); - struct rtable *rt; + const struct flow_offload_tuple *this_tuple, *other_tuple; + struct flow_action_entry *entry; + struct net_device *dev; + int ifindex; - rt = (struct rtable *)flow->tuplehash[dir].tuple.dst_cache; + this_tuple = &flow->tuplehash[dir].tuple; + switch (this_tuple->xmit_type) { + case FLOW_OFFLOAD_XMIT_DIRECT: + this_tuple = &flow->tuplehash[dir].tuple; + ifindex = this_tuple->out.ifidx; + break; + case FLOW_OFFLOAD_XMIT_NEIGH: + other_tuple = &flow->tuplehash[!dir].tuple; + ifindex = other_tuple->iifidx; + break; + default: + return; + } + + dev = dev_get_by_index(net, ifindex); + if (!dev) + return; + + entry = flow_action_entry_next(flow_rule); entry->id = FLOW_ACTION_REDIRECT; - entry->dev = rt->dst.dev; - dev_hold(rt->dst.dev); + entry->dev = dev; } static void flow_offload_encap_tunnel(const struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { + const struct flow_offload_tuple *this_tuple; struct flow_action_entry *entry; struct dst_entry *dst; - dst = flow->tuplehash[dir].tuple.dst_cache; + this_tuple = &flow->tuplehash[dir].tuple; + if (this_tuple->xmit_type == FLOW_OFFLOAD_XMIT_DIRECT) + return; + + dst = this_tuple->dst_cache; if (dst && dst->lwtstate) { struct ip_tunnel_info *tun_info; @@ -502,10 +556,15 @@ static void flow_offload_decap_tunnel(const struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { + const struct flow_offload_tuple *other_tuple; struct flow_action_entry *entry; struct dst_entry *dst; - dst = flow->tuplehash[!dir].tuple.dst_cache; + other_tuple = &flow->tuplehash[!dir].tuple; + if (other_tuple->xmit_type == FLOW_OFFLOAD_XMIT_DIRECT) + return; + + dst = other_tuple->dst_cache; if (dst && dst->lwtstate) { struct ip_tunnel_info *tun_info; @@ -517,10 +576,14 @@ static void flow_offload_decap_tunnel(const struct flow_offload *flow, } } -int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow, - enum flow_offload_tuple_dir dir, - struct nf_flow_rule *flow_rule) +static int +nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow, + enum flow_offload_tuple_dir dir, + struct nf_flow_rule *flow_rule) { + const struct flow_offload_tuple *other_tuple; + int i; + flow_offload_decap_tunnel(flow, dir, flow_rule); flow_offload_encap_tunnel(flow, dir, flow_rule); @@ -528,6 +591,26 @@ int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow, flow_offload_eth_dst(net, flow, dir, flow_rule) < 0) return -1; + other_tuple = &flow->tuplehash[!dir].tuple; + + for (i = 0; i < other_tuple->encap_num; i++) { + struct flow_action_entry *entry = flow_action_entry_next(flow_rule); + + entry->id = FLOW_ACTION_VLAN_PUSH; + entry->vlan.vid = other_tuple->encap[i].id; + entry->vlan.proto = other_tuple->encap[i].proto; + } + + return 0; +} + +int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow, + enum flow_offload_tuple_dir dir, + struct nf_flow_rule *flow_rule) +{ + if (nf_flow_rule_route_common(net, flow, dir, flow_rule) < 0) + return -1; + if (test_bit(NF_FLOW_SNAT, &flow->flags)) { flow_offload_ipv4_snat(net, flow, dir, flow_rule); flow_offload_port_snat(net, flow, dir, flow_rule); @@ -540,7 +623,7 @@ int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow, test_bit(NF_FLOW_DNAT, &flow->flags)) flow_offload_ipv4_checksum(net, flow, flow_rule); - flow_offload_redirect(flow, dir, flow_rule); + flow_offload_redirect(net, flow, dir, flow_rule); return 0; } @@ -550,11 +633,7 @@ int nf_flow_rule_route_ipv6(struct net *net, const struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { - flow_offload_decap_tunnel(flow, dir, flow_rule); - flow_offload_encap_tunnel(flow, dir, flow_rule); - - if (flow_offload_eth_src(net, flow, dir, flow_rule) < 0 || - flow_offload_eth_dst(net, flow, dir, flow_rule) < 0) + if (nf_flow_rule_route_common(net, flow, dir, flow_rule) < 0) return -1; if (test_bit(NF_FLOW_SNAT, &flow->flags)) { @@ -566,7 +645,7 @@ int nf_flow_rule_route_ipv6(struct net *net, const struct flow_offload *flow, flow_offload_port_dnat(net, flow, dir, flow_rule); } - flow_offload_redirect(flow, dir, flow_rule); + flow_offload_redirect(net, flow, dir, flow_rule); return 0; } @@ -580,10 +659,10 @@ nf_flow_offload_rule_alloc(struct net *net, enum flow_offload_tuple_dir dir) { const struct nf_flowtable *flowtable = offload->flowtable; + const struct flow_offload_tuple *tuple, *other_tuple; const struct flow_offload *flow = offload->flow; - const struct flow_offload_tuple *tuple; + struct dst_entry *other_dst = NULL; struct nf_flow_rule *flow_rule; - struct dst_entry *other_dst; int err = -ENOMEM; flow_rule = kzalloc(sizeof(*flow_rule), GFP_KERNEL); @@ -599,7 +678,10 @@ nf_flow_offload_rule_alloc(struct net *net, flow_rule->rule->match.key = &flow_rule->match.key; tuple = &flow->tuplehash[dir].tuple; - other_dst = flow->tuplehash[!dir].tuple.dst_cache; + other_tuple = &flow->tuplehash[!dir].tuple; + if (other_tuple->xmit_type == FLOW_OFFLOAD_XMIT_NEIGH) + other_dst = other_tuple->dst_cache; + err = nf_flow_rule_match(&flow_rule->match, tuple, other_dst); if (err < 0) goto err_flow_match; From patchwork Wed Mar 24 01:30:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09899C43462 for ; Wed, 24 Mar 2021 01:32:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B19EA619F8 for ; Wed, 24 Mar 2021 01:32:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234644AbhCXBb6 (ORCPT ); Tue, 23 Mar 2021 21:31:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234521AbhCXBbU (ORCPT ); Tue, 23 Mar 2021 21:31:20 -0400 Received: from mail.netfilter.org (mail.netfilter.org [IPv6:2001:4b98:dc0:41:216:3eff:fe8c:2bda]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 58602C061763; Tue, 23 Mar 2021 18:31:20 -0700 (PDT) Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id 0256963129; Wed, 24 Mar 2021 02:31:11 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 19/24] netfilter: flowtable: support for FLOW_ACTION_PPPOE_PUSH Date: Wed, 24 Mar 2021 02:30:50 +0100 Message-Id: <20210324013055.5619-20-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add a PPPoE push action if layer 2 protocol is ETH_P_PPP_SES to add PPPoE flowtable hardware offload support. Signed-off-by: Pablo Neira Ayuso --- v2: no changes. net/netfilter/nf_flow_table_offload.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index 9326ba74745e..7d0d128407be 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -600,9 +600,18 @@ nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow, continue; entry = flow_action_entry_next(flow_rule); - entry->id = FLOW_ACTION_VLAN_PUSH; - entry->vlan.vid = other_tuple->encap[i].id; - entry->vlan.proto = other_tuple->encap[i].proto; + + switch (other_tuple->encap[i].proto) { + case htons(ETH_P_PPP_SES): + entry->id = FLOW_ACTION_PPPOE_PUSH; + entry->pppoe.sid = other_tuple->encap[i].id; + break; + case htons(ETH_P_8021Q): + entry->id = FLOW_ACTION_VLAN_PUSH; + entry->vlan.vid = other_tuple->encap[i].id; + entry->vlan.proto = other_tuple->encap[i].proto; + break; + } } return 0; From patchwork Wed Mar 24 01:30:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2EE8C432C3 for ; Wed, 24 Mar 2021 01:32:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C03D4619EF for ; Wed, 24 Mar 2021 01:32:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234675AbhCXBcD (ORCPT ); Tue, 23 Mar 2021 21:32:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234522AbhCXBbV (ORCPT ); Tue, 23 Mar 2021 21:31:21 -0400 Received: from mail.netfilter.org (mail.netfilter.org [IPv6:2001:4b98:dc0:41:216:3eff:fe8c:2bda]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D8B0AC061763; Tue, 23 Mar 2021 18:31:20 -0700 (PDT) Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id 8954B62C0E; Wed, 24 Mar 2021 02:31:12 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next,v2 20/24] dsa: slave: add support for TC_SETUP_FT Date: Wed, 24 Mar 2021 02:30:51 +0100 Message-Id: <20210324013055.5619-21-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The dsa infrastructure provides a well-defined hierarchy of devices, pass up the call to set up the flow block to the master device. From the software dataplane, the netfilter infrastructure uses the dsa slave devices to refer to the input and output device for the given skbuff. Similarly, the flowtable definition in the ruleset refers to the dsa slave port devices. This patch adds the glue code to call ndo_setup_tc with TC_SETUP_FT with the master device via the dsa slave devices. Signed-off-by: Pablo Neira Ayuso --- v2: no changes. net/dsa/slave.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index df7d789236fe..d84162fe028a 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -1278,14 +1278,32 @@ static int dsa_slave_setup_tc_block(struct net_device *dev, } } +static int dsa_slave_setup_ft_block(struct dsa_switch *ds, int port, + void *type_data) +{ + struct dsa_port *cpu_dp = dsa_to_port(ds, port)->cpu_dp; + struct net_device *master = cpu_dp->master; + + if (!master->netdev_ops->ndo_setup_tc) + return -EOPNOTSUPP; + + return master->netdev_ops->ndo_setup_tc(master, TC_SETUP_FT, type_data); +} + static int dsa_slave_setup_tc(struct net_device *dev, enum tc_setup_type type, void *type_data) { struct dsa_port *dp = dsa_slave_to_port(dev); struct dsa_switch *ds = dp->ds; - if (type == TC_SETUP_BLOCK) + switch (type) { + case TC_SETUP_BLOCK: return dsa_slave_setup_tc_block(dev, type_data); + case TC_SETUP_FT: + return dsa_slave_setup_ft_block(ds, dp->index, type_data); + default: + break; + } if (!ds->ops->port_setup_tc) return -EOPNOTSUPP; From patchwork Wed Mar 24 01:30:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408672 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BA35C43603 for ; Wed, 24 Mar 2021 01:32:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 116D8619E3 for ; Wed, 24 Mar 2021 01:32:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234700AbhCXBcF (ORCPT ); Tue, 23 Mar 2021 21:32:05 -0400 Received: from mail.netfilter.org ([217.70.188.207]:60614 "EHLO mail.netfilter.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234527AbhCXBbX (ORCPT ); Tue, 23 Mar 2021 21:31:23 -0400 Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id 29252630C3; Wed, 24 Mar 2021 02:31:14 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 23/24] net: ethernet: mtk_eth_soc: add flow offloading support Date: Wed, 24 Mar 2021 02:30:54 +0100 Message-Id: <20210324013055.5619-24-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Felix Fietkau This adds support for offloading IPv4 routed flows, including SNAT/DNAT, one VLAN, PPPoE and DSA. Signed-off-by: Felix Fietkau Signed-off-by: Pablo Neira Ayuso --- v2: formerly, patch #22 now patch #23. drivers/net/ethernet/mediatek/Makefile | 2 +- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 5 + drivers/net/ethernet/mediatek/mtk_eth_soc.h | 10 +- .../net/ethernet/mediatek/mtk_ppe_offload.c | 485 ++++++++++++++++++ 4 files changed, 500 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/mediatek/mtk_ppe_offload.c diff --git a/drivers/net/ethernet/mediatek/Makefile b/drivers/net/ethernet/mediatek/Makefile index 871dc3e113e2..79d4cdbbcbf5 100644 --- a/drivers/net/ethernet/mediatek/Makefile +++ b/drivers/net/ethernet/mediatek/Makefile @@ -4,5 +4,5 @@ # obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o -mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o mtk_ppe.o mtk_ppe_debugfs.o +mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o mtk_ppe.o mtk_ppe_debugfs.o mtk_ppe_offload.o obj-$(CONFIG_NET_MEDIATEK_STAR_EMAC) += mtk_star_emac.o diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 67605b9a3916..0396f0db855f 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -2843,6 +2843,7 @@ static const struct net_device_ops mtk_netdev_ops = { #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller = mtk_poll_controller, #endif + .ndo_setup_tc = mtk_eth_setup_tc, }; static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np) @@ -3104,6 +3105,10 @@ static int mtk_probe(struct platform_device *pdev) eth->base + MTK_ETH_PPE_BASE, 2); if (err) goto err_free_dev; + + err = mtk_eth_offload_init(eth); + if (err) + goto err_free_dev; } for (i = 0; i < MTK_MAX_DEVS; i++) { diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h index 66ed537bc133..1a6750c08bb9 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h @@ -15,6 +15,7 @@ #include #include #include +#include #include "mtk_ppe.h" #define MTK_QDMA_PAGE_SIZE 2048 @@ -41,7 +42,8 @@ NETIF_F_HW_VLAN_CTAG_RX | \ NETIF_F_SG | NETIF_F_TSO | \ NETIF_F_TSO6 | \ - NETIF_F_IPV6_CSUM) + NETIF_F_IPV6_CSUM |\ + NETIF_F_HW_TC) #define MTK_HW_FEATURES_MT7628 (NETIF_F_SG | NETIF_F_RXCSUM) #define NEXT_DESP_IDX(X, Y) (((X) + 1) & ((Y) - 1)) @@ -914,6 +916,7 @@ struct mtk_eth { int ip_align; struct mtk_ppe ppe; + struct rhashtable flow_table; }; /* struct mtk_mac - the structure that holds the info about the MACs of the @@ -958,4 +961,9 @@ int mtk_gmac_sgmii_path_setup(struct mtk_eth *eth, int mac_id); int mtk_gmac_gephy_path_setup(struct mtk_eth *eth, int mac_id); int mtk_gmac_rgmii_path_setup(struct mtk_eth *eth, int mac_id); +int mtk_eth_offload_init(struct mtk_eth *eth); +int mtk_eth_setup_tc(struct net_device *dev, enum tc_setup_type type, + void *type_data); + + #endif /* MTK_ETH_H */ diff --git a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c new file mode 100644 index 000000000000..d0c46786571f --- /dev/null +++ b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c @@ -0,0 +1,485 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2020 Felix Fietkau + */ + +#include +#include +#include +#include +#include +#include +#include +#include "mtk_eth_soc.h" + +struct mtk_flow_data { + struct ethhdr eth; + + union { + struct { + __be32 src_addr; + __be32 dst_addr; + } v4; + }; + + __be16 src_port; + __be16 dst_port; + + struct { + u16 id; + __be16 proto; + u8 num; + } vlan; + struct { + u16 sid; + u8 num; + } pppoe; +}; + +struct mtk_flow_entry { + struct rhash_head node; + unsigned long cookie; + u16 hash; +}; + +static const struct rhashtable_params mtk_flow_ht_params = { + .head_offset = offsetof(struct mtk_flow_entry, node), + .head_offset = offsetof(struct mtk_flow_entry, cookie), + .key_len = sizeof(unsigned long), + .automatic_shrinking = true, +}; + +static u32 +mtk_eth_timestamp(struct mtk_eth *eth) +{ + return mtk_r32(eth, 0x0010) & MTK_FOE_IB1_BIND_TIMESTAMP; +} + +static int +mtk_flow_set_ipv4_addr(struct mtk_foe_entry *foe, struct mtk_flow_data *data, + bool egress) +{ + return mtk_foe_entry_set_ipv4_tuple(foe, egress, + data->v4.src_addr, data->src_port, + data->v4.dst_addr, data->dst_port); +} + +static void +mtk_flow_offload_mangle_eth(const struct flow_action_entry *act, void *eth) +{ + void *dest = eth + act->mangle.offset; + const void *src = &act->mangle.val; + + if (act->mangle.offset > 8) + return; + + if (act->mangle.mask == 0xffff) { + src += 2; + dest += 2; + } + + memcpy(dest, src, act->mangle.mask ? 2 : 4); +} + + +static int +mtk_flow_mangle_ports(const struct flow_action_entry *act, + struct mtk_flow_data *data) +{ + u32 val = ntohl(act->mangle.val); + + switch (act->mangle.offset) { + case 0: + if (act->mangle.mask == ~htonl(0xffff)) + data->dst_port = cpu_to_be16(val); + else + data->src_port = cpu_to_be16(val >> 16); + break; + case 2: + data->dst_port = cpu_to_be16(val); + break; + default: + return -EINVAL; + } + + return 0; +} + +static int +mtk_flow_mangle_ipv4(const struct flow_action_entry *act, + struct mtk_flow_data *data) +{ + __be32 *dest; + + switch (act->mangle.offset) { + case offsetof(struct iphdr, saddr): + dest = &data->v4.src_addr; + break; + case offsetof(struct iphdr, daddr): + dest = &data->v4.dst_addr; + break; + default: + return -EINVAL; + } + + memcpy(dest, &act->mangle.val, sizeof(u32)); + + return 0; +} + +static int +mtk_flow_get_dsa_port(struct net_device **dev) +{ +#if IS_ENABLED(CONFIG_NET_DSA) + struct dsa_port *dp; + + dp = dsa_port_from_netdev(*dev); + if (IS_ERR(dp)) + return -ENODEV; + + if (dp->cpu_dp->tag_ops->proto != DSA_TAG_PROTO_MTK) + return -ENODEV; + + *dev = dp->cpu_dp->master; + + return dp->index; +#else + return -ENODEV; +#endif +} + +static int +mtk_flow_set_output_device(struct mtk_eth *eth, struct mtk_foe_entry *foe, + struct net_device *dev) +{ + int pse_port, dsa_port; + + dsa_port = mtk_flow_get_dsa_port(&dev); + if (dsa_port >= 0) + mtk_foe_entry_set_dsa(foe, dsa_port); + + if (dev == eth->netdev[0]) + pse_port = 1; + else if (dev == eth->netdev[1]) + pse_port = 2; + else + return -EOPNOTSUPP; + + mtk_foe_entry_set_pse_port(foe, pse_port); + + return 0; +} + +static int +mtk_flow_offload_replace(struct mtk_eth *eth, struct flow_cls_offload *f) +{ + struct flow_rule *rule = flow_cls_offload_flow_rule(f); + struct flow_action_entry *act; + struct mtk_flow_data data = {}; + struct mtk_foe_entry foe; + struct net_device *odev = NULL; + struct mtk_flow_entry *entry; + int offload_type = 0; + u16 addr_type = 0; + u32 timestamp; + u8 l4proto = 0; + int err = 0; + int hash; + int i; + + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_META)) { + struct flow_match_meta match; + + flow_rule_match_meta(rule, &match); + } else { + return -EOPNOTSUPP; + } + + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_CONTROL)) { + struct flow_match_control match; + + flow_rule_match_control(rule, &match); + addr_type = match.key->addr_type; + } else { + return -EOPNOTSUPP; + } + + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_BASIC)) { + struct flow_match_basic match; + + flow_rule_match_basic(rule, &match); + l4proto = match.key->ip_proto; + } else { + return -EOPNOTSUPP; + } + + flow_action_for_each(i, act, &rule->action) { + switch (act->id) { + case FLOW_ACTION_MANGLE: + if (act->mangle.htype == FLOW_ACT_MANGLE_HDR_TYPE_ETH) + mtk_flow_offload_mangle_eth(act, &data.eth); + break; + case FLOW_ACTION_REDIRECT: + odev = act->dev; + break; + case FLOW_ACTION_CSUM: + break; + case FLOW_ACTION_VLAN_PUSH: + if (data.vlan.num == 1 || + act->vlan.proto != htons(ETH_P_8021Q)) + return -EOPNOTSUPP; + + data.vlan.id = act->vlan.vid; + data.vlan.proto = act->vlan.proto; + data.vlan.num++; + break; + case FLOW_ACTION_PPPOE_PUSH: + if (data.pppoe.num == 1) + return -EOPNOTSUPP; + + data.pppoe.sid = act->pppoe.sid; + data.pppoe.num++; + break; + default: + return -EOPNOTSUPP; + } + } + + switch (addr_type) { + case FLOW_DISSECTOR_KEY_IPV4_ADDRS: + offload_type = MTK_PPE_PKT_TYPE_IPV4_HNAPT; + break; + default: + return -EOPNOTSUPP; + } + + if (!is_valid_ether_addr(data.eth.h_source) || + !is_valid_ether_addr(data.eth.h_dest)) + return -EINVAL; + + err = mtk_foe_entry_prepare(&foe, offload_type, l4proto, 0, + data.eth.h_source, + data.eth.h_dest); + if (err) + return err; + + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_PORTS)) { + struct flow_match_ports ports; + + flow_rule_match_ports(rule, &ports); + data.src_port = ports.key->src; + data.dst_port = ports.key->dst; + } else { + return -EOPNOTSUPP; + } + + if (addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) { + struct flow_match_ipv4_addrs addrs; + + flow_rule_match_ipv4_addrs(rule, &addrs); + + data.v4.src_addr = addrs.key->src; + data.v4.dst_addr = addrs.key->dst; + + mtk_flow_set_ipv4_addr(&foe, &data, false); + } + + flow_action_for_each(i, act, &rule->action) { + if (act->id != FLOW_ACTION_MANGLE) + continue; + + switch (act->mangle.htype) { + case FLOW_ACT_MANGLE_HDR_TYPE_TCP: + case FLOW_ACT_MANGLE_HDR_TYPE_UDP: + err = mtk_flow_mangle_ports(act, &data); + break; + case FLOW_ACT_MANGLE_HDR_TYPE_IP4: + err = mtk_flow_mangle_ipv4(act, &data); + break; + case FLOW_ACT_MANGLE_HDR_TYPE_ETH: + /* handled earlier */ + break; + default: + return -EOPNOTSUPP; + } + + if (err) + return err; + } + + if (addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) { + err = mtk_flow_set_ipv4_addr(&foe, &data, true); + if (err) + return err; + } + + if (data.vlan.num == 1) { + if (data.vlan.proto != htons(ETH_P_8021Q)) + return -EOPNOTSUPP; + + mtk_foe_entry_set_vlan(&foe, data.vlan.id); + } + if (data.pppoe.num == 1) + mtk_foe_entry_set_pppoe(&foe, data.pppoe.sid); + + err = mtk_flow_set_output_device(eth, &foe, odev); + if (err) + return err; + + entry = kzalloc(sizeof(*entry), GFP_KERNEL); + if (!entry) + return -ENOMEM; + + entry->cookie = f->cookie; + timestamp = mtk_eth_timestamp(eth); + hash = mtk_foe_entry_commit(ð->ppe, &foe, timestamp); + if (hash < 0) { + err = hash; + goto free; + } + + entry->hash = hash; + err = rhashtable_insert_fast(ð->flow_table, &entry->node, + mtk_flow_ht_params); + if (err < 0) + goto clear_flow; + + return 0; +clear_flow: + mtk_foe_entry_clear(ð->ppe, hash); +free: + kfree(entry); + return err; +} + +static int +mtk_flow_offload_destroy(struct mtk_eth *eth, struct flow_cls_offload *f) +{ + struct mtk_flow_entry *entry; + + entry = rhashtable_lookup(ð->flow_table, &f->cookie, + mtk_flow_ht_params); + if (!entry) + return -ENOENT; + + mtk_foe_entry_clear(ð->ppe, entry->hash); + rhashtable_remove_fast(ð->flow_table, &entry->node, + mtk_flow_ht_params); + kfree(entry); + + return 0; +} + +static int +mtk_flow_offload_stats(struct mtk_eth *eth, struct flow_cls_offload *f) +{ + struct mtk_flow_entry *entry; + int timestamp; + u32 idle; + + entry = rhashtable_lookup(ð->flow_table, &f->cookie, + mtk_flow_ht_params); + if (!entry) + return -ENOENT; + + timestamp = mtk_foe_entry_timestamp(ð->ppe, entry->hash); + if (timestamp < 0) + return -ETIMEDOUT; + + idle = mtk_eth_timestamp(eth) - timestamp; + f->stats.lastused = jiffies - idle * HZ; + + return 0; +} + +static int +mtk_eth_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_priv) +{ + struct flow_cls_offload *cls = type_data; + struct net_device *dev = cb_priv; + struct mtk_mac *mac = netdev_priv(dev); + struct mtk_eth *eth = mac->hw; + + if (!tc_can_offload(dev)) + return -EOPNOTSUPP; + + if (type != TC_SETUP_CLSFLOWER) + return -EOPNOTSUPP; + + switch (cls->command) { + case FLOW_CLS_REPLACE: + return mtk_flow_offload_replace(eth, cls); + case FLOW_CLS_DESTROY: + return mtk_flow_offload_destroy(eth, cls); + case FLOW_CLS_STATS: + return mtk_flow_offload_stats(eth, cls); + default: + return -EOPNOTSUPP; + } + + return 0; +} + +static int +mtk_eth_setup_tc_block(struct net_device *dev, struct flow_block_offload *f) +{ + struct mtk_mac *mac = netdev_priv(dev); + struct mtk_eth *eth = mac->hw; + static LIST_HEAD(block_cb_list); + struct flow_block_cb *block_cb; + flow_setup_cb_t *cb; + + if (!eth->ppe.foe_table) + return -EOPNOTSUPP; + + if (f->binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS) + return -EOPNOTSUPP; + + cb = mtk_eth_setup_tc_block_cb; + f->driver_block_list = &block_cb_list; + + switch (f->command) { + case FLOW_BLOCK_BIND: + block_cb = flow_block_cb_lookup(f->block, cb, dev); + if (block_cb) { + flow_block_cb_incref(block_cb); + return 0; + } + block_cb = flow_block_cb_alloc(cb, dev, dev, NULL); + if (IS_ERR(block_cb)) + return PTR_ERR(block_cb); + + flow_block_cb_add(block_cb, f); + list_add_tail(&block_cb->driver_list, &block_cb_list); + return 0; + case FLOW_BLOCK_UNBIND: + block_cb = flow_block_cb_lookup(f->block, cb, dev); + if (!block_cb) + return -ENOENT; + + if (flow_block_cb_decref(block_cb)) { + flow_block_cb_remove(block_cb, f); + list_del(&block_cb->driver_list); + } + return 0; + default: + return -EOPNOTSUPP; + } +} + +int mtk_eth_setup_tc(struct net_device *dev, enum tc_setup_type type, + void *type_data) +{ + if (type == TC_SETUP_FT) + return mtk_eth_setup_tc_block(dev, type_data); + + return -EOPNOTSUPP; +} + +int mtk_eth_offload_init(struct mtk_eth *eth) +{ + if (!eth->ppe.foe_table) + return 0; + + return rhashtable_init(ð->flow_table, &mtk_flow_ht_params); +} From patchwork Wed Mar 24 01:30:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 408673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 427B7C43600 for ; Wed, 24 Mar 2021 01:32:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 28054619EF for ; Wed, 24 Mar 2021 01:32:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234705AbhCXBcG (ORCPT ); Tue, 23 Mar 2021 21:32:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234528AbhCXBbX (ORCPT ); Tue, 23 Mar 2021 21:31:23 -0400 Received: from mail.netfilter.org (mail.netfilter.org [IPv6:2001:4b98:dc0:41:216:3eff:fe8c:2bda]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 09DC3C061763; Tue, 23 Mar 2021 18:31:23 -0700 (PDT) Received: from localhost.localdomain (unknown [90.77.255.23]) by mail.netfilter.org (Postfix) with ESMTPSA id AEA41630BB; Wed, 24 Mar 2021 02:31:14 +0100 (CET) From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, kuba@kernel.org Subject: [PATCH net-next, v2 24/24] docs: nf_flowtable: update documentation with enhancements Date: Wed, 24 Mar 2021 02:30:55 +0100 Message-Id: <20210324013055.5619-25-pablo@netfilter.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210324013055.5619-1-pablo@netfilter.org> References: <20210324013055.5619-1-pablo@netfilter.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch updates the flowtable documentation to describe recent enhancements: - Offload action is available after the first packets go through the classic forwarding path. - IPv4 and IPv6 are supported. Only TCP and UDP layer 4 are supported at this stage. - Tuple has been augmented to track VLAN id and PPPoE session id. - Bridge and IP forwarding integration, including bridge VLAN filtering support. - Hardware offload support. - Describe the [OFFLOAD] and [HW_OFFLOAD] tags in the conntrack table listing. - Replace 'flow offload' by 'flow add' in example rulesets (preferred syntax). - Describe existing cache limitations. Signed-off-by: Pablo Neira Ayuso --- v2: not coming in v1. Update documentation including existing limitations. Documentation/networking/nf_flowtable.rst | 170 ++++++++++++++++++---- 1 file changed, 143 insertions(+), 27 deletions(-) diff --git a/Documentation/networking/nf_flowtable.rst b/Documentation/networking/nf_flowtable.rst index 6cdf9a1724b6..d87f253b9d39 100644 --- a/Documentation/networking/nf_flowtable.rst +++ b/Documentation/networking/nf_flowtable.rst @@ -4,35 +4,38 @@ Netfilter's flowtable infrastructure ==================================== -This documentation describes the software flowtable infrastructure available in -Netfilter since Linux kernel 4.16. +This documentation describes the Netfilter flowtable infrastructure which allows +you to define a fastpath through the flowtable datapath. This infrastructure +also provides hardware offload support. The flowtable supports for the layer 3 +IPv4 and IPv6 and the layer 4 TCP and UDP protocols. Overview -------- -Initial packets follow the classic forwarding path, once the flow enters the -established state according to the conntrack semantics (ie. we have seen traffic -in both directions), then you can decide to offload the flow to the flowtable -from the forward chain via the 'flow offload' action available in nftables. +Once the first packet of the flow successfully goes through the IP forwarding +path, from the second packet on, you might decide to offload the flow to the +flowtable through your ruleset. The flowtable infrastructure provides a rule +action that allows you to specify when to add a flow to the flowtable. -Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the -output netdevice via neigh_xmit(), hence, they bypass the classic forwarding -path (the visible effect is that you do not see these packets from any of the -netfilter hooks coming after the ingress). In case of flowtable miss, the packet -follows the classic forward path. +A packet that finds a matching entry in the flowtable (ie. flowtable hit) is +transmitted to the output netdevice via neigh_xmit(), hence, packets bypass the +classic IP forwarding path (the visible effect is that you do not see these +packets from any of the Netfilter hooks coming after ingress). In case that +there is no matching entry in the flowtable (ie. flowtable miss), the packet +follows the classic IP forwarding path. -The flowtable uses a resizable hashtable, lookups are based on the following -7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source -and destination ports and the input interface (useful in case there are several -conntrack zones in place). +The flowtable uses a resizable hashtable. Lookups are based on the following +n-tuple selectors: layer 2 protocol encapsulation (VLAN and PPPoE), layer 3 +source and destination, layer 4 source and destination ports and the input +interface (useful in case there are several conntrack zones in place). -Flowtables are populated via the 'flow offload' nftables action, so the user can -selectively specify what flows are placed into the flow table. Hence, packets -follow the classic forwarding path unless the user explicitly instruct packets -to use this new alternative forwarding path via nftables policy. +The 'flow add' action allows you to populate the flowtable, the user selectively +specifies what flows are placed into the flowtable. Hence, packets follow the +classic IP forwarding path unless the user explicitly instruct flows to use this +new alternative forwarding path via policy. -This is represented in Fig.1, which describes the classic forwarding path -including the Netfilter hooks and the flowtable fastpath bypass. +The flowtable datapath is represented in Fig.1, which describes the classic IP +forwarding path including the Netfilter hooks and the flowtable fastpath bypass. :: @@ -67,11 +70,13 @@ including the Netfilter hooks and the flowtable fastpath bypass. Fig.1 Netfilter hooks and flowtable interactions The flowtable entry also stores the NAT configuration, so all packets are -mangled according to the NAT policy that matches the initial packets that went -through the classic forwarding path. The TTL is decremented before calling -neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding -path given that the transport selectors are missing, therefore flowtable lookup -is not possible. +mangled according to the NAT policy that is specified from the classic IP +forwarding path. The TTL is decremented before calling neigh_xmit(). Fragmented +traffic is passed up to follow the classic IP forwarding path given that the +transport header is missing, in this case, flowtable lookups are not possible. +TCP RST and FIN packets are also passed up to the classic IP forwarding path to +release the flow gracefully. Packets that exceed the MTU are also passed up to +the classic forwarding path to report packet-too-big ICMP errors to the sender. Example configuration --------------------- @@ -85,7 +90,7 @@ flowtable and add one rule to your forward chain:: } chain y { type filter hook forward priority 0; policy accept; - ip protocol tcp flow offload @f + ip protocol tcp flow add @f counter packets 0 bytes 0 } } @@ -103,6 +108,117 @@ flow is offloaded, you will observe that the counter rule in the example above does not get updated for the packets that are being forwarded through the forwarding bypass. +You can identify offloaded flows through the [OFFLOAD] tag when listing your +connection tracking table. + +:: + # conntrack -L + tcp 6 src=10.141.10.2 dst=192.168.10.2 sport=52728 dport=5201 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=52728 [OFFLOAD] mark=0 use=2 + + +Layer 2 encapsulation +--------------------- + +Since Linux kernel 5.13, the flowtable infrastructure discovers the real +netdevice behind VLAN and PPPoE netdevices. The flowtable software datapath +parses the VLAN and PPPoE layer 2 headers to extract the ethertype and the +VLAN ID / PPPoE session ID which are used for the flowtable lookups. The +flowtable datapath also deals with layer 2 decapsulation. + +You do not need to add the PPPoE and the VLAN devices to your flowtable, +instead the real device is sufficient for the flowtable to track your flows. + +Bridge and IP forwarding +------------------------ + +Since Linux kernel 5.13, you can add bridge ports to the flowtable. The +flowtable infrastructure discovers the topology behind the bridge device. This +allows the flowtable to define a fastpath bypass between the bridge ports +(represented as eth1 and eth2 in the example figure below) and the gateway +device (represented as eth0) in your switch/router. + +:: + fastpath bypass + .-------------------------. + / \ + | IP forwarding | + | / \ \/ + | br0 eth0 ..... eth0 + . / \ *host B* + -> eth1 eth2 + . *switch/router* + . + . + eth0 + *host A* + +The flowtable infrastructure also supports for bridge VLAN filtering actions +such as PVID and untagged. You can also stack a classic VLAN device on top of +your bridge port. + +If you would like that your flowtable defines a fastpath between your bridge +ports and your IP forwarding path, you have to add your bridge ports (as +represented by the real netdevice) to your flowtable definition. + +Counters +-------- + +The flowtable can synchronize packet and byte counters with the existing +connection tracking entry by specifying the counter statement in your flowtable +definition, e.g. + +:: + table inet x { + flowtable f { + hook ingress priority 0; devices = { eth0, eth1 }; + counter + } + ... + } + +Counter support is available since Linux kernel 5.7. + +Hardware offload +---------------- + +If your network device provides hardware offload support, you can turn it on by +means of the 'offload' flag in your flowtable definition, e.g. + +:: + table inet x { + flowtable f { + hook ingress priority 0; devices = { eth0, eth1 }; + flags offload; + } + ... + } + +There is a workqueue that adds the flows to the hardware. Note that a few +packets might still run over the flowtable software path until the workqueue has +a chance to offload the flow to the network device. + +You can identify hardware offloaded flows through the [HW_OFFLOAD] tag when +listing your connection tracking table. Please, note that the [OFFLOAD] tag +refers to the software offload mode, so there is a distinction between [OFFLOAD] +which refers to the software flowtable fastpath and [HW_OFFLOAD] which refers +to the hardware offload datapath being used by the flow. + +The flowtable hardware offload infrastructure also supports for the DSA +(Distributed Switch Architecture). + +Limitations +----------- + +The flowtable behaves like a cache. The flowtable entries might get stale if +either the destination MAC address or the egress netdevice that is used for +transmission changes. + +This might be a problem if: + +- You run the flowtable in software mode and you combine bridge and IP + forwarding in your setup. +- Hardware offload is enabled. + More reading ------------