[nf-next,v3,3/3] netfilter: Introduce egress hook

Commit e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets on ingress.

Support the same on egress.  This allows filtering locally generated
traffic such as DHCP, or outbound AF_PACKETs in general.  It will also
allow introducing in-kernel NAT64 and NAT46.  A patch for nftables to
hook up egress rules from user space has been submitted separately.

Position the hook immediately before a packet is handed to traffic
control and then sent out on an interface, thereby mirroring the ingress
order.  This order allows marking packets in the netfilter egress hook
and subsequently using the mark in tc.  Another benefit of this order is
consistency with a lot of existing documentation which says that egress
tc is performed after netfilter hooks.

To avoid a performance degradation in the default case (with neither
netfilter nor traffic control used), Daniel Borkmann suggests "a single
static_key which wraps an empty function call entry which can then be
patched by the kernel at runtime. Inside that trampoline we can still
keep the ordering [between netfilter and traffic control] intact":

https://lore.kernel.org/netdev/20200318123315.GI979@breakpoint.cc/

To this end, introduce nf_sch_egress() which is dynamically patched into
__dev_queue_xmit(), contingent on egress_needed_key.  Inside that
function, nf_egress() and sch_handle_egress() is called, each contingent
on its own separate static_key.

nf_sch_egress() is declared noinline per Florian Westphal's suggestion.
This change alone causes a speedup if neither netfilter nor traffic
control is used, apparently because it reduces instruction cache
pressure.  The same effect was previously observed by Eric Dumazet for
the ingress path:

https://lore.kernel.org/netdev/1431387038.566.47.camel@edumazet-glaptop2.roam.corp.google.com/

Overall, performance improves with this commit if neither netfilter nor
traffic control is used. However it degrades a little if only traffic
control is used, due to the "noinline", the additional outer static key
and the added netfilter code:

* Before:       4730418pps 2270Mb/sec (2270600640bps)
* After:        4759206pps 2284Mb/sec (2284418880bps)

* Before + tc:  4063912pps 1950Mb/sec (1950677760bps)
* After  + tc:  4007728pps 1923Mb/sec (1923709440bps)

* After  + nft: 3714546pps 1782Mb/sec (1782982080bps)

Measured on a bare-metal Core i7-3615QM.

Commands to perform a measurement:
ip link add dev foo type dummy
ip link set dev foo up
modprobe pktgen
echo "add_device foo" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1

Commands to enable egress traffic control:
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'

Commands to enable egress netfilter:
nft add table netdev t
nft add chain netdev t co \{ type filter hook egress device foo priority 0 \; \}
nft add rule netdev t co ip daddr 4.3.2.1/32 drop

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 include/linux/netdevice.h        |  8 +++++
 include/linux/netfilter_netdev.h | 27 +++++++++++++++++
 include/linux/rtnetlink.h        |  2 +-
 include/uapi/linux/netfilter.h   |  1 +
 net/core/dev.c                   | 52 ++++++++++++++++++++++++++++----
 net/netfilter/Kconfig            |  8 +++++
 net/netfilter/core.c             | 24 ++++++++++++---
 net/netfilter/nft_chain_filter.c |  4 ++-
 net/sched/Kconfig                |  3 ++
 9 files changed, 117 insertions(+), 12 deletions(-)

Message ID	d2256c451876583bbbf8f0e82a5a43ce35c5cf2f.1598517740.git.lukas@wunner.de
State	New
Headers	show Return-Path: <SRS0=PnHV=CF=vger.kernel.org=netdev-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1868DC433DF for <netdev@archiver.kernel.org>; Thu, 27 Aug 2020 09:09:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D547520738 for <netdev@archiver.kernel.org>; Thu, 27 Aug 2020 09:09:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728222AbgH0JJ5 (ORCPT <rfc822;netdev@archiver.kernel.org>); Thu, 27 Aug 2020 05:09:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36094 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726882AbgH0JJ4 (ORCPT <rfc822;netdev@vger.kernel.org>); Thu, 27 Aug 2020 05:09:56 -0400 X-Greylist: delayed 886 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 27 Aug 2020 02:09:56 PDT Received: from mailout1.hostsharing.net (mailout1.hostsharing.net [IPv6:2a01:37:1000::53df:5fcc:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47123C061264; Thu, 27 Aug 2020 02:09:55 -0700 (PDT) Received: from h08.hostsharing.net (h08.hostsharing.net [IPv6:2a01:37:1000::53df:5f1c:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by mailout1.hostsharing.net (Postfix) with ESMTPS id 90D11101933F9; Thu, 27 Aug 2020 11:09:54 +0200 (CEST) Received: from localhost (pd95be530.dip0.t-ipconnect.de [217.91.229.48]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by h08.hostsharing.net (Postfix) with ESMTPSA id 461456106EDD; Thu, 27 Aug 2020 11:09:54 +0200 (CEST) X-Mailbox-Line: From d2256c451876583bbbf8f0e82a5a43ce35c5cf2f Mon Sep 17 00:00:00 2001 Message-Id: <d2256c451876583bbbf8f0e82a5a43ce35c5cf2f.1598517740.git.lukas@wunner.de> In-Reply-To: <cover.1598517739.git.lukas@wunner.de> References: <cover.1598517739.git.lukas@wunner.de> From: Lukas Wunner <lukas@wunner.de> Date: Thu, 27 Aug 2020 10:55:03 +0200 Subject: [PATCH nf-next v3 3/3] netfilter: Introduce egress hook To: "Pablo Neira Ayuso" <pablo@netfilter.org>, Jozsef Kadlecsik <kadlec@netfilter.org>, Florian Westphal <fw@strlen.de> Cc: netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>, Alexei Starovoitov <ast@kernel.org>, Eric Dumazet <edumazet@google.com>, Thomas Graf <tgraf@suug.ch>, Laura Garcia <nevola@gmail.com>, David Miller <davem@davemloft.net> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org
Series	None \| expand [nf-next,v3,2/3] netfilter: Generalize ingress hook [nf-next,v3,3/3] netfilter: Introduce egress hook

[nf-next,v3,3/3] netfilter: Introduce egress hook

Commit Message

Comments

Patch