[bpf-next,v2,00/13] bpfilter

Message ID	20210829183608.2297877-1-me@ubique.spb.ru
Headers	show Return-Path: <netdev-owner@kernel.org> From: Dmitrii Banshchikov <me@ubique.spb.ru> To: bpf@vger.kernel.org Cc: Dmitrii Banshchikov <me@ubique.spb.ru>, ast@kernel.org, davem@davemloft.net, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, netdev@vger.kernel.org, rdna@fb.com Subject: [PATCH bpf-next v2 00/13] bpfilter Date: Sun, 29 Aug 2021 22:35:55 +0400 Message-Id: <20210829183608.2297877-1-me@ubique.spb.ru> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bpfilter \| expand [bpf-next,v2,00/13] bpfilter [bpf-next,v2,01/13] bpfilter: Add types for usermode helper [bpf-next,v2,04/13] bpfilter: Add map container [bpf-next,v2,05/13] bpfilter: Add codegen infrastructure [bpf-next,v2,08/13] bpfilter: Add struct rule [bpf-next,v2,11/13] bpfilter: Add handling of setsockopt() calls [bpf-next,v2,13/13] bpfilter: Handle setsockopts

Dmitrii Banshchikov Aug. 29, 2021, 6:35 p.m. UTC

The patchset is based on the patches from David S. Miller [1] and
Daniel Borkmann [2].

The main goal of the patchset is to prepare bpfilter for
iptables' configuration blob parsing and code generation.

The patchset introduces data structures and code for matches,
targets, rules and tables. Beside that the code generation
is introduced.

The first version of the code generation supports only "inline"
mode - all chains and their rules emit instructions in linear
approach. The plan for the code generation is to introduce a
bpf_map_for_each subprogram that will handle all rules that
aren't generated in inline mode due to verifier's limit. This
shall allow to handle arbitrary large rule sets.

Things that are not implemented yet:
  1) The process of switching from the previous BPF programs to the
     new set isn't atomic.
  2) The code generation for FORWARD chain isn't supported
  3) Counters setsockopts() are not handled
  4) No support of device ifindex - it's hardcoded
  5) No helper subprog for counters update

Another problem is using iptables' blobs for tests and filter
table initialization. While it saves lines something more
maintainable should be done here.

The plan for the next iteration:
  1) Handle large rule sets via bpf_map_for_each
  2) Add a helper program for counters update
  3) Handle iptables' counters setsockopts()
  4) Handle ifindex
  5) Add TCP match

Patch 1 adds definitions of the used types.
Patch 2 adds logging to bpfilter.
Patch 3 adds bpfilter header to tools
Patch 4 adds an associative map.
Patch 5 adds code generation basis
Patches 6/7/8/9 add code for matches, targets, rules and table.
Patch 10 adds code generation for table
Patch 11 handles hooked setsockopt(2) calls.
Patch 12 adds filter table
Patch 13 uses prepared code in main().

And here are some performance tests.

The environment consists of two machines(sender and receiver)
connected with 10Gbps link via switch.  The sender uses DPDK to
simulate QUIC packets(89 bytes long) from random IP. The switch
measures the generated traffic to be about 7066377568 bits/sec,
9706553 packets/sec.

The receiver is a 2 socket 2680v3 + HT and uses either iptables,
nft or bpfilter to filter out UDP traffic.

Two tests were made. Two rulesets(default policy was to ACCEPT)
were used in each test:

```
iptables -A INPUT -p udp -m udp --dport 1500 -j DROP
```
and
```
iptables -A INPUT -s 1.1.1.1/32 -p udp -m udp --dport 1000 -j DROP
iptables -A INPUT -s 2.2.2.2/32 -p udp -m udp --dport 2000 -j DROP
...
iptables -A INPUT -s 31.31.31.31/32 -p udp -m udp --dport 31000 -j DROP
iptables -A INPUT -p udp -m udp --dport 1500 -j DROP
```

The first test measures performance of the receiver via stress-ng
[3] in bogo-ops. The upper-bound(there are no firewall and no
traffic) value for bogo-ops is 8148-8210. The lower bound value
(there is traffic but no firewall) is 6567-6643.
The stress-ng command used: stress-ng -t60 -c48 --metrics-brief.

The second test measures the number the of dropped packets. The
receiver keeps only 1 CPU online and disables all
others(maxcpus=1 and set number of cores per socket to 1 in
BIOS). The number of the dropped packets is collected via
iptables-legacy -nvL, iptables -nvL and bpftool map dump id.

Test 1: bogo-ops(the more the better)
            iptables            nft        bpfilter
  1 rule:  6474-6554      6483-6515       7996-8008
32 rules:  6374-6433      5761-5804       7997-8042


Test 2: number of dropped packets(the more the better)
            iptables            nft         bpfilter
  1 rule:  234M-241M           220M            900M+
32 rules:  186M-196M        97M-98M            900M+


Please let me know if you see a gap in the testing environment.

v1 -> v2
Maps:
  * Use map_upsert instead of separate map_insert and map_update
Matches:
  * Add a new virtual call - gen_inline. The call is used for
  * inline generating of a rule's match.
Targets:
  * Add a new virtual call - gen_inline. The call is used for inline
    generating of a rule's target.
Rules:
  * Add code generation for rules
Table:
  * Add struct table_ops
  * Add map for table_ops
  * Add filter table
  * Reorganize the way filter table is initialized
Sockopts:
  * Install/uninstall BPF programs while handling
    IPT_SO_SET_REPLACE
Code generation:
  * Add first version of the code generation
Dependencies:
  * Add libbpf

v0 -> v1
IO:
  * Use ssize_t in pvm_read, pvm_write for total_bytes
  * Move IO functions into sockopt.c and main.c
Logging:
  * Use LOGLEVEL_EMERG, LOGLEVEL_NOTICE, LOGLEVE_DEBUG
    while logging to /dev/kmsg
  * Prepend log message with <n> where n is log level
  * Conditionally enable BFLOG_DEBUG messages
  * Merge bflog.{h,c} into context.h
Matches:
  * Reorder fields in struct match_ops for tight packing
  * Get rid of struct match_ops_map
  * Rename udp_match_ops to xt_udp
  * Use XT_ALIGN macro
  * Store payload size in match size
  * Move udp match routines into a separate file
Targets:
  * Reorder fields in struct target_ops for tight packing
  * Get rid of struct target_ops_map
  * Add comments for convert_verdict function
Rules:
  * Add validation
Tables:
  * Combine table_map and table_list into table_index
  * Add validation
Sockopts:
  * Handle IPT_SO_GET_REVISION_TARGET

1. https://lore.kernel.org/patchwork/patch/902785/
2. https://lore.kernel.org/patchwork/patch/902783/
3. https://kernel.ubuntu.com/~cking/stress-ng/stress-ng.pdf

Dmitrii Banshchikov (13):
  bpfilter: Add types for usermode helper
  bpfilter: Add logging facility
  tools: Add bpfilter usermode helper header
  bpfilter: Add map container
  bpfilter: Add codegen infrastructure
  bpfilter: Add struct match
  bpfilter: Add struct target
  bpfilter: Add struct rule
  bpfilter: Add struct table
  bpfilter: Add table codegen
  bpfilter: Add handling of setsockopt() calls
  bpfilter: Add filter table
  bpfilter: Handle setsockopts

 include/uapi/linux/bpfilter.h                 | 154 +++
 net/bpfilter/Makefile                         |  16 +-
 net/bpfilter/codegen.c                        | 903 ++++++++++++++++++
 net/bpfilter/codegen.h                        | 189 ++++
 net/bpfilter/context.c                        | 138 +++
 net/bpfilter/context.h                        |  47 +
 net/bpfilter/filter-table.c                   | 246 +++++
 net/bpfilter/filter-table.h                   |  17 +
 net/bpfilter/main.c                           | 126 ++-
 net/bpfilter/map-common.c                     |  50 +
 net/bpfilter/map-common.h                     |  18 +
 net/bpfilter/match.c                          |  49 +
 net/bpfilter/match.h                          |  36 +
 net/bpfilter/rule.c                           | 239 +++++
 net/bpfilter/rule.h                           |  34 +
 net/bpfilter/sockopt.c                        | 441 +++++++++
 net/bpfilter/sockopt.h                        |  14 +
 net/bpfilter/table.c                          | 346 +++++++
 net/bpfilter/table.h                          |  54 ++
 net/bpfilter/target.c                         | 184 ++++
 net/bpfilter/target.h                         |  52 +
 net/bpfilter/xt_udp.c                         |  96 ++
 tools/include/uapi/linux/bpfilter.h           | 178 ++++
 .../testing/selftests/bpf/bpfilter/.gitignore |   8 +
 tools/testing/selftests/bpf/bpfilter/Makefile |  59 ++
 .../selftests/bpf/bpfilter/bpfilter_util.h    |  79 ++
 .../selftests/bpf/bpfilter/test_codegen.c     | 293 ++++++
 .../testing/selftests/bpf/bpfilter/test_map.c |  63 ++
 .../selftests/bpf/bpfilter/test_match.c       |  61 ++
 .../selftests/bpf/bpfilter/test_rule.c        |  55 ++
 .../selftests/bpf/bpfilter/test_target.c      |  85 ++
 .../selftests/bpf/bpfilter/test_xt_udp.c      |  41 +
 32 files changed, 4327 insertions(+), 44 deletions(-)
 create mode 100644 net/bpfilter/codegen.c
 create mode 100644 net/bpfilter/codegen.h
 create mode 100644 net/bpfilter/context.c
 create mode 100644 net/bpfilter/context.h
 create mode 100644 net/bpfilter/filter-table.c
 create mode 100644 net/bpfilter/filter-table.h
 create mode 100644 net/bpfilter/map-common.c
 create mode 100644 net/bpfilter/map-common.h
 create mode 100644 net/bpfilter/match.c
 create mode 100644 net/bpfilter/match.h
 create mode 100644 net/bpfilter/rule.c
 create mode 100644 net/bpfilter/rule.h
 create mode 100644 net/bpfilter/sockopt.c
 create mode 100644 net/bpfilter/sockopt.h
 create mode 100644 net/bpfilter/table.c
 create mode 100644 net/bpfilter/table.h
 create mode 100644 net/bpfilter/target.c
 create mode 100644 net/bpfilter/target.h
 create mode 100644 net/bpfilter/xt_udp.c
 create mode 100644 tools/include/uapi/linux/bpfilter.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/.gitignore
 create mode 100644 tools/testing/selftests/bpf/bpfilter/Makefile
 create mode 100644 tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_codegen.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_map.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_match.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_rule.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_target.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_xt_udp.c

Alexei Starovoitov Aug. 30, 2021, 7:45 p.m. UTC | #1

On Sun, Aug 29, 2021 at 10:36:07PM +0400, Dmitrii Banshchikov wrote:
>  /*

> - * # Generated by iptables-save v1.8.2 on Sat May  8 05:22:41 2021

> + *  Generated by iptables-save v1.8.2 on Sat May  8 05:22:41 2021

>   * *filter

...
> - * -A LOCAL -s 10.32.0.0/11 -j FROMDC

> - * -A LOCAL -s 10.144.0.0/12 -j FROMDC

> - * -A LOCAL -s 10.160.0.0/12 -j FROMDC

> - * -A LOCAL -s 10.0.0.0/12 -j FROMDC

> - * -A LOCAL -s 10.248.0.0/24 -j FROMDC

> - * -A LOCAL -s 10.232.0.0/16 -j FROMDC

> - * -A LOCAL -s 10.1.146.131/32 -p udp -m udp --dport 161 -j ACCEPT

> - * -A LOCAL -s 10.149.118.14/32 -p udp -m udp --dport 161 -j ACCEPT

> - * -A LOCAL -p icmp -j ACCEPT

> + * :INPUT ACCEPT [0:0]

> + * :FORWARD ACCEPT [0:0]

> + * :OUTPUT ACCEPT [0:0]

> + * -A INPUT -s 1.1.1.1/32 -d 2.2.2.2/32 -j DROP

> + * -A INPUT -s 2.2.0.0/16 -d 3.0.0.0/8 -j DROP

> + * -A INPUT -p udp -m udp --sport 100 --dport 500 -j DROP

>   * COMMIT

>   */


Patch 10 adds this test, but then patch 12 removes most of it?
Keep both?

Also hit this on my system with older glibc:

../net/bpfilter/codegen.c: In function ‘codegen_push_subprog’:
../net/bpfilter/codegen.c:67:4: warning: implicit declaration of function ‘reallocarray’ [-Wimplicit-function-declaration]
   67 |    reallocarray(codegen->subprogs, subprogs_max, sizeof(codegen->subprogs[0]));
      |    ^~~~~~~~~~~~
../net/bpfilter/codegen.c:66:12: warning: assignment to ‘struct codegen_subprog_desc **’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
   66 |   subprogs =
      |            ^

In libbpf we have libbpf_reallocarray() for this reason.

Could you provide an example of generated bpf program?
And maybe add Documentation/bpf/bpfilter_design.rst ?

The tests don't build for me:
$ cd selftests/bpf/bpfilter; make
make: *** No rule to make target '-lelf', needed by '.../selftests/bpf/bpfilter/test_match'.  Stop.

The unit tests are great, btw. test_codegen is not end-to-end, right?
Could you add a full test with iptable command line?
or netns support is a prerequisite for it?

Dmitrii Banshchikov Aug. 30, 2021, 8:54 p.m. UTC | #2

On Mon, Aug 30, 2021 at 12:45:45PM -0700, Alexei Starovoitov wrote:
> On Sun, Aug 29, 2021 at 10:36:07PM +0400, Dmitrii Banshchikov wrote:

> >  /*

> > - * # Generated by iptables-save v1.8.2 on Sat May  8 05:22:41 2021

> > + *  Generated by iptables-save v1.8.2 on Sat May  8 05:22:41 2021

> >   * *filter

> ...

> > - * -A LOCAL -s 10.32.0.0/11 -j FROMDC

> > - * -A LOCAL -s 10.144.0.0/12 -j FROMDC

> > - * -A LOCAL -s 10.160.0.0/12 -j FROMDC

> > - * -A LOCAL -s 10.0.0.0/12 -j FROMDC

> > - * -A LOCAL -s 10.248.0.0/24 -j FROMDC

> > - * -A LOCAL -s 10.232.0.0/16 -j FROMDC

> > - * -A LOCAL -s 10.1.146.131/32 -p udp -m udp --dport 161 -j ACCEPT

> > - * -A LOCAL -s 10.149.118.14/32 -p udp -m udp --dport 161 -j ACCEPT

> > - * -A LOCAL -p icmp -j ACCEPT

> > + * :INPUT ACCEPT [0:0]

> > + * :FORWARD ACCEPT [0:0]

> > + * :OUTPUT ACCEPT [0:0]

> > + * -A INPUT -s 1.1.1.1/32 -d 2.2.2.2/32 -j DROP

> > + * -A INPUT -s 2.2.0.0/16 -d 3.0.0.0/8 -j DROP

> > + * -A INPUT -p udp -m udp --sport 100 --dport 500 -j DROP

> >   * COMMIT

> >   */

> 

> Patch 10 adds this test, but then patch 12 removes most of it?

> Keep both?


Sorry, I missed it.
I decided that the large blob looks really ugly and switched to
the smaller one and forgot to cleanup the patchset.

> 

> Also hit this on my system with older glibc:

> 

> ../net/bpfilter/codegen.c: In function ‘codegen_push_subprog’:

> ../net/bpfilter/codegen.c:67:4: warning: implicit declaration of function ‘reallocarray’ [-Wimplicit-function-declaration]

>    67 |    reallocarray(codegen->subprogs, subprogs_max, sizeof(codegen->subprogs[0]));

>       |    ^~~~~~~~~~~~

> ../net/bpfilter/codegen.c:66:12: warning: assignment to ‘struct codegen_subprog_desc **’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]

>    66 |   subprogs =

>       |            ^

> 

> In libbpf we have libbpf_reallocarray() for this reason.

> 

> Could you provide an example of generated bpf program?

> And maybe add Documentation/bpf/bpfilter_design.rst ?


I will add documentation in the next iteration when
bpf_map_for_each() subprog will be introduced.

> 

> The tests don't build for me:

> $ cd selftests/bpf/bpfilter; make

> make: *** No rule to make target '-lelf', needed by '.../selftests/bpf/bpfilter/test_match'.  Stop.


libelf was added because libbpf depends on it.
Are you able to build libbpf?

> 

> The unit tests are great, btw. test_codegen is not end-to-end, right?

> Could you add a full test with iptable command line?

> or netns support is a prerequisite for it?


Yeah, as net namespaces aren't supported using iptables binary
will modify the root namespace. That is the reason why codegen
tests aren't implemented in the end-to-end fashion and rules are
represented by blobs.


-- 

Dmitrii Banshchikov

Alexei Starovoitov Aug. 30, 2021, 11:45 p.m. UTC | #3

On Tue, Aug 31, 2021 at 12:54:43AM +0400, Dmitrii Banshchikov wrote:
> On Mon, Aug 30, 2021 at 12:45:45PM -0700, Alexei Starovoitov wrote:

> > On Sun, Aug 29, 2021 at 10:36:07PM +0400, Dmitrii Banshchikov wrote:

> > >  /*

> > > - * # Generated by iptables-save v1.8.2 on Sat May  8 05:22:41 2021

> > > + *  Generated by iptables-save v1.8.2 on Sat May  8 05:22:41 2021

> > >   * *filter

> > ...

> > > - * -A LOCAL -s 10.32.0.0/11 -j FROMDC

> > > - * -A LOCAL -s 10.144.0.0/12 -j FROMDC

> > > - * -A LOCAL -s 10.160.0.0/12 -j FROMDC

> > > - * -A LOCAL -s 10.0.0.0/12 -j FROMDC

> > > - * -A LOCAL -s 10.248.0.0/24 -j FROMDC

> > > - * -A LOCAL -s 10.232.0.0/16 -j FROMDC

> > > - * -A LOCAL -s 10.1.146.131/32 -p udp -m udp --dport 161 -j ACCEPT

> > > - * -A LOCAL -s 10.149.118.14/32 -p udp -m udp --dport 161 -j ACCEPT

> > > - * -A LOCAL -p icmp -j ACCEPT

> > > + * :INPUT ACCEPT [0:0]

> > > + * :FORWARD ACCEPT [0:0]

> > > + * :OUTPUT ACCEPT [0:0]

> > > + * -A INPUT -s 1.1.1.1/32 -d 2.2.2.2/32 -j DROP

> > > + * -A INPUT -s 2.2.0.0/16 -d 3.0.0.0/8 -j DROP

> > > + * -A INPUT -p udp -m udp --sport 100 --dport 500 -j DROP

> > >   * COMMIT

> > >   */

> > 

> > Patch 10 adds this test, but then patch 12 removes most of it?

> > Keep both?

> 

> Sorry, I missed it.

> I decided that the large blob looks really ugly and switched to

> the smaller one and forgot to cleanup the patchset.

> 

> > 

> > Also hit this on my system with older glibc:

> > 

> > ../net/bpfilter/codegen.c: In function ‘codegen_push_subprog’:

> > ../net/bpfilter/codegen.c:67:4: warning: implicit declaration of function ‘reallocarray’ [-Wimplicit-function-declaration]

> >    67 |    reallocarray(codegen->subprogs, subprogs_max, sizeof(codegen->subprogs[0]));

> >       |    ^~~~~~~~~~~~

> > ../net/bpfilter/codegen.c:66:12: warning: assignment to ‘struct codegen_subprog_desc **’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]

> >    66 |   subprogs =

> >       |            ^

> > 

> > In libbpf we have libbpf_reallocarray() for this reason.

> > 

> > Could you provide an example of generated bpf program?

> > And maybe add Documentation/bpf/bpfilter_design.rst ?

> 

> I will add documentation in the next iteration when

> bpf_map_for_each() subprog will be introduced.

> 

> > 

> > The tests don't build for me:

> > $ cd selftests/bpf/bpfilter; make

> > make: *** No rule to make target '-lelf', needed by '.../selftests/bpf/bpfilter/test_match'.  Stop.

> 

> libelf was added because libbpf depends on it.

> Are you able to build libbpf?


make proceeds to build libbpf just fine, but then it stops with above message.
I manually removed -lelf from Makefile. Then run make to see it fail linking
and then manually copy pasted gcc command to build it with additional -lelf
command line.
fwiw make -v
GNU Make 4.2.1

> > 

> > The unit tests are great, btw. test_codegen is not end-to-end, right?

> > Could you add a full test with iptable command line?

> > or netns support is a prerequisite for it?

> 

> Yeah, as net namespaces aren't supported using iptables binary

> will modify the root namespace. That is the reason why codegen

> tests aren't implemented in the end-to-end fashion and rules are

> represented by blobs.


I think when ifindex is no longer hardcoded the netns support
doesn't have to be gating. The generic xdp attached to veth in netns
should work to do end-to-end test. bpftiler would need to do a bit of magic
to figure out the right ifindex. Or we can extend kernel with ifindex-less
generic XDP.

Jamal Hadi Salim Aug. 31, 2021, 1:56 a.m. UTC | #4

On 2021-08-29 2:35 p.m., Dmitrii Banshchikov wrote:

[..]

> And here are some performance tests.

> 

> The environment consists of two machines(sender and receiver)

> connected with 10Gbps link via switch.  The sender uses DPDK to

> simulate QUIC packets(89 bytes long) from random IP. The switch

> measures the generated traffic to be about 7066377568 bits/sec,

> 9706553 packets/sec.

> 

> The receiver is a 2 socket 2680v3 + HT and uses either iptables,

> nft or bpfilter to filter out UDP traffic.

> 

> Two tests were made. Two rulesets(default policy was to ACCEPT)

> were used in each test:

> 

> ```

> iptables -A INPUT -p udp -m udp --dport 1500 -j DROP

> ```

> and

> ```

> iptables -A INPUT -s 1.1.1.1/32 -p udp -m udp --dport 1000 -j DROP

> iptables -A INPUT -s 2.2.2.2/32 -p udp -m udp --dport 2000 -j DROP

> ...

> iptables -A INPUT -s 31.31.31.31/32 -p udp -m udp --dport 31000 -j DROP

> iptables -A INPUT -p udp -m udp --dport 1500 -j DROP

> ```

> 

> The first test measures performance of the receiver via stress-ng

> [3] in bogo-ops. The upper-bound(there are no firewall and no

> traffic) value for bogo-ops is 8148-8210. The lower bound value

> (there is traffic but no firewall) is 6567-6643.

> The stress-ng command used: stress-ng -t60 -c48 --metrics-brief.

> 

> The second test measures the number the of dropped packets. The

> receiver keeps only 1 CPU online and disables all

> others(maxcpus=1 and set number of cores per socket to 1 in

> BIOS). The number of the dropped packets is collected via

> iptables-legacy -nvL, iptables -nvL and bpftool map dump id.

> 

> Test 1: bogo-ops(the more the better)

>              iptables            nft        bpfilter

>    1 rule:  6474-6554      6483-6515       7996-8008

> 32 rules:  6374-6433      5761-5804       7997-8042

> 

> 

> Test 2: number of dropped packets(the more the better)

>              iptables            nft         bpfilter

>    1 rule:  234M-241M           220M            900M+

> 32 rules:  186M-196M        97M-98M            900M+

> 

> 

> Please let me know if you see a gap in the testing environment.

General perf testing will depend on the nature of the use case
you are trying to target.
What is the nature of the app? Is it just receiving packets and
counting? Does it exemplify something something real in your
network or is just purely benchmarking? Both are valid.
What else can it do (eg are you interested in latency accounting etc)?
What i have seen in practise for iptables deployments is a default
drop and in general an accept list. Per ruleset IP address aggregation
is typically achieved via ipset. So your mileage may vary...

Having said that:
Our testing[1] approach is typically for a worst case scenario.
i.e we make sure you structure the rulesets such that all of the
linear rulesets will be iterated and we eventually hit the default
ruleset.
We also try to reduce variability in the results. A lot of small
things could affect your reproducibility, so we try to avoid them.
For example, from what you described:
You are sending from a random IP - that means each packet will hit
a random ruleset (for the case of 32 rulesets). And some rules will
likely be hit more often than others. The likelihood of reproducing the
same results for multiple runs gets lower as you increase the number
of rulesets.
 From a collection perspective:
Looking at the nature of the CPU utilization is important
Softirq vs system calls vs user app.
Your test workload seems to be very specific to ingress host.
So in reality you are more constrained by kernel->user syscalls
(which will be hidden if you are mostly dropping in the kernel
as opposed to letting packets go to user space).

Something is not clear from your email:
You seem to indicate that no traffic was running in test 1.
If so, why would 32 rulesets give different results than 1?

cheers,
jamal

[1] https://netdevconf.info/0x15/session.html?Linux-ACL-Performance-Analysis

Dmitrii Banshchikov Aug. 31, 2021, 12:48 p.m. UTC | #5

On Mon, Aug 30, 2021 at 09:56:18PM -0400, Jamal Hadi Salim wrote:
> On 2021-08-29 2:35 p.m., Dmitrii Banshchikov wrote:

> 

> [..]

> 

> > And here are some performance tests.

> > 

> > The environment consists of two machines(sender and receiver)

> > connected with 10Gbps link via switch.  The sender uses DPDK to

> > simulate QUIC packets(89 bytes long) from random IP. The switch

> > measures the generated traffic to be about 7066377568 bits/sec,

> > 9706553 packets/sec.

> > 

> > The receiver is a 2 socket 2680v3 + HT and uses either iptables,

> > nft or bpfilter to filter out UDP traffic.

> > 

> > Two tests were made. Two rulesets(default policy was to ACCEPT)

> > were used in each test:

> > 

> > ```

> > iptables -A INPUT -p udp -m udp --dport 1500 -j DROP

> > ```

> > and

> > ```

> > iptables -A INPUT -s 1.1.1.1/32 -p udp -m udp --dport 1000 -j DROP

> > iptables -A INPUT -s 2.2.2.2/32 -p udp -m udp --dport 2000 -j DROP

> > ...

> > iptables -A INPUT -s 31.31.31.31/32 -p udp -m udp --dport 31000 -j DROP

> > iptables -A INPUT -p udp -m udp --dport 1500 -j DROP

> > ```

> > 

> > The first test measures performance of the receiver via stress-ng

> > [3] in bogo-ops. The upper-bound(there are no firewall and no

> > traffic) value for bogo-ops is 8148-8210. The lower bound value

> > (there is traffic but no firewall) is 6567-6643.

> > The stress-ng command used: stress-ng -t60 -c48 --metrics-brief.

> > 

> > The second test measures the number the of dropped packets. The

> > receiver keeps only 1 CPU online and disables all

> > others(maxcpus=1 and set number of cores per socket to 1 in

> > BIOS). The number of the dropped packets is collected via

> > iptables-legacy -nvL, iptables -nvL and bpftool map dump id.

> > 

> > Test 1: bogo-ops(the more the better)

> >              iptables            nft        bpfilter

> >    1 rule:  6474-6554      6483-6515       7996-8008

> > 32 rules:  6374-6433      5761-5804       7997-8042

> > 

> > 

> > Test 2: number of dropped packets(the more the better)

> >              iptables            nft         bpfilter

> >    1 rule:  234M-241M           220M            900M+

> > 32 rules:  186M-196M        97M-98M            900M+

> > 

> > 

> > Please let me know if you see a gap in the testing environment.

> 

> General perf testing will depend on the nature of the use case

> you are trying to target.

> What is the nature of the app? Is it just receiving packets and

> counting? Does it exemplify something something real in your

> network or is just purely benchmarking? Both are valid.

> What else can it do (eg are you interested in latency accounting etc)?

> What i have seen in practise for iptables deployments is a default

> drop and in general an accept list. Per ruleset IP address aggregation

> is typically achieved via ipset. So your mileage may vary...


This was a pure benchmarking with the single goal - show that
there might exist scenarios when using bpfilter may provide some
performance benefits.

> 

> Having said that:

> Our testing[1] approach is typically for a worst case scenario.

> i.e we make sure you structure the rulesets such that all of the

> linear rulesets will be iterated and we eventually hit the default

> ruleset.

> We also try to reduce variability in the results. A lot of small

> things could affect your reproducibility, so we try to avoid them.

> For example, from what you described:

> You are sending from a random IP - that means each packet will hit

> a random ruleset (for the case of 32 rulesets). And some rules will

> likely be hit more often than others. The likelihood of reproducing the

> same results for multiple runs gets lower as you increase the number

> of rulesets.

> From a collection perspective:

> Looking at the nature of the CPU utilization is important

> Softirq vs system calls vs user app.

> Your test workload seems to be very specific to ingress host.

> So in reality you are more constrained by kernel->user syscalls

> (which will be hidden if you are mostly dropping in the kernel

> as opposed to letting packets go to user space).



> 

> Something is not clear from your email:

> You seem to indicate that no traffic was running in test 1.

> If so, why would 32 rulesets give different results than 1?


I mentioned the lower and upper bound values for bogo-ops on the
machine. The lower bound is when there is traffic and no firewall
at all. The upper bound is when there is no firewall and no
traffic. Then the first test measures bogo-ops for two rule sets
when there is traffic for either iptables, nft or bpfilter.

> 

> cheers,

> jamal

> 

> [1] https://netdevconf.info/0x15/session.html?Linux-ACL-Performance-Analysis


-- 

Dmitrii Banshchikov

Dmitrii Banshchikov Aug. 31, 2021, 12:52 p.m. UTC | #6

On Mon, Aug 30, 2021 at 04:45:15PM -0700, Alexei Starovoitov wrote:
> On Tue, Aug 31, 2021 at 12:54:43AM +0400, Dmitrii Banshchikov wrote:

> > On Mon, Aug 30, 2021 at 12:45:45PM -0700, Alexei Starovoitov wrote:

> > > On Sun, Aug 29, 2021 at 10:36:07PM +0400, Dmitrii Banshchikov wrote:

> > > >  /*

> > > > - * # Generated by iptables-save v1.8.2 on Sat May  8 05:22:41 2021

> > > > + *  Generated by iptables-save v1.8.2 on Sat May  8 05:22:41 2021

> > > >   * *filter

> > > ...

> > > > - * -A LOCAL -s 10.32.0.0/11 -j FROMDC

> > > > - * -A LOCAL -s 10.144.0.0/12 -j FROMDC

> > > > - * -A LOCAL -s 10.160.0.0/12 -j FROMDC

> > > > - * -A LOCAL -s 10.0.0.0/12 -j FROMDC

> > > > - * -A LOCAL -s 10.248.0.0/24 -j FROMDC

> > > > - * -A LOCAL -s 10.232.0.0/16 -j FROMDC

> > > > - * -A LOCAL -s 10.1.146.131/32 -p udp -m udp --dport 161 -j ACCEPT

> > > > - * -A LOCAL -s 10.149.118.14/32 -p udp -m udp --dport 161 -j ACCEPT

> > > > - * -A LOCAL -p icmp -j ACCEPT

> > > > + * :INPUT ACCEPT [0:0]

> > > > + * :FORWARD ACCEPT [0:0]

> > > > + * :OUTPUT ACCEPT [0:0]

> > > > + * -A INPUT -s 1.1.1.1/32 -d 2.2.2.2/32 -j DROP

> > > > + * -A INPUT -s 2.2.0.0/16 -d 3.0.0.0/8 -j DROP

> > > > + * -A INPUT -p udp -m udp --sport 100 --dport 500 -j DROP

> > > >   * COMMIT

> > > >   */

> > > 

> > > Patch 10 adds this test, but then patch 12 removes most of it?

> > > Keep both?

> > 

> > Sorry, I missed it.

> > I decided that the large blob looks really ugly and switched to

> > the smaller one and forgot to cleanup the patchset.

> > 

> > > 

> > > Also hit this on my system with older glibc:

> > > 

> > > ../net/bpfilter/codegen.c: In function ‘codegen_push_subprog’:

> > > ../net/bpfilter/codegen.c:67:4: warning: implicit declaration of function ‘reallocarray’ [-Wimplicit-function-declaration]

> > >    67 |    reallocarray(codegen->subprogs, subprogs_max, sizeof(codegen->subprogs[0]));

> > >       |    ^~~~~~~~~~~~

> > > ../net/bpfilter/codegen.c:66:12: warning: assignment to ‘struct codegen_subprog_desc **’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]

> > >    66 |   subprogs =

> > >       |            ^

> > > 

> > > In libbpf we have libbpf_reallocarray() for this reason.

> > > 

> > > Could you provide an example of generated bpf program?

> > > And maybe add Documentation/bpf/bpfilter_design.rst ?

> > 

> > I will add documentation in the next iteration when

> > bpf_map_for_each() subprog will be introduced.

> > 

> > > 

> > > The tests don't build for me:

> > > $ cd selftests/bpf/bpfilter; make

> > > make: *** No rule to make target '-lelf', needed by '.../selftests/bpf/bpfilter/test_match'.  Stop.

> > 

> > libelf was added because libbpf depends on it.

> > Are you able to build libbpf?

> 

> make proceeds to build libbpf just fine, but then it stops with above message.

> I manually removed -lelf from Makefile. Then run make to see it fail linking

> and then manually copy pasted gcc command to build it with additional -lelf

> command line.

> fwiw make -v

> GNU Make 4.2.1


Will take a look on it. Thanks.

> 

> > > 

> > > The unit tests are great, btw. test_codegen is not end-to-end, right?

> > > Could you add a full test with iptable command line?

> > > or netns support is a prerequisite for it?

> > 

> > Yeah, as net namespaces aren't supported using iptables binary

> > will modify the root namespace. That is the reason why codegen

> > tests aren't implemented in the end-to-end fashion and rules are

> > represented by blobs.

> 

> I think when ifindex is no longer hardcoded the netns support

> doesn't have to be gating. The generic xdp attached to veth in netns

> should work to do end-to-end test. bpftiler would need to do a bit of magic

> to figure out the right ifindex. Or we can extend kernel with ifindex-less

> generic XDP.


Is it ok to add an external dependency to tests? The unit test
will need to execute iptables binary.


-- 

Dmitrii Banshchikov

Jamal Hadi Salim Aug. 31, 2021, 1:38 p.m. UTC | #7

On 2021-08-31 8:48 a.m., Dmitrii Banshchikov wrote:
> On Mon, Aug 30, 2021 at 09:56:18PM -0400, Jamal Hadi Salim wrote:

>> On 2021-08-29 2:35 p.m., Dmitrii Banshchikov wrote:



>>

>> Something is not clear from your email:

>> You seem to indicate that no traffic was running in test 1.

>> If so, why would 32 rulesets give different results than 1?

> 

> I mentioned the lower and upper bound values for bogo-ops on the

> machine. The lower bound is when there is traffic and no firewall

> at all. The upper bound is when there is no firewall and no

> traffic. Then the first test measures bogo-ops for two rule sets

> when there is traffic for either iptables, nft or bpfilter.

> 


Thanks, I totally misread that. I did look at stress-ng initially
to use it to stress the system and emulate some real world
setup (polluting cache etc) while testing but the variability of
the results was bad - so dropped it earlier. Maybe we should look
at it again.

cheers,
jamal

Alexei Starovoitov Aug. 31, 2021, 3:45 p.m. UTC | #8

On Tue, Aug 31, 2021 at 5:52 AM Dmitrii Banshchikov <me@ubique.spb.ru> wrote:
>

> Is it ok to add an external dependency to tests? The unit test

> will need to execute iptables binary.

Ideally not, but sometimes it's unavoidable.
iptables cmd is generally available.
selftests/bpf already have few tests that shell out to it.
They're not part of test_progs though.

[bpf-next,v2,00/13] bpfilter

Message

Comments