mbox series

[RESEND,v11,0/4] xdp: extend xdp_redirect_map with broadcast support

Message ID 20210513070447.1878448-1-liuhangbin@gmail.com
Headers show
Series xdp: extend xdp_redirect_map with broadcast support | expand

Message

Hangbin Liu May 13, 2021, 7:04 a.m. UTC
Hi,

This patchset is a new implementation for XDP multicast support based
on my previous 2 maps implementation[1]. The reason is that Daniel thinks
the exclude map implementation is missing proper bond support in XDP
context. And there is a plan to add native XDP bonding support. Adding a
exclude map in the helper also increases the complexity of verifier and has
drawback of performance.

The new implementation just add two new flags BPF_F_BROADCAST and
BPF_F_EXCLUDE_INGRESS to extend xdp_redirect_map for broadcast support.

With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.

The patchv10 link is here[2].

[1] https://lore.kernel.org/bpf/20210223125809.1376577-1-liuhangbin@gmail.com
[2] https://lore.kernel.org/bpf/20210423020019.2333192-1-liuhangbin@gmail.com

v11:
a) Use unlikely() when checking if this is for broadcast redirecting.
b) Fix a tracepoint NULL pointer issue Jesper found
c) Remove BPF_F_REDIR_MASK and just use OR flags to make the reader more
   clear about what's flags we are using
d) Add the performace number with multi veth interfaces in patch 01
   description.
e) remove some sleeps to reduce the testing time in patch04. Re-struct the
   test and make clear what flags we are testing.

v10: use READ/WRITE_ONCE when read/write map instead of xchg()
v9: Update patch 01 commit description
v8: use hlist_for_each_entry_rcu() when looping the devmap hash ojbs
v7: No need to free xdpf in dev_map_enqueue_clone() if xdpf_clone failed.
v6: Fix a skb leak in the error path for generic XDP
v5: Just walk the map directly to get interfaces as get_next_key() of devmap
    hash may restart looping from the first key if the device get removed.
    After update the performace has improved 10% compired with v4.
v4: Fix flags never cleared issue in patch 02. Update selftest to cover this.
v3: Rebase the code based on latest bpf-next
v2: fix flag renaming issue in patch 02


Hangbin Liu (3):
  xdp: extend xdp_redirect_map with broadcast support
  sample/bpf: add xdp_redirect_map_multi for redirect_map broadcast test
  selftests/bpf: add xdp_redirect_multi test

Jesper Dangaard Brouer (1):
  bpf: run devmap xdp_prog on flush instead of bulk enqueue

 include/linux/bpf.h                           |  20 ++
 include/linux/filter.h                        |  18 +-
 include/net/xdp.h                             |   1 +
 include/trace/events/xdp.h                    |   6 +-
 include/uapi/linux/bpf.h                      |  16 +-
 kernel/bpf/cpumap.c                           |   3 +-
 kernel/bpf/devmap.c                           | 306 +++++++++++++++---
 net/core/filter.c                             |  37 ++-
 net/core/xdp.c                                |  29 ++
 net/xdp/xskmap.c                              |   3 +-
 samples/bpf/Makefile                          |   3 +
 samples/bpf/xdp_redirect_map_multi_kern.c     |  88 +++++
 samples/bpf/xdp_redirect_map_multi_user.c     | 302 +++++++++++++++++
 tools/include/uapi/linux/bpf.h                |  16 +-
 tools/testing/selftests/bpf/Makefile          |   3 +-
 .../bpf/progs/xdp_redirect_multi_kern.c       |  94 ++++++
 .../selftests/bpf/test_xdp_redirect_multi.sh  | 204 ++++++++++++
 .../selftests/bpf/xdp_redirect_multi.c        | 226 +++++++++++++
 18 files changed, 1310 insertions(+), 65 deletions(-)
 create mode 100644 samples/bpf/xdp_redirect_map_multi_kern.c
 create mode 100644 samples/bpf/xdp_redirect_map_multi_user.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c
 create mode 100755 tools/testing/selftests/bpf/test_xdp_redirect_multi.sh
 create mode 100644 tools/testing/selftests/bpf/xdp_redirect_multi.c

Comments

Jesper Dangaard Brouer May 14, 2021, 6:49 a.m. UTC | #1
On Thu, 13 May 2021 15:04:44 +0800
Hangbin Liu <liuhangbin@gmail.com> wrote:

> From: Jesper Dangaard Brouer <brouer@redhat.com>

> 

> This changes the devmap XDP program support to run the program when the

> bulk queue is flushed instead of before the frame is enqueued. This has

> a couple of benefits:

> 

> - It "sorts" the packets by destination devmap entry, and then runs the

>   same BPF program on all the packets in sequence. This ensures that we

>   keep the XDP program and destination device properties hot in I-cache.

> 

> - It makes the multicast implementation simpler because it can just

>   enqueue packets using bq_enqueue() without having to deal with the

>   devmap program at all.

> 

> The drawback is that if the devmap program drops the packet, the enqueue

> step is redundant. However, arguably this is mostly visible in a

> micro-benchmark, and with more mixed traffic the I-cache benefit should

> win out. The performance impact of just this patch is as follows:

> 

> Using 2 10Gb i40e NIC, redirecting one to another, or into a veth interface,

> which do XDP_DROP on veth peer. With xdp_redirect_map in sample/bpf, send

> pkts via pktgen cmd:

> ./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64

> 

> There are about +/- 0.1M deviation for native testing, the performance

> improved for the base-case, but some drop back with xdp devmap prog attached.

> 

> Version          | Test                           | Generic | Native | Native + 2nd xdp_prog

> 5.12 rc4         | xdp_redirect_map   i40e->i40e  |    1.9M |   9.6M |  8.4M

> 5.12 rc4         | xdp_redirect_map   i40e->veth  |    1.7M |  11.7M |  9.8M

> 5.12 rc4 + patch | xdp_redirect_map   i40e->i40e  |    1.9M |   9.8M |  8.0M

> 5.12 rc4 + patch | xdp_redirect_map   i40e->veth  |    1.7M |  12.0M |  9.4M

> 

> When bq_xmit_all() is called from bq_enqueue(), another packet will

> always be enqueued immediately after, so clearing dev_rx, xdp_prog and

> flush_node in bq_xmit_all() is redundant. Move the clear to __dev_flush(),

> and only check them once in bq_enqueue() since they are all modified

> together.

> 

> This change also has the side effect of extending the lifetime of the

> RCU-protected xdp_prog that lives inside the devmap entries: Instead of

> just living for the duration of the XDP program invocation, the

> reference now lives all the way until the bq is flushed. This is safe

> because the bq flush happens at the end of the NAPI poll loop, so

> everything happens between a local_bh_disable()/local_bh_enable() pair.

> However, this is by no means obvious from looking at the call sites; in

> particular, some drivers have an additional rcu_read_lock() around only

> the XDP program invocation, which only confuses matters further.

> Cleaning this up will be done in a separate patch series.

> 

> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>

> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>


For the sake of good order

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer