mbox series

[PATCHv3,iproute2-next,0/5] iproute2: add libbpf support

Message ID 20201029151146.3810859-1-haliu@redhat.com
Headers show
Series iproute2: add libbpf support | expand

Message

Hangbin Liu Oct. 29, 2020, 3:11 p.m. UTC
This series converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available. This means that iproute2 will
correctly process BTF information and support the new-style BTF-defined
maps, while keeping compatibility with the old internal map definition
syntax.

This is achieved by checking for libbpf at './configure' time, and using
it if available. By default the system libbpf will be used, but static
linking against a custom libbpf version can be achieved by passing
LIBBPF_DIR to configure. FORCE_LIBBPF can be set to force configure to
abort if no suitable libbpf is found (useful for automatic packaging
that wants to enforce the dependency).

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code ensures that iproute2 will
still understand the old map definition format, including populating
map-in-map and tail call maps before load.

The examples in bpf/examples are kept, and a separate set of examples
are added with BTF-based map definitions for those examples where this
is possible (libbpf doesn't currently support declaratively populating
tail call maps).

At last, Thanks a lot for Toke's help on this patch set.

v3:
a) Update configure to Check function bpf_program__section_name() separately
b) Add a new function get_bpf_program__section_name() to choose whether to
use bpf_program__title() or not.
c) Test build the patch on Fedora 33 with libbpf-0.1.0-1.fc33 and
   libbpf-devel-0.1.0-1.fc33

v2:
a) Remove self defined IS_ERR_OR_NULL and use libbpf_get_error() instead.
b) Add ipvrf with libbpf support.


Here are the test results with patched iproute2:

== setup env
# clang -O2 -Wall -g -target bpf -c bpf_graft.c -o btf_graft.o
# clang -O2 -Wall -g -target bpf -c bpf_map_in_map.c -o btf_map_in_map.o
# clang -O2 -Wall -g -target bpf -c bpf_shared.c -o btf_shared.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_cyclic.c -o bpf_cyclic.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_graft.c -o bpf_graft.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_map_in_map.c -o bpf_map_in_map.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_shared.c -o bpf_shared.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_tailcall.c -o bpf_tailcall.o
# rm -rf /sys/fs/bpf/xdp/globals
# /root/iproute2/ip/ip link add type veth
# /root/iproute2/ip/ip link set veth0 up
# /root/iproute2/ip/ip link set veth1 up


== Load objs
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 4 tag 3056d2382e53f27c jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
4: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:21-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 5
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 8 tag 4420e72b2a601ed7 jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc  map_inner  map_outer
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
8: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:23-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 10
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 12 tag 9cbab549c3af3eab jited
# ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
12: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:25-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 15
# /root/iproute2/ip/ip link set veth0 xdp off


== Load objs again to make sure maps could be reused
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 16 tag 3056d2382e53f27c jited
# ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
16: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:27-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 20
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 20 tag 4420e72b2a601ed7 jited
# ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
# bpftool map show                                                                                                                                                                   [236/4518]
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
20: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:29-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 25
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 24 tag 9cbab549c3af3eab jited
# ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
24: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:31-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 30
# /root/iproute2/ip/ip link set veth0 xdp off
# rm -rf /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals

== Testing if we can load new-style objects (using xdp-filter as an example)
# /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_all.o sec xdp_filter
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 28 tag e29eeda1489a6520 jited
# ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
# bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
# bpftool prog show
28: xdp  name xdpfilt_alw_all  tag e29eeda1489a6520  gpl
        loaded_at 2020-10-22T08:04:33-0400  uid 0
        xlated 2408B  jited 1405B  memlock 4096B  map_ids 9,5,7,8,6
        btf_id 35
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_ip.o sec xdp_filter
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 32 tag 2f2b9dbfb786a5a2 jited
# ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
# bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
# bpftool prog show
32: xdp  name xdpfilt_alw_ip  tag 2f2b9dbfb786a5a2  gpl
        loaded_at 2020-10-22T08:04:35-0400  uid 0
        xlated 1336B  jited 778B  memlock 4096B  map_ids 7,8,5
        btf_id 40
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_tcp.o sec xdp_filter
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 36 tag 18c1bb25084030bc jited
# ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
# bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
# bpftool prog show
36: xdp  name xdpfilt_alw_tcp  tag 18c1bb25084030bc  gpl
        loaded_at 2020-10-22T08:04:37-0400  uid 0
        xlated 1128B  jited 690B  memlock 4096B  map_ids 6,5
        btf_id 45
# /root/iproute2/ip/ip link set veth0 xdp off
# rm -rf /sys/fs/bpf/xdp/globals


== Load new btf defined maps
# /root/iproute2/ip/ip link set veth0 xdp obj btf_graft.o sec aaa
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 40 tag 3056d2382e53f27c jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc
# bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
40: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:39-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 50
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj btf_map_in_map.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 44 tag 4420e72b2a601ed7 jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer
# bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
44: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:41-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 13
        btf_id 55
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj btf_shared.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 48 tag 9cbab549c3af3eab jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer  map_sh
# bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
14: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
48: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:43-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 14
        btf_id 60
# /root/iproute2/ip/ip link set veth0 xdp off
# rm -rf /sys/fs/bpf/xdp/globals


== Test load objs by tc
# /root/iproute2/tc/tc qdisc add dev veth0 ingress
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_cyclic.o sec 0xabccba/0
# /root/iproute2/tc/tc filter add dev veth0 parent ffff: bpf obj bpf_graft.o
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/0
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/1
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 43/0
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec classifier
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
# ls /sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d /sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d:
jmp_tc

/sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f:
jmp_ex  jmp_tc  map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc
# bpftool map show
15: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
16: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
17: prog_array  name jmp_ex  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
18: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 2  memlock 4096B
        owner_prog_type sched_cls  owner jited
19: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
52: sched_cls  name cls_loop  tag 3e98a40b04099d36  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 168B  jited 133B  memlock 4096B  map_ids 15
        btf_id 65
56: sched_cls  name cls_entry  tag 0fbb4d9310a6ee26  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 144B  jited 121B  memlock 4096B  map_ids 16
        btf_id 70
60: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 75
66: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 80
72: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 85
78: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 90
79: sched_cls  name cls_case2  tag ee218ff893dca823  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 336B  jited 218B  memlock 4096B  map_ids 19,18
        btf_id 90
80: sched_cls  name cls_exit  tag e78a58140deed387  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 288B  jited 177B  memlock 4096B  map_ids 19
        btf_id 90

I also run the following upstream kselftest with patches iproute2 and
all passed.

test_lwt_ip_encap.sh
test_xdp_redirect.sh
test_tc_redirect.sh
test_xdp_meta.sh
test_xdp_veth.sh
test_xdp_vlan.sh

Hangbin Liu (5):
  configure: add check_libbpf() for later libbpf support
  lib: rename bpf.c to bpf_legacy.c
  lib: add libbpf support
  examples/bpf: move struct bpf_elf_map defined maps to legacy folder
  examples/bpf: add bpf examples with BTF defined maps

 configure                                |  94 +++++++
 examples/bpf/README                      |  18 +-
 examples/bpf/bpf_graft.c                 |  14 +-
 examples/bpf/bpf_map_in_map.c            |  37 ++-
 examples/bpf/bpf_shared.c                |  14 +-
 examples/bpf/{ => legacy}/bpf_cyclic.c   |   2 +-
 examples/bpf/legacy/bpf_graft.c          |  66 +++++
 examples/bpf/legacy/bpf_map_in_map.c     |  56 ++++
 examples/bpf/legacy/bpf_shared.c         |  53 ++++
 examples/bpf/{ => legacy}/bpf_tailcall.c |   2 +-
 include/bpf_api.h                        |  13 +
 include/bpf_util.h                       |  17 +-
 ip/ipvrf.c                               |  19 +-
 lib/Makefile                             |   6 +-
 lib/{bpf.c => bpf_legacy.c}              | 184 +++++++++++-
 lib/bpf_libbpf.c                         | 341 +++++++++++++++++++++++
 16 files changed, 888 insertions(+), 48 deletions(-)
 rename examples/bpf/{ => legacy}/bpf_cyclic.c (95%)
 create mode 100644 examples/bpf/legacy/bpf_graft.c
 create mode 100644 examples/bpf/legacy/bpf_map_in_map.c
 create mode 100644 examples/bpf/legacy/bpf_shared.c
 rename examples/bpf/{ => legacy}/bpf_tailcall.c (98%)
 rename lib/{bpf.c => bpf_legacy.c} (94%)
 create mode 100644 lib/bpf_libbpf.c

Comments

David Ahern Nov. 2, 2020, 3:41 p.m. UTC | #1
On 10/29/20 9:11 AM, Hangbin Liu wrote:
> diff --git a/ip/ipvrf.c b/ip/ipvrf.c
> index 33150ac2..afaf1de7 100644
> --- a/ip/ipvrf.c
> +++ b/ip/ipvrf.c
> @@ -28,8 +28,14 @@
>  #include "rt_names.h"
>  #include "utils.h"
>  #include "ip_common.h"
> +
>  #include "bpf_util.h"
>  
> +#ifdef HAVE_LIBBPF
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +#endif
> +
>  #define CGRP_PROC_FILE  "/cgroup.procs"
>  
>  static struct link_filter vrf_filter;
> @@ -256,8 +262,13 @@ static int prog_load(int idx)
>  		BPF_EXIT_INSN(),
>  	};
>  
> +#ifdef HAVE_LIBBPF
> +	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
> +				"GPL", 0, bpf_log_buf, sizeof(bpf_log_buf));
> +#else
>  	return bpf_prog_load_buf(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
>  			         "GPL", bpf_log_buf, sizeof(bpf_log_buf));
> +#endif
>  }
>  
>  static int vrf_configure_cgroup(const char *path, int ifindex)
> @@ -288,7 +299,11 @@ static int vrf_configure_cgroup(const char *path, int ifindex)
>  		goto out;
>  	}
>  
> +#ifdef HAVE_LIBBPF
> +	if (bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE, 0)) {
> +#else
>  	if (bpf_prog_attach_fd(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE)) {
> +#endif
>  		fprintf(stderr, "Failed to attach prog to cgroup: '%s'\n",
>  			strerror(errno));
>  		goto out;

I would prefer to have these #ifdef .. #endif checks consolidated in the
lib code. Create a bpf_compat file for these. e.g.,

int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,
                     size_t size_insns, const char *license, char *log,
                     size_t size_log)
{
+#ifdef HAVE_LIBBPF
+	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
+				"GPL", 0, bpf_log_buf, sizeof(bpf_log_buf));
+#else
 	return bpf_prog_load_buf(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
 			         "GPL", bpf_log_buf, sizeof(bpf_log_buf));
+#endif
}

Similarly for bpf_program_attach.


I think even the includes can be done once in bpf_util.h with a single
+#ifdef HAVE_LIBBPF
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#endif
+

The iproute2_* functions added later in this patch can be in the compat
file as well.
David Ahern Nov. 2, 2020, 3:47 p.m. UTC | #2
On 10/29/20 9:11 AM, Hangbin Liu wrote:
> This series converts iproute2 to use libbpf for loading and attaching
> BPF programs when it is available. This means that iproute2 will
> correctly process BTF information and support the new-style BTF-defined
> maps, while keeping compatibility with the old internal map definition
> syntax.
> 
> This is achieved by checking for libbpf at './configure' time, and using
> it if available. By default the system libbpf will be used, but static
> linking against a custom libbpf version can be achieved by passing
> LIBBPF_DIR to configure. FORCE_LIBBPF can be set to force configure to
> abort if no suitable libbpf is found (useful for automatic packaging
> that wants to enforce the dependency).
> 
> The old iproute2 bpf code is kept and will be used if no suitable libbpf
> is available. When using libbpf, wrapper code ensures that iproute2 will
> still understand the old map definition format, including populating
> map-in-map and tail call maps before load.
> 
> The examples in bpf/examples are kept, and a separate set of examples
> are added with BTF-based map definitions for those examples where this
> is possible (libbpf doesn't currently support declaratively populating
> tail call maps).
> 
> At last, Thanks a lot for Toke's help on this patch set.
> 

In regards to comments from v2 of the series:

iproute2 is a stable, production package that requires minimal support
from external libraries. The external packages it does require are also
stable with few to no relevant changes.

bpf and libbpf on the other hand are under active development and
rapidly changing month over month. The git submodule approach has its
conveniences for rapid development but is inappropriate for a package
like iproute2 and will not be considered.

To explicitly state what I think should be obvious to any experienced
Linux user, iproute2 code should always compile and work *without
functionality loss* on LTS versions N and N-1 of well known OS’es with
LTS releases (e.g., Debian, Ubuntu, RHEL). Meaning iproute2 will compile
and work with the external dependencies as they exist in that OS version.

I believe there are more than enough established compatibility and
library version checks to find the middle ground to integrate new
features requiring new versions of libbpf while maintaining stability
and compatibility with older releases. The biannual releases of Ubuntu
and Fedora serve as testing grounds for integrating new features
requiring a newer version of libbpf while continuing to work with
released versions of libbpf. It appears Debian Bullseye will also fall
into this category.

Finally, bpf-based features in iproute2 will only be committed once
relevant support exists in a released version of libbpf (ie., the github
version, not just commits to the in-kernel tree version). Patches can
and should be sent for review based on testing with the in-kernel tree
version of libbpf, but I will not commit them until the library has been
released.

Thanks for working on this, Hangbin. It is right direction in the long term.
Hangbin Liu Nov. 3, 2020, 5:48 a.m. UTC | #3
On Mon, Nov 02, 2020 at 08:41:09AM -0700, David Ahern wrote:
> 
> I would prefer to have these #ifdef .. #endif checks consolidated in the
> lib code. Create a bpf_compat file for these. e.g.,
> 
> int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,
>                      size_t size_insns, const char *license, char *log,
>                      size_t size_log)
> {
> +#ifdef HAVE_LIBBPF
> +	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
> +				"GPL", 0, bpf_log_buf, sizeof(bpf_log_buf));
> +#else
>  	return bpf_prog_load_buf(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
>  			         "GPL", bpf_log_buf, sizeof(bpf_log_buf));
> +#endif
> }
> 
> Similarly for bpf_program_attach.

> 
> I think even the includes can be done once in bpf_util.h with a single
> +#ifdef HAVE_LIBBPF
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +#endif
> +
> 
> The iproute2_* functions added later in this patch can be in the compat
> file as well.

The iproute2_* functions need access static struct bpf_elf_ctx __ctx;
We need move the struct bpf_elf_ctx to another header file if add the
iproute2_* functions to compat file. Do you still want this?

Thanks
Hangbin
Andrii Nakryiko Nov. 3, 2020, 6:58 a.m. UTC | #4
On Mon, Nov 2, 2020 at 7:47 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 10/29/20 9:11 AM, Hangbin Liu wrote:
> > This series converts iproute2 to use libbpf for loading and attaching
> > BPF programs when it is available. This means that iproute2 will
> > correctly process BTF information and support the new-style BTF-defined
> > maps, while keeping compatibility with the old internal map definition
> > syntax.
> >
> > This is achieved by checking for libbpf at './configure' time, and using
> > it if available. By default the system libbpf will be used, but static
> > linking against a custom libbpf version can be achieved by passing
> > LIBBPF_DIR to configure. FORCE_LIBBPF can be set to force configure to
> > abort if no suitable libbpf is found (useful for automatic packaging
> > that wants to enforce the dependency).
> >
> > The old iproute2 bpf code is kept and will be used if no suitable libbpf
> > is available. When using libbpf, wrapper code ensures that iproute2 will
> > still understand the old map definition format, including populating
> > map-in-map and tail call maps before load.
> >
> > The examples in bpf/examples are kept, and a separate set of examples
> > are added with BTF-based map definitions for those examples where this
> > is possible (libbpf doesn't currently support declaratively populating
> > tail call maps).
> >
> > At last, Thanks a lot for Toke's help on this patch set.
> >
>
> In regards to comments from v2 of the series:
>
> iproute2 is a stable, production package that requires minimal support
> from external libraries. The external packages it does require are also
> stable with few to no relevant changes.
>
> bpf and libbpf on the other hand are under active development and
> rapidly changing month over month. The git submodule approach has its
> conveniences for rapid development but is inappropriate for a package
> like iproute2 and will not be considered.

It's ok to not consider that, really. I'm trying to understand what's
so bad about the submodule approach, not convince you (anymore) to use
libbpf through submodule. And the submodule is not for rapid
development, it's mainly for guaranteed libbpf features and version,
and simplicity of iproute2 code when using libbpf.

But I don't think I got a real answer as to what's the exact reason
against the submodule. Like what "inappropriate" even means in this
case? Jesper's security argument so far was the only objective
criteria, as far as I can tell.

>
> To explicitly state what I think should be obvious to any experienced
> Linux user, iproute2 code should always compile and work *without
> functionality loss* on LTS versions N and N-1 of well known OS’es with
> LTS releases (e.g., Debian, Ubuntu, RHEL). Meaning iproute2 will compile
> and work with the external dependencies as they exist in that OS version.

I love the appeal to obviousness and "experienced Linux user" ;)

But I also see that using libbpf through submodule gives iproute2
exact control over which version of libbpf is being used. And that
does not depend at all on any specific Linux distribution, its
version, LTS vs non-LTS, etc. iproute2 will just work the same across
all of them. So matches your stated goals very directly and
explicitly.

>
> I believe there are more than enough established compatibility and
> library version checks to find the middle ground to integrate new
> features requiring new versions of libbpf while maintaining stability
> and compatibility with older releases. The biannual releases of Ubuntu
> and Fedora serve as testing grounds for integrating new features
> requiring a newer version of libbpf while continuing to work with
> released versions of libbpf. It appears Debian Bullseye will also fall
> into this category.

Beyond just more unnecessary complexity in iproute2 library to
accommodate older libbpf versions, users basically will need to pay
closer attention not just to which version of iproute2 they have, but
also which version of libbpf is installed on their system. Which is
ok, but an unnecessary burden, IMO. By controlling the libbpf version
through the submodule, it would be simple to say: "iproute2 vX uses
libbpf vY with features Z1, Z2, Z3". Then the user would just know
what to expect from iproute2 and its BPF support. And iproute2 code
base won't have to do as much feature detection and condition
compilation tricks.

That's what I don't understand, why settle for the lowest common
denominator of libbpf versions across a wide range of systems, when
you can take control and develop against a well-known version of
libbpf. I get security upgrades angle (even if I don't rank it higher
than simplicity). But I don't get the ideal behind a blanket statement
"libbpf through submodule is inappropriate".

>
> Finally, bpf-based features in iproute2 will only be committed once
> relevant support exists in a released version of libbpf (ie., the github
> version, not just commits to the in-kernel tree version). Patches can
> and should be sent for review based on testing with the in-kernel tree
> version of libbpf, but I will not commit them until the library has been
> released.

Makes sense. And the submodule approach gives you a great deal of
control and flexibility in this case. For testing, it's easy to use
either Github or even in-kernel sources (with a bit of symlinking,
though). But for upstreaming the submodule should only reference a
released tag from Github repo. Again, everything seems to work out,
no?

>
> Thanks for working on this, Hangbin. It is right direction in the long term.
Jiri Benc Nov. 3, 2020, 8:42 a.m. UTC | #5
On Mon, 2 Nov 2020 22:58:06 -0800, Andrii Nakryiko wrote:
> But I don't think I got a real answer as to what's the exact reason
> against the submodule. Like what "inappropriate" even means in this
> case? Jesper's security argument so far was the only objective
> criteria, as far as I can tell.

It's the fundamental objection. Distributions in general have the "no
bundled libraries" policy. It is sometimes annoying but it helps to
understand that the policy is not a whim of distros, it's coming from
years of experience with package maintenance for security and stability.

> But I also see that using libbpf through submodule gives iproute2
> exact control over which version of libbpf is being used. And that
> does not depend at all on any specific Linux distribution, its
> version, LTS vs non-LTS, etc. iproute2 will just work the same across
> all of them. So matches your stated goals very directly and
> explicitly.

If you take this route, the end result would be all dependencies for
all projects being included as submodules and bundled. At the first
sight, this sounds easier for the developers. Why bother with dynamic
linking at all? Everything can be linked statically.

The result would be nightmare for both distros and users. No timely
security updates possible, critical bugs not being fixed in some
programs, etc. There is enough experience with this kind of setup to
conclude it is not the right way to go.

Yes, dynamic linking is initially more work for developers of both apps
and libraries. However, it pays off over time - there's no need to keep
track of security and other important fixes in the dependencies, it
comes for free from the distro work.

Btw, taking the bundling to the extreme, every app could bundle its own
well tested and compatible kernel version and be run in a VM. This
might sound far fetched but there were actual attempts to do that. It
didn't take off; I think part of the reason was that the Linux kernel
is very good in keeping its APIs stable.

And I'm convinced this is the way to go for libraries, too: put an
emphasis on API stability. Make it easy to get consumed and updated
under the hood. Everybody wins this way.

 Jiri
Daniel Borkmann Nov. 3, 2020, 8:46 a.m. UTC | #6
On 11/3/20 7:58 AM, Andrii Nakryiko wrote:
> On Mon, Nov 2, 2020 at 7:47 AM David Ahern <dsahern@gmail.com> wrote:
>> On 10/29/20 9:11 AM, Hangbin Liu wrote:
>>> This series converts iproute2 to use libbpf for loading and attaching
>>> BPF programs when it is available. This means that iproute2 will
>>> correctly process BTF information and support the new-style BTF-defined
>>> maps, while keeping compatibility with the old internal map definition
>>> syntax.
>>>
>>> This is achieved by checking for libbpf at './configure' time, and using
>>> it if available. By default the system libbpf will be used, but static
>>> linking against a custom libbpf version can be achieved by passing
>>> LIBBPF_DIR to configure. FORCE_LIBBPF can be set to force configure to
>>> abort if no suitable libbpf is found (useful for automatic packaging
>>> that wants to enforce the dependency).
>>>
>>> The old iproute2 bpf code is kept and will be used if no suitable libbpf
>>> is available. When using libbpf, wrapper code ensures that iproute2 will
>>> still understand the old map definition format, including populating
>>> map-in-map and tail call maps before load.
>>>
>>> The examples in bpf/examples are kept, and a separate set of examples
>>> are added with BTF-based map definitions for those examples where this
>>> is possible (libbpf doesn't currently support declaratively populating
>>> tail call maps).
>>>
>>> At last, Thanks a lot for Toke's help on this patch set.
>>
>> In regards to comments from v2 of the series:
>>
>> iproute2 is a stable, production package that requires minimal support
>> from external libraries. The external packages it does require are also
>> stable with few to no relevant changes.
>>
>> bpf and libbpf on the other hand are under active development and
>> rapidly changing month over month. The git submodule approach has its
>> conveniences for rapid development but is inappropriate for a package
>> like iproute2 and will not be considered.

I thought last time this discussion came up there was consensus that the
submodule could be an explicit opt in for the configure script at least?
David Ahern Nov. 3, 2020, 5:19 p.m. UTC | #7
On 11/2/20 10:48 PM, Hangbin Liu wrote:
> 

> The iproute2_* functions need access static struct bpf_elf_ctx __ctx;

> We need move the struct bpf_elf_ctx to another header file if add the

> iproute2_* functions to compat file. Do you still want this?

> 


ok, leave it in legacy for now.
David Ahern Nov. 3, 2020, 5:35 p.m. UTC | #8
On 11/3/20 1:46 AM, Daniel Borkmann wrote:
> I thought last time this discussion came up there was consensus that the

> submodule could be an explicit opt in for the configure script at least?


I do not recall Stephen agreeing to that, and I certainly did not.
David Ahern Nov. 3, 2020, 5:45 p.m. UTC | #9
On 11/3/20 1:42 AM, Jiri Benc wrote:
> And I'm convinced this is the way to go for libraries, too: put an
> emphasis on API stability. Make it easy to get consumed and updated
> under the hood. Everybody wins this way.

exactly. Libraries should export well thought out, easy to use, stable
APIs. Maintainers do not need to be concerned about how the code is
consumed by projects.
Alexei Starovoitov Nov. 3, 2020, 5:47 p.m. UTC | #10
On Tue, Nov 3, 2020 at 9:36 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 11/3/20 1:46 AM, Daniel Borkmann wrote:
> > I thought last time this discussion came up there was consensus that the
> > submodule could be an explicit opt in for the configure script at least?
>
> I do not recall Stephen agreeing to that, and I certainly did not.

Daniel,

since David is deaf to technical arguments,
how about we fork iproute2 and maintain it separately?
Alexei Starovoitov Nov. 3, 2020, 5:48 p.m. UTC | #11
On Tue, Nov 3, 2020 at 12:42 AM Jiri Benc <jbenc@redhat.com> wrote:
> sight, this sounds easier for the developers. Why bother with dynamic

> linking at all? Everything can be linked statically.


That's exactly what some companies do.
Linking everything statically provides stronger security.
Stephen Hemminger Nov. 3, 2020, 6:23 p.m. UTC | #12
On Tue, 3 Nov 2020 09:47:00 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Tue, Nov 3, 2020 at 9:36 AM David Ahern <dsahern@gmail.com> wrote:

> >

> > On 11/3/20 1:46 AM, Daniel Borkmann wrote:  

> > > I thought last time this discussion came up there was consensus that the

> > > submodule could be an explicit opt in for the configure script at least?  

> >

> > I do not recall Stephen agreeing to that, and I certainly did not.  

> 

> Daniel,

> 

> since David is deaf to technical arguments,

> how about we fork iproute2 and maintain it separately?


A submodule is not a practical viable option.

Please come back when you are ready to use distro libbpf packages.
This seems a microcosm of the Linux packaging problem that was discussed
around Kubernetes and "vendorizaton"
David Ahern Nov. 3, 2020, 10:32 p.m. UTC | #13
On 11/3/20 10:47 AM, Alexei Starovoitov wrote:
> since David is deaf to technical arguments,

It is not that I am "deaf to technical arguments"; you do not like my
response.

The scope of bpf in iproute2 is tiny - a few tc modules (and VRF but it
does not need libbpf) which is a small subset of the functionality and
commands within the package.

The configure script will allow you to use any libbpf version you wish.
Standard operating procedure for configuring a dependency within a package.
Alexei Starovoitov Nov. 3, 2020, 10:55 p.m. UTC | #14
On Tue, Nov 03, 2020 at 03:32:55PM -0700, David Ahern wrote:
> On 11/3/20 10:47 AM, Alexei Starovoitov wrote:
> > since David is deaf to technical arguments,
> It is not that I am "deaf to technical arguments"; you do not like my
> response.
> 
> The scope of bpf in iproute2 is tiny - a few tc modules (and VRF but it
> does not need libbpf) which is a small subset of the functionality and
> commands within the package.

When Hangbin sent this patch set I got excited that finally tc command
will start working with the latest bpf elf files.
Currently "tc" supports 4 year old files which caused plenty of pain to bpf users.
I got excited, but now I've realized that this patch set will make it worse.
The bpf support in "tc" command instead of being obviously old and obsolete
will be sort-of working with unpredictable delay between released kernel
and released iproute2 version. The iproute2 release that suppose to match kernel
release will be meaningless.
More so, the upgrade of shared libbpf.so can make older iproute2/tc to do 
something new and unpredictable.
The user experience will be awful. Not only the users won't know
what to expect out of 'tc' command they won't have a way to debug it.
All of it because iproute2 build will take system libbpf and link it
as shared library by default.
So I think iproute2 must not use libbpf. If I could remove bpf support
from iproute2 I would do so as well.
The current state of iproute2 is hurting bpf ecosystem and proposed
libbpf+iproute2 integration will make it worse.
David Ahern Nov. 4, 2020, 1:40 a.m. UTC | #15
On 11/3/20 3:55 PM, Alexei Starovoitov wrote:
> The bpf support in "tc" command instead of being obviously old and obsolete

> will be sort-of working with unpredictable delay between released kernel

> and released iproute2 version. The iproute2 release that suppose to match kernel

> release will be meaningless.


iproute2, like all userspace commands, is written to an API and for well
written APIs the commands should be backward and forward compatible
across kernel versions. Kernel upgrades do not force an update of the
entire ecosystem. New userspace on old kernels should again just work.
New functionality in the new userpsace will not, but detection of that
is a different problem and relies on kernel APIs doing proper data
validation.


> More so, the upgrade of shared libbpf.so can make older iproute2/tc to do 

> something new and unpredictable.


How so? If libbpf is written against kernel APIs and properly versioned,
it should just work. A new version of libbpf changes the .so version, so
old commands will not load it.
Hangbin Liu Nov. 4, 2020, 2:17 a.m. UTC | #16
On Tue, Nov 03, 2020 at 02:55:54PM -0800, Alexei Starovoitov wrote:
> > The scope of bpf in iproute2 is tiny - a few tc modules (and VRF but it

> > does not need libbpf) which is a small subset of the functionality and

> > commands within the package.

> 

> When Hangbin sent this patch set I got excited that finally tc command

> will start working with the latest bpf elf files.

> Currently "tc" supports 4 year old files which caused plenty of pain to bpf users.

> I got excited, but now I've realized that this patch set will make it worse.

> The bpf support in "tc" command instead of being obviously old and obsolete

> will be sort-of working with unpredictable delay between released kernel

> and released iproute2 version. The iproute2 release that suppose to match kernel

> release will be meaningless.

> More so, the upgrade of shared libbpf.so can make older iproute2/tc to do 

> something new and unpredictable.

> The user experience will be awful. Not only the users won't know

> what to expect out of 'tc' command they won't have a way to debug it.

> All of it because iproute2 build will take system libbpf and link it

> as shared library by default.

> So I think iproute2 must not use libbpf. If I could remove bpf support

> from iproute2 I would do so as well.

> The current state of iproute2 is hurting bpf ecosystem and proposed

> libbpf+iproute2 integration will make it worse.


Hi Guys,

Please take it easy. IMHO, it always very hard to make a perfect solution.
From development side, it's easier and could get latest features by using
libbpf as submodule. But we need to take care of users, backward
compatibility, distros policy etc.

I like using iproute2 to load bpf objs. But it's not standardized and too old
to load the new BTF defined objs. I think all of us like to improve it by
using libbpf. But users and distros are slowly. Some user are still using
`ifconfig`. Distros have policies to link the shared .so, etc. We have to
compromise on something.

Our purpose is to push the user to use new features. As this patchset
does, push users to try libbpf instead of legacy code. But this need time.

Sorry if my word make you feel confused. I'm not a native speaker, but I hope
we could find a solution that all(we, users, distros) could accept instead of
break/give up.

Thanks
Hangbin
Alexei Starovoitov Nov. 4, 2020, 2:45 a.m. UTC | #17
On Tue, Nov 03, 2020 at 06:40:44PM -0700, David Ahern wrote:
> On 11/3/20 3:55 PM, Alexei Starovoitov wrote:
> > The bpf support in "tc" command instead of being obviously old and obsolete
> > will be sort-of working with unpredictable delay between released kernel
> > and released iproute2 version. The iproute2 release that suppose to match kernel
> > release will be meaningless.
> 
> iproute2, like all userspace commands, is written to an API and for well
> written APIs the commands should be backward and forward compatible
> across kernel versions. Kernel upgrades do not force an update of the
> entire ecosystem. New userspace on old kernels should again just work.
> New functionality in the new userpsace will not, but detection of that
> is a different problem and relies on kernel APIs doing proper data
> validation.

commands ?!
libbpf is not a library that translates user input into kernel syscalls.
It's not libmnl that is a wrapper for netlink.
It's not libelf either.
libbpf probes kernel features and does different things depending on what it found.
libbpf is the only library I know that is backward and forward compatible.
All other libraries are backwards compatible only.
iproute2 itself is backward compatible only as well.
New devlink feature in iproute2 won't do anything on the kernel that doesn't
have the support.
libbpf, on the other side, has to work on older kernels. New libbpf features
have to gradually degrade when possible.
The users can upgrade and downgrade libbpf version at any time.
They can upgrade and downgrade kernel while keeping libbpf version the same.
The users can upgrade llvm as well and libbpf has to expect unexpected
and deal with all combinations.

> 
> > More so, the upgrade of shared libbpf.so can make older iproute2/tc to do 
> > something new and unpredictable.
> 
> How so? If libbpf is written against kernel APIs and properly versioned,
> it should just work. A new version of libbpf changes the .so version, so
> old commands will not load it.

Please point out where do you see this happening in the patch set.
See tools/lib/bpf/README.rst to understand the versioning.
Alexei Starovoitov Nov. 4, 2020, 3:11 a.m. UTC | #18
On Wed, Nov 04, 2020 at 10:17:30AM +0800, Hangbin Liu wrote:
> On Tue, Nov 03, 2020 at 02:55:54PM -0800, Alexei Starovoitov wrote:
> > > The scope of bpf in iproute2 is tiny - a few tc modules (and VRF but it
> > > does not need libbpf) which is a small subset of the functionality and
> > > commands within the package.
> > 
> > When Hangbin sent this patch set I got excited that finally tc command
> > will start working with the latest bpf elf files.
> > Currently "tc" supports 4 year old files which caused plenty of pain to bpf users.
> > I got excited, but now I've realized that this patch set will make it worse.
> > The bpf support in "tc" command instead of being obviously old and obsolete
> > will be sort-of working with unpredictable delay between released kernel
> > and released iproute2 version. The iproute2 release that suppose to match kernel
> > release will be meaningless.
> > More so, the upgrade of shared libbpf.so can make older iproute2/tc to do 
> > something new and unpredictable.
> > The user experience will be awful. Not only the users won't know
> > what to expect out of 'tc' command they won't have a way to debug it.
> > All of it because iproute2 build will take system libbpf and link it
> > as shared library by default.
> > So I think iproute2 must not use libbpf. If I could remove bpf support
> > from iproute2 I would do so as well.
> > The current state of iproute2 is hurting bpf ecosystem and proposed
> > libbpf+iproute2 integration will make it worse.
> 
> Hi Guys,
> 
> Please take it easy. IMHO, it always very hard to make a perfect solution.
> From development side, it's easier and could get latest features by using
> libbpf as submodule. But we need to take care of users, backward
> compatibility, distros policy etc.
> 
> I like using iproute2 to load bpf objs. But it's not standardized and too old
> to load the new BTF defined objs. I think all of us like to improve it by
> using libbpf. But users and distros are slowly. Some user are still using
> `ifconfig`. Distros have policies to link the shared .so, etc. We have to
> compromise on something.
> 
> Our purpose is to push the user to use new features. As this patchset
> does, push users to try libbpf instead of legacy code. But this need time.

My problem with iproute2 picking random libbpf is unpredictability.
Such roll of dice gives no confidence to users on what is expected to work.
bpf_hello_world.o will load, but that's it.
What is going to work with this or that version of "tc" command? No one knows.
The user will do 'tc -V'. Does version mean anything from bpf loading pov?
It's not. The user will do "ldd `which tc`" and then what?
Such bpf support in "tc" is worse than the current one.
At least the current one is predictably old.

There are alternatives though.
Forking the whole iproute2 because of "tc" is pointless, of course.
My 'proposal' was a fire starter because people are too stubborn to
realize that their long term believes could be incorrect until the fire is burning.
"bpftool prog load" can load any kind of elf. It cannot operate on qdiscs
and shouldn't do qdisc manipulations, but may be we can combine them into pipe
of some sort. Like "bpftool prog load file.o | tc filter ... bpf pipe"
I think that would be better long term. It will be predictable.

When we release new version of libbpf it goes through rigorous testing.
bpftool gets a lot of test coverage as well.
iproute2 with shared libbpf will get nothing. It's the same random roll of dice.
New libbpf may or may not break iproute2. That's awful user experience.
So iproute2 has to use git submodule with particular libbpf sha.
Then libbpf release process can incorporate proper testing of libbpf
and iproute2 combination.
Or iproute2 should stay as-is with obsolete bpf support.

Few years from now the situation could be different and shared libbpf would
be the most appropriate choice. But that day is not today.
Hangbin Liu Nov. 4, 2020, 8:22 a.m. UTC | #19
On Mon, Nov 02, 2020 at 08:41:09AM -0700, David Ahern wrote:
> On 10/29/20 9:11 AM, Hangbin Liu wrote:

> > diff --git a/ip/ipvrf.c b/ip/ipvrf.c

> > index 33150ac2..afaf1de7 100644

> > --- a/ip/ipvrf.c

> > +++ b/ip/ipvrf.c

> > @@ -28,8 +28,14 @@

> >  #include "rt_names.h"

> >  #include "utils.h"

> >  #include "ip_common.h"

> > +

> >  #include "bpf_util.h"

> >  

> > +#ifdef HAVE_LIBBPF

> > +#include <bpf/bpf.h>

> > +#include <bpf/libbpf.h>

> > +#endif

> > +

> >  #define CGRP_PROC_FILE  "/cgroup.procs"

> >  

> >  static struct link_filter vrf_filter;

> > @@ -256,8 +262,13 @@ static int prog_load(int idx)

> >  		BPF_EXIT_INSN(),

> >  	};

> >  

> > +#ifdef HAVE_LIBBPF

> > +	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),

> > +				"GPL", 0, bpf_log_buf, sizeof(bpf_log_buf));

> > +#else

> >  	return bpf_prog_load_buf(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),

> >  			         "GPL", bpf_log_buf, sizeof(bpf_log_buf));

> > +#endif

> >  }

> >  

> >  static int vrf_configure_cgroup(const char *path, int ifindex)

> > @@ -288,7 +299,11 @@ static int vrf_configure_cgroup(const char *path, int ifindex)

> >  		goto out;

> >  	}

> >  

> > +#ifdef HAVE_LIBBPF

> > +	if (bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE, 0)) {

> > +#else

> >  	if (bpf_prog_attach_fd(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE)) {

> > +#endif

> >  		fprintf(stderr, "Failed to attach prog to cgroup: '%s'\n",

> >  			strerror(errno));

> >  		goto out;

> 

> I would prefer to have these #ifdef .. #endif checks consolidated in the

> lib code. Create a bpf_compat file for these. e.g.,

> 

> int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,

>                      size_t size_insns, const char *license, char *log,

>                      size_t size_log)

> {

> +#ifdef HAVE_LIBBPF

> +	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),

> +				"GPL", 0, bpf_log_buf, sizeof(bpf_log_buf));

> +#else

>  	return bpf_prog_load_buf(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),

>  			         "GPL", bpf_log_buf, sizeof(bpf_log_buf));

> +#endif

> }

> 

> Similarly for bpf_program_attach.

> 

> 

> I think even the includes can be done once in bpf_util.h with a single

> +#ifdef HAVE_LIBBPF

> +#include <bpf/bpf.h>

> +#include <bpf/libbpf.h>

> +#endif

> +


Oh, I just found why I didn't include libbpf.h in bpf_legacy.c.
The reason is there are more function conflicts. e.g.
bpf_obj_get, bpf_obj_pin, bpf_prog_attach.

If we move this #ifdef HAVE_LIBBPF to bpf_legacy.c, we need to rename
them all. With current patch, we limit all the legacy functions in bpf_legacy
and doesn't mix them with libbpf.h. What do you think?

Thanks
Hangbin
Jiri Benc Nov. 4, 2020, 9:28 a.m. UTC | #20
On Tue, 3 Nov 2020 18:45:59 -0800, Alexei Starovoitov wrote:
> libbpf is the only library I know that is backward and forward compatible.


This is great to hear. It means there will be no problem with iproute2
using the system libbpf. As libbpf is both backward and forward
compatible, iproute2 will just work with whatever version it is used
with.

> All other libraries are backwards compatible only.


Backward compatibility would be enough for iproute2 but forward
compatibility does not hurt, of course.

> The users can upgrade and downgrade libbpf version at any time.

> They can upgrade and downgrade kernel while keeping libbpf version the same.

> The users can upgrade llvm as well and libbpf has to expect unexpected

> and deal with all combinations.


This actually goes beyond what would be needed for iproute2 dynamically
linked against system libbpf.

> > How so? If libbpf is written against kernel APIs and properly versioned,

> > it should just work. A new version of libbpf changes the .so version, so

> > old commands will not load it.  

> 

> Please point out where do you see this happening in the patch set.

> See tools/lib/bpf/README.rst to understand the versioning.


If the iproute2 binaries are linked against a symbol of a newer version than
is available in the system libbpf (which should not really happen
unless the system is broken), the dynamic linker will refuse to load
it. If the binary is linked against an old version of a particular
symbol, that old version will be used, if it's still provided by the
library. Otherwise, it will not load. I don't see a problem here?

The only problem would be if a particular function changed its
semantics while retaining ABI. But since libbpf is backward and forward
compatible, this should not happen.

 Jiri
Jiri Benc Nov. 4, 2020, 10:01 a.m. UTC | #21
On Tue, 3 Nov 2020 19:11:45 -0800, Alexei Starovoitov wrote:
> When we release new version of libbpf it goes through rigorous testing.

> bpftool gets a lot of test coverage as well.

> iproute2 with shared libbpf will get nothing. It's the same random roll of dice.


"Random roll of dice" would be true only if libbpf did incredibly bad
job in keeping backward compatibility. In my experience it is not the
case. Sure, a bug in retaining the compatibility may occasionally
appear; after all, any software tends to contain bugs in various
places. You are right that such bug may not be caught by your testing.

I also believe that if there is a bug in backward compatibility
reported by someone, it will be fixed (if possible). So this is really
just a matter of testing, not a fundamental problem of ABI
compatibility.

Let the distros worry about the testing. Upstream may test (and
even recommend!) certain combinations of iproute2 + libbpf, such as the
latest of both at the time of testing. If distros want to use a
different combination, they can and should do their own testing. If
their testing reveals a bug in backward compatibility and a patch to
fix it is accepted, everything will work smoothly for the distro users.

Non-distro users (or small distros) may just rely on the upstream
tested combination of iproute2 + libbpf.

> Few years from now the situation could be different and shared libbpf would

> be the most appropriate choice. But that day is not today.


Interestingly, the major compatibility problems we had were with llvm
updates. After llvm update while keeping the same kernel version, llvm
started to emit code that the verifier did not accept. Meaning a bpf
program that was previously accepted by the kernel was rejected after
recompilation. This was solved by adding a translation code to libbpf
(which nicely demonstrates that indeed libbpf cares about backward
compatibility).

Now, with dynamically linked libbpf, a single package update was able
to solve the problem for everything on the system, including users' own
programs. All that was needed was making the llvm package force update
the libbpf package (which rpm can do easily with its Conflicts
dependency).

So, at least for us, there was so far no disadvantage (and no problem)
with dynamic linking and a quite substantial advantage.

 Jiri
Daniel Borkmann Nov. 4, 2020, 10:21 a.m. UTC | #22
On 11/4/20 4:11 AM, Alexei Starovoitov wrote:
> On Wed, Nov 04, 2020 at 10:17:30AM +0800, Hangbin Liu wrote:

>> On Tue, Nov 03, 2020 at 02:55:54PM -0800, Alexei Starovoitov wrote:

>>>> The scope of bpf in iproute2 is tiny - a few tc modules (and VRF but it

>>>> does not need libbpf) which is a small subset of the functionality and

>>>> commands within the package.

>>>

>>> When Hangbin sent this patch set I got excited that finally tc command

>>> will start working with the latest bpf elf files.

>>> Currently "tc" supports 4 year old files which caused plenty of pain to bpf users.

>>> I got excited, but now I've realized that this patch set will make it worse.

>>> The bpf support in "tc" command instead of being obviously old and obsolete

>>> will be sort-of working with unpredictable delay between released kernel

>>> and released iproute2 version. The iproute2 release that suppose to match kernel

>>> release will be meaningless.

>>> More so, the upgrade of shared libbpf.so can make older iproute2/tc to do

>>> something new and unpredictable.

>>> The user experience will be awful. Not only the users won't know

>>> what to expect out of 'tc' command they won't have a way to debug it.

>>> All of it because iproute2 build will take system libbpf and link it

>>> as shared library by default.

>>> So I think iproute2 must not use libbpf. If I could remove bpf support

>>> from iproute2 I would do so as well.

>>> The current state of iproute2 is hurting bpf ecosystem and proposed

>>> libbpf+iproute2 integration will make it worse.

>>

>> Please take it easy. IMHO, it always very hard to make a perfect solution.

>> From development side, it's easier and could get latest features by using

>> libbpf as submodule. But we need to take care of users, backward

>> compatibility, distros policy etc.

>>

>> I like using iproute2 to load bpf objs. But it's not standardized and too old

>> to load the new BTF defined objs. I think all of us like to improve it by

>> using libbpf. But users and distros are slowly. Some user are still using

>> `ifconfig`. Distros have policies to link the shared .so, etc. We have to

>> compromise on something.

>>

>> Our purpose is to push the user to use new features. As this patchset

>> does, push users to try libbpf instead of legacy code. But this need time.

> 

> My problem with iproute2 picking random libbpf is unpredictability.

> Such roll of dice gives no confidence to users on what is expected to work.

> bpf_hello_world.o will load, but that's it.

> What is going to work with this or that version of "tc" command? No one knows.

> The user will do 'tc -V'. Does version mean anything from bpf loading pov?

> It's not. The user will do "ldd `which tc`" and then what?

> Such bpf support in "tc" is worse than the current one.

> At least the current one is predictably old.


User experience will be crappy and predictability worse, agree on that. For libbpf
it's the same as with rest of iproute2 code in that features are developed along
with the kernel. Distros so far are more or less used to upgrade iproute2 along
with new kernel releases though it's not the first time that some major ones have
been shipping old iproute2 for several releases until we pinged them to finally
get their act together to upgrade. With libbpf dynamically linked it's one more
moving target and it's not clear whether distros will upgrade with same cadence
as iproute2 (or even add libbpf as dependency to their packaging). Only option
users might have if they were to rely on iproute2 and to have predictability is
to ship their stuff via container with current libbpf approach which is probably
not the goal of this set.

> There are alternatives though.

> Forking the whole iproute2 because of "tc" is pointless, of course.

> My 'proposal' was a fire starter because people are too stubborn to

> realize that their long term believes could be incorrect until the fire is burning.

> "bpftool prog load" can load any kind of elf. It cannot operate on qdiscs

> and shouldn't do qdisc manipulations, but may be we can combine them into pipe

> of some sort. Like "bpftool prog load file.o | tc filter ... bpf pipe"

> I think that would be better long term. It will be predictable.


We've been thinking about 'bpftool prog load' as well given we build it right of
the kernel tree and bpftool + libbpf are predictable since they are both built out
of the same git tree and with latest features. I don't think it needs to pipe, it
would be enough to just specify where the loaded progs should be pinned in bpf fs
and tc/ip(xdp) already has the option to pick fd of the entry point from pinned
file. But should be doable as well to just pass fd via pipe to avoid later potential
cleanup. Either way I think it makes sense to do 'bpftool prog load' regardless
since it's generic and useful also for other (non-networking) prog types that can
be attached elsewhere in the system.

> When we release new version of libbpf it goes through rigorous testing.

> bpftool gets a lot of test coverage as well.

> iproute2 with shared libbpf will get nothing. It's the same random roll of dice.

> New libbpf may or may not break iproute2. That's awful user experience.

> So iproute2 has to use git submodule with particular libbpf sha.


Alternatively, you have an uapi sync script already in order to not rely on the
distro system headers installed by the distro and to copy latest uapi ones from
kernel tree. Could as well be extended to have similar fixed built-in situation as
with the current lib/bpf.c in iproute2. Back in the days when developing lib/bpf.c,
it was explicitly done as built-in for iproute2 so that it doesn't take years for
users to actually get to the point where they can realistically make use of it. If
we were to extend the internal lib/bpf.c to similar feature state as libbpf today,
how is that different in the bigger picture compared to sync or submodule... so far
noone complained about lib/bpf.c.

> Then libbpf release process can incorporate proper testing of libbpf

> and iproute2 combination.

> Or iproute2 should stay as-is with obsolete bpf support.

> 

> Few years from now the situation could be different and shared libbpf would

> be the most appropriate choice. But that day is not today.


Yep, for libbpf to be in same situation as libelf or libmnl basically feature
development would have to pretty much come to a stop so that even minor or exotic
distros get to a point where they ship same libbpf version as major distros where
then users can start to rely on the base feature set for developing programs
against it.

Thanks,
Daniel
Toke Høiland-Jørgensen Nov. 4, 2020, 11:20 a.m. UTC | #23
Daniel Borkmann <daniel@iogearbox.net> writes:

> Back in the days when developing lib/bpf.c, it was explicitly done as

> built-in for iproute2 so that it doesn't take years for users to

> actually get to the point where they can realistically make use of it.

> If we were to extend the internal lib/bpf.c to similar feature state

> as libbpf today, how is that different in the bigger picture compared

> to sync or submodule... so far noone complained about lib/bpf.c.


Except that this whole effort started because lib/bpf.c is slowly
bitrotting into oblivion? If all the tools are dynamically linked
against libbpf, that's only one package the distros have to keep
up-to-date instead of a whole list of tools. How does that make things
*worse*?

-Toke
Hangbin Liu Nov. 4, 2020, 11:40 a.m. UTC | #24
On Wed, Nov 04, 2020 at 12:09:15PM +0100, Toke Høiland-Jørgensen wrote:
> > +usage()
> > +{
> > +       cat <<EOF
> > +Usage: $0 [OPTIONS]
> > +  -h | --help                  Show this usage info
> > +  --no-libbpf                  build the package without libbpf
> > +  --libbpf-dir=DIR             build the package with self defined libbpf dir
> > +EOF
> > +       exit $1
> > +}
> 
> This would be the only command line arg that configure takes; all other
> options are passed via the environment. I think we should be consistent
> here; and since converting the whole configure script is probably out of
> scope for this patch, why not just use the existing FORCE_LIBBPF
> variable?

Yes, converting the whole configure script should be split as another patch
work.
> 
> I.e., FORCE_LIBBPF=on will fail if not libbpf is present,
> FORCE_LIBBPF=off will disable libbpf entirely, and if the variable is
> unset, libbpf will be used if found?

I like this one, with only one variable. I will check how to re-organize the
script.

> 
> Alternatively, keep them as two separate variables (FORCE_LIBBPF and
> DISABLE_LIBBPF?). I don't have any strong preference as to which of
> those is best, but I think they'd both be more consistent with the
> existing configure script logic...

Please tell me if others have any other ideas.

Thanks
Hnagbin
Daniel Borkmann Nov. 4, 2020, 1:12 p.m. UTC | #25
On 11/4/20 12:20 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:

> 

>> Back in the days when developing lib/bpf.c, it was explicitly done as

>> built-in for iproute2 so that it doesn't take years for users to

>> actually get to the point where they can realistically make use of it.

>> If we were to extend the internal lib/bpf.c to similar feature state

>> as libbpf today, how is that different in the bigger picture compared

>> to sync or submodule... so far noone complained about lib/bpf.c.

> 

> Except that this whole effort started because lib/bpf.c is slowly

> bitrotting into oblivion? If all the tools are dynamically linked

> against libbpf, that's only one package the distros have to keep

> up-to-date instead of a whole list of tools. How does that make things

> *worse*?


It sounds good in theory if that would all work out as expected, but reality
differs unfortunately. Today on vast majority of distros you are able to use
iproute2's BPF loader via lib/bpf.c given it's a fixed built-in, even if
it's bitrotting for a while now in terms of features^BTF, but the base functionality
that is in there can be used, and it is used in the wild today. If libbpf is
dynamically linked to iproute2, then I - as a user - am left with continuing
to assume that the current lib/bpf.c is the /only/ base that is really /guaranteed/
to be available as a loader across distros, but iproute2 + libbpf may not be
(it may be the case for RHEL but potentially not others). So from user PoV
I might be sticking to the current lib/bpf.c that iproute2 ships instead of
converting code over until even major distros catch up in maybe 2 years from now
(that is in fact how long it took Canonical to get bpftool included, not kidding).
If we would have done lib/bpf.c as a dynamic library back then, we wouldn't be
where we are today since users might be able to start consuming BPF functionality
just now, don't you agree? This was an explicit design choice back then for exactly
this reason. If we extend lib/bpf.c or import libbpf one way or another then there
is consistency across distros and users would be able to consume it in a predictable
way starting from next major releases. And you could start making this assumption
on all major distros in say, 3 months from now. The discussion is somehow focused
on the PoV of /a/ distro which is all nice and good, but the ones consuming the
loader shipping software /across/ distros are users writing BPF progs, all I'm
trying to say is that the _user experience_ should be the focus of this discussion
and right now we're trying hard making it rather painful for them to consume it.

Cheers,
Daniel
Jakub Kicinski Nov. 4, 2020, 7:17 p.m. UTC | #26
On Wed, 4 Nov 2020 14:12:47 +0100 Daniel Borkmann wrote:
> If we would have done lib/bpf.c as a dynamic library back then, we wouldn't be

> where we are today since users might be able to start consuming BPF functionality

> just now, don't you agree? This was an explicit design choice back then for exactly

> this reason. If we extend lib/bpf.c or import libbpf one way or another then there

> is consistency across distros and users would be able to consume it in a predictable

> way starting from next major releases. And you could start making this assumption

> on all major distros in say, 3 months from now. The discussion is somehow focused

> on the PoV of /a/ distro which is all nice and good, but the ones consuming the

> loader shipping software /across/ distros are users writing BPF progs, all I'm

> trying to say is that the _user experience_ should be the focus of this discussion

> and right now we're trying hard making it rather painful for them to consume it.


IIUC you're saying that we cannot depend on libbpf updates from distro.
Isn't that a pretty bad experience for all users who would like to link
against it? There are 4 components (kernel, lib, tools, compiler) all
need to be kept up to date for optimal user experience. Cutting corners
with one of them leads nowhere medium term IMHO.

Unless what you guys are saying is that libbpf is _not_ supposed to be
backward compatible from the user side, and must be used a submodule.
But then why bother defining ABI versions, or build it as an .so at all.

I'm also confused by the testing argument. Surely the solution is to
add unit / system tests for iproute2. Distros will rebuild packages
when dependencies change and retest. If we have 0 tests doesn't matter
what update strategy there is.
Andrii Nakryiko Nov. 4, 2020, 8:43 p.m. UTC | #27
On Wed, Nov 4, 2020 at 11:17 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 4 Nov 2020 14:12:47 +0100 Daniel Borkmann wrote:
> > If we would have done lib/bpf.c as a dynamic library back then, we wouldn't be
> > where we are today since users might be able to start consuming BPF functionality
> > just now, don't you agree? This was an explicit design choice back then for exactly
> > this reason. If we extend lib/bpf.c or import libbpf one way or another then there
> > is consistency across distros and users would be able to consume it in a predictable
> > way starting from next major releases. And you could start making this assumption
> > on all major distros in say, 3 months from now. The discussion is somehow focused
> > on the PoV of /a/ distro which is all nice and good, but the ones consuming the
> > loader shipping software /across/ distros are users writing BPF progs, all I'm
> > trying to say is that the _user experience_ should be the focus of this discussion
> > and right now we're trying hard making it rather painful for them to consume it.

This! Thanks, Daniel, for stating it very explicitly. Earlier I
mentioned iproute2 code simplification if using submodules, but that's
just a nice by-product, not the goal, so I'll just ignore that. I'll
try to emphasize the end user experience though.

What users writing BPF programs can expect from iproute2 in terms of
available BPF features is what matters. And by not enforcing a
specific minimal libbpf version, iproute2 version doesn't matter all
that much, because libbpf version that iproute2 ends up linking
against might be very old.

There was a lot of talk about API stability and backwards
compatibility. Libbpf has had a stable API and ABI for at least 1.5
years now and is very conscious about that when adding or extending
new APIs. That's not even a factor in me arguing for submodules. I'll
give a few specific examples of libbpf API not changing at all, but
how end user experience gets tremendously better.

Some of the most important APIs of libbpf are, arguably,
bpf_object__open() and bpf_object__load(). They accept a BPF ELF file,
do some preprocessing and in the end load BPF instructions into the
kernel for verification. But while API doesn't change across libbpf
versions, BPF-side code features supported changes quite a lot.

1. BTF sanitization. Newer versions of clang would emit a richer set
of BTF type information. Old kernels might not support BTF at all (but
otherwise would work just fine), or might not support some specific
newer additions to BTF. If someone was to use the latest Clang, but
outdated libbpf and old kernel, they would have a bad time, because
their BPF program would fail due to the kernel being strict about BTF.
But new libbpf would "sanitize" BTF, according to supported features
of the kernel, or just drop BTF altogether, if the kernel is that old.

If iproute2's latest version doesn't imply the latest libbpf version,
there is a high chance that the user's BPF program will fail to load.
Which requires users to be **aware** of all these complications, and
care about specific Clang versions and subsets of BTF that get
generated. With the latest libbpf all that goes away.

2. bpf_probe_read_user() falling back to bpf_probe_read(). Newer
kernels warn if a BPF application isn't using a proper _kernel() or
_user() variant of bpf_probe_read(), and eventually will just stop
supporting generic bpf_probe_read(). So what this means is that end
users would need to compile to variants of their BPF application, one
for older kernels with bpf_probe_read(), another with
bpf_probe_read_kernel()/bpf_probe_read_user(). That's a massive pain
in the butt. But newer libbpf versions provide a completely
transparent fallback from _user()/_kernel() variants to generic one,
if the kernel doesn't support new variants. So the instruction to
users becomes simple: always use
bpf_probe_read_user()/bpf_probe_read_kernel().

But with iproute2 not enforcing new enough versions of libbpf, all
that goes out of the window and puts the burden back on end users.

3. Another feature (and far from being the last of this kind in
libbpf) is a full support for individual *non-always-inlined*
functions in BPF code, which was added recently. This allows to
structure BPF code better, get better instruction cache use and for
newer kernels even get significant speed ups of BPF code verification.
This is purely a libbpf feature, no API was changed. Further, the
kernel understands the difference between global and static functions
in BPF code and optimizes verification, if possible. Libbpf takes care
of falling back to static functions for old kernels that are not yet
aware of global functions. All that is completely transparent and
works reliably without users having to deal with three variants of
doing helper functions in their BPF code.

And again, if, when using iproute2, the user doesn't know which
version of libbpf will be used, they have to assume the worst
(__always_inline) or maintain 2 or 3 different copies of their code.

And there are more conveniences like that significantly simplifying
BPF end users by hiding differences of kernel versions, clang
versions, etc.

Submodule is a way that I know of to make this better for end users.
If there are other ways to pull this off with shared library use, I'm
all for it, it will save the security angle that distros are arguing
for. E.g., if distributions will always have the latest libbpf
available almost as soon as it's cut upstream *and* new iproute2
versions enforce the latest libbpf when they are packaged/released,
then this might work equivalently for end users. If Linux distros
would be willing to do this faithfully and promptly, I have no
objections whatsoever. Because all that matters is BPF end user
experience, as Daniel explained above.

>
> IIUC you're saying that we cannot depend on libbpf updates from distro.

As I tried to explain above, a big part of libbpf is BPF loader,
which, while not changing the library API, does get more and advanced
features with newer versions. So yeah, you can totally use older
versions of libbpf, but you need to be aware of all the kernel + clang
+ BPF code features interactions, which newer libbpfs often
transparently alleviate for the user.

So if someone has some old BPF code not using anything fancy, they
might not care all that much, probably.


> Isn't that a pretty bad experience for all users who would like to link
> against it? There are 4 components (kernel, lib, tools, compiler) all
> need to be kept up to date for optimal user experience. Cutting corners
> with one of them leads nowhere medium term IMHO.
>
> Unless what you guys are saying is that libbpf is _not_ supposed to be
> backward compatible from the user side, and must be used a submodule.
> But then why bother defining ABI versions, or build it as an .so at all.

That's not what anyone is saying, I hope we established that in this
thread that libbpf does provide a stable API and ABI, with backwards
and forward compatibility. And takes it very seriously. User BPF
programs just tend to grow in complexity and features used and newer
libbpf versions are sometimes a requirement to utilize all that
effectively.

>
> I'm also confused by the testing argument. Surely the solution is to
> add unit / system tests for iproute2. Distros will rebuild packages
> when dependencies change and retest. If we have 0 tests doesn't matter
> what update strategy there is.

Tests are good, but I'm a bit sceptical about the surface area that
could be tested. Compiled BPF program (ELF file) is an input to BPF
loader APIs, and that compiled BPF program can be arbitrarily complex,
using a variety of different kernel/libbpf features. So a single
non-changing APIs accepts an infinite variety of inputs. selftests/bpf
mandate that each new kernel and libbpf feature gets a test, I'm
wondering if iproute2 test suite would be able to keep up with this.
And then again, some features are not supposed to work on older libbpf
versions, so not clear how iproute2 would test that. But regardless,
more testing is always better, so I hope this won't discourage testing
per se.
Edward Cree Nov. 4, 2020, 9:15 p.m. UTC | #28
On 04/11/2020 03:11, Alexei Starovoitov wrote:
> The user will do 'tc -V'. Does version mean anything from bpf loading pov?
> It's not. The user will do "ldd `which tc`" and then what?
Is it beyond the wit of man for 'tc -V' to output somethingabout
 libbpf version?
Other libraries seem to solve these problems all the time, I
 haven't seen anyone explain what makes libbpf so special that it
 has to be different.

-ed
Alexei Starovoitov Nov. 4, 2020, 10:10 p.m. UTC | #29
On Wed, Nov 4, 2020 at 1:16 PM Edward Cree <ecree@solarflare.com> wrote:
>
> On 04/11/2020 03:11, Alexei Starovoitov wrote:
> > The user will do 'tc -V'. Does version mean anything from bpf loading pov?
> > It's not. The user will do "ldd `which tc`" and then what?
> Is it beyond the wit of man for 'tc -V' to output somethingabout
>  libbpf version?
> Other libraries seem to solve these problems all the time, I
>  haven't seen anyone explain what makes libbpf so special that it
>  has to be different.

slow vger? Please see Daniel and Andrii detailed explanations.

libbpf is not your traditional library.
Looking through the installed libraries on my devserver in /lib64/ directory
I think the closest is libbfd.so
Then think why gdb always statically links it.
Toke Høiland-Jørgensen Nov. 4, 2020, 10:24 p.m. UTC | #30
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> Some of the most important APIs of libbpf are, arguably,
> bpf_object__open() and bpf_object__load(). They accept a BPF ELF file,
> do some preprocessing and in the end load BPF instructions into the
> kernel for verification. But while API doesn't change across libbpf
> versions, BPF-side code features supported changes quite a lot.

Yes, which means that nothing has to change in iproute2 *at all* to get
this; not the version, not even a rebuild: just update the system
libbpf, and you'll automatically gain all these features. How is that an
argument for *not* linking dynamically? It's a user *benefit* to not
have to care about the iproute2 version, but only have to care about
keeping libbpf up to date.

I mean, if iproute2 had started out by linking dynamically against
libbpf (setting aside the fact that libbpf didn't exist back then), we
wouldn't even be having this conversation: In that case its support for
new features in the BPF format would just automatically have kept up
along with the rest of the system as the library got upgraded...

-Toke
Toke Høiland-Jørgensen Nov. 4, 2020, 10:35 p.m. UTC | #31
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Wed, Nov 4, 2020 at 1:16 PM Edward Cree <ecree@solarflare.com> wrote:

>>

>> On 04/11/2020 03:11, Alexei Starovoitov wrote:

>> > The user will do 'tc -V'. Does version mean anything from bpf loading pov?

>> > It's not. The user will do "ldd `which tc`" and then what?

>> Is it beyond the wit of man for 'tc -V' to output somethingabout

>>  libbpf version?

>> Other libraries seem to solve these problems all the time, I

>>  haven't seen anyone explain what makes libbpf so special that it

>>  has to be different.

>

> slow vger? Please see Daniel and Andrii detailed explanations.

>

> libbpf is not your traditional library.

> Looking through the installed libraries on my devserver in /lib64/ directory

> I think the closest is libbfd.so

> Then think why gdb always statically links it.


The distinguishing feature is the tool, not the library. For a tool that
intimately depends detailed behaviour, sure it makes sense to statically
link to know exactly which version you have. But for BPF, that is
bpftool, not iproute2.

For iproute2, libbpf serves a very simple function: load a BPF program
from an object file and turn it into an fd that can be attached. For
that, dynamic linking is the right thing to do so library upgrades can
bring in new support without touching the tool itself.

Daniel's example from upthread illustrates it:

bpftool prog load | tc attach

i.e., decoupling load from attach. Which is *exactly* what dynamic
linking in iproute2 would mean, except using ld(1) instead of a pipe!

-Toke
Edward Cree Nov. 4, 2020, 11:05 p.m. UTC | #32
On 04/11/2020 22:10, Alexei Starovoitov wrote:
> On Wed, Nov 4, 2020 at 1:16 PM Edward Cree <ecree@solarflare.com> wrote:
>> On 04/11/2020 03:11, Alexei Starovoitov wrote:
>>> The user will do 'tc -V'. Does version mean anything from bpf loading pov?
>>> It's not. The user will do "ldd `which tc`" and then what?
>> Is it beyond the wit of man for 'tc -V' to output somethingabout
>>  libbpf version?
>> Other libraries seem to solve these problems all the time, I
>>  haven't seen anyone explain what makes libbpf so special that it
>>  has to be different.
> slow vger? Please see Daniel and Andrii detailed explanations.
Nah, I've seen that subthread(vger is fine).  I felt that subthread
 was missing this point about -V which is why I replied where it was
 brought up.
Daniel and Andrii have only explained why users will want to have an
 up-to-date libbpf, they (and you) haven't connected it to any
 argument about why static linking is the way to achieve that.
> libbpf is not your traditional library.
This has only been asserted, not explained.
I'm fully willing to entertain the possibility that libbpf is indeed
 special.  But if you want to win people over, you'll need to
 explain *why* it's special.
"Look at bfd and think why" is not enough, be more explicit.

AIUI the API between iproute2 and libbpf isn't changing, all that's
 happening is that libbpf is gaining new capabilities in things that
 are totally transparent to iproute2 (e.g. BTF fixups).  So the
 reasonable thing for users to expect is "I need new BPF features,
 I'll upgrade my libbpf", and with dynamic linking that works fine
 whether they upgrade iproute2 too or not.
This narrative is, on the face of it, just as plausible as "I'm
 getting an error from iproute2, I'll upgrade that".  And if distros
 decide that that's a common enough mistake to matter, then they can
 make the newer iproute2 package depend on a newer libbpf package,
 and apt or yum or whatever will automagically DTRT.
Whereas if you tightly couple them from the start, distros can't
 then go the other way if it turns out you made the wrong choice.
 (What if someone can't use the latest iproute2 release because it
 has a regression bug that breaks their use-case, but they need the
 latest libbpf for one of your shiny new features?)

Don't get me wrong, I'd love a world in which static linking was the
 norm and we all rebuilt our binaries locally every time we upgraded
 a piece.  But that's not the world we live in, and consistency
 *within* a distro matters too...

-ed
David Ahern Nov. 5, 2020, 2:33 a.m. UTC | #33
On 11/4/20 1:22 AM, Hangbin Liu wrote:
> If we move this #ifdef HAVE_LIBBPF to bpf_legacy.c, we need to rename
> them all. With current patch, we limit all the legacy functions in bpf_legacy
> and doesn't mix them with libbpf.h. What do you think?

Let's rename conflicts with a prefix -- like legacy. In fact, those
iproute2_ functions names could use the legacy_ prefix as well.
David Ahern Nov. 5, 2020, 2:39 a.m. UTC | #34
On 11/4/20 2:28 AM, Jiri Benc wrote:
> On Tue, 3 Nov 2020 18:45:59 -0800, Alexei Starovoitov wrote:

>> libbpf is the only library I know that is backward and forward compatible.

> 

> This is great to hear. It means there will be no problem with iproute2

> using the system libbpf. As libbpf is both backward and forward

> compatible, iproute2 will just work with whatever version it is used

> with.


That is how I read that as well. The bpf team is making sure libbpf is a
stable, robust front-end to kernel APIs. That stability is what controls
the user experience. With the due diligence in testing, packages using
libbpf can have confidence that using an libbpf API is not going to
change release over release regardless of kernel version installed
(i.e., as kernel versions go newer from an OS start point - typical
scenario for a distribution).


> 

> The only problem would be if a particular function changed its

> semantics while retaining ABI. But since libbpf is backward and forward

> compatible, this should not happen.


exactly.

Then, If libbpf needs to change something that affects users, it bumps
the soname version.
David Ahern Nov. 5, 2020, 3:19 a.m. UTC | #35
On 11/4/20 3:21 AM, Daniel Borkmann wrote:
> 
>> Then libbpf release process can incorporate proper testing of libbpf
>> and iproute2 combination.
>> Or iproute2 should stay as-is with obsolete bpf support.
>>
>> Few years from now the situation could be different and shared libbpf
>> would
>> be the most appropriate choice. But that day is not today.
> 
> Yep, for libbpf to be in same situation as libelf or libmnl basically
> feature
> development would have to pretty much come to a stop so that even minor
> or exotic
> distros get to a point where they ship same libbpf version as major
> distros where
> then users can start to rely on the base feature set for developing
> programs
> against it.

User experience keeps getting brought up, but I also keep reading the
stance that BPF users can not expect a consistent experience unless they
are constantly chasing latest greatest versions of *ALL* S/W related to
BPF. That is not a realistic expectation for users. Distributions exist
for a reason. They solve real packaging problems.

As libbpf and bpf in general reach a broader audience, the requirements
to use, deploy and even tryout BPF features needs to be more user
friendly and that starts with maintainers of the BPF code and how they
approach extensions and features. Telling libbpf consumers to make
libbpf a submodule of their project and update the reference point every
time a new release comes out is not user friendly.

Similarly, it is not realistic or user friendly to *require* general
Linux users to constantly chase latest versions of llvm, clang, dwarves,
bcc, bpftool, libbpf, (I am sure I am missing more), and, by extension
of what you want here, iproute2 just to upgrade their production kernel
to say v5.10, the next LTS, or to see what relevant new ebpf features
exists in the new kernel. As a specific example BTF extensions are added
in a way that is all or nothing. Meaning, you want to compile kernel
version X with CONFIG_DEBUG_INFO_BTF enabled, update your toolchain.
Sure, you are using the latest LTS of $distro, and it worked fine with
kernel version X-1 last week, but now compile fails completely unless
the pahole version is updated. Horrible user experience. Again, just an
example and one I brought up in July. I am sure there more.

Linux APIs are about stability and consistency. Commands and libraries
that work on v5.9 should work exactly the same on v5.10, 5.11, 5.12, ...
*IF* I want a new feature (kernel, bpf or libbpf), then the requirement
to upgrade is justified. But if I am just updating my kernel, or
updating my compiler, or updating iproute2 because I want to try out
some new nexthop feature, I should not be cornered into an all or
nothing scheme.
David Ahern Nov. 5, 2020, 3:48 a.m. UTC | #36
On 11/4/20 1:43 PM, Andrii Nakryiko wrote:
> 
> What users writing BPF programs can expect from iproute2 in terms of
> available BPF features is what matters. And by not enforcing a
> specific minimal libbpf version, iproute2 version doesn't matter all
> that much, because libbpf version that iproute2 ends up linking
> against might be very old.
> 
> There was a lot of talk about API stability and backwards
> compatibility. Libbpf has had a stable API and ABI for at least 1.5
> years now and is very conscious about that when adding or extending
> new APIs. That's not even a factor in me arguing for submodules. I'll
> give a few specific examples of libbpf API not changing at all, but
> how end user experience gets tremendously better.
> 
> Some of the most important APIs of libbpf are, arguably,
> bpf_object__open() and bpf_object__load(). They accept a BPF ELF file,
> do some preprocessing and in the end load BPF instructions into the
> kernel for verification. But while API doesn't change across libbpf
> versions, BPF-side code features supported changes quite a lot.
> 
> 1. BTF sanitization. Newer versions of clang would emit a richer set
> of BTF type information. Old kernels might not support BTF at all (but
> otherwise would work just fine), or might not support some specific
> newer additions to BTF. If someone was to use the latest Clang, but
> outdated libbpf and old kernel, they would have a bad time, because
> their BPF program would fail due to the kernel being strict about BTF.
> But new libbpf would "sanitize" BTF, according to supported features
> of the kernel, or just drop BTF altogether, if the kernel is that old.
> 

In my experience, compilers are the least likely change in a typical
Linux development environment. BPF should not be forcing new versions
(see me last response).

> 
> 2. bpf_probe_read_user() falling back to bpf_probe_read(). Newer
> kernels warn if a BPF application isn't using a proper _kernel() or
> _user() variant of bpf_probe_read(), and eventually will just stop
> supporting generic bpf_probe_read(). So what this means is that end
> users would need to compile to variants of their BPF application, one
> for older kernels with bpf_probe_read(), another with
> bpf_probe_read_kernel()/bpf_probe_read_user(). That's a massive pain
> in the butt. But newer libbpf versions provide a completely
> transparent fallback from _user()/_kernel() variants to generic one,
> if the kernel doesn't support new variants. So the instruction to
> users becomes simple: always use
> bpf_probe_read_user()/bpf_probe_read_kernel().
> 

I vaguely recall a thread about having BPF system call return user
friendly messages, but that was shot down. I take this example to mean
the solution is to have libbpf handle the quirks and various changes
which means that now libbpf takes on burden - the need for constant
updates to handle quirks. extack has been very successful at making
networking configuration mistakes more user friendly. Other kernel
features should be using the same kind of extension.
Hangbin Liu Nov. 5, 2020, 7:51 a.m. UTC | #37
On Wed, Nov 04, 2020 at 07:33:40PM -0700, David Ahern wrote:
> On 11/4/20 1:22 AM, Hangbin Liu wrote:

> > If we move this #ifdef HAVE_LIBBPF to bpf_legacy.c, we need to rename

> > them all. With current patch, we limit all the legacy functions in bpf_legacy

> > and doesn't mix them with libbpf.h. What do you think?

> 

> Let's rename conflicts with a prefix -- like legacy. In fact, those

> iproute2_ functions names could use the legacy_ prefix as well.

> 


Sorry, when trying to rename the functions. I just found another issue.
Even we fix the conflicts right now. What if libbpf add new functions
and we got another conflict in future? There are too much bpf functions
in bpf_legacy.c which would have more risks for naming conflicts..

With bpf_libbpf.c, there are less functions and has less risk for naming
conflicts. So I think it maybe better to not include libbpf.h in bpf_legacy.c.
What do you think?

Thanks
Hangbin
Jamal Hadi Salim Nov. 5, 2020, 2:05 p.m. UTC | #38
On 2020-11-04 10:19 p.m., David Ahern wrote:

[..]
> 

> User experience keeps getting brought up, but I also keep reading the

> stance that BPF users can not expect a consistent experience unless they

> are constantly chasing latest greatest versions of *ALL* S/W related to

> BPF. That is not a realistic expectation for users. Distributions exist

> for a reason. They solve real packaging problems.

> 

> As libbpf and bpf in general reach a broader audience, the requirements

> to use, deploy and even tryout BPF features needs to be more user

> friendly and that starts with maintainers of the BPF code and how they

> approach extensions and features. Telling libbpf consumers to make

> libbpf a submodule of their project and update the reference point every

> time a new release comes out is not user friendly.

> 

> Similarly, it is not realistic or user friendly to *require* general

> Linux users to constantly chase latest versions of llvm, clang, dwarves,

> bcc, bpftool, libbpf, (I am sure I am missing more), and, by extension

> of what you want here, iproute2 just to upgrade their production kernel

> to say v5.10, the next LTS, or to see what relevant new ebpf features

> exists in the new kernel. As a specific example BTF extensions are added

> in a way that is all or nothing. Meaning, you want to compile kernel

> version X with CONFIG_DEBUG_INFO_BTF enabled, update your toolchain.

> Sure, you are using the latest LTS of $distro, and it worked fine with

> kernel version X-1 last week, but now compile fails completely unless

> the pahole version is updated. Horrible user experience. Again, just an

> example and one I brought up in July. I am sure there more.

> 



2cents feedback from a dabbler in ebpf on user experience:

What David described above *has held me back*.
Over time it seems things have gotten better with libbpf
(although a few times i find myself copying includes from the
latest iproute into libbpf). I ended up just doing static links.
The idea of upgrading clang/llvm every 2 months i revisit ebpf is
the most painful. At times code that used to compile just fine
earlier doesnt anymore. There's a minor issue of requiring i install
kernel headers every time i want to run something in samples, etc
but i am probably lacking knowledge on how to ease the pain in that
regard.

I find the loader and associated tooling in iproute2/tc to be quiet
stable (not shiny but works everytime).
And for that reason i often find myself sticking to just tc instead
of toying with other areas.
Slight tangent:
One thing that would help libbpf adoption is to include an examples/
directory. Put a bunch of sample apps for tc, probes, xdp etc.
And have them compile outside of the kernel. Maybe useful Makefiles
that people can cutnpaste from. Every time you add a new feature
put some sample code in the examples.

cheers,
jamal
David Ahern Nov. 5, 2020, 3:25 p.m. UTC | #39
On 11/5/20 12:51 AM, Hangbin Liu wrote:
> On Wed, Nov 04, 2020 at 07:33:40PM -0700, David Ahern wrote:
>> On 11/4/20 1:22 AM, Hangbin Liu wrote:
>>> If we move this #ifdef HAVE_LIBBPF to bpf_legacy.c, we need to rename
>>> them all. With current patch, we limit all the legacy functions in bpf_legacy
>>> and doesn't mix them with libbpf.h. What do you think?
>>
>> Let's rename conflicts with a prefix -- like legacy. In fact, those
>> iproute2_ functions names could use the legacy_ prefix as well.
>>
> 
> Sorry, when trying to rename the functions. I just found another issue.
> Even we fix the conflicts right now. What if libbpf add new functions
> and we got another conflict in future? There are too much bpf functions
> in bpf_legacy.c which would have more risks for naming conflicts..
> 
> With bpf_libbpf.c, there are less functions and has less risk for naming
> conflicts. So I think it maybe better to not include libbpf.h in bpf_legacy.c.
> What do you think?
> 
>

Is there a way to sort the code such that bpf_legacy.c is not used when
libbpf is enabled and bpf_libbpf.c is not compiled when libbpf is disabled.
Toke Høiland-Jørgensen Nov. 5, 2020, 3:57 p.m. UTC | #40
David Ahern <dsahern@gmail.com> writes:

> On 11/5/20 12:51 AM, Hangbin Liu wrote:
>> On Wed, Nov 04, 2020 at 07:33:40PM -0700, David Ahern wrote:
>>> On 11/4/20 1:22 AM, Hangbin Liu wrote:
>>>> If we move this #ifdef HAVE_LIBBPF to bpf_legacy.c, we need to rename
>>>> them all. With current patch, we limit all the legacy functions in bpf_legacy
>>>> and doesn't mix them with libbpf.h. What do you think?
>>>
>>> Let's rename conflicts with a prefix -- like legacy. In fact, those
>>> iproute2_ functions names could use the legacy_ prefix as well.
>>>
>> 
>> Sorry, when trying to rename the functions. I just found another issue.
>> Even we fix the conflicts right now. What if libbpf add new functions
>> and we got another conflict in future? There are too much bpf functions
>> in bpf_legacy.c which would have more risks for naming conflicts..
>> 
>> With bpf_libbpf.c, there are less functions and has less risk for naming
>> conflicts. So I think it maybe better to not include libbpf.h in bpf_legacy.c.
>> What do you think?
>> 
>>
>
> Is there a way to sort the code such that bpf_legacy.c is not used when
> libbpf is enabled and bpf_libbpf.c is not compiled when libbpf is disabled.

That's basically what we were going for, i.e.:

git mv lib/bpf.c lib/bpf_legacy.c
git add lib/bpf_libbpf.c

and then adding ifdefs to bpf_legacy.c and only including the other if
libbpf support is enabled.

I guess we could split it further into lib/bpf_{libbpf,legacy,glue}.c
and have the two former ones be completely devoid of ifdefs and
conditionally included based on whether or not libbpf support is
enabled?

-Toke
David Ahern Nov. 5, 2020, 4:02 p.m. UTC | #41
On 11/5/20 8:57 AM, Toke Høiland-Jørgensen wrote:
> I guess we could split it further into lib/bpf_{libbpf,legacy,glue}.c

> and have the two former ones be completely devoid of ifdefs and

> conditionally included based on whether or not libbpf support is

> enabled?


that sounds reasonable.
Andrii Nakryiko Nov. 5, 2020, 8:14 p.m. UTC | #42
On Wed, Nov 4, 2020 at 2:24 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>

> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

>

> > Some of the most important APIs of libbpf are, arguably,

> > bpf_object__open() and bpf_object__load(). They accept a BPF ELF file,

> > do some preprocessing and in the end load BPF instructions into the

> > kernel for verification. But while API doesn't change across libbpf

> > versions, BPF-side code features supported changes quite a lot.

>

> Yes, which means that nothing has to change in iproute2 *at all* to get

> this; not the version, not even a rebuild: just update the system

> libbpf, and you'll automatically gain all these features. How is that an

> argument for *not* linking dynamically? It's a user *benefit* to not

> have to care about the iproute2 version, but only have to care about

> keeping libbpf up to date.

>

> I mean, if iproute2 had started out by linking dynamically against

> libbpf (setting aside the fact that libbpf didn't exist back then), we

> wouldn't even be having this conversation: In that case its support for

> new features in the BPF format would just automatically have kept up

> along with the rest of the system as the library got upgraded...

>


I think it's a difference in the perspective.

You are seeing iproute2 as an explicit proxy to libbpf. Users should
be aware of the fact that iproute2 just uses libbpf to load whatever
BPF ELF file user provides. At that point iproute2 versions almost
doesn't matter. Whatever BPF application users provide (that rely on
iproute2 to load it) should still be very conscious about libbpf
version and depend on that explicitly.

I saw it differently. For me, the fact that iproute2 is using libbpf
is an implementation detail. User developing BPF application is
providing a BPF ELF file that follows a de facto BPF "spec" (all those
SEC() conventions, global variables, map references, etc). Yes, that
"spec" is being driven by libbpf currently, but libbpf is not the only
library that supports it. Go BPF library is trying to keep up and
support most of the same features. So in that sense, iproute2 is
another BPF loader, just like Go library and libbpf library. The fact
that it defers to libbpf should be not important to the end user. With
that view, if a user tested their BPF program with a specific iproute2
version, it should be enough.

But clearly that's not the view that most people on this thread hold
and prefer end users to know and care about libbpf versioning
explicitly. That's fine.

But can we at least make sure that when libbpf is integrated with
iproute2, it specifies the latest libbpf (v0.2) as a dependency?


> -Toke

>
Andrii Nakryiko Nov. 5, 2020, 8:19 p.m. UTC | #43
On Wed, Nov 4, 2020 at 3:05 PM Edward Cree <ecree@solarflare.com> wrote:
>
> On 04/11/2020 22:10, Alexei Starovoitov wrote:
> > On Wed, Nov 4, 2020 at 1:16 PM Edward Cree <ecree@solarflare.com> wrote:
> >> On 04/11/2020 03:11, Alexei Starovoitov wrote:
> >>> The user will do 'tc -V'. Does version mean anything from bpf loading pov?
> >>> It's not. The user will do "ldd `which tc`" and then what?
> >> Is it beyond the wit of man for 'tc -V' to output somethingabout
> >>  libbpf version?
> >> Other libraries seem to solve these problems all the time, I
> >>  haven't seen anyone explain what makes libbpf so special that it
> >>  has to be different.
> > slow vger? Please see Daniel and Andrii detailed explanations.
> Nah, I've seen that subthread(vger is fine).  I felt that subthread
>  was missing this point about -V which is why I replied where it was
>  brought up.
> Daniel and Andrii have only explained why users will want to have an
>  up-to-date libbpf, they (and you) haven't connected it to any
>  argument about why static linking is the way to achieve that.

I'll just quote myself here for your convenience.

  Submodule is a way that I know of to make this better for end users.
  If there are other ways to pull this off with shared library use, I'm
  all for it, it will save the security angle that distros are arguing
  for. E.g., if distributions will always have the latest libbpf
  available almost as soon as it's cut upstream *and* new iproute2
  versions enforce the latest libbpf when they are packaged/released,
  then this might work equivalently for end users. If Linux distros
  would be willing to do this faithfully and promptly, I have no
  objections whatsoever. Because all that matters is BPF end user
  experience, as Daniel explained above.

No one replied to that, unfortunately.


> > libbpf is not your traditional library.
> This has only been asserted, not explained.
> I'm fully willing to entertain the possibility that libbpf is indeed
>  special.  But if you want to win people over, you'll need to
>  explain *why* it's special.
> "Look at bfd and think why" is not enough, be more explicit.
>
> AIUI the API between iproute2 and libbpf isn't changing, all that's
>  happening is that libbpf is gaining new capabilities in things that
>  are totally transparent to iproute2 (e.g. BTF fixups).  So the
>  reasonable thing for users to expect is "I need new BPF features,
>  I'll upgrade my libbpf", and with dynamic linking that works fine
>  whether they upgrade iproute2 too or not.
> This narrative is, on the face of it, just as plausible as "I'm
>  getting an error from iproute2, I'll upgrade that".  And if distros
>  decide that that's a common enough mistake to matter, then they can
>  make the newer iproute2 package depend on a newer libbpf package,
>  and apt or yum or whatever will automagically DTRT.
> Whereas if you tightly couple them from the start, distros can't
>  then go the other way if it turns out you made the wrong choice.
>  (What if someone can't use the latest iproute2 release because it
>  has a regression bug that breaks their use-case, but they need the
>  latest libbpf for one of your shiny new features?)
>
> Don't get me wrong, I'd love a world in which static linking was the
>  norm and we all rebuilt our binaries locally every time we upgraded
>  a piece.  But that's not the world we live in, and consistency
>  *within* a distro matters too...
>
> -ed
Andrii Nakryiko Nov. 5, 2020, 8:45 p.m. UTC | #44
On Wed, Nov 4, 2020 at 7:19 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 11/4/20 3:21 AM, Daniel Borkmann wrote:
> >
> >> Then libbpf release process can incorporate proper testing of libbpf
> >> and iproute2 combination.
> >> Or iproute2 should stay as-is with obsolete bpf support.
> >>
> >> Few years from now the situation could be different and shared libbpf
> >> would
> >> be the most appropriate choice. But that day is not today.
> >
> > Yep, for libbpf to be in same situation as libelf or libmnl basically
> > feature
> > development would have to pretty much come to a stop so that even minor
> > or exotic
> > distros get to a point where they ship same libbpf version as major
> > distros where
> > then users can start to rely on the base feature set for developing
> > programs
> > against it.
>
> User experience keeps getting brought up, but I also keep reading the
> stance that BPF users can not expect a consistent experience unless they
> are constantly chasing latest greatest versions of *ALL* S/W related to

That's not true. If you need new functionality like BTF, CO-RE,
function-by-function verification, etc., then yes, you have to update
kernel, compiler, libbpf, sometimes pahole. But if you have an BPF
application that doesn't use and need any of the newer features, it
will keep working just fine with the old kernel, old libbpf, and old
compiler.

Life is a bit more nuanced, of course. Sometimes a Clang update will
cause a shift in code generation patterns and you'd need either kernel
update (to get improved verifier logic) and/or libbpf update (to
compensate for either kernel or Clang change). Or update Clang again
to get a fixed version. That's life, bugs and problems are real.

If you care about using BTF-powered features, yes, you might need to
update pahole to get basic BTF, or get new BTF funcs needed for
fentry/fexit, or soon you'll need v1.19 if you want kernel module
BTFs. If you don't care about BTF, don't set CONFIG_DEBUG_INFO_BTF=y
and you won't even need pahole. For kernel module BTFs, you can't
request module BTF generation, unless you have a recent enough pahole.
I'm not sure how this can be handled better.

But if you have a plain old boring BPF program using
BPF_MAP_ARRAY/BPF_MAP_HASH, no global variables, you attach it to old
and stable BPF hooks like tracepoint, kprobe, etc., then it will work
with pretty much every version of libbpf, clang, and kernel. Don't
pass '-g' to Clang and BTF won't be generated at all, so you won't
even need BTF sanitization at all. And so on.

The problem is that users do want those new features, because those
allow to do new things or do existing things better/easier/faster. So
then we do ask to upgrade regularly to provide adequate support. But
it's like complaining that you need to update Java VM, compiler, Java
standard library, when you do want to use some new functionality.

> BPF. That is not a realistic expectation for users. Distributions exist
> for a reason. They solve real packaging problems.
>
> As libbpf and bpf in general reach a broader audience, the requirements
> to use, deploy and even tryout BPF features needs to be more user
> friendly and that starts with maintainers of the BPF code and how they
> approach extensions and features. Telling libbpf consumers to make
> libbpf a submodule of their project and update the reference point every
> time a new release comes out is not user friendly.

I have all the rights to ask for this, if I believe it's a better way
to go. Users have the right to refuse. But also iproute2 is not
exactly an end user in this situation, it is part of the BPF
ecosystem. So I think it's reasonable to have a healthy discussion
about the best way to facilitate BPF end-users.

>
> Similarly, it is not realistic or user friendly to *require* general
> Linux users to constantly chase latest versions of llvm, clang, dwarves,
> bcc, bpftool, libbpf, (I am sure I am missing more), and, by extension
> of what you want here, iproute2 just to upgrade their production kernel
> to say v5.10, the next LTS, or to see what relevant new ebpf features
> exists in the new kernel. As a specific example BTF extensions are added
> in a way that is all or nothing. Meaning, you want to compile kernel
> version X with CONFIG_DEBUG_INFO_BTF enabled, update your toolchain.
> Sure, you are using the latest LTS of $distro, and it worked fine with
> kernel version X-1 last week, but now compile fails completely unless
> the pahole version is updated. Horrible user experience. Again, just an
> example and one I brought up in July. I am sure there more.
>
> Linux APIs are about stability and consistency. Commands and libraries
> that work on v5.9 should work exactly the same on v5.10, 5.11, 5.12, ...
> *IF* I want a new feature (kernel, bpf or libbpf), then the requirement
> to upgrade is justified. But if I am just updating my kernel, or
> updating my compiler, or updating iproute2 because I want to try out
> some new nexthop feature, I should not be cornered into an all or
> nothing scheme.
Andrii Nakryiko Nov. 5, 2020, 8:53 p.m. UTC | #45
On Wed, Nov 4, 2020 at 7:48 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 11/4/20 1:43 PM, Andrii Nakryiko wrote:
> >
> > What users writing BPF programs can expect from iproute2 in terms of
> > available BPF features is what matters. And by not enforcing a
> > specific minimal libbpf version, iproute2 version doesn't matter all
> > that much, because libbpf version that iproute2 ends up linking
> > against might be very old.
> >
> > There was a lot of talk about API stability and backwards
> > compatibility. Libbpf has had a stable API and ABI for at least 1.5
> > years now and is very conscious about that when adding or extending
> > new APIs. That's not even a factor in me arguing for submodules. I'll
> > give a few specific examples of libbpf API not changing at all, but
> > how end user experience gets tremendously better.
> >
> > Some of the most important APIs of libbpf are, arguably,
> > bpf_object__open() and bpf_object__load(). They accept a BPF ELF file,
> > do some preprocessing and in the end load BPF instructions into the
> > kernel for verification. But while API doesn't change across libbpf
> > versions, BPF-side code features supported changes quite a lot.
> >
> > 1. BTF sanitization. Newer versions of clang would emit a richer set
> > of BTF type information. Old kernels might not support BTF at all (but
> > otherwise would work just fine), or might not support some specific
> > newer additions to BTF. If someone was to use the latest Clang, but
> > outdated libbpf and old kernel, they would have a bad time, because
> > their BPF program would fail due to the kernel being strict about BTF.
> > But new libbpf would "sanitize" BTF, according to supported features
> > of the kernel, or just drop BTF altogether, if the kernel is that old.
> >
>
> In my experience, compilers are the least likely change in a typical
> Linux development environment. BPF should not be forcing new versions
> (see me last response).
>

"My experience" and "typical" don't generalize well, I'd rather not
draw any specific conclusions from that. But as I replied to your last
response: if you have a BPF application that doesn't use BPF CO-RE and
doesn't need BTF, you'll most probably be just fine with older Clang
(<v10), no one is forcing anything.

We do recommend to use the latest Clang, so that you have to deal with
less work arounds, of course. And you get all the shiny BTF built-ins.
And some of the problematic code patterns are not generated by newer
Clangs so that you as a BPF developer have to deal with less painful
development and debugging process.

> >
> > 2. bpf_probe_read_user() falling back to bpf_probe_read(). Newer
> > kernels warn if a BPF application isn't using a proper _kernel() or
> > _user() variant of bpf_probe_read(), and eventually will just stop
> > supporting generic bpf_probe_read(). So what this means is that end
> > users would need to compile to variants of their BPF application, one
> > for older kernels with bpf_probe_read(), another with
> > bpf_probe_read_kernel()/bpf_probe_read_user(). That's a massive pain
> > in the butt. But newer libbpf versions provide a completely
> > transparent fallback from _user()/_kernel() variants to generic one,
> > if the kernel doesn't support new variants. So the instruction to
> > users becomes simple: always use
> > bpf_probe_read_user()/bpf_probe_read_kernel().
> >
>
> I vaguely recall a thread about having BPF system call return user
> friendly messages, but that was shot down. I take this example to mean
> the solution is to have libbpf handle the quirks and various changes
> which means that now libbpf takes on burden - the need for constant
> updates to handle quirks. extack has been very successful at making
> networking configuration mistakes more user friendly. Other kernel
> features should be using the same kind of extension.

I don't think this is relevant for this discussion at all. But yes,
libbpf tries to alleviate as much pain as possible. And no, extack
won't help with that in general, only with some error reporting,
potentially.
Andrii Nakryiko Nov. 5, 2020, 9:01 p.m. UTC | #46
On Thu, Nov 5, 2020 at 6:05 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>

> On 2020-11-04 10:19 p.m., David Ahern wrote:

>

> [..]

> >

> > User experience keeps getting brought up, but I also keep reading the

> > stance that BPF users can not expect a consistent experience unless they

> > are constantly chasing latest greatest versions of *ALL* S/W related to

> > BPF. That is not a realistic expectation for users. Distributions exist

> > for a reason. They solve real packaging problems.

> >

> > As libbpf and bpf in general reach a broader audience, the requirements

> > to use, deploy and even tryout BPF features needs to be more user

> > friendly and that starts with maintainers of the BPF code and how they

> > approach extensions and features. Telling libbpf consumers to make

> > libbpf a submodule of their project and update the reference point every

> > time a new release comes out is not user friendly.

> >

> > Similarly, it is not realistic or user friendly to *require* general

> > Linux users to constantly chase latest versions of llvm, clang, dwarves,

> > bcc, bpftool, libbpf, (I am sure I am missing more), and, by extension

> > of what you want here, iproute2 just to upgrade their production kernel

> > to say v5.10, the next LTS, or to see what relevant new ebpf features

> > exists in the new kernel. As a specific example BTF extensions are added

> > in a way that is all or nothing. Meaning, you want to compile kernel

> > version X with CONFIG_DEBUG_INFO_BTF enabled, update your toolchain.

> > Sure, you are using the latest LTS of $distro, and it worked fine with

> > kernel version X-1 last week, but now compile fails completely unless

> > the pahole version is updated. Horrible user experience. Again, just an

> > example and one I brought up in July. I am sure there more.

> >

>

>

> 2cents feedback from a dabbler in ebpf on user experience:

>

> What David described above *has held me back*.

> Over time it seems things have gotten better with libbpf

> (although a few times i find myself copying includes from the

> latest iproute into libbpf). I ended up just doing static links.

> The idea of upgrading clang/llvm every 2 months i revisit ebpf is

> the most painful. At times code that used to compile just fine

> earlier doesnt anymore. There's a minor issue of requiring i install


Do you have a specific example of something that stopped compiling?
I'm not saying that can't happen, but we definitely try hard to avoid
any regressions. I might be forgetting something, but I don't recall
the situation when something would stop compiling just due to newer
libbpf.

> kernel headers every time i want to run something in samples, etc

> but i am probably lacking knowledge on how to ease the pain in that

> regard.

>

> I find the loader and associated tooling in iproute2/tc to be quiet

> stable (not shiny but works everytime).

> And for that reason i often find myself sticking to just tc instead

> of toying with other areas.


That's the part that others on this thread mentioned is bit rotting?
Doesn't seem like everyone is happy about that, though. Stopping any
development definitely makes things stable by definition. BPF and
libbpf try to be stable while not stagnating, which is harder than
just stopping any development, unfortunately.

> Slight tangent:

> One thing that would help libbpf adoption is to include an examples/

> directory. Put a bunch of sample apps for tc, probes, xdp etc.

> And have them compile outside of the kernel. Maybe useful Makefiles

> that people can cutnpaste from. Every time you add a new feature

> put some sample code in the examples.


That's what tools/testing/selftests/bpf in kernel source are for. It's
not the greatest showcase of examples, but all the new features have a
test demonstrating its usage. I do agree about having simple Makefiles
and we do have that at [0]. I'm also about to do another sample repo
with a lot of things pre-setup, for tinkering and using that as a
bootstrap for BPF development with libbpf.

  [0] https://github.com/iovisor/bcc/tree/master/libbpf-tools

>

> cheers,

> jamal
Hangbin Liu Nov. 6, 2020, 12:41 a.m. UTC | #47
On Thu, Nov 05, 2020 at 08:25:12AM -0700, David Ahern wrote:
> > Sorry, when trying to rename the functions. I just found another issue.
> > Even we fix the conflicts right now. What if libbpf add new functions
> > and we got another conflict in future? There are too much bpf functions
> > in bpf_legacy.c which would have more risks for naming conflicts..
> > 
> > With bpf_libbpf.c, there are less functions and has less risk for naming
> > conflicts. So I think it maybe better to not include libbpf.h in bpf_legacy.c.
> > What do you think?
> > 
> >
> 
> Is there a way to sort the code such that bpf_legacy.c is not used when
> libbpf is enabled and bpf_libbpf.c is not compiled when libbpf is disabled.
> 

That what the current code did. In lib/Makefile we only compile bpf_libbpf.o
when libbpf enabled.

ifeq ($(HAVE_LIBBPF),y)
UTILOBJ += bpf_libbpf.o
endif

But bpf code in ipvrf.c is special as it calls both legacy code an libbpf code.
If we put it in bpf_legacy.c, then bpf_legacy.c will be corrupt by libbpf.h.
If we put it in bpf_libbpf.c, then we can't build without bpf_libbpf.o when
libbpf disable.

I haven't figured out a better solution.

Thanks
Hangbin
Hangbin Liu Nov. 6, 2020, 12:56 a.m. UTC | #48
On Thu, Nov 05, 2020 at 09:02:15AM -0700, David Ahern wrote:
> On 11/5/20 8:57 AM, Toke Høiland-Jørgensen wrote:
> > I guess we could split it further into lib/bpf_{libbpf,legacy,glue}.c
> > and have the two former ones be completely devoid of ifdefs and
> > conditionally included based on whether or not libbpf support is
> > enabled?
> 
> that sounds reasonable.
> 

OK, I will do this.

Thanks
Hangbin
Jiri Benc Nov. 6, 2020, 8:44 a.m. UTC | #49
On Thu, 5 Nov 2020 12:19:00 -0800, Andrii Nakryiko wrote:
> I'll just quote myself here for your convenience.

Sorry, I missed your original email for some reason.

>   Submodule is a way that I know of to make this better for end users.
>   If there are other ways to pull this off with shared library use, I'm
>   all for it, it will save the security angle that distros are arguing
>   for. E.g., if distributions will always have the latest libbpf
>   available almost as soon as it's cut upstream *and* new iproute2
>   versions enforce the latest libbpf when they are packaged/released,
>   then this might work equivalently for end users. If Linux distros
>   would be willing to do this faithfully and promptly, I have no
>   objections whatsoever. Because all that matters is BPF end user
>   experience, as Daniel explained above.

That's basically what we already do, for both Fedora and RHEL.

Of course, it follows the distro release cycle, i.e. no version
upgrades - or very limited ones - during lifetime of a particular
release. But that would not be different if libbpf was bundled in
individual projects.

 Jiri
Jiri Benc Nov. 6, 2020, 9 a.m. UTC | #50
On Thu, 5 Nov 2020 12:45:39 -0800, Andrii Nakryiko wrote:
> That's not true. If you need new functionality like BTF, CO-RE,
> function-by-function verification, etc., then yes, you have to update
> kernel, compiler, libbpf, sometimes pahole. But if you have an BPF
> application that doesn't use and need any of the newer features, it
> will keep working just fine with the old kernel, old libbpf, and old
> compiler.

I'm fine with this.

It doesn't work that well in practice, we've found ourselves chasing
problems caused by llvm update (problems for older bpf programs, not
new ones), problems on non-x86_64 caused by kernel updates, etc. It can
be attributed to living on the edge and it should stabilize over time,
hopefully. But it's still what the users are experiencing and it's
probably what David is referring to. I expect it to smooth itself over
time.

Add to that the fact that something that is in fact a new feature is
perceived as a bug fix by some users. For example, a perfectly valid
and simple C program, not using anything shiny but a basic simple loop,
compiles just fine but is rejected by the kernel. A newer kernel and a
newer compiler and a newer libbpf and a newer pahole will cause the
same program to be accepted. Now, the user does not see that for this,
a new load of BTF functionality had to be added and all those mentioned
projects enhanced with substantial code. All they see is their simple
hello world test program did not work and now it does.

I'm not saying I have a solution nor I'm saying you should do something
about it. Just trying to explain the perception.

 Jiri
Jamal Hadi Salim Nov. 6, 2020, 3:27 p.m. UTC | #51
On 2020-11-05 4:01 p.m., Andrii Nakryiko wrote:
> On Thu, Nov 5, 2020 at 6:05 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>>

>> On 2020-11-04 10:19 p.m., David Ahern wrote:

>>

>> [..]


[..]

>> 2cents feedback from a dabbler in ebpf on user experience:

>>

>> What David described above *has held me back*.

>> Over time it seems things have gotten better with libbpf

>> (although a few times i find myself copying includes from the

>> latest iproute into libbpf). I ended up just doing static links.

>> The idea of upgrading clang/llvm every 2 months i revisit ebpf is

>> the most painful. At times code that used to compile just fine

>> earlier doesnt anymore. There's a minor issue of requiring i install

> 

> Do you have a specific example of something that stopped compiling?

> I'm not saying that can't happen, but we definitely try hard to avoid

> any regressions. I might be forgetting something, but I don't recall

> the situation when something would stop compiling just due to newer

> libbpf.

> 


Unfortunately the ecosystem is more than libbpf; sometimes it is
the kernel code that is being exercised by libbpf that is problematic.
This may sound unfair to libbpf but it is hard to separate the two for
someone who is dabbling like me.

The last issue iirc correctly had to do with one of the tcp notifier
variants either in samples or selftests(both user space and kernel).
I can go back and look at the details.
The fix always more than half the time was need to upgrade
clang/llvm. At one point i think it required that i had to grab
the latest and greatest git version. I think the machine i have
right now has version 11. The first time i found out about these
clang upgrades was trying to go from 8->9 or maybe it was 9->10.
Somewhere along there also was discovery that something that
compiled under earlier version wasnt compiling under newer version.

>> kernel headers every time i want to run something in samples, etc

>> but i am probably lacking knowledge on how to ease the pain in that

>> regard.

>>

>> I find the loader and associated tooling in iproute2/tc to be quiet

>> stable (not shiny but works everytime).

>> And for that reason i often find myself sticking to just tc instead

>> of toying with other areas.

> 

> That's the part that others on this thread mentioned is bit rotting?


Yes. Reason is i dont have to deal with new discoveries of things
that require some upgrade or copying etc.
I should be clear on the "it is the ecosystem": this is not just because
of user space code but also the simplicity of writing the tc kernel code
and loading it with tc tooling and then have a separate user tool for
control.
Lately i started linking the control tool with static libbpf instead.

Bpftool seems improved last time i tried to load something in XDP. I 
like the load-map-then-attach-program approach that bpftool gets
out of libbpf. I dont think that feature is possible with tc tooling.

However, I am still loading with tc and xdp with ip because of old
habits and what i consider to be a very simple workflow.

> Doesn't seem like everyone is happy about that, though. Stopping any

> development definitely makes things stable by definition. BPF and

> libbpf try to be stable while not stagnating, which is harder than

> just stopping any development, unfortunately.

> 


I am for moving to libbpf. I think it is a bad idea to have multiple
loaders for example. Note: I am not a demanding user, but there
are a few useful features that i feel i need that are missing in
iproute2 version. e.g, one thing i was playing with about a month
ago was some TOCTOU issue in the kernel code and getting
the bpf_lock integrated into the tc code proved challenging.
I ended rewriting the code to work around the tooling.

The challenge - when making changes in the name of progress - is to
not burden a user like myself with a complex workflow but still give
me the features i need.

>> Slight tangent:

>> One thing that would help libbpf adoption is to include an examples/

>> directory. Put a bunch of sample apps for tc, probes, xdp etc.

>> And have them compile outside of the kernel. Maybe useful Makefiles

>> that people can cutnpaste from. Every time you add a new feature

>> put some sample code in the examples.

> 

> That's what tools/testing/selftests/bpf in kernel source are for. It's

> not the greatest showcase of examples, but all the new features have a

> test demonstrating its usage. I do agree about having simple Makefiles

> and we do have that at [0]. I'm also about to do another sample repo

> with a lot of things pre-setup, for tinkering and using that as a

> bootstrap for BPF development with libbpf.

> 

>    [0] https://github.com/iovisor/bcc/tree/master/libbpf-tools



I pull that tree regularly.
selftests is good for aggregating things developers submit and
then have the robots test.
For better usability, it has to be something that is standalone that 
would work out of the box with libbf.
selftests and samples are not what i would consider for the
faint-hearted.
It may look easy to you because you eat this stuff for
breakfast but consider all those masses you want to be part of this.
They dont have the skills and people with average skills dont
have the patience.

This again comes back to "the ecosystem" - just getting libbpf to get
things stable for userland is not enough. Maybe have part of the libbpf
testing also to copy things from selftests.

cheers,
jamal
Andrii Nakryiko Nov. 6, 2020, 8:57 p.m. UTC | #52
On Fri, Nov 6, 2020 at 12:44 AM Jiri Benc <jbenc@redhat.com> wrote:
>

> On Thu, 5 Nov 2020 12:19:00 -0800, Andrii Nakryiko wrote:

> > I'll just quote myself here for your convenience.

>

> Sorry, I missed your original email for some reason.

>

> >   Submodule is a way that I know of to make this better for end users.

> >   If there are other ways to pull this off with shared library use, I'm

> >   all for it, it will save the security angle that distros are arguing

> >   for. E.g., if distributions will always have the latest libbpf

> >   available almost as soon as it's cut upstream *and* new iproute2

> >   versions enforce the latest libbpf when they are packaged/released,

> >   then this might work equivalently for end users. If Linux distros

> >   would be willing to do this faithfully and promptly, I have no

> >   objections whatsoever. Because all that matters is BPF end user

> >   experience, as Daniel explained above.

>

> That's basically what we already do, for both Fedora and RHEL.

>

> Of course, it follows the distro release cycle, i.e. no version

> upgrades - or very limited ones - during lifetime of a particular

> release. But that would not be different if libbpf was bundled in

> individual projects.


Alright. Hopefully this would be sufficient in practice.

>

>  Jiri

>
Alexei Starovoitov Nov. 6, 2020, 9:04 p.m. UTC | #53
On Fri, Nov 6, 2020 at 12:58 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>

> On Fri, Nov 6, 2020 at 12:44 AM Jiri Benc <jbenc@redhat.com> wrote:

> >

> > On Thu, 5 Nov 2020 12:19:00 -0800, Andrii Nakryiko wrote:

> > > I'll just quote myself here for your convenience.

> >

> > Sorry, I missed your original email for some reason.

> >

> > >   Submodule is a way that I know of to make this better for end users.

> > >   If there are other ways to pull this off with shared library use, I'm

> > >   all for it, it will save the security angle that distros are arguing

> > >   for. E.g., if distributions will always have the latest libbpf

> > >   available almost as soon as it's cut upstream *and* new iproute2

> > >   versions enforce the latest libbpf when they are packaged/released,

> > >   then this might work equivalently for end users. If Linux distros

> > >   would be willing to do this faithfully and promptly, I have no

> > >   objections whatsoever. Because all that matters is BPF end user

> > >   experience, as Daniel explained above.

> >

> > That's basically what we already do, for both Fedora and RHEL.

> >

> > Of course, it follows the distro release cycle, i.e. no version

> > upgrades - or very limited ones - during lifetime of a particular

> > release. But that would not be different if libbpf was bundled in

> > individual projects.

>

> Alright. Hopefully this would be sufficient in practice.


I think bumping the minimal version of libbpf with every iproute2 release
is necessary as well.
Today iproute2-next should require 0.2.0. The cycle after it should be 0.3.0
and so on.
This way at least some correlation between iproute2 and libbpf will be
established.
Otherwise it's a mess of versions and functionality from user point of view.
Andrii Nakryiko Nov. 6, 2020, 9:07 p.m. UTC | #54
On Fri, Nov 6, 2020 at 1:00 AM Jiri Benc <jbenc@redhat.com> wrote:
>

> On Thu, 5 Nov 2020 12:45:39 -0800, Andrii Nakryiko wrote:

> > That's not true. If you need new functionality like BTF, CO-RE,

> > function-by-function verification, etc., then yes, you have to update

> > kernel, compiler, libbpf, sometimes pahole. But if you have an BPF

> > application that doesn't use and need any of the newer features, it

> > will keep working just fine with the old kernel, old libbpf, and old

> > compiler.

>

> I'm fine with this.

>

> It doesn't work that well in practice, we've found ourselves chasing

> problems caused by llvm update (problems for older bpf programs, not

> new ones), problems on non-x86_64 caused by kernel updates, etc. It can

> be attributed to living on the edge and it should stabilize over time,

> hopefully. But it's still what the users are experiencing and it's

> probably what David is referring to. I expect it to smooth itself over

> time.


It's definitely going to be better over time, of course. I honestly
can't remember many cases where working applications stopped working
with newer kernels. I only remember cases when Clang changed the code
generation patterns. Also there were few too permissive checks fixed
in later kernels, which could break apps, if apps relied on buggy
logic. That did happen I think.

But anyway, I bet people just got a "something like that happened in
the past" flag in their head, but won't be able to recall specific
details anymore. My point is that we (BPF developers) don't take these
things lightly, so I'd just like to avoid the perception that we don't
care about this. Because we do, despite it sometimes being painful.
But there are layers upon layers of abstraction and it's not all
always under our control, so things might break.

>

> Add to that the fact that something that is in fact a new feature is

> perceived as a bug fix by some users. For example, a perfectly valid

> and simple C program, not using anything shiny but a basic simple loop,

> compiles just fine but is rejected by the kernel. A newer kernel and a

> newer compiler and a newer libbpf and a newer pahole will cause the

> same program to be accepted. Now, the user does not see that for this,

> a new load of BTF functionality had to be added and all those mentioned

> projects enhanced with substantial code. All they see is their simple

> hello world test program did not work and now it does.


Right. The unavoidable truth that anyone using BPF has to have at
least a surface-level idea about what BPF verifier is and what (and
sometimes how) it checks. It also gets better over time so much that
for some simpler application it will just work perfectly from the
first version of written code.

But let's also not lose perspective here. There aren't many examples
of practical static verification of program safety and termination,
right? It's tricky, and especially when making it also practical for a
wide variety of use cases.

>

> I'm not saying I have a solution nor I'm saying you should do something

> about it. Just trying to explain the perception.


Thanks for that, it's a good perspective. Hopefully my explanation
also makes sense ;)

>

>  Jiri

>
Andrii Nakryiko Nov. 6, 2020, 9:25 p.m. UTC | #55
On Fri, Nov 6, 2020 at 7:27 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>

> On 2020-11-05 4:01 p.m., Andrii Nakryiko wrote:

> > On Thu, Nov 5, 2020 at 6:05 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> >>

> >> On 2020-11-04 10:19 p.m., David Ahern wrote:

> >>

> >> [..]

>

> [..]

>

> >> 2cents feedback from a dabbler in ebpf on user experience:

> >>

> >> What David described above *has held me back*.

> >> Over time it seems things have gotten better with libbpf

> >> (although a few times i find myself copying includes from the

> >> latest iproute into libbpf). I ended up just doing static links.

> >> The idea of upgrading clang/llvm every 2 months i revisit ebpf is

> >> the most painful. At times code that used to compile just fine

> >> earlier doesnt anymore. There's a minor issue of requiring i install

> >

> > Do you have a specific example of something that stopped compiling?

> > I'm not saying that can't happen, but we definitely try hard to avoid

> > any regressions. I might be forgetting something, but I don't recall

> > the situation when something would stop compiling just due to newer

> > libbpf.

> >

>

> Unfortunately the ecosystem is more than libbpf; sometimes it is

> the kernel code that is being exercised by libbpf that is problematic.

> This may sound unfair to libbpf but it is hard to separate the two for

> someone who is dabbling like me.


I get that. Clang is also part of the ecosystem, along the kernel,
pahole, etc. It's a lot of moving parts and we strive to keep them all
working well together. It's not 100% smooth all the time, but that's
at least the goal.

>

> The last issue iirc correctly had to do with one of the tcp notifier

> variants either in samples or selftests(both user space and kernel).

> I can go back and look at the details.

> The fix always more than half the time was need to upgrade

> clang/llvm. At one point i think it required that i had to grab

> the latest and greatest git version. I think the machine i have

> right now has version 11. The first time i found out about these

> clang upgrades was trying to go from 8->9 or maybe it was 9->10.

> Somewhere along there also was discovery that something that

> compiled under earlier version wasnt compiling under newer version.


So with kernel's samples/bpf and selftests/bpf, we do quite often
expect the latest Clang, because it's not just examples, but also a
live set of tests. So to not accumulate too much cruft, we do update
those (sometimes, not all the time) with assumption of latest features
in Clang, libbpf, pahole, and kernel. That's reality and we set those
expectations quite explicitly a while ago. But that's not the
expectation for user applications outside of the kernel tree. Just
wanted to make this clear.

>

> >> kernel headers every time i want to run something in samples, etc

> >> but i am probably lacking knowledge on how to ease the pain in that

> >> regard.

> >>

> >> I find the loader and associated tooling in iproute2/tc to be quiet

> >> stable (not shiny but works everytime).

> >> And for that reason i often find myself sticking to just tc instead

> >> of toying with other areas.

> >

> > That's the part that others on this thread mentioned is bit rotting?

>

> Yes. Reason is i dont have to deal with new discoveries of things

> that require some upgrade or copying etc.

> I should be clear on the "it is the ecosystem": this is not just because

> of user space code but also the simplicity of writing the tc kernel code

> and loading it with tc tooling and then have a separate user tool for

> control.

> Lately i started linking the control tool with static libbpf instead.


There are also two broad categories of BPF applications: networking
and the rest (tracing, now security, etc). Networking historically
dealt with well-defined data structures (ip headers, tcp headers, etc)
and didn't need much to know about the ever-changing nature of kernel
memory layouts. That used to be, arguably, simpler use case from BPF
standpoint.

Tracing, on the other hand, was always challenging. The only viable
option was BCC's approach of bundling compiler, expecting
kernel-headers, etc. We started changing that with BPF CO-RE to make a
traditional per-compiled model viable. That obviously required changes
in all parts of the ecosystem. So tracing BPF apps went from
impossible, to hard, to constantly evolving, and we are right now in a
somewhat mixed evolving/stabilizing stage. Bleeding edge. As Jiri
said, it's to be expected that there would be rough corners. But the
choice is either to live dangerously or wait for a few years for
things to completely settle. Pick your poison ;)

>

> Bpftool seems improved last time i tried to load something in XDP. I

> like the load-map-then-attach-program approach that bpftool gets

> out of libbpf. I dont think that feature is possible with tc tooling.

>

> However, I am still loading with tc and xdp with ip because of old

> habits and what i consider to be a very simple workflow.

>

> > Doesn't seem like everyone is happy about that, though. Stopping any

> > development definitely makes things stable by definition. BPF and

> > libbpf try to be stable while not stagnating, which is harder than

> > just stopping any development, unfortunately.

> >

>

> I am for moving to libbpf. I think it is a bad idea to have multiple

> loaders for example. Note: I am not a demanding user, but there

> are a few useful features that i feel i need that are missing in

> iproute2 version. e.g, one thing i was playing with about a month

> ago was some TOCTOU issue in the kernel code and getting

> the bpf_lock integrated into the tc code proved challenging.

> I ended rewriting the code to work around the tooling.


Right, bpf_lock relies on BTF, that's probably why.

>

> The challenge - when making changes in the name of progress - is to

> not burden a user like myself with a complex workflow but still give

> me the features i need.


This takes time and work, and can't be done perfectly overnight.
That's all. But the thing is: we are working towards it, non-stop.

>

> >> Slight tangent:

> >> One thing that would help libbpf adoption is to include an examples/

> >> directory. Put a bunch of sample apps for tc, probes, xdp etc.

> >> And have them compile outside of the kernel. Maybe useful Makefiles

> >> that people can cutnpaste from. Every time you add a new feature

> >> put some sample code in the examples.

> >

> > That's what tools/testing/selftests/bpf in kernel source are for. It's

> > not the greatest showcase of examples, but all the new features have a

> > test demonstrating its usage. I do agree about having simple Makefiles

> > and we do have that at [0]. I'm also about to do another sample repo

> > with a lot of things pre-setup, for tinkering and using that as a

> > bootstrap for BPF development with libbpf.

> >

> >    [0] https://github.com/iovisor/bcc/tree/master/libbpf-tools

>

>

> I pull that tree regularly.

> selftests is good for aggregating things developers submit and

> then have the robots test.

> For better usability, it has to be something that is standalone that

> would work out of the box with libbf.


It's not yet ready for wider announcement, but give this a try:

https://github.com/anakryiko/libbpf-bootstrap

Should make it easier to play with libbpf and BPF.

> selftests and samples are not what i would consider for the

> faint-hearted.

> It may look easy to you because you eat this stuff for

> breakfast but consider all those masses you want to be part of this.

> They dont have the skills and people with average skills dont

> have the patience.


I acknowledged from the get-go that selftest/bpf are not the best
source of examples, just that's what we've got. It takes contributions
from lots of people to maintain a decent, nice and clean, easy to use
set of realistic examples. It's unrealistic, IMO, to expect a bunch of
core BPF developers to both develop core technology actively, and
provide great educational resources (however unfortunate that is).

>

> This again comes back to "the ecosystem" - just getting libbpf to get

> things stable for userland is not enough. Maybe have part of the libbpf

> testing also to copy things from selftests.

>

> cheers,

> jamal
Stephen Hemminger Nov. 6, 2020, 11:25 p.m. UTC | #56
On Fri, 6 Nov 2020 13:04:16 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Fri, Nov 6, 2020 at 12:58 PM Andrii Nakryiko

> <andrii.nakryiko@gmail.com> wrote:

> >

> > On Fri, Nov 6, 2020 at 12:44 AM Jiri Benc <jbenc@redhat.com> wrote:  

> > >

> > > On Thu, 5 Nov 2020 12:19:00 -0800, Andrii Nakryiko wrote:  

> > > > I'll just quote myself here for your convenience.  

> > >

> > > Sorry, I missed your original email for some reason.

> > >  

> > > >   Submodule is a way that I know of to make this better for end users.

> > > >   If there are other ways to pull this off with shared library use, I'm

> > > >   all for it, it will save the security angle that distros are arguing

> > > >   for. E.g., if distributions will always have the latest libbpf

> > > >   available almost as soon as it's cut upstream *and* new iproute2

> > > >   versions enforce the latest libbpf when they are packaged/released,

> > > >   then this might work equivalently for end users. If Linux distros

> > > >   would be willing to do this faithfully and promptly, I have no

> > > >   objections whatsoever. Because all that matters is BPF end user

> > > >   experience, as Daniel explained above.  

> > >

> > > That's basically what we already do, for both Fedora and RHEL.

> > >

> > > Of course, it follows the distro release cycle, i.e. no version

> > > upgrades - or very limited ones - during lifetime of a particular

> > > release. But that would not be different if libbpf was bundled in

> > > individual projects.  

> >

> > Alright. Hopefully this would be sufficient in practice.  

> 

> I think bumping the minimal version of libbpf with every iproute2 release

> is necessary as well.

> Today iproute2-next should require 0.2.0. The cycle after it should be 0.3.0

> and so on.

> This way at least some correlation between iproute2 and libbpf will be

> established.

> Otherwise it's a mess of versions and functionality from user point of view.


As long as iproute2 6.0 and libbpf 0.11.0 continues to work on older kernel
(like oldest living LTS 4.19 in 2023?); then it is fine. 

Just don't want libbpf to cause visible breakage for users.
Andrii Nakryiko Nov. 6, 2020, 11:30 p.m. UTC | #57
On Fri, Nov 6, 2020 at 3:25 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>

> On Fri, 6 Nov 2020 13:04:16 -0800

> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

>

> > On Fri, Nov 6, 2020 at 12:58 PM Andrii Nakryiko

> > <andrii.nakryiko@gmail.com> wrote:

> > >

> > > On Fri, Nov 6, 2020 at 12:44 AM Jiri Benc <jbenc@redhat.com> wrote:

> > > >

> > > > On Thu, 5 Nov 2020 12:19:00 -0800, Andrii Nakryiko wrote:

> > > > > I'll just quote myself here for your convenience.

> > > >

> > > > Sorry, I missed your original email for some reason.

> > > >

> > > > >   Submodule is a way that I know of to make this better for end users.

> > > > >   If there are other ways to pull this off with shared library use, I'm

> > > > >   all for it, it will save the security angle that distros are arguing

> > > > >   for. E.g., if distributions will always have the latest libbpf

> > > > >   available almost as soon as it's cut upstream *and* new iproute2

> > > > >   versions enforce the latest libbpf when they are packaged/released,

> > > > >   then this might work equivalently for end users. If Linux distros

> > > > >   would be willing to do this faithfully and promptly, I have no

> > > > >   objections whatsoever. Because all that matters is BPF end user

> > > > >   experience, as Daniel explained above.

> > > >

> > > > That's basically what we already do, for both Fedora and RHEL.

> > > >

> > > > Of course, it follows the distro release cycle, i.e. no version

> > > > upgrades - or very limited ones - during lifetime of a particular

> > > > release. But that would not be different if libbpf was bundled in

> > > > individual projects.

> > >

> > > Alright. Hopefully this would be sufficient in practice.

> >

> > I think bumping the minimal version of libbpf with every iproute2 release

> > is necessary as well.

> > Today iproute2-next should require 0.2.0. The cycle after it should be 0.3.0

> > and so on.

> > This way at least some correlation between iproute2 and libbpf will be

> > established.

> > Otherwise it's a mess of versions and functionality from user point of view.

>

> As long as iproute2 6.0 and libbpf 0.11.0 continues to work on older kernel

> (like oldest living LTS 4.19 in 2023?); then it is fine.

>

> Just don't want libbpf to cause visible breakage for users.


libbpf CI validates a bunch of selftests on 4.9 kernel, see [0]. It
should work on even older ones. Not all BPF programs would load and be
verified successfully, but libbpf itself should work regardless.

  [0] https://travis-ci.com/github/libbpf/libbpf/jobs/429362146
David Ahern Nov. 6, 2020, 11:38 p.m. UTC | #58
On 11/6/20 4:25 PM, Stephen Hemminger wrote:
>>

>> I think bumping the minimal version of libbpf with every iproute2 release

>> is necessary as well.

>> Today iproute2-next should require 0.2.0. The cycle after it should be 0.3.0

>> and so on.

>> This way at least some correlation between iproute2 and libbpf will be

>> established.

>> Otherwise it's a mess of versions and functionality from user point of view.


If existing bpf features in iproute2 work fine with version 0.1.0, what
is the justification for an arbitrary requirement for iproute2 to force
users to bump libbpf versions just to use iproute2 from v5.11?
Stephen Hemminger Nov. 7, 2020, 12:41 a.m. UTC | #59
On Fri, 6 Nov 2020 15:30:38 -0800
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Fri, Nov 6, 2020 at 3:25 PM Stephen Hemminger

> <stephen@networkplumber.org> wrote:

> >

> > On Fri, 6 Nov 2020 13:04:16 -0800

> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> >  

> > > On Fri, Nov 6, 2020 at 12:58 PM Andrii Nakryiko

> > > <andrii.nakryiko@gmail.com> wrote:  

> > > >

> > > > On Fri, Nov 6, 2020 at 12:44 AM Jiri Benc <jbenc@redhat.com> wrote:  

> > > > >

> > > > > On Thu, 5 Nov 2020 12:19:00 -0800, Andrii Nakryiko wrote:  

> > > > > > I'll just quote myself here for your convenience.  

> > > > >

> > > > > Sorry, I missed your original email for some reason.

> > > > >  

> > > > > >   Submodule is a way that I know of to make this better for end users.

> > > > > >   If there are other ways to pull this off with shared library use, I'm

> > > > > >   all for it, it will save the security angle that distros are arguing

> > > > > >   for. E.g., if distributions will always have the latest libbpf

> > > > > >   available almost as soon as it's cut upstream *and* new iproute2

> > > > > >   versions enforce the latest libbpf when they are packaged/released,

> > > > > >   then this might work equivalently for end users. If Linux distros

> > > > > >   would be willing to do this faithfully and promptly, I have no

> > > > > >   objections whatsoever. Because all that matters is BPF end user

> > > > > >   experience, as Daniel explained above.  

> > > > >

> > > > > That's basically what we already do, for both Fedora and RHEL.

> > > > >

> > > > > Of course, it follows the distro release cycle, i.e. no version

> > > > > upgrades - or very limited ones - during lifetime of a particular

> > > > > release. But that would not be different if libbpf was bundled in

> > > > > individual projects.  

> > > >

> > > > Alright. Hopefully this would be sufficient in practice.  

> > >

> > > I think bumping the minimal version of libbpf with every iproute2 release

> > > is necessary as well.

> > > Today iproute2-next should require 0.2.0. The cycle after it should be 0.3.0

> > > and so on.

> > > This way at least some correlation between iproute2 and libbpf will be

> > > established.

> > > Otherwise it's a mess of versions and functionality from user point of view.  

> >

> > As long as iproute2 6.0 and libbpf 0.11.0 continues to work on older kernel

> > (like oldest living LTS 4.19 in 2023?); then it is fine.

> >

> > Just don't want libbpf to cause visible breakage for users.  

> 

> libbpf CI validates a bunch of selftests on 4.9 kernel, see [0]. It

> should work on even older ones. Not all BPF programs would load and be

> verified successfully, but libbpf itself should work regardless.

> 

>   [0] https://travis-ci.com/github/libbpf/libbpf/jobs/429362146


Look at the dates in my note, are you willing to promise that compatibility
in future versions.
Andrii Nakryiko Nov. 7, 2020, 1:07 a.m. UTC | #60
On Fri, Nov 6, 2020 at 4:41 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>

> On Fri, 6 Nov 2020 15:30:38 -0800

> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

>

> > On Fri, Nov 6, 2020 at 3:25 PM Stephen Hemminger

> > <stephen@networkplumber.org> wrote:

> > >

> > > On Fri, 6 Nov 2020 13:04:16 -0800

> > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> > >

> > > > On Fri, Nov 6, 2020 at 12:58 PM Andrii Nakryiko

> > > > <andrii.nakryiko@gmail.com> wrote:

> > > > >

> > > > > On Fri, Nov 6, 2020 at 12:44 AM Jiri Benc <jbenc@redhat.com> wrote:

> > > > > >

> > > > > > On Thu, 5 Nov 2020 12:19:00 -0800, Andrii Nakryiko wrote:

> > > > > > > I'll just quote myself here for your convenience.

> > > > > >

> > > > > > Sorry, I missed your original email for some reason.

> > > > > >

> > > > > > >   Submodule is a way that I know of to make this better for end users.

> > > > > > >   If there are other ways to pull this off with shared library use, I'm

> > > > > > >   all for it, it will save the security angle that distros are arguing

> > > > > > >   for. E.g., if distributions will always have the latest libbpf

> > > > > > >   available almost as soon as it's cut upstream *and* new iproute2

> > > > > > >   versions enforce the latest libbpf when they are packaged/released,

> > > > > > >   then this might work equivalently for end users. If Linux distros

> > > > > > >   would be willing to do this faithfully and promptly, I have no

> > > > > > >   objections whatsoever. Because all that matters is BPF end user

> > > > > > >   experience, as Daniel explained above.

> > > > > >

> > > > > > That's basically what we already do, for both Fedora and RHEL.

> > > > > >

> > > > > > Of course, it follows the distro release cycle, i.e. no version

> > > > > > upgrades - or very limited ones - during lifetime of a particular

> > > > > > release. But that would not be different if libbpf was bundled in

> > > > > > individual projects.

> > > > >

> > > > > Alright. Hopefully this would be sufficient in practice.

> > > >

> > > > I think bumping the minimal version of libbpf with every iproute2 release

> > > > is necessary as well.

> > > > Today iproute2-next should require 0.2.0. The cycle after it should be 0.3.0

> > > > and so on.

> > > > This way at least some correlation between iproute2 and libbpf will be

> > > > established.

> > > > Otherwise it's a mess of versions and functionality from user point of view.

> > >

> > > As long as iproute2 6.0 and libbpf 0.11.0 continues to work on older kernel

> > > (like oldest living LTS 4.19 in 2023?); then it is fine.

> > >

> > > Just don't want libbpf to cause visible breakage for users.

> >

> > libbpf CI validates a bunch of selftests on 4.9 kernel, see [0]. It

> > should work on even older ones. Not all BPF programs would load and be

> > verified successfully, but libbpf itself should work regardless.

> >

> >   [0] https://travis-ci.com/github/libbpf/libbpf/jobs/429362146

>

> Look at the dates in my note, are you willing to promise that compatibility

> in future versions.

>


I don't understand why after so many emails in this thread it's still
not clear that backwards compatibility is in libbpf's DNA. And no one
can even point out where and when exactly libbpf even had a problem
with backwards compatibility in the first place! Yet, all of this
insinuation of libbpf API instability...

So for the last time (hopefully): yes!

We managed to do that for at least 2 last years, why would we suddenly
break this?
Alexei Starovoitov Nov. 9, 2020, 1:45 a.m. UTC | #61
On Fri, Nov 06, 2020 at 04:38:13PM -0700, David Ahern wrote:
> On 11/6/20 4:25 PM, Stephen Hemminger wrote:

> >>

> >> I think bumping the minimal version of libbpf with every iproute2 release

> >> is necessary as well.

> >> Today iproute2-next should require 0.2.0. The cycle after it should be 0.3.0

> >> and so on.

> >> This way at least some correlation between iproute2 and libbpf will be

> >> established.

> >> Otherwise it's a mess of versions and functionality from user point of view.

> 

> If existing bpf features in iproute2 work fine with version 0.1.0, what

> is the justification for an arbitrary requirement for iproute2 to force

> users to bump libbpf versions just to use iproute2 from v5.11?


I don't understand why on one side you're pointing out existing quirkiness with
bpf usability while at the same time arguing to make it _less_ user friendly
when myself, Daniel, Andrii explained in detail what libbpf does and how it
affects user experience?

The analogy of libbpf in iproute2 and libbfd in gdb is that both libraries
perform large percentage of functionality comparing to the rest of the tool.
When library is dynamic linked it makes user experience unpredictable. My guess
is that libbfd is ~50% of what gdb is doing. What will the users say if gdb
suddenly behaves differently (supports less or more elf files) because
libbfd.so got upgraded in the background? In case of tc+libbpf the break down
of funcionality is heavliy skewed towards libbpf. The amount of logic iproute2
code will do to perform "tc filter ... bpf..." command is 10% iproute2 / 90%
libbpf. Issuing few netlink calls to attach bpf prog to a qdisc is trivial
comparing to what libbpf is doing with an elf file. There is a linker inside
libbpf. It will separate different functions inside elf file. It will relocate
code and adjust instructions before sending it to the kernel. libbpf is not
a wrapper. It's a mini compiler: CO-RE logic, function relocation, dynamic
kernel feature probing, etc. When the users use a command line tool (like
iproute2 or bpftool) they are interfacing with the tool. It's not unix-like to
demand that users should check the version of a shared library and adjust their
expectations. The UI is the command line. Its version is as a promise of
features. iproute2 of certain version in one distro should behave the same as
iproute2 in another distro. By not doing git submodule that promise is broken.
Hence my preference is to use fixed libbpf sha for every iproute2 release. The
other alternative is to lag iproute2/libbpf one release behind. Hence
repeating what I said earlier: Today iproute2-next should require 0.2.0. The
iprtoute2 in the next cycle _must_ bump be the minimum libbpf version to 0.3.0.
Not bumping minimum version brings us to square one and unpredicatable user
experience. The users are jumping through enough hoops when they develop bpf
programs. We have to make it simpler and easier. Using libbpf in iproute2
can improve the user experience, but only if it's predictable.
Hangbin Liu Nov. 9, 2020, 7:07 a.m. UTC | #62
This series converts iproute2 to use libbpf for loading and attaching
BPF programs when it is available. This means that iproute2 will
correctly process BTF information and support the new-style BTF-defined
maps, while keeping compatibility with the old internal map definition
syntax.

This is achieved by checking for libbpf at './configure' time, and using
it if available. By default the system libbpf will be used, but static
linking against a custom libbpf version can be achieved by passing
LIBBPF_DIR to configure. LIBBPF_FORCE can be set to on to force configure
abort if no suitable libbpf is found (useful for automatic packaging
that wants to enforce the dependency), or set off to disable libbpf check
and build iproute2 with legacy bpf.

The old iproute2 bpf code is kept and will be used if no suitable libbpf
is available. When using libbpf, wrapper code ensures that iproute2 will
still understand the old map definition format, including populating
map-in-map and tail call maps before load.

The examples in bpf/examples are kept, and a separate set of examples
are added with BTF-based map definitions for those examples where this
is possible (libbpf doesn't currently support declaratively populating
tail call maps).

At last, Thanks a lot for Toke's help on this patch set.

v4:
a) Make variable LIBBPF_FORCE able to control whether build iproute2
   with libbpf or not.
b) Add new file bpf_glue.c to for libbpf/legacy mixed bpf calls.
c) Fix some build issues and shell compatibility error.

v3:
a) Update configure to Check function bpf_program__section_name() separately
b) Add a new function get_bpf_program__section_name() to choose whether to
use bpf_program__title() or not.
c) Test build the patch on Fedora 33 with libbpf-0.1.0-1.fc33 and
   libbpf-devel-0.1.0-1.fc33

v2:
a) Remove self defined IS_ERR_OR_NULL and use libbpf_get_error() instead.
b) Add ipvrf with libbpf support.


Here are the test results with patched iproute2:

== setup env
# clang -O2 -Wall -g -target bpf -c bpf_graft.c -o btf_graft.o
# clang -O2 -Wall -g -target bpf -c bpf_map_in_map.c -o btf_map_in_map.o
# clang -O2 -Wall -g -target bpf -c bpf_shared.c -o btf_shared.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_cyclic.c -o bpf_cyclic.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_graft.c -o bpf_graft.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_map_in_map.c -o bpf_map_in_map.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_shared.c -o bpf_shared.o
# clang -O2 -Wall -g -target bpf -c legacy/bpf_tailcall.c -o bpf_tailcall.o
# rm -rf /sys/fs/bpf/xdp/globals
# /root/iproute2/ip/ip link add type veth
# /root/iproute2/ip/ip link set veth0 up
# /root/iproute2/ip/ip link set veth1 up


== Load objs
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 4 tag 3056d2382e53f27c jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
4: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:21-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 5
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 8 tag 4420e72b2a601ed7 jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc  map_inner  map_outer
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
8: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:23-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 10
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 12 tag 9cbab549c3af3eab jited
# ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
12: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:25-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 15
# /root/iproute2/ip/ip link set veth0 xdp off


== Load objs again to make sure maps could be reused
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_graft.o sec aaa
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 16 tag 3056d2382e53f27c jited
# ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
16: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:27-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 20
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_map_in_map.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 20 tag 4420e72b2a601ed7 jited
# ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
# bpftool map show                                                                                                                                                                   [236/4518]
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
20: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:29-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 3
        btf_id 25
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj bpf_shared.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 24 tag 9cbab549c3af3eab jited
# ls /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef:
map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc  map_inner  map_outer
# bpftool map show
1: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
2: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
3: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
4: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
24: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:31-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 4
        btf_id 30
# /root/iproute2/ip/ip link set veth0 xdp off
# rm -rf /sys/fs/bpf/xdp/7a1422e90cd81478f97bc33fbd7782bcb3b868ef /sys/fs/bpf/xdp/globals

== Testing if we can load new-style objects (using xdp-filter as an example)
# /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_all.o sec xdp_filter
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 28 tag e29eeda1489a6520 jited
# ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
# bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
# bpftool prog show
28: xdp  name xdpfilt_alw_all  tag e29eeda1489a6520  gpl
        loaded_at 2020-10-22T08:04:33-0400  uid 0
        xlated 2408B  jited 1405B  memlock 4096B  map_ids 9,5,7,8,6
        btf_id 35
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_ip.o sec xdp_filter
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 32 tag 2f2b9dbfb786a5a2 jited
# ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
# bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
# bpftool prog show
32: xdp  name xdpfilt_alw_ip  tag 2f2b9dbfb786a5a2  gpl
        loaded_at 2020-10-22T08:04:35-0400  uid 0
        xlated 1336B  jited 778B  memlock 4096B  map_ids 7,8,5
        btf_id 40
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj /usr/lib64/bpf/xdpfilt_alw_tcp.o sec xdp_filter
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 36 tag 18c1bb25084030bc jited
# ls /sys/fs/bpf/xdp/globals
filter_ethernet  filter_ipv4  filter_ipv6  filter_ports  xdp_stats_map
# bpftool map show
5: percpu_array  name xdp_stats_map  flags 0x0
        key 4B  value 16B  max_entries 5  memlock 4096B
        btf_id 35
6: percpu_array  name filter_ports  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 1576960B
        btf_id 35
7: percpu_hash  name filter_ipv4  flags 0x0
        key 4B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
8: percpu_hash  name filter_ipv6  flags 0x0
        key 16B  value 8B  max_entries 10000  memlock 1142784B
        btf_id 35
9: percpu_hash  name filter_ethernet  flags 0x0
        key 6B  value 8B  max_entries 10000  memlock 1064960B
        btf_id 35
# bpftool prog show
36: xdp  name xdpfilt_alw_tcp  tag 18c1bb25084030bc  gpl
        loaded_at 2020-10-22T08:04:37-0400  uid 0
        xlated 1128B  jited 690B  memlock 4096B  map_ids 6,5
        btf_id 45
# /root/iproute2/ip/ip link set veth0 xdp off
# rm -rf /sys/fs/bpf/xdp/globals


== Load new btf defined maps
# /root/iproute2/ip/ip link set veth0 xdp obj btf_graft.o sec aaa
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 40 tag 3056d2382e53f27c jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc
# bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
40: xdp  name cls_aaa  tag 3056d2382e53f27c  gpl
        loaded_at 2020-10-22T08:04:39-0400  uid 0
        xlated 80B  jited 71B  memlock 4096B
        btf_id 50
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj btf_map_in_map.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 44 tag 4420e72b2a601ed7 jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer
# bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
44: xdp  name imain  tag 4420e72b2a601ed7  gpl
        loaded_at 2020-10-22T08:04:41-0400  uid 0
        xlated 336B  jited 193B  memlock 4096B  map_ids 13
        btf_id 55
# /root/iproute2/ip/ip link set veth0 xdp off
# /root/iproute2/ip/ip link set veth0 xdp obj btf_shared.o sec ingress
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
    prog/xdp id 48 tag 9cbab549c3af3eab jited
# ls /sys/fs/bpf/xdp/globals
jmp_tc  map_outer  map_sh
# bpftool map show
10: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
11: array  name map_inner  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
13: array_of_maps  name map_outer  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
14: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
48: xdp  name imain  tag 9cbab549c3af3eab  gpl
        loaded_at 2020-10-22T08:04:43-0400  uid 0
        xlated 224B  jited 139B  memlock 4096B  map_ids 14
        btf_id 60
# /root/iproute2/ip/ip link set veth0 xdp off
# rm -rf /sys/fs/bpf/xdp/globals


== Test load objs by tc
# /root/iproute2/tc/tc qdisc add dev veth0 ingress
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_cyclic.o sec 0xabccba/0
# /root/iproute2/tc/tc filter add dev veth0 parent ffff: bpf obj bpf_graft.o
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/0
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 42/1
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec 43/0
# /root/iproute2/tc/tc filter add dev veth0 ingress bpf da obj bpf_tailcall.o sec classifier
# /root/iproute2/ip/ip link show veth0
5: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 6a:e6:fa:2b:4e:1f brd ff:ff:ff:ff:ff:ff
# ls /sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d /sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f /sys/fs/bpf/xdp/globals
/sys/fs/bpf/xdp/37e88cb3b9646b2ea5f99ab31069ad88db06e73d:
jmp_tc

/sys/fs/bpf/xdp/fc68fe3e96378a0cba284ea6acbe17e898d8b11f:
jmp_ex  jmp_tc  map_sh

/sys/fs/bpf/xdp/globals:
jmp_tc
# bpftool map show
15: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
16: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
17: prog_array  name jmp_ex  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
        owner_prog_type sched_cls  owner jited
18: prog_array  name jmp_tc  flags 0x0
        key 4B  value 4B  max_entries 2  memlock 4096B
        owner_prog_type sched_cls  owner jited
19: array  name map_sh  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B
# bpftool prog show
52: sched_cls  name cls_loop  tag 3e98a40b04099d36  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 168B  jited 133B  memlock 4096B  map_ids 15
        btf_id 65
56: sched_cls  name cls_entry  tag 0fbb4d9310a6ee26  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 144B  jited 121B  memlock 4096B  map_ids 16
        btf_id 70
60: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 75
66: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 80
72: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 85
78: sched_cls  name cls_case1  tag e06a3bd62293d65d  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 328B  jited 216B  memlock 4096B  map_ids 19,17
        btf_id 90
79: sched_cls  name cls_case2  tag ee218ff893dca823  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 336B  jited 218B  memlock 4096B  map_ids 19,18
        btf_id 90
80: sched_cls  name cls_exit  tag e78a58140deed387  gpl
        loaded_at 2020-10-22T08:04:45-0400  uid 0
        xlated 288B  jited 177B  memlock 4096B  map_ids 19
        btf_id 90

I also run the following upstream kselftest with patches iproute2 and
all passed.

test_lwt_ip_encap.sh
test_xdp_redirect.sh
test_tc_redirect.sh
test_xdp_meta.sh
test_xdp_veth.sh
test_xdp_vlan.sh

Hangbin Liu (5):
  configure: add check_libbpf() for later libbpf support
  lib: rename bpf.c to bpf_legacy.c
  lib: add libbpf support
  examples/bpf: move struct bpf_elf_map defined maps to legacy folder
  examples/bpf: add bpf examples with BTF defined maps

 configure                                | 108 +++++++
 examples/bpf/README                      |  18 +-
 examples/bpf/bpf_graft.c                 |  14 +-
 examples/bpf/bpf_map_in_map.c            |  37 ++-
 examples/bpf/bpf_shared.c                |  14 +-
 examples/bpf/{ => legacy}/bpf_cyclic.c   |   2 +-
 examples/bpf/legacy/bpf_graft.c          |  66 +++++
 examples/bpf/legacy/bpf_map_in_map.c     |  56 ++++
 examples/bpf/legacy/bpf_shared.c         |  53 ++++
 examples/bpf/{ => legacy}/bpf_tailcall.c |   2 +-
 include/bpf_api.h                        |  13 +
 include/bpf_util.h                       |  21 +-
 ip/ipvrf.c                               |   6 +-
 lib/Makefile                             |   6 +-
 lib/bpf_glue.c                           |  35 +++
 lib/{bpf.c => bpf_legacy.c}              | 193 ++++++++++++-
 lib/bpf_libbpf.c                         | 353 +++++++++++++++++++++++
 17 files changed, 939 insertions(+), 58 deletions(-)
 rename examples/bpf/{ => legacy}/bpf_cyclic.c (95%)
 create mode 100644 examples/bpf/legacy/bpf_graft.c
 create mode 100644 examples/bpf/legacy/bpf_map_in_map.c
 create mode 100644 examples/bpf/legacy/bpf_shared.c
 rename examples/bpf/{ => legacy}/bpf_tailcall.c (98%)
 create mode 100644 lib/bpf_glue.c
 rename lib/{bpf.c => bpf_legacy.c} (94%)
 create mode 100644 lib/bpf_libbpf.c
David Ahern Nov. 10, 2020, 4:09 a.m. UTC | #63
On 11/8/20 6:45 PM, Alexei Starovoitov wrote:
> 

> I don't understand why on one side you're pointing out existing quirkiness with

> bpf usability while at the same time arguing to make it _less_ user friendly


I believe you have confused my comments with others. My comments have
focused on one aspect: The insistence by BPF maintainers that all code
bases and users constantly chase latest and greatest versions of
relevant S/W to use BPF - though I believe a lot of the tool chasing
stems from BTF. I am fairly certain I have been consistent in that theme
within this thread.

> when myself, Daniel, Andrii explained in detail what libbpf does and how it

> affects user experience?

> 

> The analogy of libbpf in iproute2 and libbfd in gdb is that both libraries


Your gdb / libbfd analogy misses the mark - by a lot. That analogy is
relevant for bpftool, not iproute2.

iproute2 can leverage libbpf for 3 or 4 tc modules and a few xdp hooks.
That is it, and it is a tiny percentage of the functionality in the package.
Edward Cree Nov. 10, 2020, 12:47 p.m. UTC | #64
On 05/11/2020 14:05, Jamal Hadi Salim wrote:
> On 2020-11-04 10:19 p.m., David Ahern wrote:

>

> [..]

>> Similarly, it is not realistic or user friendly to *require* general

>> Linux users to constantly chase latest versions of llvm, clang, dwarves,

>> bcc, bpftool, libbpf, (I am sure I am missing more)

>

> 2cents feedback from a dabbler in ebpf on user experience:

>

> What David described above *has held me back*.

If we're doing 2¢... I gave up on trying to keep ebpf_asmabreast
 of all the latest BPF and BTF features quite some time ago, since
 there was rarely any documentation and the specifications for BPF
 elves were basically "whatever latest clang does".
The bpf developers seem to have taken the position that since
 they're in control of clang, libbpf and the kernel, they can make
 their changes across all three and not bother with the specs that
 would allow other toolchains to interoperate.  As a result of
 which, that belief has now become true — while ebpf_asm will
 still work for what it always did (simple XDP programs), it is
 unlikely ever to gain CO-RE support so is no longer a live
 alternative to clang for BPF in general.
Of course the bpf developers are well within their rights to not
 care about that.  But I think it illustrates why having to
 interoperate with systems outside their control and mix-and-match
 versioning of various components provides external discipline that
 is sorely needed if the BPF ecosystem is to remain healthy.
That is why I am opposed to iproute2 'vendoring' libbpf.

-ed
Alexei Starovoitov Nov. 11, 2020, 12:47 a.m. UTC | #65
On Mon, Nov 09, 2020 at 09:09:44PM -0700, David Ahern wrote:
> On 11/8/20 6:45 PM, Alexei Starovoitov wrote:

> > 

> > I don't understand why on one side you're pointing out existing quirkiness with

> > bpf usability while at the same time arguing to make it _less_ user friendly

> 

> I believe you have confused my comments with others. My comments have

> focused on one aspect: The insistence by BPF maintainers that all code

> bases and users constantly chase latest and greatest versions of

> relevant S/W to use BPF


yes, because we care about user experience while you're still insisting
on make it horrible.
With random pick of libbpf.so we would have no choice, but to actively tell
users to avoid using tc, because sooner or later they will be pissed. I'd
rather warn them ahead of time.

> - though I believe a lot of the tool chasing

> stems from BTF. I am fairly certain I have been consistent in that theme

> within this thread.


Right. A lot of features added in the last couple years depend on BTF:
static vs global linking, bpf_spin_lock, function by function verification, etc

> > when myself, Daniel, Andrii explained in detail what libbpf does and how it

> > affects user experience?

> > 

> > The analogy of libbpf in iproute2 and libbfd in gdb is that both libraries

> 

> Your gdb / libbfd analogy misses the mark - by a lot. That analogy is

> relevant for bpftool, not iproute2.

> 

> iproute2 can leverage libbpf for 3 or 4 tc modules and a few xdp hooks.

> That is it, and it is a tiny percentage of the functionality in the package.


cat tools/lib/bpf/*.[hc]|wc -l
23950
cat iproute2/tc/*.[hc]|wc -l
29542

The point is that for these few tc commands the amount logic in libbpf/tc is 90/10.

Let's play it out how libbpf+tc is going to get developed moving forward if
libbpf is a random version. Say, there is a patch for libbpf that makes
iproute2 experience better. bpf maintainers would have no choice, but to reject
it, since we don't add features/apis to libbpf if there is no active user.
Adding a new libbpf api that iproute2 few years from now may or may not take
advantage makes little sense.
Alexei Starovoitov Nov. 11, 2020, 12:53 a.m. UTC | #66
On Tue, Nov 10, 2020 at 12:47:28PM +0000, Edward Cree wrote:
> On 05/11/2020 14:05, Jamal Hadi Salim wrote:

> > On 2020-11-04 10:19 p.m., David Ahern wrote:

> >

> > [..]

> >> Similarly, it is not realistic or user friendly to *require* general

> >> Linux users to constantly chase latest versions of llvm, clang, dwarves,

> >> bcc, bpftool, libbpf, (I am sure I am missing more)

> >

> > 2cents feedback from a dabbler in ebpf on user experience:

> >

> > What David described above *has held me back*.

> If we're doing 2¢... I gave up on trying to keep ebpf_asmabreast

>  of all the latest BPF and BTF features quite some time ago, since

>  there was rarely any documentation and the specifications for BPF

>  elves were basically "whatever latest clang does".

> The bpf developers seem to have taken the position that since

>  they're in control of clang, libbpf and the kernel, they can make

>  their changes across all three and not bother with the specs that

>  would allow other toolchains to interoperate.  As a result of

>  which, that belief has now become true — while ebpf_asm will

>  still work for what it always did (simple XDP programs), it is

>  unlikely ever to gain CO-RE support so is no longer a live

>  alternative to clang for BPF in general.

> Of course the bpf developers are well within their rights to not

>  care about that.  But I think it illustrates why having to

>  interoperate with systems outside their control and mix-and-match

>  versioning of various components provides external discipline that

>  is sorely needed if the BPF ecosystem is to remain healthy.


I think thriving public bpf projects, startups and established companies
that are obviously outside of control of few people that argue here
would disagree with your assessment.
Toke Høiland-Jørgensen Nov. 11, 2020, 11:02 a.m. UTC | #67
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Nov 09, 2020 at 09:09:44PM -0700, David Ahern wrote:

>> On 11/8/20 6:45 PM, Alexei Starovoitov wrote:

>> > 

>> > I don't understand why on one side you're pointing out existing quirkiness with

>> > bpf usability while at the same time arguing to make it _less_ user friendly

>> 

>> I believe you have confused my comments with others. My comments have

>> focused on one aspect: The insistence by BPF maintainers that all code

>> bases and users constantly chase latest and greatest versions of

>> relevant S/W to use BPF

>

> yes, because we care about user experience while you're still insisting

> on make it horrible.

> With random pick of libbpf.so we would have no choice, but to actively tell

> users to avoid using tc, because sooner or later they will be pissed. I'd

> rather warn them ahead of time.


Could we *please* stop with this "my way or the highway" extortion? It's
incredibly rude, and it's not helping the discussion.

>> - though I believe a lot of the tool chasing

>> stems from BTF. I am fairly certain I have been consistent in that theme

>> within this thread.

>

> Right. A lot of features added in the last couple years depend on BTF:

> static vs global linking, bpf_spin_lock, function by function verification, etc

>

>> > when myself, Daniel, Andrii explained in detail what libbpf does and how it

>> > affects user experience?

>> > 

>> > The analogy of libbpf in iproute2 and libbfd in gdb is that both libraries

>> 

>> Your gdb / libbfd analogy misses the mark - by a lot. That analogy is

>> relevant for bpftool, not iproute2.

>> 

>> iproute2 can leverage libbpf for 3 or 4 tc modules and a few xdp hooks.

>> That is it, and it is a tiny percentage of the functionality in the package.

>

> cat tools/lib/bpf/*.[hc]|wc -l

> 23950

> cat iproute2/tc/*.[hc]|wc -l

> 29542

>

> The point is that for these few tc commands the amount logic in libbpf/tc is 90/10.

>

> Let's play it out how libbpf+tc is going to get developed moving forward if

> libbpf is a random version. Say, there is a patch for libbpf that makes

> iproute2 experience better. bpf maintainers would have no choice, but to reject

> it, since we don't add features/apis to libbpf if there is no active user.

> Adding a new libbpf api that iproute2 few years from now may or may not take

> advantage makes little sense.


What? No one has said that iproute2 would never use any new features,
just that they would be added conditionally on a compatibility check
with libbpf (like the check for bpf_program__section_name() in the
current patch series).

Besides, for the entire history of BPF support in iproute2 so far, the
benefit has come from all the features that libbpf has just started
automatically supporting on load (BTF, etc), so users would have
benefited from automatic library updates had it *not* been vendored in.

-Toke
Edward Cree Nov. 11, 2020, 11:31 a.m. UTC | #68
On 11/11/2020 00:53, Alexei Starovoitov wrote:
> On Tue, Nov 10, 2020 at 12:47:28PM +0000, Edward Cree wrote:

>> But I think it illustrates why having to

>>  interoperate with systems outside their control and mix-and-match

>>  versioning of various components provides external discipline that

>>  is sorely needed if the BPF ecosystem is to remain healthy.

> 

> I think thriving public bpf projects, startups and established companies

> that are obviously outside of control of few people that argue here

> would disagree with your assessment.


Correct me if I'm wrong, but aren't those bpf projects and companies
 _things that are written in BPF_, rather than alternative toolchain
 components for compiling, loading and otherwise wrangling BPF once
 it's been written?
It is the latter that I am saying is needed in order to keep BPF
 infrastructure development "honest", rather than treating the clang
 frontend as The API and all layers below it as undocumented internal
 implementation details.
In a healthy ecosystem, it should be possible to use a compiler,
 assembler, linker and loader developed separately by four projects
 unrelated to each other and to the kernel and runtime.  Thanks to
 well-specified ABIs and file formats, in the C ecosystem this is
 actually possible, despite the existence of some projects that
 bundle together multiple components.
In the BPF ecosystem, instead, it seems like the only toolchain
 anyone cares to support is latest clang + latest libbpf, and if you
 try to replace any component of the toolchain with something else,
 the spec you have to program against is "Go and read the LLVM
 source code, figure out what it does, and copy that".
That is not sustainable in the long term.

-ed
Daniel Borkmann Nov. 11, 2020, 3:06 p.m. UTC | #69
On 11/11/20 12:02 PM, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

>> On Mon, Nov 09, 2020 at 09:09:44PM -0700, David Ahern wrote:

>>> On 11/8/20 6:45 PM, Alexei Starovoitov wrote:

>>>>

>>>> I don't understand why on one side you're pointing out existing quirkiness with

>>>> bpf usability while at the same time arguing to make it _less_ user friendly

>>>

>>> I believe you have confused my comments with others. My comments have

>>> focused on one aspect: The insistence by BPF maintainers that all code

>>> bases and users constantly chase latest and greatest versions of

>>> relevant S/W to use BPF

>>

>> yes, because we care about user experience while you're still insisting

>> on make it horrible.

>> With random pick of libbpf.so we would have no choice, but to actively tell

>> users to avoid using tc, because sooner or later they will be pissed. I'd

>> rather warn them ahead of time.

> 

> Could we *please* stop with this "my way or the highway" extortion? It's

> incredibly rude, and it's not helping the discussion.

> 

>>> - though I believe a lot of the tool chasing

>>> stems from BTF. I am fairly certain I have been consistent in that theme

>>> within this thread.

>>

>> Right. A lot of features added in the last couple years depend on BTF:

>> static vs global linking, bpf_spin_lock, function by function verification, etc

>>

>>>> when myself, Daniel, Andrii explained in detail what libbpf does and how it

>>>> affects user experience?

>>>>

>>>> The analogy of libbpf in iproute2 and libbfd in gdb is that both libraries

>>>

>>> Your gdb / libbfd analogy misses the mark - by a lot. That analogy is

>>> relevant for bpftool, not iproute2.

>>>

>>> iproute2 can leverage libbpf for 3 or 4 tc modules and a few xdp hooks.

>>> That is it, and it is a tiny percentage of the functionality in the package.

>>

>> cat tools/lib/bpf/*.[hc]|wc -l

>> 23950

>> cat iproute2/tc/*.[hc]|wc -l

>> 29542

>>

>> The point is that for these few tc commands the amount logic in libbpf/tc is 90/10.

>>

>> Let's play it out how libbpf+tc is going to get developed moving forward if

>> libbpf is a random version. Say, there is a patch for libbpf that makes

>> iproute2 experience better. bpf maintainers would have no choice, but to reject

>> it, since we don't add features/apis to libbpf if there is no active user.

>> Adding a new libbpf api that iproute2 few years from now may or may not take

>> advantage makes little sense.

> 

> What? No one has said that iproute2 would never use any new features,

> just that they would be added conditionally on a compatibility check

> with libbpf (like the check for bpf_program__section_name() in the

> current patch series).

> 

> Besides, for the entire history of BPF support in iproute2 so far, the

> benefit has come from all the features that libbpf has just started

> automatically supporting on load (BTF, etc), so users would have

> benefited from automatic library updates had it *not* been vendored in.


Not really. What you imply here is that we're living in a perfect world and that
all distros follow suite and i) add libbpf dependency to their official iproute2
package, ii) upgrade iproute2 package along with new kernel releases and iii)
upgrade libbpf along with it so that users are able to develop BPF programs against
the feature set that the kernel offers (as intended). These are a lot of moving parts
to get right, and as I pointed out earlier in the conversation, it took major distros
2 years to get their act together to officially include bpftool as a package -
I'm not making this up, and this sort of pace is simply not sustainable. It's also
not clear whether distros will get point iii) correct. It's not about compatibility,
but rather about __users__ of the loader being able to __benefit__ of the latest
features their distro kernel ships from BPF (& libbpf) side just as they do with
iproute2 extensions. For the integrated lib/bpf.c in iproute2 this was never an
issue and for multiple years in the earlier days it was much further ahead than
libbpf which was only tracing-focused before we decided to put focus on the latter
as a more general loader instead. But if you ever want to start a deprecation process
of the lib/bpf.c then users should not need to worry whether iproute2 was even linked
to libbpf in the first place, they should be able to have a guarantee that it's
__generally available__ as with lib/bpf.c, otherwise they'll always just assume
the latter as the minimal available base when writing code against iproute2 loader.
Hypothetically speaking, if Hangbin would have presented patches here to extend the
existing lib/bpf.c to the point that it's feature complete (compared to libbpf),
we wouldn't even have this whole discussion here.

Thanks,
Daniel
David Ahern Nov. 11, 2020, 4:33 p.m. UTC | #70
On 11/11/20 8:06 AM, Daniel Borkmann wrote:
> 

> Not really. What you imply here is that we're living in a perfect world

> and that

> all distros follow suite and i) add libbpf dependency to their official

> iproute2

> package, ii) upgrade iproute2 package along with new kernel releases and

> iii)

> upgrade libbpf along with it so that users are able to develop BPF

> programs against

> the feature set that the kernel offers (as intended). These are a lot of

> moving parts

> to get right, and as I pointed out earlier in the conversation, it took

> major distros

> 2 years to get their act together to officially include bpftool as a

> package -


Yes, there are lot of moving parts and that puts a huge burden on
distributions. The trend that related s/w is outdated 2-3 months after a
release can be taken as a sign that bpf is not stable and ready for
distributions to take on and support.

bpftool is only 3 years old (Oct 2017 is first kernel commit). You can
not expect distributions to chase every whim from kernel developers, so
bpftool needed to evolve and prove its usefulness. It has now, so really
the disappointment should be limited to distributions over the past 12
months, especially Ubuntu 20.04 (most recent LTS) not having a libbpf
and bpftool releases. But again, 20.04 was too old for BTF 3 months
after it was released and that comes back to the bigger question of
whether bpf is really ready for distributions to support. More below.

Focusing on the future: for Ubuntu (and Debian?) bpftool is in the
linux-tools-common package. perf has already trained distributions to
release a tools package with kernel releases. That means bpftool updates
follow the kernel cadence. bpftool requires libbpf and I believe given
the feature dependencies will force libbpf versions to follow kernel
releases, so I believe your goal is going to be achieved by those
dependencies.

But there is an on-going nagging problem which needs to be acknowledged
and solved. As an *example*, Ubunutu has kernel updates to get new
hardware support (HWE releases). Updating kernels on an LTS is
problematic when the kernel update requires toolchain updates to
maintain features (DEBUG_INFO_BTF) and library updates to get tools for
that kernel version working. That is a huge disruption to their
customers who want stability — the whole reason for LTS distributions.
Alexei Starovoitov Nov. 11, 2020, 6:08 p.m. UTC | #71
On Wed, Nov 11, 2020 at 11:31:47AM +0000, Edward Cree wrote:
> On 11/11/2020 00:53, Alexei Starovoitov wrote:

> > On Tue, Nov 10, 2020 at 12:47:28PM +0000, Edward Cree wrote:

> >> But I think it illustrates why having to

> >>  interoperate with systems outside their control and mix-and-match

> >>  versioning of various components provides external discipline that

> >>  is sorely needed if the BPF ecosystem is to remain healthy.

> > 

> > I think thriving public bpf projects, startups and established companies

> > that are obviously outside of control of few people that argue here

> > would disagree with your assessment.

> 

> Correct me if I'm wrong, but aren't those bpf projects and companies

>  _things that are written in BPF_, rather than alternative toolchain

>  components for compiling, loading and otherwise wrangling BPF once

>  it's been written?

> It is the latter that I am saying is needed in order to keep BPF

>  infrastructure development "honest", rather than treating the clang

>  frontend as The API and all layers below it as undocumented internal

>  implementation details.

> In a healthy ecosystem, it should be possible to use a compiler,

>  assembler, linker and loader developed separately by four projects

>  unrelated to each other and to the kernel and runtime.  Thanks to

>  well-specified ABIs and file formats, in the C ecosystem this is

>  actually possible, despite the existence of some projects that

>  bundle together multiple components.

> In the BPF ecosystem, instead, it seems like the only toolchain

>  anyone cares to support is latest clang + latest libbpf, and if you

>  try to replace any component of the toolchain with something else,

>  the spec you have to program against is "Go and read the LLVM

>  source code, figure out what it does, and copy that".

> That is not sustainable in the long term.


Absolutely. I agree 100% with above.
BPF ecosystem eventually will get to a point of fixed file format,
linker specification and 1000 page psABI document.
One can argue that when RISCV ISA was invented recently and it came with full
ABI document just like x86 long ago. BPF ISA is different. It grows
"organically". We don't add all possible instructions up front. We don't define
all possible relocation types to ELF. That fundamental difference vs all other
ISAs help BPF follow its own path. Take BTF, for example. No other ISA have
such concept. Yet due to BTF the BPF ecosystem can provide features no other
ISA can. Similar story happens with clang. BPF extended C language _already_.
The BPF C programs have a way to compare types. It is a C language extension.
Did we go to C standard committee and argue for years that such extension is
necessary? Obviously not. Today BPF is, as you correctly pointed out, layers of
undocumented internal details. Obviously we're not content with such situation.
Toke Høiland-Jørgensen Nov. 12, 2020, 10:36 p.m. UTC | #72
Daniel Borkmann <daniel@iogearbox.net> writes:

>> Besides, for the entire history of BPF support in iproute2 so far, the

>> benefit has come from all the features that libbpf has just started

>> automatically supporting on load (BTF, etc), so users would have

>> benefited from automatic library updates had it *not* been vendored in.

>

> Not really. What you imply here is that we're living in a perfect

> world and that all distros follow suite and i) add libbpf dependency

> to their official iproute2 package, ii) upgrade iproute2 package along

> with new kernel releases and iii) upgrade libbpf along with it so that

> users are able to develop BPF programs against the feature set that

> the kernel offers (as intended). These are a lot of moving parts to

> get right, and as I pointed out earlier in the conversation, it took

> major distros 2 years to get their act together to officially include

> bpftool as a package - I'm not making this up, and this sort of pace

> is simply not sustainable. It's also not clear whether distros will

> get point iii) correct.


I totally get that you've been frustrated with the distro adoption and
packaging of BPF-related tools. And rightfully so. I just don't think
that the answer to this is to try to work around distros, but rather to
work with them to get things right.

I'm quite happy to take a shot at getting a cross-distro effort going in
this space; really, having well-supported BPF tooling ought to be in
everyone's interest!

-Toke
Daniel Borkmann Nov. 12, 2020, 11:20 p.m. UTC | #73
On 11/12/20 11:36 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:

> 

>>> Besides, for the entire history of BPF support in iproute2 so far, the

>>> benefit has come from all the features that libbpf has just started

>>> automatically supporting on load (BTF, etc), so users would have

>>> benefited from automatic library updates had it *not* been vendored in.

>>

>> Not really. What you imply here is that we're living in a perfect

>> world and that all distros follow suite and i) add libbpf dependency

>> to their official iproute2 package, ii) upgrade iproute2 package along

>> with new kernel releases and iii) upgrade libbpf along with it so that

>> users are able to develop BPF programs against the feature set that

>> the kernel offers (as intended). These are a lot of moving parts to

>> get right, and as I pointed out earlier in the conversation, it took

>> major distros 2 years to get their act together to officially include

>> bpftool as a package - I'm not making this up, and this sort of pace

>> is simply not sustainable. It's also not clear whether distros will

>> get point iii) correct.

> 

> I totally get that you've been frustrated with the distro adoption and

> packaging of BPF-related tools. And rightfully so. I just don't think

> that the answer to this is to try to work around distros, but rather to

> work with them to get things right.

> 

> I'm quite happy to take a shot at getting a cross-distro effort going in

> this space; really, having well-supported BPF tooling ought to be in

> everyone's interest!


Thanks, yes, that is worth a push either way! There is still a long tail
of distros that are not considered major and until they all catch up with
points i)-iii) it might take a much longer time until this becomes really
ubiquitous with iproute2 for users of the libbpf loader. Its that this
frustrating user experience could be avoided altogether. iproute2 is
shipped and run also on small / embedded devices hence it tries to have
external dependencies reduced to a bare minimum (well, except that libmnl
detour, but it's not a mandatory dependency). If I were a user and would
rely on the loader for my progs to be installed I'd probably end up
compiling my own version of iproute2 linked with libbpf to move forward
instead of being blocked on distro to catch up, but its an additional
hassle for shipping SW instead of just having it all pre-installed when
built-in given it otherwise comes with the base distro already. But then
my question is what is planned here as deprecation process for the built-in
lib/bpf.c code? I presume we'll remove it eventually to move on?
Stephen Hemminger Nov. 13, 2020, 12:04 a.m. UTC | #74
On Fri, 13 Nov 2020 00:20:52 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 11/12/20 11:36 PM, Toke Høiland-Jørgensen wrote:

> > Daniel Borkmann <daniel@iogearbox.net> writes:

> >   

> >>> Besides, for the entire history of BPF support in iproute2 so far, the

> >>> benefit has come from all the features that libbpf has just started

> >>> automatically supporting on load (BTF, etc), so users would have

> >>> benefited from automatic library updates had it *not* been vendored in.  

> >>

> >> Not really. What you imply here is that we're living in a perfect

> >> world and that all distros follow suite and i) add libbpf dependency

> >> to their official iproute2 package, ii) upgrade iproute2 package along

> >> with new kernel releases and iii) upgrade libbpf along with it so that

> >> users are able to develop BPF programs against the feature set that

> >> the kernel offers (as intended). These are a lot of moving parts to

> >> get right, and as I pointed out earlier in the conversation, it took

> >> major distros 2 years to get their act together to officially include

> >> bpftool as a package - I'm not making this up, and this sort of pace

> >> is simply not sustainable. It's also not clear whether distros will

> >> get point iii) correct.  

> > 

> > I totally get that you've been frustrated with the distro adoption and

> > packaging of BPF-related tools. And rightfully so. I just don't think

> > that the answer to this is to try to work around distros, but rather to

> > work with them to get things right.

> > 

> > I'm quite happy to take a shot at getting a cross-distro effort going in

> > this space; really, having well-supported BPF tooling ought to be in

> > everyone's interest!  

> 

> Thanks, yes, that is worth a push either way! There is still a long tail

> of distros that are not considered major and until they all catch up with

> points i)-iii) it might take a much longer time until this becomes really

> ubiquitous with iproute2 for users of the libbpf loader. Its that this

> frustrating user experience could be avoided altogether. iproute2 is

> shipped and run also on small / embedded devices hence it tries to have

> external dependencies reduced to a bare minimum (well, except that libmnl

> detour, but it's not a mandatory dependency). If I were a user and would

> rely on the loader for my progs to be installed I'd probably end up

> compiling my own version of iproute2 linked with libbpf to move forward

> instead of being blocked on distro to catch up, but its an additional

> hassle for shipping SW instead of just having it all pre-installed when

> built-in given it otherwise comes with the base distro already. But then

> my question is what is planned here as deprecation process for the built-in

> lib/bpf.c code? I presume we'll remove it eventually to move on?


Perf has a similar problem and it made it into most distributions because it is
a valuable tool. Maybe there is some lessons learned that could apply here.
Alexei Starovoitov Nov. 13, 2020, 12:40 a.m. UTC | #75
On Thu, Nov 12, 2020 at 4:35 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>

> On Fri, 13 Nov 2020 00:20:52 +0100

> Daniel Borkmann <daniel@iogearbox.net> wrote:

>

> > On 11/12/20 11:36 PM, Toke Høiland-Jørgensen wrote:

> > > Daniel Borkmann <daniel@iogearbox.net> writes:

> > >

> > >>> Besides, for the entire history of BPF support in iproute2 so far, the

> > >>> benefit has come from all the features that libbpf has just started

> > >>> automatically supporting on load (BTF, etc), so users would have

> > >>> benefited from automatic library updates had it *not* been vendored in.

> > >>

> > >> Not really. What you imply here is that we're living in a perfect

> > >> world and that all distros follow suite and i) add libbpf dependency

> > >> to their official iproute2 package, ii) upgrade iproute2 package along

> > >> with new kernel releases and iii) upgrade libbpf along with it so that

> > >> users are able to develop BPF programs against the feature set that

> > >> the kernel offers (as intended). These are a lot of moving parts to

> > >> get right, and as I pointed out earlier in the conversation, it took

> > >> major distros 2 years to get their act together to officially include

> > >> bpftool as a package - I'm not making this up, and this sort of pace

> > >> is simply not sustainable. It's also not clear whether distros will

> > >> get point iii) correct.

> > >

> > > I totally get that you've been frustrated with the distro adoption and

> > > packaging of BPF-related tools. And rightfully so. I just don't think

> > > that the answer to this is to try to work around distros, but rather to

> > > work with them to get things right.

> > >

> > > I'm quite happy to take a shot at getting a cross-distro effort going in

> > > this space; really, having well-supported BPF tooling ought to be in

> > > everyone's interest!

> >

> > Thanks, yes, that is worth a push either way! There is still a long tail

> > of distros that are not considered major and until they all catch up with

> > points i)-iii) it might take a much longer time until this becomes really

> > ubiquitous with iproute2 for users of the libbpf loader. Its that this

> > frustrating user experience could be avoided altogether. iproute2 is

> > shipped and run also on small / embedded devices hence it tries to have

> > external dependencies reduced to a bare minimum (well, except that libmnl

> > detour, but it's not a mandatory dependency). If I were a user and would

> > rely on the loader for my progs to be installed I'd probably end up

> > compiling my own version of iproute2 linked with libbpf to move forward

> > instead of being blocked on distro to catch up, but its an additional

> > hassle for shipping SW instead of just having it all pre-installed when

> > built-in given it otherwise comes with the base distro already. But then

> > my question is what is planned here as deprecation process for the built-in

> > lib/bpf.c code? I presume we'll remove it eventually to move on?

>

> Perf has a similar problem and it made it into most distributions because it is

> a valuable tool. Maybe there is some lessons learned that could apply here.


Indeed.
Please read tools/perf/Documentation/Build.txt
and realize that perf binary _statically_ links libperf library.
David Ahern Nov. 13, 2020, 3:55 a.m. UTC | #76
On 11/12/20 4:20 PM, Daniel Borkmann wrote:
> built-in given it otherwise comes with the base distro already. But then

> my question is what is planned here as deprecation process for the built-in

> lib/bpf.c code? I presume we'll remove it eventually to move on?


It will need to follow the established deprecation pattern for N, N-1
releases (N here refers to distro LTS releases, not kernel or iproute2
releases). Meaning, for the next few years it needs to exist as an
option when libbpf is not installed. After that we can add a deprecation
warning that libbpf is preferred, and then at some point in the distant
future it can be removed.