mbox series

[v7,bpf-next,0/6] xsk: build skb by page (aka generic zerocopy xmit)

Message ID 20210217120003.7938-1-alobakin@pm.me
Headers show
Series xsk: build skb by page (aka generic zerocopy xmit) | expand

Message

Alexander Lobakin Feb. 17, 2021, noon UTC
This series introduces XSK generic zerocopy xmit by adding XSK umem
pages as skb frags instead of copying data to linear space.
The only requirement for this for drivers is to be able to xmit skbs
with skb_headlen(skb) == 0, i.e. all data including hard headers
starts from frag 0.
To indicate whether a particular driver supports this, a new netdev
priv flag, IFF_TX_SKB_NO_LINEAR, is added (and declared in virtio_net
as it's already capable of doing it). So consider implementing this
in your drivers to greatly speed-up generic XSK xmit.

The first two bits refactor netdev_priv_flags a bit to harden them
in terms of bitfield overflow, as IFF_TX_SKB_NO_LINEAR is the last
one that fits into unsigned int.
The fifth patch adds headroom and tailroom reservations for the
allocated skbs on XSK generic xmit path. This ensures there won't
be any unwanted skb reallocations on fast-path due to headroom and/or
tailroom driver/device requirements (own headers/descriptors etc.).
The other three add a new private flag, declare it in virtio_net
driver and introduce generic XSK zerocopy xmit itself.

The main body of work is created and done by Xuan Zhuo. His original
cover letter:

v3:
    Optimized code

v2:
    1. add priv_flags IFF_TX_SKB_NO_LINEAR instead of netdev_feature
    2. split the patch to three:
        a. add priv_flags IFF_TX_SKB_NO_LINEAR
        b. virtio net add priv_flags IFF_TX_SKB_NO_LINEAR
        c. When there is support this flag, construct skb without linear
           space
    3. use ERR_PTR() and PTR_ERR() to handle the err

v1 message log:
---------------

This patch is used to construct skb based on page to save memory copy
overhead.

This has one problem:

We construct the skb by fill the data page as a frag into the skb. In
this way, the linear space is empty, and the header information is also
in the frag, not in the linear space, which is not allowed for some
network cards. For example, Mellanox Technologies MT27710 Family
[ConnectX-4 Lx] will get the following error message:

    mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8,
    qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
    00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
    WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
    00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
    00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
    00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb

I also tried to use build_skb to construct skb, but because of the
existence of skb_shinfo, it must be behind the linear space, so this
method is not working. We can't put skb_shinfo on desc->addr, it will be
exposed to users, this is not safe.

Finally, I added a feature NETIF_F_SKB_NO_LINEAR to identify whether the
network card supports the header information of the packet in the frag
and not in the linear space.

---------------- Performance Testing ------------

The test environment is Aliyun ECS server.
Test cmd:
```
xdpsock -i eth0 -t  -S -s <msg size>
```

Test result data:

size    64      512     1024    1500
copy    1916747 1775988 1600203 1440054
page    1974058 1953655 1945463 1904478
percent 3.0%    10.0%   21.58%  32.3%

Comments

John Fastabend Feb. 18, 2021, 6:08 a.m. UTC | #1
Alexander Lobakin wrote:
> This series introduces XSK generic zerocopy xmit by adding XSK umem

> pages as skb frags instead of copying data to linear space.

> The only requirement for this for drivers is to be able to xmit skbs

> with skb_headlen(skb) == 0, i.e. all data including hard headers

> starts from frag 0.

> To indicate whether a particular driver supports this, a new netdev

> priv flag, IFF_TX_SKB_NO_LINEAR, is added (and declared in virtio_net

> as it's already capable of doing it). So consider implementing this

> in your drivers to greatly speed-up generic XSK xmit.


[...]
 
> ---------------- Performance Testing ------------

> 

> The test environment is Aliyun ECS server.

> Test cmd:

> ```

> xdpsock -i eth0 -t  -S -s <msg size>

> ```

> 

> Test result data:

> 

> size    64      512     1024    1500

> copy    1916747 1775988 1600203 1440054

> page    1974058 1953655 1945463 1904478

> percent 3.0%    10.0%   21.58%  32.3%

> 


For the series, but might be good to get Dave or Jakub to check
2/6 to be sure they agree.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Jakub Kicinski Feb. 18, 2021, 7:49 p.m. UTC | #2
On Wed, 17 Feb 2021 22:08:55 -0800 John Fastabend wrote:
> > ---------------- Performance Testing ------------

> > 

> > The test environment is Aliyun ECS server.

> > Test cmd:

> > ```

> > xdpsock -i eth0 -t  -S -s <msg size>

> > ```

> > 

> > Test result data:

> > 

> > size    64      512     1024    1500

> > copy    1916747 1775988 1600203 1440054

> > page    1974058 1953655 1945463 1904478

> > percent 3.0%    10.0%   21.58%  32.3%

> >   

> 

> For the series, but might be good to get Dave or Jakub to check

> 2/6 to be sure they agree.


Not sure if Dave would consider holding this series just because of
this, but I'm not a huge fan. I think moving towards a bitfield would
be a better direction an all these flags and defines.

This series is not the place for such effort, so perhaps drop patch 2,
leave it be and follow up with a conversion to a bitfield?