mbox series

[v11,bpf-next,00/18] mvneta: introduce XDP multi-buffer support

Message ID cover.1628854454.git.lorenzo@kernel.org
Headers show
Series mvneta: introduce XDP multi-buffer support | expand

Message

Lorenzo Bianconi Aug. 13, 2021, 11:47 a.m. UTC
This series introduce XDP multi-buffer support. The mvneta driver is
the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
please focus on how these new types of xdp_{buff,frame} packets
traverse the different layers and the layout design. It is on purpose
that BPF-helpers are kept simple, as we don't want to expose the
internal layout to allow later changes.

The main idea for the new multi-buffer layout is to reuse the same
structure used for non-linear SKB. This rely on the "skb_shared_info"
struct at the end of the first buffer to link together subsequent
buffers. Keeping the layout compatible with SKBs is also done to ease
and speedup creating a SKB from an xdp_{buff,frame}.
Converting xdp_frame to SKB and deliver it to the network stack is shown
in patch 05/18 (e.g. cpumaps).

A multi-buffer bit (mb) has been introduced in the flags field of xdp_{buff,frame}
structure to notify the bpf/network layer if this is a xdp multi-buffer frame
(mb = 1) or not (mb = 0).
The mb bit will be set by a xdp multi-buffer capable driver only for
non-linear frames maintaining the capability to receive linear frames
without any extra cost since the skb_shared_info structure at the end
of the first buffer will be initialized only if mb is set.
Moreover the flags field in xdp_{buff,frame} will be reused even for
xdp rx csum offloading in future series.

Typical use cases for this series are:
- Jumbo-frames
- Packet header split (please see Google’s use-case @ NetDevConf 0x14, [0])
- TSO/GRO

The two following ebpf helpers (and related selftests) has been introduced:
- bpf_xdp_adjust_data:
  Move xdp_md->data and xdp_md->data_end pointers in subsequent fragments
  according to the offset provided by the ebpf program. This helper can be
  used to read/write values in frame payload.
- bpf_xdp_get_buff_len:
  Return the total frame size (linear + paged parts)

bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
account xdp multi-buff frames.

More info about the main idea behind this approach can be found here [1][2].

Changes since v10:
- move xdp->data to the requested payload offset instead of to the beginning of
  the fragment in bpf_xdp_adjust_data()

Changes since v9:
- introduce bpf_xdp_adjust_data helper and related selftest
- add xdp_frags_size and xdp_frags_tsize fields in skb_shared_info
- introduce xdp_update_skb_shared_info utility routine in ordere to not reset
  frags array in skb_shared_info converting from a xdp_buff/xdp_frame to a skb 
- simplify bpf_xdp_copy routine

Changes since v8:
- add proper dma unmapping if XDP_TX fails on mvneta for a xdp multi-buff
- switch back to skb_shared_info implementation from previous xdp_shared_info
  one
- avoid using a bietfield in xdp_buff/xdp_frame since it introduces performance
  regressions. Tested now on 10G NIC (ixgbe) to verify there are no performance
  penalties for regular codebase
- add bpf_xdp_get_buff_len helper and remove frame_length field in xdp ctx
- add data_len field in skb_shared_info struct
- introduce XDP_FLAGS_FRAGS_PF_MEMALLOC flag

Changes since v7:
- rebase on top of bpf-next
- fix sparse warnings
- improve comments for frame_length in include/net/xdp.h

Changes since v6:
- the main difference respect to previous versions is the new approach proposed
  by Eelco to pass full length of the packet to eBPF layer in XDP context
- reintroduce multi-buff support to eBPF kself-tests
- reintroduce multi-buff support to bpf_xdp_adjust_tail helper
- introduce multi-buffer support to bpf_xdp_copy helper
- rebase on top of bpf-next

Changes since v5:
- rebase on top of bpf-next
- initialize mb bit in xdp_init_buff() and drop per-driver initialization
- drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
- postpone introduction of frame_length field in XDP ctx to another series
- minor changes

Changes since v4:
- rebase ontop of bpf-next
- introduce xdp_shared_info to build xdp multi-buff instead of using the
  skb_shared_info struct
- introduce frame_length in xdp ctx
- drop previous bpf helpers
- fix bpf_xdp_adjust_tail for xdp multi-buff
- introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
- fix xdp_return_frame_bulk for xdp multi-buff

Changes since v3:
- rebase ontop of bpf-next
- add patch 10/13 to copy back paged data from a xdp multi-buff frame to
  userspace buffer for xdp multi-buff selftests

Changes since v2:
- add throughput measurements
- drop bpf_xdp_adjust_mb_header bpf helper
- introduce selftest for xdp multibuffer
- addressed comments on bpf_xdp_get_frags_count
- introduce xdp multi-buff support to cpumaps

Changes since v1:
- Fix use-after-free in xdp_return_{buff/frame}
- Introduce bpf helpers
- Introduce xdp_mb sample program
- access skb_shared_info->nr_frags only on the last fragment

Changes since RFC:
- squash multi-buffer bit initialization in a single patch
- add mvneta non-linear XDP buff support for tx side

[0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
[2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)

Eelco Chaudron (3):
  bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  bpf: add multi-buffer support to xdp copy helpers
  bpf: update xdp_adjust_tail selftest to include multi-buffer

Lorenzo Bianconi (15):
  net: skbuff: add size metadata to skb_shared_info for xdp
  xdp: introduce flags field in xdp_buff/xdp_frame
  net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  net: mvneta: simplify mvneta_swbm_add_rx_fragment management
  net: xdp: add xdp_update_skb_shared_info utility routine
  net: marvell: rely on xdp_update_skb_shared_info utility routine
  xdp: add multi-buff support to xdp_return_{buff/frame}
  net: mvneta: add multi buffer support to XDP_TX
  net: mvneta: enable jumbo frames for XDP
  bpf: introduce bpf_xdp_get_buff_len helper
  bpf: move user_size out of bpf_test_init
  bpf: introduce multibuff support to bpf_prog_test_run_xdp()
  bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
    signature
  net: xdp: introduce bpf_xdp_adjust_data helper
  bpf: add bpf_xdp_adjust_data selftest

 drivers/net/ethernet/marvell/mvneta.c         | 204 ++++++++++-------
 include/linux/skbuff.h                        |   6 +-
 include/net/xdp.h                             |  95 +++++++-
 include/uapi/linux/bpf.h                      |  39 ++++
 kernel/trace/bpf_trace.c                      |   3 +
 net/bpf/test_run.c                            | 117 ++++++++--
 net/core/filter.c                             | 212 +++++++++++++++++-
 net/core/xdp.c                                |  76 ++++++-
 tools/include/uapi/linux/bpf.h                |  39 ++++
 .../bpf/prog_tests/xdp_adjust_data.c          |  55 +++++
 .../bpf/prog_tests/xdp_adjust_tail.c          | 118 ++++++++++
 .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 151 +++++++++----
 .../bpf/progs/test_xdp_adjust_tail_grow.c     |  10 +-
 .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++-
 .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
 .../bpf/progs/test_xdp_update_frags.c         |  41 ++++
 16 files changed, 1035 insertions(+), 165 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_adjust_data.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_update_frags.c

Comments

Toke Høiland-Jørgensen Aug. 18, 2021, 12:31 p.m. UTC | #1
Lorenzo Bianconi <lorenzo@kernel.org> writes:

> For XDP frames split over multiple buffers, the xdp_md->data and

> xdp_md->data_end pointers will point to the start and end of the first

> fragment only. bpf_xdp_adjust_data can be used to access subsequent

> fragments by moving the data pointers. To use, an XDP program can call

> this helper with the byte offset of the packet payload that

> it wants to access; the helper will move xdp_md->data and xdp_md ->data_end

> so they point to the requested payload offset and to the end of the

> fragment containing this byte offset, and return the byte offset of the

> start of the fragment.

> To move back to the beginning of the packet, simply call the

> helper with an offset of '0'.

> Note also that the helpers that modify the packet boundaries

> (bpf_xdp_adjust_head(), bpf_xdp_adjust_tail() and

> bpf_xdp_adjust_meta()) will fail if the pointers have been

> moved; it is the responsibility of the BPF program to move them

> back before using these helpers.

>

> Suggested-by: John Fastabend <john.fastabend@gmail.com>

> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>

> ---

>  include/net/xdp.h              |  8 +++++

>  include/uapi/linux/bpf.h       | 32 ++++++++++++++++++

>  net/bpf/test_run.c             |  8 +++++

>  net/core/filter.c              | 61 +++++++++++++++++++++++++++++++++-

>  tools/include/uapi/linux/bpf.h | 32 ++++++++++++++++++

>  5 files changed, 140 insertions(+), 1 deletion(-)

>

> diff --git a/include/net/xdp.h b/include/net/xdp.h

> index cdaecf8d4d61..ce4764c7cd40 100644

> --- a/include/net/xdp.h

> +++ b/include/net/xdp.h

> @@ -82,6 +82,11 @@ struct xdp_buff {

>  	struct xdp_txq_info *txq;

>  	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/

>  	u16 flags; /* supported values defined in xdp_flags */

> +	/* xdp multi-buff metadata used for frags iteration */

> +	struct {

> +		u16 headroom;	/* frame headroom: data - data_hard_start */

> +		u16 headlen;	/* first buffer length: data_end - data */

> +	} mb;

>  };

>  

>  static __always_inline bool xdp_buff_is_mb(struct xdp_buff *xdp)

> @@ -127,6 +132,9 @@ xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,

>  	xdp->data = data;

>  	xdp->data_end = data + data_len;

>  	xdp->data_meta = meta_valid ? data : data + 1;

> +	/* mb metadata for frags iteration */

> +	xdp->mb.headroom = headroom;

> +	xdp->mb.headlen = data_len;

>  }

>  

>  /* Reserve memory area at end-of data area.

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> index ddbf9ccc2f74..c20a8b7c5c7c 100644

> --- a/include/uapi/linux/bpf.h

> +++ b/include/uapi/linux/bpf.h

> @@ -4853,6 +4853,37 @@ union bpf_attr {

>   *		Get the total size of a given xdp buff (linear and paged area)

>   *	Return

>   *		The total size of a given xdp buffer.

> + *

> + * long bpf_xdp_adjust_data(struct xdp_buff *xdp_md, u32 offset)

> + *	Description

> + *		For XDP frames split over multiple buffers, the

> + *		*xdp_md*\ **->data** and*xdp_md *\ **->data_end** pointers

> + *		will point to the start and end of the first fragment only.

> + *		This helper can be used to access subsequent fragments by

> + *		moving the data pointers. To use, an XDP program can call

> + *		this helper with the byte offset of the packet payload that

> + *		it wants to access; the helper will move *xdp_md*\ **->data**

> + *		and *xdp_md *\ **->data_end** so they point to the requested

> + *		payload offset and to the end of the fragment containing this

> + *		byte offset, and return the byte offset of the start of the

> + *		fragment.


This comment is wrong now :)

-Toke
Lorenzo Bianconi Aug. 18, 2021, 12:47 p.m. UTC | #2
> Lorenzo Bianconi <lorenzo@kernel.org> writes:

> 

[...]
> > + *	Description

> > + *		For XDP frames split over multiple buffers, the

> > + *		*xdp_md*\ **->data** and*xdp_md *\ **->data_end** pointers

> > + *		will point to the start and end of the first fragment only.

> > + *		This helper can be used to access subsequent fragments by

> > + *		moving the data pointers. To use, an XDP program can call

> > + *		this helper with the byte offset of the packet payload that

> > + *		it wants to access; the helper will move *xdp_md*\ **->data**

> > + *		and *xdp_md *\ **->data_end** so they point to the requested

> > + *		payload offset and to the end of the fragment containing this

> > + *		byte offset, and return the byte offset of the start of the

> > + *		fragment.

> 

> This comment is wrong now :)


actually we are still returning the byte offset of the start of the fragment
(base_offset).

Lorenzo

> 

> -Toke

>
Toke Høiland-Jørgensen Aug. 18, 2021, 12:58 p.m. UTC | #3
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:

>> Lorenzo Bianconi <lorenzo@kernel.org> writes:

>> 

> [...]

>> > + *	Description

>> > + *		For XDP frames split over multiple buffers, the

>> > + *		*xdp_md*\ **->data** and*xdp_md *\ **->data_end** pointers

>> > + *		will point to the start and end of the first fragment only.

>> > + *		This helper can be used to access subsequent fragments by

>> > + *		moving the data pointers. To use, an XDP program can call

>> > + *		this helper with the byte offset of the packet payload that

>> > + *		it wants to access; the helper will move *xdp_md*\ **->data**

>> > + *		and *xdp_md *\ **->data_end** so they point to the requested

>> > + *		payload offset and to the end of the fragment containing this

>> > + *		byte offset, and return the byte offset of the start of the

>> > + *		fragment.

>> 

>> This comment is wrong now :)

>

> actually we are still returning the byte offset of the start of the fragment

> (base_offset).


Hmm, right, I was looking at the 'return 0':

> +BPF_CALL_2(bpf_xdp_adjust_data, struct xdp_buff *, xdp, u32, offset)

> +{

> +	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);

> +	u32 base_offset = xdp->mb.headlen;

> +	int i;

> +

> +	if (!xdp_buff_is_mb(xdp) || offset > sinfo->xdp_frags_size)

> +		return -EINVAL;

> +

> +	if (offset < xdp->mb.headlen) {

> +		/* linear area */

> +		xdp->data = xdp->data_hard_start + xdp->mb.headroom;

> +		xdp->data_end = xdp->data + xdp->mb.headlen;

> +		return 0;

> +	}


But I guess that's an offset; but that means the helper is not doing
what it says it's doing if it's within the first fragment. That should
probably be made consistent... :)

-Toke
Lorenzo Bianconi Aug. 18, 2021, 1:37 p.m. UTC | #4
> Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:

> 

> >> Lorenzo Bianconi <lorenzo@kernel.org> writes:

> >> 

> > [...]

> >> > + *	Description

> >> > + *		For XDP frames split over multiple buffers, the

> >> > + *		*xdp_md*\ **->data** and*xdp_md *\ **->data_end** pointers

> >> > + *		will point to the start and end of the first fragment only.

> >> > + *		This helper can be used to access subsequent fragments by

> >> > + *		moving the data pointers. To use, an XDP program can call

> >> > + *		this helper with the byte offset of the packet payload that

> >> > + *		it wants to access; the helper will move *xdp_md*\ **->data**

> >> > + *		and *xdp_md *\ **->data_end** so they point to the requested

> >> > + *		payload offset and to the end of the fragment containing this

> >> > + *		byte offset, and return the byte offset of the start of the

> >> > + *		fragment.

> >> 

> >> This comment is wrong now :)

> >

> > actually we are still returning the byte offset of the start of the fragment

> > (base_offset).

> 

> Hmm, right, I was looking at the 'return 0':

> 

> > +BPF_CALL_2(bpf_xdp_adjust_data, struct xdp_buff *, xdp, u32, offset)

> > +{

> > +	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);

> > +	u32 base_offset = xdp->mb.headlen;

> > +	int i;

> > +

> > +	if (!xdp_buff_is_mb(xdp) || offset > sinfo->xdp_frags_size)

> > +		return -EINVAL;

> > +

> > +	if (offset < xdp->mb.headlen) {

> > +		/* linear area */

> > +		xdp->data = xdp->data_hard_start + xdp->mb.headroom;

> > +		xdp->data_end = xdp->data + xdp->mb.headlen;

> > +		return 0;

> > +	}

> 

> But I guess that's an offset; but that means the helper is not doing

> what it says it's doing if it's within the first fragment. That should

> probably be made consistent... :)


ack, right. I will fix it in v12, thanks.

Regards,
Lorenzo

> 

> -Toke

>