[bpf-next,V3,4/6] bpf: make it possible to identify BPF redirected SKBs

Message ID 160216615767.882446.7384364280837100311.stgit@firesoul
State Superseded
Headers show
Series
  • bpf: New approach for BPF MTU handling
Related show

Commit Message

Jesper Dangaard Brouer Oct. 8, 2020, 2:09 p.m.
This change makes it possible to identify SKBs that have been redirected
by TC-BPF (cls_act). This is needed for a number of cases.

(1) For collaborating with driver ifb net_devices.
(2) For avoiding starting generic-XDP prog on TC ingress redirect.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c    |    2 ++
 net/sched/Kconfig |    1 +
 2 files changed, 3 insertions(+)

Comments

Daniel Borkmann Oct. 9, 2020, 4:47 p.m. | #1
On 10/8/20 4:09 PM, Jesper Dangaard Brouer wrote:
> This change makes it possible to identify SKBs that have been redirected
> by TC-BPF (cls_act). This is needed for a number of cases.
> 
> (1) For collaborating with driver ifb net_devices.
> (2) For avoiding starting generic-XDP prog on TC ingress redirect.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Not sure if anyone actually cares about ifb devices, but my worry is that the
generic XDP vs tc interaction has been as-is for quite some time so this change
in behavior could break in the wild.
Maciej Żenczykowski Oct. 9, 2020, 6:33 p.m. | #2
> > This change makes it possible to identify SKBs that have been redirected
> > by TC-BPF (cls_act). This is needed for a number of cases.
> >
> > (1) For collaborating with driver ifb net_devices.
> > (2) For avoiding starting generic-XDP prog on TC ingress redirect.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> Not sure if anyone actually cares about ifb devices, but my worry is that the
> generic XDP vs tc interaction has been as-is for quite some time so this change
> in behavior could break in the wild.

I'm not at all sure of the interactions/implications here.
But I do have a request to enable ifb on Android for ingress rate
limiting and separately we're trying to make XDP work...
So we might at some point end up with cellular interfaces with xdp
ebpf (redirect for forwarding/nat/tethering) + ifb + tc ebpf (for
device local stuff).
But this is still all very vague and 'ideas only' level.
(and in general I think I'd like to get rid of the redirect in tc
ebpf, and leave only xlat64 translation for to-the-device traffic in
there, so maybe there's no problem anyway??)
Jesper Dangaard Brouer Oct. 10, 2020, 11:09 a.m. | #3
On Fri, 9 Oct 2020 11:33:33 -0700
Maciej Żenczykowski <maze@google.com> wrote:

> > > This change makes it possible to identify SKBs that have been redirected
> > > by TC-BPF (cls_act). This is needed for a number of cases.
> > >
> > > (1) For collaborating with driver ifb net_devices.
> > > (2) For avoiding starting generic-XDP prog on TC ingress redirect.
> > >
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>  
> >
> > Not sure if anyone actually cares about ifb devices, but my worry is that the
> > generic XDP vs tc interaction has been as-is for quite some time so this change
> > in behavior could break in the wild.  

No, I believe this happened as recent at kernel v5.2, when Stephen
Hemminger changed this in commit 458bf2f224f0 ("net: core: support XDP
generic on stacked devices.").  And for the record I think that
patch/change was a mistake, as people should not use generic-XDP for
these kind of stacked devices (they should really use TC-BPF as that is
the right tool for the job).


> I'm not at all sure of the interactions/implications here.
> But I do have a request to enable ifb on Android for ingress rate
> limiting and separately we're trying to make XDP work...
> So we might at some point end up with cellular interfaces with xdp
> ebpf (redirect for forwarding/nat/tethering) + ifb + tc ebpf (for
> device local stuff).

To me I was very surprised when I discovered tc-redirect didn't work
with ifb driver.  And it sounds like you have an actual use-case for
this on Android.

> But this is still all very vague and 'ideas only' level.
> (and in general I think I'd like to get rid of the redirect in tc
> ebpf, and leave only xlat64 translation for to-the-device traffic in
> there, so maybe there's no problem anyway??)

I know it sounds strange coming from me "Mr.XDP", but I actaully think
that in many cases you will be better off with using TC-BPF.
Especially on Android, as it will be very hard to get native-XDP
implemented in all these different drivers. (And you don't want to use
generic-XDP, because there is a high chance it causes a reallocation of
the SKB, which is a huge performance hit).
Maciej Żenczykowski Oct. 12, 2020, 9:04 p.m. | #4
On Sat, Oct 10, 2020 at 4:09 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Fri, 9 Oct 2020 11:33:33 -0700
> Maciej Żenczykowski <maze@google.com> wrote:
>
> > > > This change makes it possible to identify SKBs that have been redirected
> > > > by TC-BPF (cls_act). This is needed for a number of cases.
> > > >
> > > > (1) For collaborating with driver ifb net_devices.
> > > > (2) For avoiding starting generic-XDP prog on TC ingress redirect.
> > > >
> > > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > >
> > > Not sure if anyone actually cares about ifb devices, but my worry is that the
> > > generic XDP vs tc interaction has been as-is for quite some time so this change
> > > in behavior could break in the wild.
>
> No, I believe this happened as recent at kernel v5.2, when Stephen
> Hemminger changed this in commit 458bf2f224f0 ("net: core: support XDP
> generic on stacked devices.").  And for the record I think that
> patch/change was a mistake, as people should not use generic-XDP for
> these kind of stacked devices (they should really use TC-BPF as that is
> the right tool for the job).
>
>
> > I'm not at all sure of the interactions/implications here.
> > But I do have a request to enable ifb on Android for ingress rate
> > limiting and separately we're trying to make XDP work...
> > So we might at some point end up with cellular interfaces with xdp
> > ebpf (redirect for forwarding/nat/tethering) + ifb + tc ebpf (for
> > device local stuff).
>
> To me I was very surprised when I discovered tc-redirect didn't work
> with ifb driver.  And it sounds like you have an actual use-case for
> this on Android.
>
> > But this is still all very vague and 'ideas only' level.
> > (and in general I think I'd like to get rid of the redirect in tc
> > ebpf, and leave only xlat64 translation for to-the-device traffic in
> > there, so maybe there's no problem anyway??)
>
> I know it sounds strange coming from me "Mr.XDP", but I actaully think
> that in many cases you will be better off with using TC-BPF.
> Especially on Android, as it will be very hard to get native-XDP
> implemented in all these different drivers. (And you don't want to use
> generic-XDP, because there is a high chance it causes a reallocation of
> the SKB, which is a huge performance hit).

We want the benefits of not allocating/zeroing skb metadata.
We probably can't (always) do zerocopy...

But let's list what we have on at least 1 sample device:
(a) cellular interface receives, no LRO, into skb, no build_skb
so each packet is <= mtu and requires meta alloc, meta zero, payload alloc
on some devices, payload is copied because nic does not receive into
all of system RAM, just SWMMIO style into a small ~60MB buffer.
(b) GRO happens
(c) TC BPF with redirect or routing/forwarding/iptables
(d) GSO happens, cause no TSO at NCM usb driver
(e) NCM driver copies payload, discards skb.
[and it allocates around 1 more skb per 16KB]

so we basically have at least 2 allocs and 2 payload copies per <=1500 packet
(and cellular mtus are likely closer to 1280 then 1500)

Lots of room for improvement - GRO/GSO are probably a net loss
(unclear) and all that allocation/copies.
If I can get XDP to eliminate the skb meta allocation and the fast
path payload copy in the cellular driver.
(so we only have copy from xdp frame into skb), then it's already a
huge win - we're down to 1 copy in NCM driver.
NCM could technically not require a copy with USB controller SG, but
current demo patches for that are not a win.
(Most likely usb controller is crappy, but lots of work left)
If forwarding/tethering is through XDP Redirect, then I also win due
to no GRO/GSO on that path.
(at least I think so)

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 9d55bf5d1a65..b433098896b2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3885,6 +3885,7 @@  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 		return NULL;
 	case TC_ACT_REDIRECT:
 		/* No need to push/pop skb's mac_header here on egress! */
+		skb_set_redirected(skb, false);
 		skb_do_redirect(skb);
 		*ret = NET_XMIT_SUCCESS;
 		return NULL;
@@ -4974,6 +4975,7 @@  sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		 * redirecting to another netdev
 		 */
 		__skb_push(skb, skb->mac_len);
+		skb_set_redirected(skb, true);
 		skb_do_redirect(skb);
 		return NULL;
 	case TC_ACT_CONSUMED:
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index a3b37d88800e..a1bbaa8fd054 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -384,6 +384,7 @@  config NET_SCH_INGRESS
 	depends on NET_CLS_ACT
 	select NET_INGRESS
 	select NET_EGRESS
+	select NET_REDIRECT
 	help
 	  Say Y here if you want to use classifiers for incoming and/or outgoing
 	  packets. This qdisc doesn't do anything else besides running classifiers,