diff mbox series

[net] net: Fix gro aggregation for udp encaps with zero csum

Message ID 20210226212248.8300-1-daniel@iogearbox.net
State New
Headers show
Series [net] net: Fix gro aggregation for udp encaps with zero csum | expand

Commit Message

Daniel Borkmann Feb. 26, 2021, 9:22 p.m. UTC
We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
csum for the UDP header itself is 0. In that case, GRO aggregation does
not take place on the phys dev, but instead is deferred to the vxlan/geneve
driver (see trace below).

The reason is essentially that GRO aggregation bails out in udp_gro_receive()
for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing
checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
and napi context has csum_valid set. This is however not the case for zero
UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).

At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches
on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
so it certainly is expected to handle zero csums in udp_gro_receive(). The
purpose of the check added via 662880f44203 ("net: Allow GRO to use and set
levels of checksum unnecessary") seems to catch bad csum and stop aggregation
right away.

One way to fix aggregation in the zero case is to only perform the !csum_valid
check in udp_gro_receive() if uh->check is infact non-zero.

Before:

  [...]
  swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100400 len=1500   (1)
  swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100200 len=1500
  swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101100 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101700 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101b00 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100600 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100f00 len=1500
  swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100a00 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100500 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100700 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101d00 len=1500   (2)
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101000 len=1500
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101c00 len=1500
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101400 len=1500
  swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100e00 len=1500
  swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101600 len=1500
  swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100800 len=774
  swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)
  swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112  (2)
  [...]

  # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

   87380  16384  16384    20.01    13129.24

After:

  [...]
  swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479000 len=11286 (1)
  swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)
  swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d478500 len=2898  (2)
  swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479f00 len=8490  (3)
  swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848  (2)
  swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440  (3)
  [...]

  # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

   87380  16384  16384    20.01    24576.53

Fixes: 57c67ff4bd92 ("udp: additional GRO support")
Fixes: 662880f44203 ("net: Allow GRO to use and set levels of checksum unnecessary")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tom Herbert <tom@herbertland.com>
---
 net/ipv4/udp_offload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Willem de Bruijn Feb. 27, 2021, 4 p.m. UTC | #1
On Fri, Feb 26, 2021 at 4:23 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>

> We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the

> csum for the UDP header itself is 0. In that case, GRO aggregation does

> not take place on the phys dev, but instead is deferred to the vxlan/geneve

> driver (see trace below).

>

> The reason is essentially that GRO aggregation bails out in udp_gro_receive()

> for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,

> others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing

> checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE

> and napi context has csum_valid set. This is however not the case for zero

> UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).

>

> At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches

> on !uh->check ^ !uh2->check as part to determine candidates for aggregation,

> so it certainly is expected to handle zero csums in udp_gro_receive(). The

> purpose of the check added via 662880f44203 ("net: Allow GRO to use and set

> levels of checksum unnecessary") seems to catch bad csum and stop aggregation

> right away.

>

> One way to fix aggregation in the zero case is to only perform the !csum_valid

> check in udp_gro_receive() if uh->check is infact non-zero.

>

> Before:

>

>   [...]

>   swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100400 len=1500   (1)

>   swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100200 len=1500

>   swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101100 len=1500

>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101700 len=1500

>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101b00 len=1500

>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100600 len=1500

>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100f00 len=1500

>   swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100a00 len=1500

>   swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100500 len=1500

>   swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100700 len=1500

>   swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101d00 len=1500   (2)

>   swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101000 len=1500

>   swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101c00 len=1500

>   swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101400 len=1500

>   swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100e00 len=1500

>   swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101600 len=1500

>   swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100800 len=774

>   swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)

>   swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112  (2)

>   [...]

>

>   # netperf -H 10.55.10.4 -t TCP_STREAM -l 20

>   MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo

>   Recv   Send    Send

>   Socket Socket  Message  Elapsed

>   Size   Size    Size     Time     Throughput

>   bytes  bytes   bytes    secs.    10^6bits/sec

>

>    87380  16384  16384    20.01    13129.24

>

> After:

>

>   [...]

>   swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479000 len=11286 (1)

>   swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)

>   swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d478500 len=2898  (2)

>   swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479f00 len=8490  (3)

>   swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848  (2)

>   swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440  (3)

>   [...]

>

>   # netperf -H 10.55.10.4 -t TCP_STREAM -l 20

>   MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo

>   Recv   Send    Send

>   Socket Socket  Message  Elapsed

>   Size   Size    Size     Time     Throughput

>   bytes  bytes   bytes    secs.    10^6bits/sec

>

>    87380  16384  16384    20.01    24576.53

>

> Fixes: 57c67ff4bd92 ("udp: additional GRO support")

> Fixes: 662880f44203 ("net: Allow GRO to use and set levels of checksum unnecessary")

> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

> Cc: Eric Dumazet <edumazet@google.com>

> Cc: Willem de Bruijn <willemb@google.com>

> Cc: John Fastabend <john.fastabend@gmail.com>

> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>

> Cc: Tom Herbert <tom@herbertland.com>


Makes sense to me.

We cannot do checksum conversion with zero field, but that does not
have to limit coalescing.

CHECKSUM_COMPLETE with a checksum validated by
skb_gro_checksum_validate_zero_check implies csum_valid.

So the test

>             (skb->ip_summed != CHECKSUM_PARTIAL &&

>              NAPI_GRO_CB(skb)->csum_cnt == 0 &&

>              !NAPI_GRO_CB(skb)->csum_valid) ||


Basically matches

- CHECKSUM_NONE
- CHECKSUM_UNNECESSARY which has already used up its valid state on a
prior header
- CHECKSUM_COMPLETE with bad checksum.

This change just refines to not drop for in the first two cases on a
zero checksum field.

Making this explicit in case anyone sees holes in the logic. Else,

Acked-by: Willem de Bruijn <willemb@google.com>
John Fastabend Feb. 27, 2021, 5:16 p.m. UTC | #2
Willem de Bruijn wrote:
> On Fri, Feb 26, 2021 at 4:23 PM Daniel Borkmann <daniel@iogearbox.net> wrote:

> >

> > We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the

> > csum for the UDP header itself is 0. In that case, GRO aggregation does

> > not take place on the phys dev, but instead is deferred to the vxlan/geneve

> > driver (see trace below).

> >

> > The reason is essentially that GRO aggregation bails out in udp_gro_receive()

> > for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,

> > others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing

> > checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE

> > and napi context has csum_valid set. This is however not the case for zero

> > UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).

> >

> > At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches

> > on !uh->check ^ !uh2->check as part to determine candidates for aggregation,

> > so it certainly is expected to handle zero csums in udp_gro_receive(). The

> > purpose of the check added via 662880f44203 ("net: Allow GRO to use and set

> > levels of checksum unnecessary") seems to catch bad csum and stop aggregation

> > right away.

> >

> > One way to fix aggregation in the zero case is to only perform the !csum_valid

> > check in udp_gro_receive() if uh->check is infact non-zero.

> >


[...]

> We cannot do checksum conversion with zero field, but that does not

> have to limit coalescing.

> 

> CHECKSUM_COMPLETE with a checksum validated by

> skb_gro_checksum_validate_zero_check implies csum_valid.

> 

> So the test

> 

> >             (skb->ip_summed != CHECKSUM_PARTIAL &&

> >              NAPI_GRO_CB(skb)->csum_cnt == 0 &&

> >              !NAPI_GRO_CB(skb)->csum_valid) ||

> 

> Basically matches

> 

> - CHECKSUM_NONE

> - CHECKSUM_UNNECESSARY which has already used up its valid state on a

> prior header

> - CHECKSUM_COMPLETE with bad checksum.

> 

> This change just refines to not drop for in the first two cases on a

> zero checksum field.


+1

> 

> Making this explicit in case anyone sees holes in the logic. Else,

> 

> Acked-by: Willem de Bruijn <willemb@google.com>


LGTM,

Acked-by: John Fastabend <john.fastabend@gmail.com>
Jakub Kicinski Feb. 28, 2021, 8:02 p.m. UTC | #3
On Sat, 27 Feb 2021 09:16:38 -0800 John Fastabend wrote:
> Willem de Bruijn wrote:

> > On Fri, Feb 26, 2021 at 4:23 PM Daniel Borkmann <daniel@iogearbox.net> wrote:  

> > >

> > > We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the

> > > csum for the UDP header itself is 0. In that case, GRO aggregation does

> > > not take place on the phys dev, but instead is deferred to the vxlan/geneve

> > > driver (see trace below).

> > >

> > > The reason is essentially that GRO aggregation bails out in udp_gro_receive()

> > > for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,

> > > others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing

> > > checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE

> > > and napi context has csum_valid set. This is however not the case for zero

> > > UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).

> > >

> > > At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches

> > > on !uh->check ^ !uh2->check as part to determine candidates for aggregation,

> > > so it certainly is expected to handle zero csums in udp_gro_receive(). The

> > > purpose of the check added via 662880f44203 ("net: Allow GRO to use and set

> > > levels of checksum unnecessary") seems to catch bad csum and stop aggregation

> > > right away.

> > >

> > > One way to fix aggregation in the zero case is to only perform the !csum_valid

> > > check in udp_gro_receive() if uh->check is infact non-zero.

> >

> > Acked-by: Willem de Bruijn <willemb@google.com>  

> 

> Acked-by: John Fastabend <john.fastabend@gmail.com>


Applied, thanks!
diff mbox series

Patch

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index ff39e94781bf..a7f714c4b068 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -466,7 +466,7 @@  struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	}
 
 	if (!sk || NAPI_GRO_CB(skb)->encap_mark ||
-	    (skb->ip_summed != CHECKSUM_PARTIAL &&
+	    (uh->check && skb->ip_summed != CHECKSUM_PARTIAL &&
 	     NAPI_GRO_CB(skb)->csum_cnt == 0 &&
 	     !NAPI_GRO_CB(skb)->csum_valid) ||
 	    !udp_sk(sk)->gro_receive)