mbox series

[net-next,v7,0/3] net: gro: move p->{flush/flush_id} calculations to L4

Message ID 20240412155533.115507-1-richardbgobert@gmail.com
Headers show
Series net: gro: move p->{flush/flush_id} calculations to L4 | expand

Message

Richard Gobert April 12, 2024, 3:55 p.m. UTC
This patch series depends on commits in the series submitted to net.
(https://lore.kernel.org/netdev/20240412152120.115067-1-richardbgobert@gmail.com/)

The fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.

3rd patch adds tests for different flush_id flows in GRO.

v6 -> v7:
 - Moved bug fixes to a separate submission in net
 - Added UDP fwd benchmark
 - v6:
   https://lore.kernel.org/all/20240410153423.107381-1-richardbgobert@gmail.com/

v5 -> v6:
 - Write inner_network_offset in vxlan and geneve
 - Ignore is_atomic when DF=0
 - v5:
   https://lore.kernel.org/all/20240408141720.98832-1-richardbgobert@gmail.com/

v4 -> v5:
 - Add 1st commit - flush id checks in udp_gro_receive segment which can be
   backported by itself
 - Add TCP measurements for the 5th commit
 - Add flush id tests to ensure flush id logic is preserved in GRO
 - Simplify gro_inet_flush by removing a branch
 - v4:
   https://lore.kernel.org/all/202420325182543.87683-1-richardbgobert@gmail.com/

v3 -> v4:
 - Fix code comment and commit message typos
 - v3:
   https://lore.kernel.org/all/f939c84a-2322-4393-a5b0-9b1e0be8ed8e@gmail.com/

v2 -> v3:
 - Use napi_gro_cb instead of skb->{offset}
 - v2:
   https://lore.kernel.org/all/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.com/

v1 -> v2:
 - Pass p_off in *_gro_complete to fix UDP bug
 - Remove more conditionals and memory fetches from inet_gro_flush
 - v1:
   https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.com/

Richard Gobert (3):
  net: gro: add {inner_}network_offset to napi_gro_cb
  net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment
  selftests/net: add flush id selftests

 drivers/net/geneve.c              |   1 +
 drivers/net/vxlan/vxlan_core.c    |   1 +
 include/net/gro.h                 |  82 +++++++++++++++--
 net/8021q/vlan_core.c             |   2 +
 net/core/gro.c                    |   5 +-
 net/ethernet/eth.c                |   1 +
 net/ipv4/af_inet.c                |  46 +---------
 net/ipv4/gre_offload.c            |   1 +
 net/ipv4/tcp_offload.c            |  15 +---
 net/ipv4/udp_offload.c            |  16 +---
 net/ipv6/ip6_offload.c            |  19 +---
 tools/testing/selftests/net/gro.c | 144 ++++++++++++++++++++++++++++++
 12 files changed, 238 insertions(+), 95 deletions(-)

Comments

Richard Gobert April 17, 2024, 1:57 p.m. UTC | #1
Willem de Bruijn wrote:
> Richard Gobert wrote:
>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
>> makes sure both are set correctly. In the common path there's only one
>> write (skb_gro_reset_offset).
>>
>> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
>> ---
>>  drivers/net/geneve.c           |  1 +
>>  drivers/net/vxlan/vxlan_core.c |  1 +
>>  include/net/gro.h              | 18 ++++++++++++++++--
>>  net/8021q/vlan_core.c          |  2 ++
>>  net/core/gro.c                 |  1 +
>>  net/ethernet/eth.c             |  1 +
>>  net/ipv4/af_inet.c             |  5 +----
>>  net/ipv4/gre_offload.c         |  1 +
>>  net/ipv6/ip6_offload.c         |  8 ++++----
>>  9 files changed, 28 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
>> index d4520c3f7c09..ae596285d78c 100644
>> --- a/net/ipv4/gre_offload.c
>> +++ b/net/ipv4/gre_offload.c
>> @@ -224,6 +224,7 @@ static struct sk_buff *gre_gro_receive(struct list_head *head,
>>  	/* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
>>  	skb_gro_postpull_rcsum(skb, greh, grehlen);
>>  
>> +	NAPI_GRO_CB(skb)->inner_network_offset = hlen;
>>  	pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
>>  	flush = 0;
> 
> Nice that this even works for ETH_P_TEB, as eth_gro_receive will
> overwrite the offset written here.
>   
>   
>>  	list_for_each_entry(p, head, list) {
>>  		const struct ipv6hdr *iph2;
>> @@ -327,6 +325,7 @@ static struct sk_buff *sit_ip6ip6_gro_receive(struct list_head *head,
>>  	}
>>  
>>  	NAPI_GRO_CB(skb)->encap_mark = 1;
>> +	NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);
>>  
>>  	return ipv6_gro_receive(head, skb);
>>  }
>> @@ -342,6 +341,7 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
>>  	}
>>  
>>  	NAPI_GRO_CB(skb)->encap_mark = 1;
>> +	NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);
> 
> Do we still need encap_mark, or is it always set at the same time that
> inner_network_offset becomes non-zero?
> 

This would require setting inner_network_header to 0 before dev_gro_receive
which would not be favorable to the common case. (As opposed to encap_mark
which is already set to 0 as being part of NAPI_GRO_CB->zeroed). In my
opinion, it might also be less readable.
Richard Gobert April 17, 2024, 2:01 p.m. UTC | #2
Willem de Bruijn wrote:
> Richard Gobert wrote:
>> Added flush id selftests to test different cases where DF flag is set or
>> unset and id value changes in the following packets. All cases where the
>> packets should coalesce or should not coalesce are tested.
>>
>> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
> 
> Thanks for adding tests. Minor point below only. The tests pass both
> before and after your series, right? Then immediately a nice
> validation that the optimization has no unintended side-effects.
> 

Yes, the logic is preserved - tests pass both in net-next and after
applying the patch :)

>> ---
>>  tools/testing/selftests/net/gro.c | 144 ++++++++++++++++++++++++++++++
>>  1 file changed, 144 insertions(+)
>>
>> diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c
>> index 353e1e867fbb..74ab06953c38 100644
>> --- a/tools/testing/selftests/net/gro.c
>> +++ b/tools/testing/selftests/net/gro.c
>> @@ -617,6 +617,120 @@ static void add_ipv6_exthdr(void *buf, void *optpkt, __u8 exthdr_type, char *ext
>>  	iph->payload_len = htons(ntohs(iph->payload_len) + MIN_EXTHDR_SIZE);
>>  }
>>  
>> +static void fix_ip4_checksum(struct iphdr *iph)
>> +{
>> +	iph->check = 0;
>> +	iph->check = checksum_fold(iph, sizeof(struct iphdr), 0);
>> +}
>> +
>> +static void send_flush_id_case(int fd, struct sockaddr_ll *daddr, int tcase)
>> +{
>> +	bool send_three = false;
>> +	static char buf1[MAX_HDR_LEN + PAYLOAD_LEN];
>> +	static char buf2[MAX_HDR_LEN + PAYLOAD_LEN];
>> +	static char buf3[MAX_HDR_LEN + PAYLOAD_LEN];
>> +
>> +	create_packet(buf1, 0, 0, PAYLOAD_LEN, 0);
>> +	create_packet(buf2, PAYLOAD_LEN, 0, PAYLOAD_LEN, 0);
>> +	create_packet(buf3, PAYLOAD_LEN * 2, 0, PAYLOAD_LEN, 0);
>> +
>> +	struct iphdr *iph1 = (struct iphdr *)(buf1 + ETH_HLEN);
>> +	struct iphdr *iph2 = (struct iphdr *)(buf2 + ETH_HLEN);
>> +	struct iphdr *iph3 = (struct iphdr *)(buf3 + ETH_HLEN);
>> +
> 
> minor: variable defintions before code, and reverse chrismas tree.

Good catch, I'll apply these changes and push v8 when the relevant series
for net will be merged. Thanks!
Richard Gobert April 18, 2024, 3:09 p.m. UTC | #3
Paolo Abeni wrote:
> On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote:
>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
>> makes sure both are set correctly. In the common path there's only one
>> write (skb_gro_reset_offset).
>>
>> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
> 
> Does not apply cleanly to net-next. You have to wait until the net
> dependency is merged into net-next before posting.
> 
>> ---
>>  drivers/net/geneve.c           |  1 +
>>  drivers/net/vxlan/vxlan_core.c |  1 +
>>  include/net/gro.h              | 18 ++++++++++++++++--
>>  net/8021q/vlan_core.c          |  2 ++
>>  net/core/gro.c                 |  1 +
>>  net/ethernet/eth.c             |  1 +
>>  net/ipv4/af_inet.c             |  5 +----
>>  net/ipv4/gre_offload.c         |  1 +
>>  net/ipv6/ip6_offload.c         |  8 ++++----
>>  9 files changed, 28 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
>> index 9c18a39b0d0c..a6256ea1f5bc 100644
>> --- a/drivers/net/geneve.c
>> +++ b/drivers/net/geneve.c
>> @@ -545,6 +545,7 @@ static struct sk_buff *geneve_gro_receive(struct sock *sk,
>>  	if (!ptype)
>>  		goto out;
>>  
>> +	NAPI_GRO_CB(skb)->inner_network_offset = hlen;
>>  	pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
>>  	flush = 0;
>>  
>> diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
>> index 6fb182d9d6e7..9fb93c3953c1 100644
>> --- a/drivers/net/vxlan/vxlan_core.c
>> +++ b/drivers/net/vxlan/vxlan_core.c
>> @@ -754,6 +754,7 @@ static struct sk_buff *vxlan_gpe_gro_receive(struct sock *sk,
>>  
>>  	vh = vxlan_gro_prepare_receive(sk, head, skb, &grc);
>>  	if (vh) {
>> +		NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);
>>  		if (!vxlan_parse_gpe_proto(vh, &protocol))
>>  			goto out;
>>  		ptype = gro_find_receive_by_type(protocol);
> 
> What about vxlan_gro_receive? and fou/gue?
> 

No need to write in fou/gue functions, as both functions call
{inet,inet6}_offloads, which means if there's an IP/IPv6 header after
fou/gue - ipip_gro_receive will be called (or ip6ip6_gro_receive, or
sit_ip6ip6_gro_receive, etc), in which inner_network_offset is written.

vxlan_gro_receive calls eth_gro_receive, in which inner_network_offset
is written as well.

> Side note: the latter apparently exist mainly to make UDP-related
> changes more difficult, can we deprecated them once for all?
> 
> Thank,
> 
> Paolo
>