From patchwork Wed Nov 18 15:29:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 327856 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EB87C63777 for ; Wed, 18 Nov 2020 15:29:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2E91524743 for ; Wed, 18 Nov 2020 15:29:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Vj9Z2FKc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727317AbgKRP3r (ORCPT ); Wed, 18 Nov 2020 10:29:47 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:53782 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725832AbgKRP3q (ORCPT ); Wed, 18 Nov 2020 10:29:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1605713386; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4jFpeGAhh6Fb5jG1FvpIjRZn9YY2hHww4xO7ltwcqts=; b=Vj9Z2FKcJuSZzx4vbLGr1Ypqmor1W9wNRrm2IAdnPITpZE/XnkAYDahmZWya6z1/P6JctR zAZCV51kGyJVr9ONx0rS3VMg4Z3QpIVY8/k0Oc2M6kRQHehIdY3+TsZSMgJBZgP+AldtGx foaOVooCGCCMRbNnzibb7wrDkB1EVpM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-373-K1bTl2RQNyWCwRo6Oynxyg-1; Wed, 18 Nov 2020 10:29:44 -0500 X-MC-Unique: K1bTl2RQNyWCwRo6Oynxyg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E806410866A7; Wed, 18 Nov 2020 15:29:41 +0000 (UTC) Received: from firesoul.localdomain (unknown [10.40.208.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7F5BD60C05; Wed, 18 Nov 2020 15:29:36 +0000 (UTC) Received: from [192.168.42.3] (localhost [IPv6:::1]) by firesoul.localdomain (Postfix) with ESMTP id 6A7FA32138456; Wed, 18 Nov 2020 16:29:35 +0100 (CET) Subject: [PATCH bpf-next V6 2/7] bpf: fix bpf_fib_lookup helper MTU check for SKB ctx From: Jesper Dangaard Brouer To: bpf@vger.kernel.org Cc: Jesper Dangaard Brouer , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , maze@google.com, lmb@cloudflare.com, shaun@tigera.io, Lorenzo Bianconi , marek@cloudflare.com, John Fastabend , Jakub Kicinski , eyal.birger@gmail.com, colrack@gmail.com Date: Wed, 18 Nov 2020 16:29:35 +0100 Message-ID: <160571337537.2801246.15228178384451037535.stgit@firesoul> In-Reply-To: <160571331409.2801246.11527010115263068327.stgit@firesoul> References: <160571331409.2801246.11527010115263068327.stgit@firesoul> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org BPF end-user on Cilium slack-channel (Carlo Carraro) wants to use bpf_fib_lookup for doing MTU-check, but *prior* to extending packet size, by adjusting fib_params 'tot_len' with the packet length plus the expected encap size. (Just like the bpf_check_mtu helper supports). He discovered that for SKB ctx the param->tot_len was not used, instead skb->len was used (via MTU check in is_skb_forwardable()). Fix this by using fib_params 'tot_len' for MTU check. If not provided (e.g. zero) then keep existing behaviour intact. Fixes: 4c79579b44b1 ("bpf: Change bpf_fib_lookup to return lookup status") Reported-by: Carlo Carraro Signed-off-by: Jesper Dangaard Brouer --- net/core/filter.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 1ee97fdeea64..ae1fe8e6069a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5567,10 +5567,20 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb, if (!rc) { struct net_device *dev; + u32 mtu; dev = dev_get_by_index_rcu(net, params->ifindex); - if (!is_skb_forwardable(dev, skb)) + mtu = dev->mtu; + + /* Using tot_len for L3 MTU check if provided by user. Notice at + * this TC cls_bpf level skb->len contains L2 size, but + * is_skb_forwardable takes that into account. + */ + if (params->tot_len > mtu) { rc = BPF_FIB_LKUP_RET_FRAG_NEEDED; + } else if (!is_skb_forwardable(dev, skb)) { + rc = BPF_FIB_LKUP_RET_FRAG_NEEDED; + } } return rc; From patchwork Wed Nov 18 15:29:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 327855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19A85C5519F for ; Wed, 18 Nov 2020 15:30:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9CAA424766 for ; Wed, 18 Nov 2020 15:30:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PcuQvRBb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727338AbgKRP37 (ORCPT ); Wed, 18 Nov 2020 10:29:59 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:38968 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727334AbgKRP37 (ORCPT ); Wed, 18 Nov 2020 10:29:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1605713396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O06k635e0t3CmipauzaP3n7qZpHshjGFzUMUSCntWCU=; b=PcuQvRBbiAFsjsBI12CTcam+u6W4gRnBYLxdPt3bB8H/TMXI3FHTyuY/k8C6vPohaHeDSt S/X8jJ5UabGZDuDTj/vdRHAANhnlznE7cX2hwk64jUUISlFZw+CjkaCE15/TWLsUvx9j1H CHwAGSMUgsnJgf5CgQFpurJDe9JoP+Y= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-482-QQiC7906NN6WlRkWEGzFKA-1; Wed, 18 Nov 2020 10:29:49 -0500 X-MC-Unique: QQiC7906NN6WlRkWEGzFKA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4F75410866D6; Wed, 18 Nov 2020 15:29:47 +0000 (UTC) Received: from firesoul.localdomain (unknown [10.40.208.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id 94B5C5D9CA; Wed, 18 Nov 2020 15:29:46 +0000 (UTC) Received: from [192.168.42.3] (localhost [IPv6:::1]) by firesoul.localdomain (Postfix) with ESMTP id 940A332138454; Wed, 18 Nov 2020 16:29:45 +0100 (CET) Subject: [PATCH bpf-next V6 4/7] bpf: add BPF-helper for MTU checking From: Jesper Dangaard Brouer To: bpf@vger.kernel.org Cc: Jesper Dangaard Brouer , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , maze@google.com, lmb@cloudflare.com, shaun@tigera.io, Lorenzo Bianconi , marek@cloudflare.com, John Fastabend , Jakub Kicinski , eyal.birger@gmail.com, colrack@gmail.com Date: Wed, 18 Nov 2020 16:29:45 +0100 Message-ID: <160571338553.2801246.16056207176480511227.stgit@firesoul> In-Reply-To: <160571331409.2801246.11527010115263068327.stgit@firesoul> References: <160571331409.2801246.11527010115263068327.stgit@firesoul> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs. The SKB object is complex and the skb->len value (accessible from BPF-prog) also include the length of any extra GRO/GSO segments, but without taking into account that these GRO/GSO segments get added transport (L4) and network (L3) headers before being transmitted. Thus, this BPF-helper is created such that the BPF-programmer don't need to handle these details in the BPF-prog. The API is designed to help the BPF-programmer, that want to do packet context size changes, which involves other helpers. These other helpers usually does a delta size adjustment. This helper also support a delta size (len_diff), which allow BPF-programmer to reuse arguments needed by these other helpers, and perform the MTU check prior to doing any actual size adjustment of the packet context. It is on purpose, that we allow the len adjustment to become a negative result, that will pass the MTU check. This might seem weird, but it's not this helpers responsibility to "catch" wrong len_diff adjustments. Other helpers will take care of these checks, if BPF-programmer chooses to do actual size adjustment. V6: - Took John's advice and dropped BPF_MTU_CHK_RELAX - Returned MTU is kept at L3-level (like fib_lookup) V4: Lot of changes - ifindex 0 now use current netdev for MTU lookup - rename helper from bpf_mtu_check to bpf_check_mtu - fix bug for GSO pkt length (as skb->len is total len) - remove __bpf_len_adj_positive, simply allow negative len adj Signed-off-by: Jesper Dangaard Brouer --- include/uapi/linux/bpf.h | 67 ++++++++++++++++++++++ net/core/filter.c | 122 ++++++++++++++++++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 67 ++++++++++++++++++++++ 3 files changed, 256 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index beacd312ea17..2619ea8c5a08 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3790,6 +3790,61 @@ union bpf_attr { * *ARG_PTR_TO_BTF_ID* of type *task_struct*. * Return * Pointer to the current task. + * + * int bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff, u64 flags) + * Description + * Check ctx packet size against MTU of net device (based on + * *ifindex*). This helper will likely be used in combination with + * helpers that adjust/change the packet size. The argument + * *len_diff* can be used for querying with a planned size + * change. This allows to check MTU prior to changing packet ctx. + * + * Specifying *ifindex* zero means the MTU check is performed + * against the current net device. This is practical if this isn't + * used prior to redirect. + * + * The Linux kernel route table can configure MTUs on a more + * specific per route level, which is not provided by this helper. + * For route level MTU checks use the **bpf_fib_lookup**\ () + * helper. + * + * *ctx* is either **struct xdp_md** for XDP programs or + * **struct sk_buff** for tc cls_act programs. + * + * The *flags* argument can be a combination of one or more of the + * following values: + * + * **BPF_MTU_CHK_SEGS** + * This flag will only works for *ctx* **struct sk_buff**. + * If packet context contains extra packet segment buffers + * (often knows as GSO skb), then MTU check is harder to + * check at this point, because in transmit path it is + * possible for the skb packet to get re-segmented + * (depending on net device features). This could still be + * a MTU violation, so this flag enables performing MTU + * check against segments, with a different violation + * return code to tell it apart. Check cannot use len_diff. + * + * On return *mtu_len* pointer contains the MTU value of the net + * device. Remember the net device configured MTU is the L3 size, + * which is returned here and XDP and TX length operate at L2. + * Helper take this into account for you, but remember when using + * MTU value in your BPF-code. On input *mtu_len* must be a valid + * pointer and be initialized (to zero), else verifier will reject + * BPF program. + * + * Return + * * 0 on success, and populate MTU value in *mtu_len* pointer. + * + * * < 0 if any input argument is invalid (*mtu_len* not updated) + * + * MTU violations return positive values, but also populate MTU + * value in *mtu_len* pointer, as this can be needed for + * implementing PMTU handing: + * + * * **BPF_MTU_CHK_RET_FRAG_NEEDED** + * * **BPF_MTU_CHK_RET_SEGS_TOOBIG** + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3951,6 +4006,7 @@ union bpf_attr { FN(task_storage_get), \ FN(task_storage_delete), \ FN(get_current_task_btf), \ + FN(check_mtu), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -4978,6 +5034,17 @@ struct bpf_redir_neigh { }; }; +/* bpf_check_mtu flags*/ +enum bpf_check_mtu_flags { + BPF_MTU_CHK_SEGS = (1U << 0), +}; + +enum bpf_check_mtu_ret { + BPF_MTU_CHK_RET_SUCCESS, /* check and lookup successful */ + BPF_MTU_CHK_RET_FRAG_NEEDED, /* fragmentation required to fwd */ + BPF_MTU_CHK_RET_SEGS_TOOBIG, /* GSO re-segmentation needed to fwd */ +}; + enum bpf_task_fd_type { BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ BPF_FD_TYPE_TRACEPOINT, /* tp name */ diff --git a/net/core/filter.c b/net/core/filter.c index 0712b7e5d9fb..7e8d2475a205 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5604,6 +5604,124 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = { .arg4_type = ARG_ANYTHING, }; +static struct net_device *__dev_via_ifindex(struct net_device *dev_curr, + u32 ifindex) +{ + struct net *netns = dev_net(dev_curr); + + /* Non-redirect use-cases can use ifindex=0 and save ifindex lookup */ + if (ifindex == 0) + return dev_curr; + + return dev_get_by_index_rcu(netns, ifindex); +} + +BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb, + u32, ifindex, u32 *, mtu_len, s32, len_diff, u64, flags) +{ + int ret = BPF_MTU_CHK_RET_FRAG_NEEDED; + struct net_device *dev = skb->dev; + int len; + int mtu; + + if (flags & ~(BPF_MTU_CHK_SEGS)) + return -EINVAL; + + dev = __dev_via_ifindex(dev, ifindex); + if (!dev) + return -ENODEV; + + mtu = READ_ONCE(dev->mtu); + + /* TC len is L2, remove L2-header as dev MTU is L3 size */ + len = skb->len - ETH_HLEN; + + len += len_diff; /* len_diff can be negative, minus result pass check */ + if (len <= mtu) { + ret = BPF_MTU_CHK_RET_SUCCESS; + goto out; + } + /* At this point, skb->len exceed MTU, but as it include length of all + * segments, it can still be below MTU. The SKB can possibly get + * re-segmented in transmit path (see validate_xmit_skb). Thus, user + * must choose if segs are to be MTU checked. Last SKB "headlen" is + * checked against MTU. + */ + if (skb_is_gso(skb)) { + ret = BPF_MTU_CHK_RET_SUCCESS; + + if (flags & BPF_MTU_CHK_SEGS && + skb_gso_validate_network_len(skb, mtu)) { + ret = BPF_MTU_CHK_RET_SEGS_TOOBIG; + goto out; + } + + len = skb_headlen(skb) - ETH_HLEN + len_diff; + if (len > mtu) { + ret = BPF_MTU_CHK_RET_FRAG_NEEDED; + goto out; + } + } +out: + /* BPF verifier guarantees valid pointer */ + *mtu_len = mtu; + + return ret; +} + +BPF_CALL_5(bpf_xdp_check_mtu, struct xdp_buff *, xdp, + u32, ifindex, u32 *, mtu_len, s32, len_diff, u64, flags) +{ + struct net_device *dev = xdp->rxq->dev; + int len = xdp->data_end - xdp->data; + int ret = BPF_MTU_CHK_RET_SUCCESS; + int mtu; + + /* XDP variant doesn't support multi-buffer segment check (yet) */ + if (flags) + return -EINVAL; + + dev = __dev_via_ifindex(dev, ifindex); + if (!dev) + return -ENODEV; + + mtu = READ_ONCE(dev->mtu); + + /* XDP len is L2, remove L2-header as dev MTU is L3 size */ + len -= ETH_HLEN; + + len += len_diff; /* len_diff can be negative, minus result pass check */ + if (len > mtu) + ret = BPF_MTU_CHK_RET_FRAG_NEEDED; + + /* BPF verifier guarantees valid pointer */ + *mtu_len = mtu; + + return ret; +} + +static const struct bpf_func_proto bpf_skb_check_mtu_proto = { + .func = bpf_skb_check_mtu, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_INT, + .arg4_type = ARG_ANYTHING, + .arg5_type = ARG_ANYTHING, +}; + +static const struct bpf_func_proto bpf_xdp_check_mtu_proto = { + .func = bpf_xdp_check_mtu, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_INT, + .arg4_type = ARG_ANYTHING, + .arg5_type = ARG_ANYTHING, +}; + #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len) { @@ -7169,6 +7287,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_socket_uid_proto; case BPF_FUNC_fib_lookup: return &bpf_skb_fib_lookup_proto; + case BPF_FUNC_check_mtu: + return &bpf_skb_check_mtu_proto; case BPF_FUNC_sk_fullsock: return &bpf_sk_fullsock_proto; case BPF_FUNC_sk_storage_get: @@ -7238,6 +7358,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_xdp_adjust_tail_proto; case BPF_FUNC_fib_lookup: return &bpf_xdp_fib_lookup_proto; + case BPF_FUNC_check_mtu: + return &bpf_xdp_check_mtu_proto; #ifdef CONFIG_INET case BPF_FUNC_sk_lookup_udp: return &bpf_xdp_sk_lookup_udp_proto; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index beacd312ea17..2619ea8c5a08 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3790,6 +3790,61 @@ union bpf_attr { * *ARG_PTR_TO_BTF_ID* of type *task_struct*. * Return * Pointer to the current task. + * + * int bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff, u64 flags) + * Description + * Check ctx packet size against MTU of net device (based on + * *ifindex*). This helper will likely be used in combination with + * helpers that adjust/change the packet size. The argument + * *len_diff* can be used for querying with a planned size + * change. This allows to check MTU prior to changing packet ctx. + * + * Specifying *ifindex* zero means the MTU check is performed + * against the current net device. This is practical if this isn't + * used prior to redirect. + * + * The Linux kernel route table can configure MTUs on a more + * specific per route level, which is not provided by this helper. + * For route level MTU checks use the **bpf_fib_lookup**\ () + * helper. + * + * *ctx* is either **struct xdp_md** for XDP programs or + * **struct sk_buff** for tc cls_act programs. + * + * The *flags* argument can be a combination of one or more of the + * following values: + * + * **BPF_MTU_CHK_SEGS** + * This flag will only works for *ctx* **struct sk_buff**. + * If packet context contains extra packet segment buffers + * (often knows as GSO skb), then MTU check is harder to + * check at this point, because in transmit path it is + * possible for the skb packet to get re-segmented + * (depending on net device features). This could still be + * a MTU violation, so this flag enables performing MTU + * check against segments, with a different violation + * return code to tell it apart. Check cannot use len_diff. + * + * On return *mtu_len* pointer contains the MTU value of the net + * device. Remember the net device configured MTU is the L3 size, + * which is returned here and XDP and TX length operate at L2. + * Helper take this into account for you, but remember when using + * MTU value in your BPF-code. On input *mtu_len* must be a valid + * pointer and be initialized (to zero), else verifier will reject + * BPF program. + * + * Return + * * 0 on success, and populate MTU value in *mtu_len* pointer. + * + * * < 0 if any input argument is invalid (*mtu_len* not updated) + * + * MTU violations return positive values, but also populate MTU + * value in *mtu_len* pointer, as this can be needed for + * implementing PMTU handing: + * + * * **BPF_MTU_CHK_RET_FRAG_NEEDED** + * * **BPF_MTU_CHK_RET_SEGS_TOOBIG** + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3951,6 +4006,7 @@ union bpf_attr { FN(task_storage_get), \ FN(task_storage_delete), \ FN(get_current_task_btf), \ + FN(check_mtu), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -4978,6 +5034,17 @@ struct bpf_redir_neigh { }; }; +/* bpf_check_mtu flags*/ +enum bpf_check_mtu_flags { + BPF_MTU_CHK_SEGS = (1U << 0), +}; + +enum bpf_check_mtu_ret { + BPF_MTU_CHK_RET_SUCCESS, /* check and lookup successful */ + BPF_MTU_CHK_RET_FRAG_NEEDED, /* fragmentation required to fwd */ + BPF_MTU_CHK_RET_SEGS_TOOBIG, /* GSO re-segmentation needed to fwd */ +}; + enum bpf_task_fd_type { BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ BPF_FD_TYPE_TRACEPOINT, /* tp name */ From patchwork Wed Nov 18 15:29:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jesper Dangaard Brouer X-Patchwork-Id: 327854 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CA03C63777 for ; Wed, 18 Nov 2020 15:30:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B100F2476C for ; Wed, 18 Nov 2020 15:30:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LB6Bn3YL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727353AbgKRPaJ (ORCPT ); Wed, 18 Nov 2020 10:30:09 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:27812 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727347AbgKRPaH (ORCPT ); Wed, 18 Nov 2020 10:30:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1605713406; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+KjiaZnt5TvPhVwqun5U0/5VXraitAEOcDnJu1jwtF8=; b=LB6Bn3YLM/Yl5zGifdd96FJe7EOFSwfvN/STjipCT6p32xgJoe2Bim+NS/cYLjmi60mbWS U9AUzka9f95Gc1EDfupo2LEKVRxIozUWwx77gxNwyA1MXLCWxKIM+eCIb5Md2WxhOV/nzD kKwG+lCQyJDxB5nleNiY3EReoNI7+ro= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-57-5cRmS4a_MOqlPfaI45M3Aw-1; Wed, 18 Nov 2020 10:30:02 -0500 X-MC-Unique: 5cRmS4a_MOqlPfaI45M3Aw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0B53B593B2; Wed, 18 Nov 2020 15:30:00 +0000 (UTC) Received: from firesoul.localdomain (unknown [10.40.208.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id BB2031964B; Wed, 18 Nov 2020 15:29:56 +0000 (UTC) Received: from [192.168.42.3] (localhost [IPv6:::1]) by firesoul.localdomain (Postfix) with ESMTP id B913732138454; Wed, 18 Nov 2020 16:29:55 +0100 (CET) Subject: [PATCH bpf-next V6 6/7] bpf: make it possible to identify BPF redirected SKBs From: Jesper Dangaard Brouer To: bpf@vger.kernel.org Cc: Jesper Dangaard Brouer , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , maze@google.com, lmb@cloudflare.com, shaun@tigera.io, Lorenzo Bianconi , marek@cloudflare.com, John Fastabend , Jakub Kicinski , eyal.birger@gmail.com, colrack@gmail.com Date: Wed, 18 Nov 2020 16:29:55 +0100 Message-ID: <160571339569.2801246.446458790928377797.stgit@firesoul> In-Reply-To: <160571331409.2801246.11527010115263068327.stgit@firesoul> References: <160571331409.2801246.11527010115263068327.stgit@firesoul> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This change makes it possible to identify SKBs that have been redirected by TC-BPF (cls_act). This is needed for a number of cases. (1) For collaborating with driver ifb net_devices. (2) For avoiding starting generic-XDP prog on TC ingress redirect. It is most important to fix XDP case(2), because this can break userspace when a driver gets support for native-XDP. Imagine userspace loads XDP prog on eth0, which fallback to generic-XDP, and it process TC-redirected packets. When kernel is updated with native-XDP support for eth0, then the program no-longer see the TC-redirected packets. Therefore it is important to keep the order intact; that XDP runs before TC-BPF. Signed-off-by: Jesper Dangaard Brouer --- net/core/dev.c | 2 ++ net/sched/Kconfig | 1 + 2 files changed, 3 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 6ceb6412ee97..26b40f8005ae 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3872,6 +3872,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) return NULL; case TC_ACT_REDIRECT: /* No need to push/pop skb's mac_header here on egress! */ + skb_set_redirected(skb, false); skb_do_redirect(skb); *ret = NET_XMIT_SUCCESS; return NULL; @@ -4963,6 +4964,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, * redirecting to another netdev */ __skb_push(skb, skb->mac_len); + skb_set_redirected(skb, true); if (skb_do_redirect(skb) == -EAGAIN) { __skb_pull(skb, skb->mac_len); *another = true; diff --git a/net/sched/Kconfig b/net/sched/Kconfig index a3b37d88800e..a1bbaa8fd054 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -384,6 +384,7 @@ config NET_SCH_INGRESS depends on NET_CLS_ACT select NET_INGRESS select NET_EGRESS + select NET_REDIRECT help Say Y here if you want to use classifiers for incoming and/or outgoing packets. This qdisc doesn't do anything else besides running classifiers,