From patchwork Sat Nov 7 19:38:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 322146 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EA18C388F7 for ; Sat, 7 Nov 2020 19:38:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D6BD921707 for ; Sat, 7 Nov 2020 19:38:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="g5J8Dz/H" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728651AbgKGTi3 (ORCPT ); Sat, 7 Nov 2020 14:38:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725836AbgKGTi2 (ORCPT ); Sat, 7 Nov 2020 14:38:28 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C62B1C0613CF; Sat, 7 Nov 2020 11:38:28 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id t16so5389391oie.11; Sat, 07 Nov 2020 11:38:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=lu8xFLzKq5EcwF/rmpzYgaF3GRt/mG2wCMUgnyFW2F4=; b=g5J8Dz/HsGZtjd0AFilkNvenRhKE/uv0CLpDugjcbQKLK8o3WKw5ow7SpIhN6lvu3C ZuUcxmqC0D7k7vPpEGQsSn3axH2mnycVFxW+29tIAkeH1UWLTsxTUwseVLXunKqUM5AK bdUng6sV/8hwrFTGB4eV6knoOmU6YwLDQdtYgq0SNOInPO5hTC4FfK7I8sKKvHvdvTLT 9ncFd1C6rM718vYKb5SfvZnDtUc3pQAJBrAn3wjtgr7Q7mX26VxW+49ie+gGxQr6kos+ UKzbrASiDVCDdDWXnmJDOE7LwL0XvnIKd4cQ0e5rzGhHQtS+UI7jdt6NMsPioNIOWOqE K64A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=lu8xFLzKq5EcwF/rmpzYgaF3GRt/mG2wCMUgnyFW2F4=; b=D72o1ApxNLrpkxV6S7w2NKGyAoIBKpMqvA0/Xbl0/RjFcgmtzSMCED6PKW7PoOIZLv Oj5y7XMh7xUrhvesti3sPjkrEuIK8ZjAZfvlj/WiOWCqjY12RsxcUndGMzjJSR/uYbdb AhDC8p5JbYv7k+znvYmx11kSGgA2q7aeCnvjr2VJlwi03DWMfCIcWIFOqHUU4bZ5j5Xr joNmjiIjhoSSoBJ+i0n26EMCuGyqnA3JSYHgX4070x501w89PG/qgvStZVHcKQaMS3mx R0b5Rswap12qnG4FWo4ZAPvwAsNye05+HVjnAleh5Hmi+RGN1tsZ12CP1uS83uhhlI6N DS9Q== X-Gm-Message-State: AOAM53237m42zgRiffH3zwzh1NxQGn5Ei2FNCr+uCSViKicdWiHxUzrj EwcJGkN0MXx1Vao5BGFLoj4= X-Google-Smtp-Source: ABdhPJy7qPlnp5/EvqHFs1UtAVy5avBgFP1x8B47wxkipA/CHg7aiTtRnKD53ItrFJAaUZOT/0FQ6A== X-Received: by 2002:aca:c70b:: with SMTP id x11mr4979894oif.58.1604777908240; Sat, 07 Nov 2020 11:38:28 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id w70sm1235588oiw.29.2020.11.07.11.38.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 07 Nov 2020 11:38:27 -0800 (PST) Subject: [bpf PATCH 2/5] bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Sat, 07 Nov 2020 11:38:15 -0800 Message-ID: <160477789531.608263.11406385279476664711.stgit@john-XPS-13-9370> In-Reply-To: <160477770483.608263.6057216691957042088.stgit@john-XPS-13-9370> References: <160477770483.608263.6057216691957042088.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Fix sockmap sk_skb programs so that they observe sk_rcvbuf limits. This allows users to tune SO_RCVBUF and sockmap will honor them. We can refactor the if(charge) case out in later patches. But, keep this fix to the point. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Suggested-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/core/skmsg.c | 20 ++++++++++++++++---- net/ipv4/tcp_bpf.c | 3 ++- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 654182ecf87b..fe44280c033e 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -170,10 +170,12 @@ static int sk_msg_free_elem(struct sock *sk, struct sk_msg *msg, u32 i, struct scatterlist *sge = sk_msg_elem(msg, i); u32 len = sge->length; - if (charge) - sk_mem_uncharge(sk, len); - if (!msg->skb) + /* When the skb owns the memory we free it from consume_skb path. */ + if (!msg->skb) { + if (charge) + sk_mem_uncharge(sk, len); put_page(sg_page(sge)); + } memset(sge, 0, sizeof(*sge)); return len; } @@ -403,6 +405,9 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) int copied = 0, num_sge; struct sk_msg *msg; + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) + return -EAGAIN; + msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); if (unlikely(!msg)) return -EAGAIN; @@ -418,7 +423,14 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) return num_sge; } - sk_mem_charge(sk, skb->len); + /* This will transition ownership of the data from the socket where + * the BPF program was run initiating the redirect to the socket + * we will eventually receive this data on. The data will be released + * from skb_consume found in __tcp_bpf_recvmsg() after its been copied + * into user buffers. + */ + skb_set_owner_r(skb, sk); + copied = skb->len; msg->sg.start = 0; msg->sg.size = copied; diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 3709d679436e..30802fa99aa1 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -46,7 +46,8 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, if (likely(!peek)) { sge->offset += copy; sge->length -= copy; - sk_mem_uncharge(sk, copy); + if (!msg_rx->skb) + sk_mem_uncharge(sk, copy); msg_rx->sg.size -= copy; if (!sge->length) { From patchwork Sat Nov 7 19:38:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 322145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24424C388F9 for ; Sat, 7 Nov 2020 19:39:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE04D21707 for ; Sat, 7 Nov 2020 19:39:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bn7rWu//" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728669AbgKGTjH (ORCPT ); Sat, 7 Nov 2020 14:39:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725836AbgKGTjH (ORCPT ); Sat, 7 Nov 2020 14:39:07 -0500 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA4F3C0613CF; Sat, 7 Nov 2020 11:39:06 -0800 (PST) Received: by mail-ot1-x341.google.com with SMTP id k3so4646611otp.12; Sat, 07 Nov 2020 11:39:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=7SSO8P7QNjSDnyLJqh8IxVjW+CSfAgn4F8/JaQHnAL4=; b=bn7rWu//83wK0urzZf+yCJenGPDafaWYhy1Y8Mvdb+CQWTzGTyKrvDJ0ItkalDxfDO 9LLF4MAveDC2iL8iYvrqCyjPJlYDALMh+QzvOr+HjFKjjAqFmpCxqo7fQdx+zBqRDXPr N7XV4NucpbCKfXtkkR3Lz6M3ugSDTHMfsr16Cys2ka8jJfa/72CTLPh+4BN2UuioZe5o g0hdXHGH8FCQ3ojjHo8DLYJ+II3A/leT/VXe54Vg4A7aNGiCVbv3ruYdVtvvP1Tm6T0W zU7OFJTAa9RIB1RPKdDBBXze/CHuPvy84RA3IdRreViePbELGFmhIBcKDgfT/PSbTzmu thNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=7SSO8P7QNjSDnyLJqh8IxVjW+CSfAgn4F8/JaQHnAL4=; b=NqhV/ztanwC1POrll4MrzUYyBDj8MkMIRXPT9Wgq/KCVzXdUium3nEWYmSofXu74gs +F8OUC1w/Srm98jhHS0NfjPB3uV+iMn1gKW6V/uQEM8vkaHMU+FYGLtgcqal6HrA4jno IUUHydD2NaMU+vmqWIx8p82iXiUg54ue4gwbxtXpfAkT5uh1F9PBPgZe0fAvz8+PLfRX UDcaA/QgKRvF2FiBmoZ4FwvnTuTMi4F7yKEMqTPvJ2IJtELI5/wECBvtiItL0DQwK661 pdSoGI9po4QNw6n0wWX/fa7cgduZrFQXCyz/WUGyOO0aBoxLodnRvVU8kLOZCCEEEGBz C3bw== X-Gm-Message-State: AOAM533uzKPY7qSiMLY0mDXTtLzZ4/WA3xa2VORf/dlfcdHPs02NyEzZ W3PKjUuT+Ox1yIGERN8NPG4= X-Google-Smtp-Source: ABdhPJyK7FIQDOg6PuDArIPCEmh0BvsvMgx6X48ITd/6wo8Xeap9PlK3A/JIpyO5BviKjOghbBATuw== X-Received: by 2002:a9d:4c18:: with SMTP id l24mr5034347otf.291.1604777946315; Sat, 07 Nov 2020 11:39:06 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id k15sm1206555oor.11.2020.11.07.11.38.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 07 Nov 2020 11:39:05 -0800 (PST) Subject: [bpf PATCH 4/5] bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self From: John Fastabend To: ast@kernel.org, daniel@iogearbox.net, jakub@cloudflare.com Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org Date: Sat, 07 Nov 2020 11:38:54 -0800 Message-ID: <160477793403.608263.17626285322866020367.stgit@john-XPS-13-9370> In-Reply-To: <160477770483.608263.6057216691957042088.stgit@john-XPS-13-9370> References: <160477770483.608263.6057216691957042088.stgit@john-XPS-13-9370> User-Agent: StGit/0.23-36-gc01b MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If the skb_verdict_prog redirects an skb knowingly to itself, fix your BPF program this is not optimal and an abuse of the API please use SK_PASS. That said there may be cases, such as socket load balancing, where picking the socket is hashed based or otherwise picks the same socket it was received on in some rare cases. If this happens we don't want to confuse userspace giving them an EAGAIN error if we can avoid it. To avoid double accounting in these cases. At the moment even if the skb has already been charged against the sockets rcvbuf and forward alloc we check it again and do set_owner_r() causing it to be orphaned and recharged. For one this is useless work, but more importantly we can have a case where the skb could be put on the ingress queue, but because we are under memory pressure we return EAGAIN. The trouble here is the skb has already been accounted for so any rcvbuf checks include the memory associated with the packet already. This rolls up and can result in unecessary EAGAIN errors in userspace read() calls. Fix by doing an unlikely check and skipping checks if skb->sk == sk. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend --- net/core/skmsg.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 580252e532da..59c36a672256 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -404,11 +404,13 @@ static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, { struct sk_msg *msg; - if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) - return NULL; + if (likely(skb->sk != sk)) { + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) + return NULL; - if (!sk_rmem_schedule(sk, skb, skb->len)) - return NULL; + if (!sk_rmem_schedule(sk, skb, skb->len)) + return NULL; + } msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); if (unlikely(!msg)) @@ -455,9 +457,12 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) * the BPF program was run initiating the redirect to the socket * we will eventually receive this data on. The data will be released * from skb_consume found in __tcp_bpf_recvmsg() after its been copied - * into user buffers. + * into user buffers. If we are receiving on the same sock skb->sk is + * already assigned, skip memory accounting and owner transition seeing + * it already set correctly. */ - skb_set_owner_r(skb, sk); + if (likely(skb->sk != sk)) + skb_set_owner_r(skb, sk); return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); }