From patchwork Wed May 13 19:25:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 219312 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4C99C433E2 for ; Wed, 13 May 2020 19:26:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6E9852065C for ; Wed, 13 May 2020 19:26:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="V6lOQiKa" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390517AbgEMT0O (ORCPT ); Wed, 13 May 2020 15:26:14 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:10442 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390449AbgEMT0N (ORCPT ); Wed, 13 May 2020 15:26:13 -0400 Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.16.0.42/8.16.0.42) with SMTP id 04DJQ70H011611 for ; Wed, 13 May 2020 12:26:11 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=LJLH95qbJ8TPH3V/6PS/akLdfKIx47Z5kPqOpf5GDjA=; b=V6lOQiKaKMmKRinsf1UCm73+V2Lppfh0XYx9fIiIFDrpLFEAnRwsTyJrHl4ToylugHuu JeXn0hSTQM/WYBSgQMOVBANz8bzDhp6z8yGVjIs/dT65BokZr2I++woZwp/LyXBFETRm Fx4gH0yOQDEMYe+0zahCvDcOPUasy4TZnME= Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net with ESMTP id 3100xh6v0v-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 13 May 2020 12:26:11 -0700 Received: from intmgw005.03.ash8.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1847.3; Wed, 13 May 2020 12:26:09 -0700 Received: by devbig012.ftw2.facebook.com (Postfix, from userid 137359) id 5534A2EC3007; Wed, 13 May 2020 12:26:07 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Andrii Nakryiko Smtp-Origin-Hostname: devbig012.ftw2.facebook.com To: , , , CC: , , Andrii Nakryiko , "Paul E . McKenney" , Jonathan Lemon Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 3/6] bpf: track reference type in verifier Date: Wed, 13 May 2020 12:25:29 -0700 Message-ID: <20200513192532.4058934-4-andriin@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200513192532.4058934-1-andriin@fb.com> References: <20200513192532.4058934-1-andriin@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.676 definitions=2020-05-13_09:2020-05-13,2020-05-13 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 cotscore=-2147483648 mlxscore=0 priorityscore=1501 adultscore=0 impostorscore=0 malwarescore=0 spamscore=0 clxscore=1015 suspectscore=25 mlxlogscore=999 phishscore=0 bulkscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005130166 X-FB-Internal: deliver Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch extends verifier's reference tracking logic with tracking a reference type and ensuring acquire()/release() functions accept only the right types of references. Currently, ambiguity between socket and ringbuf record references is resolved through use of different types of input arguments to bpf_sk_release() and bpf_ringbuf_commit(): ARG_PTR_TO_SOCK_COMMON and ARG_PTR_TO_ALLOC_MEM, respectively. It is thus impossible to pass ringbuf record pointer to bpf_sk_release() (and vice versa for socket). On the other hand, patch #1 added ARG_PTR_TO_ALLOC_MEM arg type, which, from the point of view of verifier, is a pointer to a fixed-sized allocated memory region. This is generic enough concept that could be used for other BPF helpers (e.g., malloc/free pair, if added). once we have that, there will be nothing to prevent passing ringbuf record to bpf_mem_free() (or whatever) helper. To that end, this patch adds a capability to specify and track reference types, that would be validated by verifier to ensure correct match between acquire() and release() helpers. This patch can be postponed for later, so is posted separate from other verifier changes. Signed-off-by: Andrii Nakryiko --- include/linux/bpf_verifier.h | 8 +++ kernel/bpf/verifier.c | 96 +++++++++++++++++++++++++++++------- 2 files changed, 86 insertions(+), 18 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index c94a736e53cd..2a6d961570cb 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -164,6 +164,12 @@ struct bpf_stack_state { u8 slot_type[BPF_REG_SIZE]; }; +enum bpf_ref_type { + BPF_REF_INVALID, + BPF_REF_SOCKET, + BPF_REF_RINGBUF, +}; + struct bpf_reference_state { /* Track each reference created with a unique id, even if the same * instruction creates the reference multiple times (eg, via CALL). @@ -173,6 +179,8 @@ struct bpf_reference_state { * is used purely to inform the user of a reference leak. */ int insn_idx; + /* Type of reference being tracked */ + enum bpf_ref_type ref_type; }; /* state of the program: diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index b8f0158d2327..dc741a631089 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -436,6 +436,19 @@ static bool is_release_function(enum bpf_func_id func_id) func_id == BPF_FUNC_ringbuf_discard; } +static enum bpf_ref_type get_release_ref_type(enum bpf_func_id func_id) +{ + switch (func_id) { + case BPF_FUNC_sk_release: + return BPF_REF_SOCKET; + case BPF_FUNC_ringbuf_submit: + case BPF_FUNC_ringbuf_discard: + return BPF_REF_RINGBUF; + default: + return BPF_REF_INVALID; + } +} + static bool may_be_acquire_function(enum bpf_func_id func_id) { return func_id == BPF_FUNC_sk_lookup_tcp || @@ -464,6 +477,28 @@ static bool is_acquire_function(enum bpf_func_id func_id, return false; } +static enum bpf_ref_type get_acquire_ref_type(enum bpf_func_id func_id, + const struct bpf_map *map) +{ + enum bpf_map_type map_type = map ? map->map_type : BPF_MAP_TYPE_UNSPEC; + + switch (func_id) { + case BPF_FUNC_sk_lookup_tcp: + case BPF_FUNC_sk_lookup_udp: + case BPF_FUNC_skc_lookup_tcp: + return BPF_REF_SOCKET; + case BPF_FUNC_map_lookup_elem: + if (map_type == BPF_MAP_TYPE_SOCKMAP || + map_type == BPF_MAP_TYPE_SOCKHASH) + return BPF_REF_SOCKET; + return BPF_REF_INVALID; + case BPF_FUNC_ringbuf_reserve: + return BPF_REF_RINGBUF; + default: + return BPF_REF_INVALID; + } +} + static bool is_ptr_cast_function(enum bpf_func_id func_id) { return func_id == BPF_FUNC_tcp_sock || @@ -736,7 +771,8 @@ static int realloc_func_state(struct bpf_func_state *state, int stack_size, * On success, returns a valid pointer id to associate with the register * On failure, returns a negative errno. */ -static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx) +static int acquire_reference_state(struct bpf_verifier_env *env, + int insn_idx, enum bpf_ref_type ref_type) { struct bpf_func_state *state = cur_func(env); int new_ofs = state->acquired_refs; @@ -748,25 +784,32 @@ static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx) id = ++env->id_gen; state->refs[new_ofs].id = id; state->refs[new_ofs].insn_idx = insn_idx; + state->refs[new_ofs].ref_type = ref_type; return id; } /* release function corresponding to acquire_reference_state(). Idempotent. */ -static int release_reference_state(struct bpf_func_state *state, int ptr_id) +static int release_reference_state(struct bpf_func_state *state, int ptr_id, + enum bpf_ref_type ref_type) { + struct bpf_reference_state *ref; int i, last_idx; last_idx = state->acquired_refs - 1; for (i = 0; i < state->acquired_refs; i++) { - if (state->refs[i].id == ptr_id) { - if (last_idx && i != last_idx) - memcpy(&state->refs[i], &state->refs[last_idx], - sizeof(*state->refs)); - memset(&state->refs[last_idx], 0, sizeof(*state->refs)); - state->acquired_refs--; - return 0; - } + ref = &state->refs[i]; + if (ref->id != ptr_id) + continue; + + if (ref_type != BPF_REF_INVALID && ref->ref_type != ref_type) + return -EINVAL; + + if (i != last_idx) + memcpy(ref, &state->refs[last_idx], sizeof(*ref)); + memset(&state->refs[last_idx], 0, sizeof(*ref)); + state->acquired_refs--; + return 0; } return -EINVAL; } @@ -4295,14 +4338,13 @@ static void release_reg_references(struct bpf_verifier_env *env, /* The pointer with the specified id has released its reference to kernel * resources. Identify all copies of the same pointer and clear the reference. */ -static int release_reference(struct bpf_verifier_env *env, - int ref_obj_id) +static int release_reference(struct bpf_verifier_env *env, int ref_obj_id, + enum bpf_ref_type ref_type) { struct bpf_verifier_state *vstate = env->cur_state; - int err; - int i; + int err, i; - err = release_reference_state(cur_func(env), ref_obj_id); + err = release_reference_state(cur_func(env), ref_obj_id, ref_type); if (err) return err; @@ -4661,7 +4703,16 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn return err; } } else if (is_release_function(func_id)) { - err = release_reference(env, meta.ref_obj_id); + enum bpf_ref_type ref_type; + + ref_type = get_release_ref_type(func_id); + if (ref_type == BPF_REF_INVALID) { + verbose(env, "unrecognized reference accepted by func %s#%d\n", + func_id_name(func_id), func_id); + return -EFAULT; + } + + err = release_reference(env, meta.ref_obj_id, ref_type); if (err) { verbose(env, "func %s#%d reference has not been acquired before\n", func_id_name(func_id), func_id); @@ -4744,8 +4795,17 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn /* For release_reference() */ regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id; } else if (is_acquire_function(func_id, meta.map_ptr)) { - int id = acquire_reference_state(env, insn_idx); + enum bpf_ref_type ref_type; + int id; + + ref_type = get_acquire_ref_type(func_id, meta.map_ptr); + if (ref_type == BPF_REF_INVALID) { + verbose(env, "unrecognized reference returned by func %s#%d\n", + func_id_name(func_id), func_id); + return -EINVAL; + } + id = acquire_reference_state(env, insn_idx, ref_type); if (id < 0) return id; /* For mark_ptr_or_null_reg() */ @@ -6728,7 +6788,7 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno, * No one could have freed the reference state before * doing the NULL check. */ - WARN_ON_ONCE(release_reference_state(state, id)); + WARN_ON_ONCE(release_reference_state(state, id, BPF_REF_INVALID)); for (i = 0; i <= vstate->curframe; i++) __mark_ptr_or_null_regs(vstate->frame[i], id, is_null); From patchwork Wed May 13 19:25:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 219310 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39FD2C433DF for ; Wed, 13 May 2020 19:27:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D31EE2065C for ; Wed, 13 May 2020 19:27:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="Vps3lxTb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390680AbgEMT1W (ORCPT ); Wed, 13 May 2020 15:27:22 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:13406 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390670AbgEMT1U (ORCPT ); Wed, 13 May 2020 15:27:20 -0400 Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 04DJPSek012855 for ; Wed, 13 May 2020 12:27:18 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=JzYMdtQC7/8hDWUtBW9qnms7f+V63ZCJhBMxXVtbMPY=; b=Vps3lxTbs8yFGqXYTaklN1HEkEf9OrAZEHn0hH4tPEDo3eUi7aoTBeu6Uu0GNd9uYm7f iHdmrvB2teupG0SdqJToxuTj+TH/tGBWmJ7FKa8X68dokVqinxFJVstnsFV+Bc48rtFB TUDjfh0lKp6LVB/0Oz2Ys+lWA9X3M/reBRk= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 3100x9xt7g-20 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 13 May 2020 12:27:18 -0700 Received: from intmgw001.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1847.3; Wed, 13 May 2020 12:26:13 -0700 Received: by devbig012.ftw2.facebook.com (Postfix, from userid 137359) id B2F252EC3007; Wed, 13 May 2020 12:26:11 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Andrii Nakryiko Smtp-Origin-Hostname: devbig012.ftw2.facebook.com To: , , , CC: , , Andrii Nakryiko , "Paul E . McKenney" , Jonathan Lemon Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 5/6] selftests/bpf: add BPF ringbuf selftests Date: Wed, 13 May 2020 12:25:31 -0700 Message-ID: <20200513192532.4058934-6-andriin@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200513192532.4058934-1-andriin@fb.com> References: <20200513192532.4058934-1-andriin@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.676 definitions=2020-05-13_09:2020-05-13,2020-05-13 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 adultscore=0 spamscore=0 priorityscore=1501 bulkscore=0 phishscore=0 lowpriorityscore=0 mlxscore=0 mlxlogscore=999 malwarescore=0 clxscore=1015 impostorscore=0 cotscore=-2147483648 suspectscore=25 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005130166 X-FB-Internal: deliver Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Both singleton BPF ringbuf and BPF ringbuf with map-in-map use cases are tested. Also reserve+submit/discards and output variants of API are validated. Signed-off-by: Andrii Nakryiko --- .../selftests/bpf/prog_tests/ringbuf.c | 101 +++++++++++++++++ .../selftests/bpf/prog_tests/ringbuf_multi.c | 102 ++++++++++++++++++ .../selftests/bpf/progs/test_ringbuf.c | 63 +++++++++++ .../selftests/bpf/progs/test_ringbuf_multi.c | 77 +++++++++++++ 4 files changed, 343 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/ringbuf.c create mode 100644 tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf.c create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_multi.c diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testing/selftests/bpf/prog_tests/ringbuf.c new file mode 100644 index 000000000000..2708cc791f1a --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c @@ -0,0 +1,101 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "test_ringbuf.skel.h" + +#define EDONE 7777 + +static int duration = 0; + +struct sample { + int pid; + int seq; + long value; + char comm[16]; +}; + +static int process_sample(void *ctx, void *data, size_t len) +{ + struct sample *s = data; + + switch (s->seq) { + case 0: + CHECK(s->value != 333, "sample1_value", "exp %ld, got %ld\n", + 333L, s->value); + break; + case 1: + CHECK(s->value != 777, "sample2_value", "exp %ld, got %ld\n", + 777L, s->value); + return -EDONE; + default: + CHECK(false, "extra_sample", "unexpected sample\n"); + } + + return 0; +} + +void test_ringbuf(void) +{ + struct test_ringbuf *skel; + struct ring_buffer *ringbuf; + int err; + + skel = test_ringbuf__open_and_load(); + if (CHECK(!skel, "skel_open_load", "skeleton open&load failed\n")) + return; + + /* only trigger BPF program for current process */ + skel->bss->pid = getpid(); + + ringbuf = ring_buffer__new(bpf_map__fd(skel->maps.ringbuf), + process_sample, NULL, NULL); + if (CHECK(!ringbuf, "ringbuf_create", "failed to create ringbuf\n")) + goto cleanup; + + err = test_ringbuf__attach(skel); + if (CHECK(err, "skel_attach", "skeleton attachment failed: %d\n", err)) + goto cleanup; + + /* trigger exactly two samples */ + skel->bss->value = 333; + syscall(__NR_getpgid); + skel->bss->value = 777; + syscall(__NR_getpgid); + + /* poll for samples */ + do { + err = ring_buffer__poll(ringbuf, -1); + } while (err >= 0); + + /* -EDONE is used as an indicator that we are done */ + if (CHECK(err != -EDONE, "err_done", "done err: %d\n", err)) + goto cleanup; + + /* we expect extra polling to return nothing */ + err = ring_buffer__poll(ringbuf, 0); + if (CHECK(err < 0, "extra_samples", "poll result: %d\n", err)) + goto cleanup; + + CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n", + 0L, skel->bss->dropped); + CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n", + 2L, skel->bss->total); + CHECK(skel->bss->discarded != 1, "err_discarded", "exp %ld, got %ld\n", + 1L, skel->bss->discarded); + + test_ringbuf__detach(skel); + +cleanup: + ring_buffer__free(ringbuf); + test_ringbuf__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c b/tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c new file mode 100644 index 000000000000..f352f556bd34 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include +#include +#include "test_ringbuf_multi.skel.h" + +static int duration = 0; + +struct sample { + int pid; + int seq; + long value; + char comm[16]; +}; + +static int process_sample(void *ctx, void *data, size_t len) +{ + int ring = (unsigned long)ctx; + struct sample *s = data; + + switch (s->seq) { + case 0: + CHECK(ring != 1, "sample1_ring", "exp %d, got %d\n", 1, ring); + CHECK(s->value != 333, "sample1_value", "exp %ld, got %ld\n", + 333L, s->value); + break; + case 1: + CHECK(ring != 2, "sample2_ring", "exp %d, got %d\n", 2, ring); + CHECK(s->value != 777, "sample2_value", "exp %ld, got %ld\n", + 777L, s->value); + break; + default: + CHECK(true, "extra_sample", "unexpected sample seq %d, val %ld\n", + s->seq, s->value); + return -1; + } + + return 0; +} + +void test_ringbuf_multi(void) +{ + struct test_ringbuf_multi *skel; + struct ring_buffer *ringbuf; + int err; + + skel = test_ringbuf_multi__open_and_load(); + if (CHECK(!skel, "skel_open_load", "skeleton open&load failed\n")) + return; + + /* only trigger BPF program for current process */ + skel->bss->pid = getpid(); + + ringbuf = ring_buffer__new(bpf_map__fd(skel->maps.ringbuf1), + process_sample, (void *)(long)1, NULL); + if (CHECK(!ringbuf, "ringbuf_create", "failed to create ringbuf\n")) + goto cleanup; + + err = ring_buffer__add(ringbuf, bpf_map__fd(skel->maps.ringbuf2), + process_sample, (void *)(long)2); + if (CHECK(err, "ringbuf_add", "failed to add another ring\n")) + goto cleanup; + + err = test_ringbuf_multi__attach(skel); + if (CHECK(err, "skel_attach", "skeleton attachment failed: %d\n", err)) + goto cleanup; + + /* trigger few samples, some will be skipped */ + skel->bss->target_ring = 0; + skel->bss->value = 333; + syscall(__NR_getpgid); + + /* skipped, no ringbuf in slot 1 */ + skel->bss->target_ring = 1; + skel->bss->value = 555; + syscall(__NR_getpgid); + + skel->bss->target_ring = 2; + skel->bss->value = 777; + syscall(__NR_getpgid); + + /* poll for samples, should get 2 ringbufs back */ + err = ring_buffer__poll(ringbuf, -1); + if (CHECK(err != 2, "poll_res", "expected 2 events, got %d\n", err)) + goto cleanup; + + /* expect extra polling to return nothing */ + err = ring_buffer__poll(ringbuf, 0); + if (CHECK(err < 0, "extra_samples", "poll result: %d\n", err)) + goto cleanup; + + CHECK(skel->bss->dropped != 0, "err_dropped", "exp %ld, got %ld\n", + 0L, skel->bss->dropped); + CHECK(skel->bss->skipped != 1, "err_skipped", "exp %ld, got %ld\n", + 1L, skel->bss->skipped); + CHECK(skel->bss->total != 2, "err_total", "exp %ld, got %ld\n", + 2L, skel->bss->total); + +cleanup: + ring_buffer__free(ringbuf); + test_ringbuf_multi__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_ringbuf.c b/tools/testing/selftests/bpf/progs/test_ringbuf.c new file mode 100644 index 000000000000..6084f17e17d8 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_ringbuf.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2019 Facebook + +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct sample { + int pid; + int seq; + long value; + char comm[16]; +}; + +struct { + __uint(type, BPF_MAP_TYPE_RINGBUF); + __uint(max_entries, 1 << 12); +} ringbuf SEC(".maps"); + +/* inputs */ +int pid = 0; +long value = 0; + +/* outputs */ +long total = 0; +long discarded = 0; +long dropped = 0; + +SEC("tp/syscalls/sys_enter_getpgid") +int test_ringbuf(void *ctx) +{ + int cur_pid = bpf_get_current_pid_tgid() >> 32; + struct sample *sample; + int zero = 0; + + if (cur_pid != pid) + return 0; + + sample = bpf_ringbuf_reserve(&ringbuf, sizeof(*sample), 0); + if (!sample) { + __sync_fetch_and_add(&dropped, 1); + return 1; + } + + sample->pid = pid; + bpf_get_current_comm(sample->comm, sizeof(sample->comm)); + sample->value = value; + + sample->seq = total; + __sync_fetch_and_add(&total, 1); + + if (sample->seq & 1) { + /* copy from reserved sample to a new one... */ + bpf_ringbuf_output(&ringbuf, sample, sizeof(*sample), 0); + /* ...and then discard reserved sample */ + bpf_ringbuf_discard(sample); + __sync_fetch_and_add(&discarded, 1); + } else { + bpf_ringbuf_submit(sample); + } + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c b/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c new file mode 100644 index 000000000000..b45291e77aa2 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2019 Facebook + +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct sample { + int pid; + int seq; + long value; + char comm[16]; +}; + +struct ringbuf_map { + __uint(type, BPF_MAP_TYPE_RINGBUF); + __uint(max_entries, 1 << 12); +} ringbuf1 SEC(".maps"), + ringbuf2 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS); + __uint(max_entries, 4); + __type(key, int); + __array(values, struct ringbuf_map); +} ringbuf_arr SEC(".maps") = { + .values = { + [0] = &ringbuf1, + [2] = &ringbuf2, + }, +}; + +/* inputs */ +int pid = 0; +int target_ring = 0; +long value = 0; + +/* outputs */ +long total = 0; +long dropped = 0; +long skipped = 0; + +SEC("tp/syscalls/sys_enter_getpgid") +int test_ringbuf(void *ctx) +{ + int cur_pid = bpf_get_current_pid_tgid() >> 32; + struct sample *sample; + void *rb; + int zero = 0; + + if (cur_pid != pid) + return 0; + + rb = bpf_map_lookup_elem(&ringbuf_arr, &target_ring); + if (!rb) { + skipped += 1; + return 1; + } + + sample = bpf_ringbuf_reserve(rb, sizeof(*sample), 0); + if (!sample) { + dropped += 1; + return 1; + } + + sample->pid = pid; + bpf_get_current_comm(sample->comm, sizeof(sample->comm)); + sample->value = value; + + sample->seq = total; + total += 1; + + bpf_ringbuf_submit(sample); + + return 0; +} From patchwork Wed May 13 19:25:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrii Nakryiko X-Patchwork-Id: 219311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31186C433DF for ; Wed, 13 May 2020 19:26:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CC4512065C for ; Wed, 13 May 2020 19:26:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="gHJGOSlD" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390657AbgEMT0c (ORCPT ); Wed, 13 May 2020 15:26:32 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:20878 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390575AbgEMT0b (ORCPT ); Wed, 13 May 2020 15:26:31 -0400 Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 04DJPhPK022370 for ; Wed, 13 May 2020 12:26:27 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=facebook; bh=u69t6G5CwZsyO+gFFoHypZYKKS9/GK48ZqeST75G5Y8=; b=gHJGOSlDtsNDjlyJep5y4qbsrHtXDrAF+xMv/N4NX5zkOLaKv/aFodGHLT6ANcWNTMvU IWwZA1M8JCF61eGisNEbv//da6KLb0mYz2W2wdiZfPlT5n7pseJ/zoMRdx80sNet3r3v fsMQxOSZXPdxxboJm+OkN3jpEqSZLmKSnX0= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 3100xb6sdy-20 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 13 May 2020 12:26:27 -0700 Received: from intmgw002.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1847.3; Wed, 13 May 2020 12:26:26 -0700 Received: by devbig012.ftw2.facebook.com (Postfix, from userid 137359) id EBDFA2EC3007; Wed, 13 May 2020 12:26:13 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Andrii Nakryiko Smtp-Origin-Hostname: devbig012.ftw2.facebook.com To: , , , CC: , , Andrii Nakryiko , "Paul E . McKenney" , Jonathan Lemon Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 6/6] bpf: add BPF ringbuf and perf buffer benchmarks Date: Wed, 13 May 2020 12:25:32 -0700 Message-ID: <20200513192532.4058934-7-andriin@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200513192532.4058934-1-andriin@fb.com> References: <20200513192532.4058934-1-andriin@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.676 definitions=2020-05-13_09:2020-05-13,2020-05-13 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 phishscore=0 cotscore=-2147483648 mlxscore=0 suspectscore=9 lowpriorityscore=0 impostorscore=0 bulkscore=0 spamscore=0 priorityscore=1501 clxscore=1015 malwarescore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005130166 X-FB-Internal: deliver Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Extend bench framework with ability to have benchmark-provided child argument parser for custom benchmark-specific parameters. This makes bench generic code modular and independent from any specific benchmark. Also implement a set of benchmarks for new BPF ring buffer and existing perf buffer. 5 benchmarks were implemented: 2 variations for BPF ringbuf, 3 variations for perfbuf: - rb-libbpf utilizes stock libbpf ring_buffer manager for reading data; - rb-custom implements custom ring buffer setup and reading code, to eliminate overheads inherent in generic libbpf code due to callback functions and the need to update consumer position after each consumed record, instead of batching updates (due to pessimistic assumption that user callback might take long time and thus could unnecessarily hold ring buffer space for too long); - pb-libbpf uses stock libbpf perf_buffer code with all the default settings; default settings are good safe defaults, but are far from providing highest throughput -- perf buffer allows to specify wakeup_events and sample_period > 1, that will cause perf code to trigger epoll notifications less frequently, which boosts throughput immensely; - pb-raw uses perf_buffer__new_raw() API to specify custom wakeup_events and sample_period and specifies raw sample callback, which removes some of the overhead in generic case; - pb-custom does the same setup as pb-raw, but implements custom consumer code eliminating all the callback overhead. Otherwise, all benchamrks implement similar way to generate a batch of records by using fentry/sys_getpgid BPF program, which pushes a bunch of records in a tight loop and records number of successful and dropped samples. Each record is a small 8-byte integer, to minimize the effect of memory copying with bpf_perf_event_output() and bpf_ringbuf_output(). Benchmarks that have only one producer implement optional back-to-back mode, in which record production and consumption is alternating on the same CPU. This is the highest-throughput happy case, showing ultimate performance achievable with either BPF ringbuf or perfbuf. All the below scenarios are implemented in a script in benchs/run_bench_ringbufs.sh. Tests were performed on 28-core/56-thread Intel Xeon CPU E5-2680 v4 @ 2.40GHz CPU. Single-producer, parallel producer ================================== rb-libbpf 11.302 ± 0.564M/s (drops 0.000 ± 0.000M/s) rb-custom 8.895 ± 0.148M/s (drops 0.000 ± 0.000M/s) pb-libbpf 0.927 ± 0.008M/s (drops 0.000 ± 0.000M/s) pb-raw 9.884 ± 0.058M/s (drops 0.000 ± 0.000M/s) pb-custom 9.724 ± 0.068M/s (drops 0.000 ± 0.001M/s) Single producer on one CPU, consumer on another one, both running at full speed. Curiously, rb-libbpf has higher throughput than objectively faster (due to more lightweight consumer code path) rb-custom. It appears that faster consumer causes kernel to send notifications more frequently, because consumer appears to be caught up more frequently. Stock perfbuf settings are about 10x slower, compared to pb-raw/pb-custom with wakeup_events/sample_period set to 500. The trade-off is that with sampling, application might not get next X events until X+1st arrives, which is not always acceptable. Single-producer, back-to-back mode ================================== rb-libbpf 15.991 ± 0.456M/s (drops 0.000 ± 0.000M/s) rb-custom 21.646 ± 0.457M/s (drops 0.000 ± 0.000M/s) pb-libbpf 1.608 ± 0.040M/s (drops 0.000 ± 0.000M/s) pb-raw 8.866 ± 0.098M/s (drops 0.000 ± 0.000M/s) pb-custom 8.858 ± 0.137M/s (drops 0.000 ± 0.000M/s) Here we test a back-to-back mode, which is arguably best-case scenario both for BPF ringbuf and perfbuf, because there is no contention and for ringbuf also no excessive notification, because consumer appears to be behind after the first record. For ringbuf, custom consumer code clearly wins with 22 vs 16 million records per second exchanged between producer and consumer. Perfbuf with wakeup sampling gets 5.5x throughput increase, compared to no-sampling version. There also doesn't seem to be noticeable overhead from generic libbpf handling code. Perfbuf, effect of sample rate, back-to-back ============================================ sample rate 1 1.634 ± 0.023M/s (drops 0.000 ± 0.000M/s) sample rate 5 4.510 ± 0.082M/s (drops 0.000 ± 0.000M/s) sample rate 10 5.875 ± 0.085M/s (drops 0.000 ± 0.000M/s) sample rate 25 7.361 ± 0.150M/s (drops 0.000 ± 0.000M/s) sample rate 50 7.574 ± 0.559M/s (drops 0.000 ± 0.000M/s) sample rate 100 8.486 ± 0.274M/s (drops 0.000 ± 0.000M/s) sample rate 250 8.787 ± 0.183M/s (drops 0.000 ± 0.000M/s) sample rate 500 8.494 ± 0.732M/s (drops 0.000 ± 0.000M/s) sample rate 1000 8.445 ± 0.336M/s (drops 0.000 ± 0.000M/s) sample rate 2000 9.000 ± 0.187M/s (drops 0.000 ± 0.000M/s) sample rate 3000 8.684 ± 0.556M/s (drops 0.000 ± 0.000M/s) This benchmark shows the effect of event sampling for perfbuf. Back-to-back mode for highest throughput. Just doing every 5th record notification gives 3x speed up. 250-500 appears to be the point of diminishing return, with 5.5x speed up. Most benchmarks use 500 as the default sampling for pb-raw and pb-custom. Ringbuf, reserve+commit vs output, back-to-back =============================================== reserve 22.570 ± 0.490M/s (drops 0.000 ± 0.000M/s) output 19.037 ± 0.429M/s (drops 0.000 ± 0.000M/s) BPF ringbuf supports two sets of APIs with various usability and performance tradeoffs: bpf_ringbuf_reserve()+bpf_ringbuf_commit() vs bpf_ringbuf_output(). This benchmark clearly shows superiority of reserve+commit approach, despite using a small 8-byte record size. Single-producer, consumer/producer competing on the same CPU, low batch count ============================================================================= rb-libbpf 1.485 ± 0.015M/s (drops 1.969 ± 0.058M/s) rb-custom 1.489 ± 0.017M/s (drops 1.910 ± 0.098M/s) pb-libbpf 1.242 ± 0.042M/s (drops 0.000 ± 0.000M/s) pb-raw 1.217 ± 0.043M/s (drops 0.000 ± 0.000M/s) pb-custom 1.264 ± 0.023M/s (drops 0.000 ± 0.000M/s) This benchmark shows one of the worst-case scenarios, in which producer and consumer do not coordinate *and* fight for the same CPU. No batch count and sampling settings were able to eliminate drops for ringbuffer, producer is just too fast for consumer to keep up. Still, ringbuf and perfbuf still able to pass through more than a million messages per second, which is more than enough for a lot of applications. Ringbuf, multi-producer contention, low batch count =================================================== rb-libbpf nr_prod 1 9.559 ± 0.134M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 2 6.542 ± 0.039M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 3 4.444 ± 0.007M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 4 3.764 ± 0.021M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 8 4.147 ± 0.013M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 12 3.652 ± 0.185M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 16 2.338 ± 0.020M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 20 2.055 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 24 1.961 ± 0.006M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 28 2.136 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 32 2.069 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 36 2.177 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 40 2.323 ± 0.002M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 44 2.274 ± 0.004M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 48 2.069 ± 0.003M/s (drops 0.000 ± 0.000M/s) rb-libbpf nr_prod 52 2.041 ± 0.004M/s (drops 0.000 ± 0.000M/s) Ringbuf uses a very short-duration spinlock during reservation phase, to check few invariants, increment producer count and set record header. This is the biggest point of contention for ringbuf implementation. This benchmark evaluates the effect of multiple competing writers on overall throughput of a single shared ringbuffer. Overall throughput drops by about 30% when going from single to two highly-contended producers, losing another 30% with third producer added. Performance drop stabilizes at around 16 producers and hovers around 2mln even with 50+ fighting producers, which is a 4.75x drop in throughput and a testament to a good implementation of spinlock in the kernel. Note, that in the intended real-world scenarios, it's not expected to get even close to such a high levels of contention. But if contention will become a problem, there is always an option of sharding few ring buffers across a set of CPUs. Signed-off-by: Andrii Nakryiko --- tools/testing/selftests/bpf/Makefile | 5 +- tools/testing/selftests/bpf/bench.c | 18 + .../selftests/bpf/benchs/bench_ringbufs.c | 593 ++++++++++++++++++ .../bpf/benchs/run_bench_ringbufs.sh | 61 ++ .../selftests/bpf/progs/perfbuf_bench.c | 33 + .../selftests/bpf/progs/ringbuf_bench.c | 45 ++ 6 files changed, 754 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/benchs/bench_ringbufs.c create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh create mode 100644 tools/testing/selftests/bpf/progs/perfbuf_bench.c create mode 100644 tools/testing/selftests/bpf/progs/ringbuf_bench.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 352d17a16bae..a95ac4d691d2 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -412,12 +412,15 @@ $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h $(CC) $(CFLAGS) -c $(filter %.c,$^) $(LDLIBS) -o $@ $(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h $(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h +$(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \ + $(OUTPUT)/perfbuf_bench.skel.h $(OUTPUT)/bench.o: bench.h testing_helpers.h $(OUTPUT)/bench: LDLIBS += -lm $(OUTPUT)/bench: $(OUTPUT)/bench.o $(OUTPUT)/testing_helpers.o \ $(OUTPUT)/bench_count.o \ $(OUTPUT)/bench_rename.o \ - $(OUTPUT)/bench_trigger.o + $(OUTPUT)/bench_trigger.o \ + $(OUTPUT)/bench_ringbufs.o $(call msg,BINARY,,$@) $(CC) $(LDFLAGS) -o $@ $(filter %.a %.o,$^) $(LDLIBS) diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c index 8c0dfbfe6088..a18f93804da7 100644 --- a/tools/testing/selftests/bpf/bench.c +++ b/tools/testing/selftests/bpf/bench.c @@ -130,6 +130,13 @@ static const struct argp_option opts[] = { {}, }; +extern struct argp bench_ringbufs_argp; + +static const struct argp_child bench_parsers[] = { + { &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 }, + {}, +}; + static error_t parse_arg(int key, char *arg, struct argp_state *state) { static int pos_args; @@ -208,6 +215,7 @@ static void parse_cmdline_args(int argc, char **argv) .options = opts, .parser = parse_arg, .doc = argp_program_doc, + .children = bench_parsers, }; if (argp_parse(&argp, argc, argv, 0, NULL, NULL)) exit(1); @@ -310,6 +318,11 @@ extern const struct bench bench_trig_rawtp; extern const struct bench bench_trig_kprobe; extern const struct bench bench_trig_fentry; extern const struct bench bench_trig_fmodret; +extern const struct bench bench_rb_libbpf; +extern const struct bench bench_rb_custom; +extern const struct bench bench_pb_libbpf; +extern const struct bench bench_pb_raw; +extern const struct bench bench_pb_custom; static const struct bench *benchs[] = { &bench_count_global, @@ -327,6 +340,11 @@ static const struct bench *benchs[] = { &bench_trig_kprobe, &bench_trig_fentry, &bench_trig_fmodret, + &bench_rb_libbpf, + &bench_rb_custom, + &bench_pb_libbpf, + &bench_pb_raw, + &bench_pb_custom, }; static void setup_benchmark() diff --git a/tools/testing/selftests/bpf/benchs/bench_ringbufs.c b/tools/testing/selftests/bpf/benchs/bench_ringbufs.c new file mode 100644 index 000000000000..01017105ac08 --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/bench_ringbufs.c @@ -0,0 +1,593 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020 Facebook */ +#include +#include +#include +#include +#include +#include +#include +#include "bench.h" +#include "ringbuf_bench.skel.h" +#include "perfbuf_bench.skel.h" + +static struct { + bool back2back; + int batch_cnt; + int ringbuf_sz; /* per-ringbuf, in bytes */ + bool ringbuf_use_reserve; /* use reserve/submit or output API */ + int perfbuf_sz; /* per-CPU size, in pages */ + int perfbuf_sample_rate; +} args = { + .back2back = false, + .batch_cnt = 500, + .ringbuf_sz = 512 * 1024, + .ringbuf_use_reserve = true, + .perfbuf_sz = 128, + .perfbuf_sample_rate = 500, +}; + +enum { + ARG_RB_BACK2BACK = 2000, + ARG_RB_USE_OUTPUT = 2001, + ARG_RB_BATCH_CNT = 2002, + ARG_RB_SAMPLE_RATE = 2003, +}; + +static const struct argp_option opts[] = { + { "rb-b2b", ARG_RB_BACK2BACK, NULL, 0, "Back-to-back mode"}, + { "rb-use-output", ARG_RB_USE_OUTPUT, NULL, 0, "Use bpf_ringbuf_output() instead of bpf_ringbuf_reserve()"}, + { "rb-batch-cnt", ARG_RB_BATCH_CNT, "CNT", 0, "Set BPF-side record batch count"}, + { "rb-sample-rate", ARG_RB_SAMPLE_RATE, "RATE", 0, "Perf buf's sample rate"}, + {}, +}; + +static error_t parse_arg(int key, char *arg, struct argp_state *state) +{ + switch (key) { + case ARG_RB_BACK2BACK: + args.back2back = true; + break; + case ARG_RB_USE_OUTPUT: + args.ringbuf_use_reserve = false; + break; + case ARG_RB_BATCH_CNT: + args.batch_cnt = strtol(arg, NULL, 10); + if (args.batch_cnt < 0) { + fprintf(stderr, "Invalid batch count."); + argp_usage(state); + } + break; + case ARG_RB_SAMPLE_RATE: + args.perfbuf_sample_rate = strtol(arg, NULL, 10); + if (args.perfbuf_sample_rate < 0) { + fprintf(stderr, "Invalid perfbuf sample rate."); + argp_usage(state); + } + break; + default: + return ARGP_ERR_UNKNOWN; + } + return 0; +} + +/* exported into benchmark runner */ +const struct argp bench_ringbufs_argp = { + .options = opts, + .parser = parse_arg, +}; + +/* RINGBUF-LIBBPF benchmark */ + +static struct counter buf_hits; + +static inline void bufs_trigger_batch() +{ + (void)syscall(__NR_getpgid); +} + +static void bufs_validate() +{ + if (env.consumer_cnt != 1) { + fprintf(stderr, "rb-libbpf benchmark doesn't support multi-consumer!\n"); + exit(1); + } + + if (args.back2back && env.producer_cnt > 1) { + fprintf(stderr, "back-to-back mode makes sense only for single-producer case!\n"); + exit(1); + } +} + +static void *bufs_sample_producer(void *input) +{ + if (args.back2back) { + /* initial batch to get everything started */ + bufs_trigger_batch(); + return NULL; + } + + while (true) + bufs_trigger_batch(); + return NULL; +} + +static struct ringbuf_libbpf_ctx { + struct ringbuf_bench *skel; + struct ring_buffer *ringbuf; +} ringbuf_libbpf_ctx; + +static void ringbuf_libbpf_measure(struct bench_res *res) +{ + struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx; + + res->hits = atomic_swap(&buf_hits.value, 0); + res->drops = atomic_swap(&ctx->skel->bss->dropped, 0); +} + +static struct ringbuf_bench *ringbuf_setup_skeleton() +{ + struct ringbuf_bench *skel; + + setup_libbpf(); + + skel = ringbuf_bench__open(); + if (!skel) { + fprintf(stderr, "failed to open skeleton\n"); + exit(1); + } + + skel->rodata->batch_cnt = args.batch_cnt; + skel->rodata->use_reserve = args.ringbuf_use_reserve ? 1 : 0; + + bpf_map__resize(skel->maps.ringbuf, args.ringbuf_sz); + + if (ringbuf_bench__load(skel)) { + fprintf(stderr, "failed to load skeleton\n"); + exit(1); + } + + return skel; +} + +static int buf_process_sample(void *ctx, void *data, size_t len) +{ + atomic_inc(&buf_hits.value); + return 0; +} + +static void ringbuf_libbpf_setup() +{ + struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx; + struct bpf_link *link; + + ctx->skel = ringbuf_setup_skeleton(); + ctx->ringbuf = ring_buffer__new(bpf_map__fd(ctx->skel->maps.ringbuf), + buf_process_sample, NULL, NULL); + if (!ctx->ringbuf) { + fprintf(stderr, "failed to create ringbuf\n"); + exit(1); + } + + link = bpf_program__attach(ctx->skel->progs.bench_ringbuf); + if (IS_ERR(link)) { + fprintf(stderr, "failed to attach program!\n"); + exit(1); + } +} + +static void *ringbuf_libbpf_consumer(void *input) +{ + struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx; + + while (ring_buffer__poll(ctx->ringbuf, -1) >= 0) { + if (args.back2back) + bufs_trigger_batch(); + } + fprintf(stderr, "ringbuf polling failed!\n"); + return NULL; +} + +/* RINGBUF-CUSTOM benchmark */ +struct ringbuf_custom { + __u64 *consumer_pos; + __u64 *producer_pos; + __u64 mask; + void *data; + int map_fd; +}; + +static struct ringbuf_custom_ctx { + struct ringbuf_bench *skel; + struct ringbuf_custom ringbuf; + int epoll_fd; + struct epoll_event event; +} ringbuf_custom_ctx; + +static void ringbuf_custom_measure(struct bench_res *res) +{ + struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx; + + res->hits = atomic_swap(&buf_hits.value, 0); + res->drops = atomic_swap(&ctx->skel->bss->dropped, 0); +} + +static void ringbuf_custom_setup() +{ + struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx; + const size_t page_size = getpagesize(); + struct bpf_link *link; + struct ringbuf_custom *r; + void *tmp; + int err; + + ctx->skel = ringbuf_setup_skeleton(); + + ctx->epoll_fd = epoll_create1(EPOLL_CLOEXEC); + if (ctx->epoll_fd < 0) { + fprintf(stderr, "failed to create epoll fd: %d\n", -errno); + exit(1); + } + + r = &ctx->ringbuf; + r->map_fd = bpf_map__fd(ctx->skel->maps.ringbuf); + r->mask = args.ringbuf_sz - 1; + + /* Map writable consumer page */ + tmp = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, + r->map_fd, 0); + if (tmp == MAP_FAILED) { + fprintf(stderr, "failed to mmap consumer page: %d\n", -errno); + exit(1); + } + r->consumer_pos = tmp; + + /* Map read-only producer page and data pages. */ + tmp = mmap(NULL, page_size + 2 * args.ringbuf_sz, PROT_READ, MAP_SHARED, + r->map_fd, page_size); + if (tmp == MAP_FAILED) { + fprintf(stderr, "failed to mmap data pages: %d\n", -errno); + exit(1); + } + r->producer_pos = tmp; + r->data = tmp + page_size; + + ctx->event.events = EPOLLIN; + err = epoll_ctl(ctx->epoll_fd, EPOLL_CTL_ADD, r->map_fd, &ctx->event); + if (err < 0) { + fprintf(stderr, "failed to epoll add ringbuf: %d\n", -errno); + exit(1); + } + + link = bpf_program__attach(ctx->skel->progs.bench_ringbuf); + if (IS_ERR(link)) { + fprintf(stderr, "failed to attach program\n"); + exit(1); + } +} + +#define RINGBUF_BUSY_BIT (1 << 31) +#define RINGBUF_DISCARD_BIT (1 << 30) +#define RINGBUF_META_LEN 8 + +static inline int roundup_len(__u32 len) +{ + /* clear out top 2 bits */ + len <<= 2; + len >>= 2; + /* add length prefix */ + len += RINGBUF_META_LEN; + /* round up to 8 byte alignment */ + return (len + 7) / 8 * 8; +} + +static void ringbuf_custom_process_ring(struct ringbuf_custom *r) +{ + __u64 cons_pos, prod_pos; + int *len_ptr, len; + bool got_new_data; + + cons_pos = *r->consumer_pos; + while (true) { + got_new_data = false; + prod_pos = smp_load_acquire(r->producer_pos); + while (cons_pos < prod_pos) { + len_ptr = r->data + (cons_pos & r->mask); + len = smp_load_acquire(len_ptr); + + /* sample not committed yet, bail out for now */ + if (len & RINGBUF_BUSY_BIT) + return; + + got_new_data = true; + cons_pos += roundup_len(len); + + atomic_inc(&buf_hits.value); + } + if (got_new_data) + smp_store_release(r->consumer_pos, cons_pos); + else + break; + }; +} + +static void *ringbuf_custom_consumer(void *input) +{ + struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx; + int cnt; + + do { + if (args.back2back) + bufs_trigger_batch(); + cnt = epoll_wait(ctx->epoll_fd, &ctx->event, 1, -1); + if (cnt > 0) + { + ringbuf_custom_process_ring(&ctx->ringbuf); + } + } while (cnt >= 0); + fprintf(stderr, "ringbuf polling failed!\n"); + return 0; +} + +/* PERFBUF-LIBBPF benchmark */ +static struct perfbuf_libbpf_ctx { + struct perfbuf_bench *skel; + struct perf_buffer *perfbuf; +} perfbuf_libbpf_ctx; + +static void perfbuf_measure(struct bench_res *res) +{ + struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx; + + res->hits = atomic_swap(&buf_hits.value, 0); + res->drops = atomic_swap(&ctx->skel->bss->dropped, 0); +} + +static struct perfbuf_bench *perfbuf_setup_skeleton() +{ + struct perfbuf_bench *skel; + + setup_libbpf(); + + skel = perfbuf_bench__open(); + if (!skel) { + fprintf(stderr, "failed to open skeleton\n"); + exit(1); + } + + skel->rodata->batch_cnt = args.batch_cnt; + + if (perfbuf_bench__load(skel)) { + fprintf(stderr, "failed to load skeleton\n"); + exit(1); + } + + return skel; +} + +static void perfbuf_process_sample(void *input_ctx, int cpu, void *data, + __u32 len) +{ + atomic_inc(&buf_hits.value); +} + +static enum bpf_perf_event_ret +perfbuf_process_sample_raw(void *input_ctx, int cpu, + struct perf_event_header *e) +{ + switch (e->type) { + case PERF_RECORD_SAMPLE: + atomic_inc(&buf_hits.value); + break; + case PERF_RECORD_LOST: + break; + default: + return LIBBPF_PERF_EVENT_ERROR; + } + return LIBBPF_PERF_EVENT_CONT; +} + +static void perfbuf_libbpf_setup() +{ + struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx; + struct perf_buffer_opts pb_opts = { + .sample_cb = perfbuf_process_sample, + .ctx = (void *)(long)0, + }; + struct bpf_link *link; + + ctx->skel = perfbuf_setup_skeleton(); + ctx->perfbuf = perf_buffer__new(bpf_map__fd(ctx->skel->maps.perfbuf), + args.perfbuf_sz, &pb_opts); + if (!ctx->perfbuf) { + fprintf(stderr, "failed to create perfbuf\n"); + exit(1); + } + + link = bpf_program__attach(ctx->skel->progs.bench_perfbuf); + if (IS_ERR(link)) { + fprintf(stderr, "failed to attach program\n"); + exit(1); + } +} + +static void perfbuf_raw_setup() +{ + struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx; + struct perf_event_attr attr; + struct perf_buffer_raw_opts pb_opts = { + .event_cb = perfbuf_process_sample_raw, + .ctx = (void *)(long)0, + .attr = &attr, + }; + struct bpf_link *link; + + ctx->skel = perfbuf_setup_skeleton(); + + memset(&attr, 0, sizeof(attr)); + attr.config = PERF_COUNT_SW_BPF_OUTPUT, + attr.type = PERF_TYPE_SOFTWARE; + attr.sample_type = PERF_SAMPLE_RAW; + /* notify only every Nth sample */ + attr.sample_period = args.perfbuf_sample_rate; + attr.wakeup_events = args.perfbuf_sample_rate; + + if (args.perfbuf_sample_rate > args.batch_cnt) { + fprintf(stderr, "sample rate %d is too high for given batch count %d\n", + args.perfbuf_sample_rate, args.batch_cnt); + exit(1); + } + + ctx->perfbuf = perf_buffer__new_raw(bpf_map__fd(ctx->skel->maps.perfbuf), + args.perfbuf_sz, &pb_opts); + if (!ctx->perfbuf) { + fprintf(stderr, "failed to create perfbuf\n"); + exit(1); + } + + link = bpf_program__attach(ctx->skel->progs.bench_perfbuf); + if (IS_ERR(link)) { + fprintf(stderr, "failed to attach program\n"); + exit(1); + } +} + +static void *perfbuf_libbpf_consumer(void *input) +{ + struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx; + + while (perf_buffer__poll(ctx->perfbuf, -1) >= 0) { + if (args.back2back) + bufs_trigger_batch(); + } + fprintf(stderr, "perfbuf polling failed!\n"); + return NULL; +} + +/* PERFBUF-CUSTOM benchmark */ + +/* copies of internal libbpf definitions */ +struct perf_cpu_buf { + struct perf_buffer *pb; + void *base; /* mmap()'ed memory */ + void *buf; /* for reconstructing segmented data */ + size_t buf_size; + int fd; + int cpu; + int map_key; +}; + +struct perf_buffer { + perf_buffer_event_fn event_cb; + perf_buffer_sample_fn sample_cb; + perf_buffer_lost_fn lost_cb; + void *ctx; /* passed into callbacks */ + + size_t page_size; + size_t mmap_size; + struct perf_cpu_buf **cpu_bufs; + struct epoll_event *events; + int cpu_cnt; /* number of allocated CPU buffers */ + int epoll_fd; /* perf event FD */ + int map_fd; /* BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map FD */ +}; + +static void *perfbuf_custom_consumer(void *input) +{ + struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx; + struct perf_buffer *pb = ctx->perfbuf; + struct perf_cpu_buf *cpu_buf; + struct perf_event_mmap_page *header; + size_t mmap_mask = pb->mmap_size - 1; + struct perf_event_header *ehdr; + __u64 data_head, data_tail; + size_t ehdr_size; + void *base; + int i, cnt; + + while (true) { + if (args.back2back) + bufs_trigger_batch(); + cnt = epoll_wait(pb->epoll_fd, pb->events, pb->cpu_cnt, -1); + if (cnt <= 0) { + fprintf(stderr, "perf epoll failed: %d\n", -errno); + exit(1); + } + + for (i = 0; i < cnt; ++i) { + cpu_buf = pb->events[i].data.ptr; + header = cpu_buf->base; + base = ((void *)header) + pb->page_size; + + data_head = ring_buffer_read_head(header); + data_tail = header->data_tail; + while (data_head != data_tail) { + ehdr = base + (data_tail & mmap_mask); + ehdr_size = ehdr->size; + + if (ehdr->type == PERF_RECORD_SAMPLE) + atomic_inc(&buf_hits.value); + + data_tail += ehdr_size; + } + ring_buffer_write_tail(header, data_tail); + } + } + return NULL; +} + +const struct bench bench_rb_libbpf = { + .name = "rb-libbpf", + .validate = bufs_validate, + .setup = ringbuf_libbpf_setup, + .producer_thread = bufs_sample_producer, + .consumer_thread = ringbuf_libbpf_consumer, + .measure = ringbuf_libbpf_measure, + .report_progress = hits_drops_report_progress, + .report_final = hits_drops_report_final, +}; + +const struct bench bench_rb_custom = { + .name = "rb-custom", + .validate = bufs_validate, + .setup = ringbuf_custom_setup, + .producer_thread = bufs_sample_producer, + .consumer_thread = ringbuf_custom_consumer, + .measure = ringbuf_custom_measure, + .report_progress = hits_drops_report_progress, + .report_final = hits_drops_report_final, +}; + +const struct bench bench_pb_libbpf = { + .name = "pb-libbpf", + .validate = bufs_validate, + .setup = perfbuf_libbpf_setup, + .producer_thread = bufs_sample_producer, + .consumer_thread = perfbuf_libbpf_consumer, + .measure = perfbuf_measure, + .report_progress = hits_drops_report_progress, + .report_final = hits_drops_report_final, +}; + +const struct bench bench_pb_raw = { + .name = "pb-raw", + .validate = bufs_validate, + .setup = perfbuf_raw_setup, + .producer_thread = bufs_sample_producer, + .consumer_thread = perfbuf_libbpf_consumer, + .measure = perfbuf_measure, + .report_progress = hits_drops_report_progress, + .report_final = hits_drops_report_final, +}; + +const struct bench bench_pb_custom = { + .name = "pb-custom", + .validate = bufs_validate, + .setup = perfbuf_raw_setup, + .producer_thread = bufs_sample_producer, + .consumer_thread = perfbuf_custom_consumer, + .measure = perfbuf_measure, + .report_progress = hits_drops_report_progress, + .report_final = hits_drops_report_final, +}; + diff --git a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh new file mode 100755 index 000000000000..15b6cd5bd9b3 --- /dev/null +++ b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh @@ -0,0 +1,61 @@ +#!/bin/bash + +set -eufo pipefail + +RUN_BENCH="sudo ./bench -w3 -d10 -a" + +function hits() +{ + echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/" +} + +function drops() +{ + echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/" +} + +function header() +{ + local len=${#1} + + printf "\n%s\n" "$1" + for i in $(seq 1 $len); do printf '='; done + printf '\n' +} + +function summarize() +{ + bench="$1" + summary=$(echo $2 | tail -n1) + printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)" +} + +header "Single-producer, parallel producer" +for b in rb-libbpf rb-custom pb-libbpf pb-raw pb-custom; do + summarize $b "$($RUN_BENCH $b)" +done + +header "Single-producer, back-to-back mode" +for b in rb-libbpf rb-custom pb-libbpf pb-raw pb-custom; do + summarize $b "$($RUN_BENCH --rb-b2b $b)" +done + +header "Perfbuf, effect of sample rate, back-to-back" +for b in 1 5 10 25 50 100 250 500 1000 2000 3000; do + summarize "sample rate $b" "$($RUN_BENCH --rb-b2b --rb-batch-cnt 3000 --rb-sample-rate $b pb-raw)" +done + +header "Ringbuf, reserve+commit vs output, back-to-back" +summarize "reserve" "$($RUN_BENCH --rb-b2b rb-custom)" +summarize "output" "$($RUN_BENCH --rb-b2b --rb-use-output rb-custom)" + +header "Single-producer, consumer/producer competing on the same CPU, low batch count" +for b in rb-libbpf rb-custom pb-libbpf pb-raw pb-custom; do + summarize $b "$($RUN_BENCH --rb-sample-rate 1 --rb-batch-cnt 1 --prod-affinity 0 --cons-affinity 0 $b)" +done + +header "Ringbuf, multi-producer contention, low batch count" +for b in 1 2 3 4 8 12 16 20 24 28 32 36 40 44 48 52; do + summarize "rb-libbpf nr_prod $b" "$($RUN_BENCH -p$b --rb-batch-cnt 50 rb-libbpf)" +done + diff --git a/tools/testing/selftests/bpf/progs/perfbuf_bench.c b/tools/testing/selftests/bpf/progs/perfbuf_bench.c new file mode 100644 index 000000000000..e5ab4836a641 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/perfbuf_bench.c @@ -0,0 +1,33 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2020 Facebook + +#include +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct { + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(value_size, sizeof(int)); + __uint(key_size, sizeof(int)); +} perfbuf SEC(".maps"); + +const volatile int batch_cnt = 0; + +long sample_val = 42; +long dropped __attribute__((aligned(128))) = 0; + +SEC("fentry/__x64_sys_getpgid") +int bench_perfbuf(void *ctx) +{ + __u64 *sample; + int i; + + for (i = 0; i < batch_cnt; i++) { + if (bpf_perf_event_output(ctx, &perfbuf, BPF_F_CURRENT_CPU, + &sample_val, sizeof(sample_val))) + __sync_add_and_fetch(&dropped, 1); + } + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/ringbuf_bench.c b/tools/testing/selftests/bpf/progs/ringbuf_bench.c new file mode 100644 index 000000000000..6008ec5d6a22 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/ringbuf_bench.c @@ -0,0 +1,45 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2020 Facebook + +#include +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct { + __uint(type, BPF_MAP_TYPE_RINGBUF); +} ringbuf SEC(".maps"); + +const volatile int batch_cnt = 0; +const volatile long use_reserve = 1; + +long sample_val = 42; +long dropped __attribute__((aligned(128))) = 0; + +SEC("fentry/__x64_sys_getpgid") +int bench_ringbuf(void *ctx) +{ + long *sample; + int i; + + if (use_reserve) { + for (i = 0; i < batch_cnt; i++) { + sample = bpf_ringbuf_reserve(&ringbuf, + sizeof(sample_val), 0); + if (!sample) { + __sync_add_and_fetch(&dropped, 1); + } else { + *sample = sample_val; + bpf_ringbuf_submit(sample); + } + } + } else { + for (i = 0; i < batch_cnt; i++) { + if (bpf_ringbuf_output(&ringbuf, &sample_val, + sizeof(sample_val), 0)) + __sync_add_and_fetch(&dropped, 1); + } + } + return 0; +}