[RFC,bpf-next] ksnoop: kernel argument/return value tracing/display using BTF

BPF Type Format (BTF) provides a description of kernel data structures
and of the types kernel functions utilize as arguments and return values.

A helper was recently added - bpf_snprintf_btf() - that uses that
description to create a string representation of the data provided,
using the BTF id of its type.  For example to create a string
representation of a "struct sk_buff", the pointer to the skb
is provided along with the type id of "struct sk_buff".

Here that functionality is utilized to support tracing kernel
function entry and return using k[ret]probes.  The "struct pt_regs"
context can be used to derive arguments and return values, and
when the user supplies a function name we

- look it up in /proc/kallsyms to find its address/module
- look it up in the BTF kernel data to get types of arguments
  and return value
- store a map representation of the trace information, keyed by
  instruction pointer
- on function entry/return we look up the map to retrieve the BTF
  ids of the arguments/return values and can call bpf_snprintf_btf()
  with these argument/return values along with the type ids to store
  a string representation in the map.
- this is then sent via perf event to userspace where it can be
  displayed.

ksnoop can be used to show function signatures; for example:

$ ksnoop info ip_send_skb
int  ip_send_skb(struct net  * net, struct sk_buff  * skb);

Then we can trace the function, for example:

$ ksnoop trace ip_send_skb
                TASK    PID CPU#     TIMESTAMP FUNCTION

                ping   3833    1 251523.616148 ip_send_skb(
                                                net = *(struct net){
                                                 .passive = (refcount_t){
                                                  .refs = (atomic_t){
                                                   .counter = (int)2,
                                                  },
                                                 },

etc.  Truncated data is suffixed by "..." (2048 bytes of
string value are provided for each argument).  Up to
five arguments are displayed.

The arguments are referred to via name (e.g. skb, net), and
the return value is referred to as "return" (using the keyword
ensures we can never clash with an argument name), i.e.

                ping   3833    1 251523.617250 ip_send_skb(
                                                return = (int)0

                                               );

ksnoop can select specific arguments/return value rather
than tracing everything; for example:

$ ksnoop "ip_send_skb(skb)"

...will only trace the skb argument.  A single level of
reference is supported also, for example:

$ ksnoop "ip_send_skb(skb->sk)"

..for a pointer member or

$ ksnoop "ip_send_skb(skb->len)"

...for a non-pointer member.

Multiple functions can be specified also, for example:

$ ksnoop ip_send_skb ip_rcv

ksnoop will work for in-kernel and module-specific functions,
but in the latter case only base types or core kernel types
will be displayed; bpf_snprintf_btf() does not currently
support module-specific type display.

If invalid memory (such as a userspace pointers or invalid
NULL pointers) is encountered in function arguments, return
values or references, ksnoop will report it like this:

          irqbalance   1043    3 282167.478364 getname(
                                                filename = 0x7ffd5a0cca10
                                                /* Cannot show 'filename' as 'char  *'.
                                                 * Userspace/invalid ptr? */

                                               );

ksnoop can handle simple predicate evaluations;
"==", "!=", ">", "<", ">=", "<=" are supported and the
the assumption is that for a trace to be recorded, all
predicates have to evaluate to true.  For example:

$ ksnoop "ip_send_skb(skb->len == 84, skb)"
                ping  19009    1  19671.328156 ip_send_skb(
                                                skb->len = (unsigned int)84
                                                ,

                                                skb = *(struct sk_buff){
                                                 (union){
                                                  .sk = (struct sock *)0xffff930a01095c00,
                                                  .ip_defrag_offset = (int)17390592,
                                                 },
                                                 (union){
                                                  (struct){
                                                   ._skb_refdst = (long unsigned int)18446624275917552448,
                                                   .destructor = ( *)0xffffffffa5bfaf00,
                                                  },
                                                  .tcp_tsorted_anchor = (struct list_head){
                                                   .next = (struct list_head *)0xffff930b6729bb40,
                                                   .prev = (struct list_head *)0xffffffffa5bfaf00,
                                                  },
                                                 },
                                                 .len = (unsigned int)84,
                                                 .ignore_df = (__u8)0x1,
                                                 (union){
                                                  .csum = (__wsum)2619910871,
                                                  (struct){
                                                   .csum_start = (__u16)43735,
                                                   .csum_offset = (__u16)39976,
                                                  },
                                                 },
                                                 .transport_header = (__u16)36,
                                                 .network_header = (__u16)16,
                                                 .mac_header = (__u16)65535,
                                                 .tail = (sk_buff_data_t)100,
                                                 .end = (sk_buff_data_t)192,
                                                 .head = (unsigned char *)0xffff930b9d3cf800,
                                                 .data = (unsigned char *)0xffff930b9d3cf810,
                                                 .truesize = (unsigned int)768,
                                                 .users = (refcount_t){
                                                  .refs = (atomic_t){
                                                   .counter = (int)1,
                                                  },
                                                 },
                                                }

                                               );

It is possible to combine a request for entry arguments with a
predicate on return value; for example we might want to see
skbs on entry for cases where ip_send_skb eventually returned
an error value.  To do this, a predicate such as

$ ksnoop "ip_send_skb(skb, return!=0)"

...could be used.  On entry, rather than sending perf events
the skb argument string representation is "stashed", and
on return if the predicate is satisfied, the stashed data
along with return-value-related data is sent as a perf
event.  This allows us to satisfy requests such as
"show me entry argument X when the function fails, returning
a negative errno".

A note about overhead: it is very high.  The overhead costs are
a combination of known kprobe overhead costs and the cost of
assembling string representations of kernel data.

Use of predicates can mitigate overhead, as collection of trace
data will only occur when the predicate is satisfied; in such
cases it is best to lead with the predicate, e.g.

ksnoop "ip_send_skb(skb->dev == 0, skb)"

...as this will be evaluated before the skb is stringified,
and we potentially avoid that operation if the predicate fails.
The same is _not_ true however in the stash case; for

ksnoop "ip_send_skb(skb, return!=0)"

...we must collect the skb representation on entry as we do not
yet know if the function will fail or not.  If it does, the
data is discarded rather than sent as a perf event.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/bpf/Makefile            |  16 +-
 tools/bpf/ksnoop/Makefile     | 102 +++++
 tools/bpf/ksnoop/ksnoop.bpf.c | 336 +++++++++++++++
 tools/bpf/ksnoop/ksnoop.c     | 981 ++++++++++++++++++++++++++++++++++++++++++
 tools/bpf/ksnoop/ksnoop.h     | 110 +++++
 5 files changed, 1542 insertions(+), 3 deletions(-)
 create mode 100644 tools/bpf/ksnoop/Makefile
 create mode 100644 tools/bpf/ksnoop/ksnoop.bpf.c
 create mode 100644 tools/bpf/ksnoop/ksnoop.c
 create mode 100644 tools/bpf/ksnoop/ksnoop.h

Message ID	1609773991-10509-1-git-send-email-alan.maguire@oracle.com
State	New
Headers	show Return-Path: <netdev-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0875FC433E0 for <netdev@archiver.kernel.org>; Mon, 4 Jan 2021 15:28:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BA2B222211 for <netdev@archiver.kernel.org>; Mon, 4 Jan 2021 15:28:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727519AbhADP1x (ORCPT <rfc822;netdev@archiver.kernel.org>); Mon, 4 Jan 2021 10:27:53 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:42506 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727307AbhADP1x (ORCPT <rfc822;netdev@vger.kernel.org>); Mon, 4 Jan 2021 10:27:53 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 104FEpvL114263; Mon, 4 Jan 2021 15:26:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id; s=corp-2020-01-29; bh=JdklI/p5vryj31jff16hPVUbbJ8xCtRAW579tZKTmiA=; b=gevogI3GAq9pwTyFZl34K6C1fdtTm+9Mdb7aG25M8nFCgkuw7wJLG/AcXKRQauIlKKe3 ZHDB5mxJfJ81mZj8ffBAecMitD3S+W85/9b84RLaqqv+9l/E1lUofiMOm4TJg130djYJ uHBgfqpEphv6ztFQfQITm53n59acRhnhHKws07EJ/zMoRMZBnjE7e3G4vV3Vg72wy5nM EGpXdIneIxGXvvmErJIy4XbYQIMUFokE37Y0gMHSWnq+9LZJHPO996ihcFKEpOTUgdjT 2tUrPwh9GmKrqvtP+Tn0hY1Jer6gvjcmSMSOKdnUuextkSrwHBgZ8R1hmnJQu0QkJgwh Vw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 35tgskmrt1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 04 Jan 2021 15:26:43 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 104FFOvn064560; Mon, 4 Jan 2021 15:26:43 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 35v2axahj7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 Jan 2021 15:26:42 +0000 Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 104FQeJZ009859; Mon, 4 Jan 2021 15:26:41 GMT Received: from localhost.localdomain (/95.45.14.174) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 Jan 2021 15:26:39 +0000 From: Alan Maguire <alan.maguire@oracle.com> To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, natechancellor@gmail.com, ndesaulniers@google.com, toke@redhat.com, jean-philippe@linaro.org, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH bpf-next] ksnoop: kernel argument/return value tracing/display using BTF Date: Mon, 4 Jan 2021 15:26:31 +0000 Message-Id: <1609773991-10509-1-git-send-email-alan.maguire@oracle.com> X-Mailer: git-send-email 1.8.3.1 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9854 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 phishscore=0 suspectscore=0 spamscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101040100 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9853 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 spamscore=0 malwarescore=0 phishscore=0 impostorscore=0 bulkscore=0 clxscore=1011 priorityscore=1501 lowpriorityscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101040100 Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org
Series	[RFC,bpf-next] ksnoop: kernel argument/return value tracing/display using BTF \| expand [RFC,bpf-next] ksnoop: kernel argument/return value tracing/display using BTF

[RFC,bpf-next] ksnoop: kernel argument/return value tracing/display using BTF

Commit Message

Comments

Patch