mbox series

[RESEND,bpf-next,v3,0/9] bpf: Support multi-attach for freplace programs

Message ID 159981835466.134722.8652987144251743467.stgit@toke.dk
Headers show
Series bpf: Support multi-attach for freplace programs | expand

Message

Toke Høiland-Jørgensen Sept. 11, 2020, 9:59 a.m. UTC
This series adds support attaching freplace BPF programs to multiple targets.
This is needed to support incremental attachment of multiple XDP programs using
the libxdp dispatcher model.

The first three patches are refactoring patches: The first one is a trivial
change to the logging in the verifier, split out to make the subsequent refactor
easier to read. Patch 2 refactors check_attach_btf_id() so that the checks on
program and target compatibility can be reused when attaching to a secondary
location.

Patch 3 changes prog_aux->linked_prog to be an embedded bpf_tracing_link that is
initialised at program load time. This nicely encapsulates both the trampoline
and the prog reference, and moves the release of these references into bpf_link
teardown. At raw_tracepoint_open() time (i.e., when the link is attached), it
will be removed from the extension prog, and primed as a regular bpf_link.

Based on these refactorings, it becomes pretty straight-forward to support
multiple-attach for freplace programs (patch 4). This is simply a matter of
creating a second bpf_tracing_link if a target is supplied to
raw_tracepoint_open().

Patch 5 is a port of Jiri Olsa's patch to support fentry/fexit on freplace
programs. His approach of getting the target type from the target program
reference no longer works after we've gotten rid of linked_prog (because the
bpf_tracing_link reference disappears on attach). Instead, we used the saved
reference to the target prog type that is also used to verify compatibility on
secondary freplace attachment.

Patches 6-7 are tools and libbpf updates, and patches 8-9 are selftests, the
first one for the multi-freplace functionality itself, and the second one is
Jiri's previous selftest for the fentry-to-freplace fix.

With this series, libxdp and xdp-tools can successfully attach multiple programs
one at a time. To play with this, use the 'freplace-multi-attach' branch of
xdp-tools:

$ git clone --recurse-submodules --branch freplace-multi-attach https://github.com/xdp-project/xdp-tools
$ cd xdp-tools
$ make
$ sudo ./xdp-loader/xdp-loader load veth0 lib/testing/xdp_drop.o
$ sudo ./xdp-loader/xdp-loader load veth0 lib/testing/xdp_pass.o
$ sudo ./xdp-loader/xdp-loader status

The series is also available here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=bpf-freplace-multi-attach-alt-03

Changelog:

v3:
  - Get rid of prog_aux->linked_prog entirely in favour of a bpf_tracing_link
  - Incorporate Jiri's fix for attaching fentry to freplace programs

v2:
  - Drop the log arguments from bpf_raw_tracepoint_open
  - Fix kbot errors
  - Rebase to latest bpf-next

---

Jiri Olsa (1):
      selftests/bpf: Adding test for arg dereference in extension trace

Toke Høiland-Jørgensen (8):
      bpf: change logging calls from verbose() to bpf_log() and use log pointer
      bpf: verifier: refactor check_attach_btf_id()
      bpf: wrap prog->aux->linked_prog in a bpf_tracing_link
      bpf: support attaching freplace programs to multiple attach points
      bpf: Fix context type resolving for extension programs
      tools: add new members to bpf_attr.raw_tracepoint in bpf.h
      libbpf: add support for supplying target to bpf_raw_tracepoint_open()
      selftests: add test for multiple attachments of freplace program


 include/linux/bpf.h                           |  33 ++-
 include/linux/bpf_verifier.h                  |   9 +
 include/uapi/linux/bpf.h                      |   6 +-
 kernel/bpf/btf.c                              |  22 +-
 kernel/bpf/core.c                             |   5 +-
 kernel/bpf/syscall.c                          | 161 +++++++++--
 kernel/bpf/trampoline.c                       |  34 ++-
 kernel/bpf/verifier.c                         | 251 ++++++++++--------
 tools/include/uapi/linux/bpf.h                |   6 +-
 tools/lib/bpf/bpf.c                           |  13 +-
 tools/lib/bpf/bpf.h                           |   9 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/fexit_bpf2bpf.c  | 171 +++++++++---
 .../selftests/bpf/prog_tests/trace_ext.c      |  93 +++++++
 .../bpf/progs/freplace_get_constant.c         |  15 ++
 .../selftests/bpf/progs/test_trace_ext.c      |  18 ++
 .../bpf/progs/test_trace_ext_tracing.c        |  25 ++
 17 files changed, 683 insertions(+), 189 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/trace_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/freplace_get_constant.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_trace_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_trace_ext_tracing.c

Comments

Toke Høiland-Jørgensen Sept. 11, 2020, 8:56 p.m. UTC | #1
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Fri, Sep 11, 2020 at 3:00 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> The bpf_tracing_link structure is a convenient data structure to contain
>> the reference to a linked program; in preparation for supporting multiple
>> attachments for the same freplace program, move the linked_prog in
>> prog->aux into a bpf_tracing_link wrapper.
>>
>> With this change, it is no longer possible to attach the same tracing
>> program multiple times (detaching in-between), since the reference from the
>> tracing program to the target disappears on the first attach. However,
>> since the next patch will let the caller supply an attach target, that will
>> also make it possible to attach to the same place multiple times.
>>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>  include/linux/bpf.h     |   21 +++++++++---
>>  kernel/bpf/btf.c        |   13 +++++---
>>  kernel/bpf/core.c       |    5 +--
>>  kernel/bpf/syscall.c    |   81 +++++++++++++++++++++++++++++++++++++----------
>>  kernel/bpf/trampoline.c |   12 ++-----
>>  kernel/bpf/verifier.c   |   13 +++++---
>>  6 files changed, 102 insertions(+), 43 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 7f19c3216370..722c60f1c1fc 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -26,6 +26,7 @@ struct bpf_verifier_log;
>>  struct perf_event;
>>  struct bpf_prog;
>>  struct bpf_prog_aux;
>> +struct bpf_tracing_link;
>>  struct bpf_map;
>>  struct sock;
>>  struct seq_file;
>> @@ -614,8 +615,8 @@ static __always_inline unsigned int bpf_dispatcher_nop_func(
>>  }
>>  #ifdef CONFIG_BPF_JIT
>>  struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
>> -int bpf_trampoline_link_prog(struct bpf_prog *prog);
>> -int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
>> +int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr);
>> +int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr);
>>  int bpf_trampoline_get(u64 key, void *addr,
>>                        struct btf_func_model *fmodel,
>>                        struct bpf_trampoline **trampoline);
>> @@ -667,11 +668,13 @@ static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>  {
>>         return NULL;
>>  }
>> -static inline int bpf_trampoline_link_prog(struct bpf_prog *prog)
>> +static inline int bpf_trampoline_link_prog(struct bpf_prog *prog,
>> +                                          struct bpf_trampoline *tr)
>>  {
>>         return -ENOTSUPP;
>>  }
>> -static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
>> +static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog,
>> +                                            struct bpf_trampoline *tr)
>>  {
>>         return -ENOTSUPP;
>>  }
>> @@ -740,14 +743,13 @@ struct bpf_prog_aux {
>>         u32 max_rdonly_access;
>>         u32 max_rdwr_access;
>>         const struct bpf_ctx_arg_aux *ctx_arg_info;
>> -       struct bpf_prog *linked_prog;
>> +       struct bpf_tracing_link *tgt_link;
>>         bool verifier_zext; /* Zero extensions has been inserted by verifier. */
>>         bool offload_requested;
>>         bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
>>         bool func_proto_unreliable;
>>         bool sleepable;
>>         enum bpf_tramp_prog_type trampoline_prog_type;
>> -       struct bpf_trampoline *trampoline;
>>         struct hlist_node tramp_hlist;
>>         /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
>>         const struct btf_type *attach_func_proto;
>> @@ -827,6 +829,13 @@ struct bpf_link {
>>         struct work_struct work;
>>  };
>>
>> +struct bpf_tracing_link {
>> +       struct bpf_link link;
>> +       enum bpf_attach_type attach_type;
>> +       struct bpf_trampoline *trampoline;
>> +       struct bpf_prog *tgt_prog;
>> +};
>> +
>>  struct bpf_link_ops {
>>         void (*release)(struct bpf_link *link);
>>         void (*dealloc)(struct bpf_link *link);
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 2ace56c99c36..e10f13f8251c 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -3706,10 +3706,10 @@ struct btf *btf_parse_vmlinux(void)
>>
>>  struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog)
>>  {
>> -       struct bpf_prog *tgt_prog = prog->aux->linked_prog;
>> +       struct bpf_tracing_link *tgt_link = prog->aux->tgt_link;
>>
>> -       if (tgt_prog) {
>> -               return tgt_prog->aux->btf;
>> +       if (tgt_link && tgt_link->tgt_prog) {
>> +               return tgt_link->tgt_prog->aux->btf;
>>         } else {
>>                 return btf_vmlinux;
>>         }
>> @@ -3733,14 +3733,17 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>>                     struct bpf_insn_access_aux *info)
>>  {
>>         const struct btf_type *t = prog->aux->attach_func_proto;
>> -       struct bpf_prog *tgt_prog = prog->aux->linked_prog;
>>         struct btf *btf = bpf_prog_get_target_btf(prog);
>>         const char *tname = prog->aux->attach_func_name;
>>         struct bpf_verifier_log *log = info->log;
>> +       struct bpf_prog *tgt_prog = NULL;
>>         const struct btf_param *args;
>>         u32 nr_args, arg;
>>         int i, ret;
>>
>> +       if (prog->aux->tgt_link)
>> +               tgt_prog = prog->aux->tgt_link->tgt_prog;
>> +
>>         if (off % 8) {
>>                 bpf_log(log, "func '%s' offset %d is not multiple of 8\n",
>>                         tname, off);
>> @@ -4572,7 +4575,7 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
>>                 return -EFAULT;
>>         }
>>         if (prog_type == BPF_PROG_TYPE_EXT)
>> -               prog_type = prog->aux->linked_prog->type;
>> +               prog_type = prog->aux->tgt_link->tgt_prog->type;
>>
>>         t = btf_type_by_id(btf, t->type);
>>         if (!t || !btf_type_is_func_proto(t)) {
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index ed0b3578867c..54c125cec218 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -2130,7 +2130,6 @@ static void bpf_prog_free_deferred(struct work_struct *work)
>>         if (aux->prog->has_callchain_buf)
>>                 put_callchain_buffers();
>>  #endif
>> -       bpf_trampoline_put(aux->trampoline);
>>         for (i = 0; i < aux->func_cnt; i++)
>>                 bpf_jit_free(aux->func[i]);
>>         if (aux->func_cnt) {
>> @@ -2146,8 +2145,8 @@ void bpf_prog_free(struct bpf_prog *fp)
>>  {
>>         struct bpf_prog_aux *aux = fp->aux;
>>
>> -       if (aux->linked_prog)
>> -               bpf_prog_put(aux->linked_prog);
>> +       if (aux->tgt_link)
>> +               bpf_link_put(&aux->tgt_link->link);
>
> Until the link is primed, you shouldn't bpf_link_put() it. At this
> stage the link itself is just a piece of memory that needs to be
> kfree()'d. And your circular dependency problem doesn't exist anymore.
> You'll have to put a trampoline and target prog manually here, though
> (but you have a similar problem below as well, so might just have a
> small helper to do this). But I think it's simpler that relying on an
> artificial "defunct" state of not-yet-activated bpf_link, which you do
> with the dance around link->prog = NULL.

Yeah, makes sense. I initially figured that would be 'breaking the
abstraction' of bpf_link, but I ended up having to do that anyway, so
you're right I might as well treat it as a piece of memory here.

[...]

>> @@ -2574,14 +2614,16 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
>>                 goto out_put_prog;
>>         }
>>
>> -       link = kzalloc(sizeof(*link), GFP_USER);
>> +       link = READ_ONCE(prog->aux->tgt_link);
>>         if (!link) {
>> -               err = -ENOMEM;
>> +               err = -ENOENT;
>> +               goto out_put_prog;
>> +       }
>> +       olink = cmpxchg(&prog->aux->tgt_link, link, NULL);
>> +       if (olink != link) {
>> +               err = -ENOENT;
>>                 goto out_put_prog;
>>         }
>
> Wouldn't single xchg to NULL be sufficient to achieve the same?
> READ_ONCE + cmpxchg seems unnecessary to me.

It would, but in the next patch I'm introducing a check on the contents
of the link before cmpxchg'ing it, so figured it was easier to just use
the same pattern here.

>> -       bpf_link_init(&link->link, BPF_LINK_TYPE_TRACING,
>> -                     &bpf_tracing_link_lops, prog);
>> -       link->attach_type = prog->expected_attach_type;
>>
>>         err = bpf_link_prime(&link->link, &link_primer);
>>         if (err) {
>
> if priming errors out, you need to put target prog and trampoline,
> kfree(link) won't do it (and calling bpf_link_cleanup() is not correct
> before priming). See above as well.

Ah yes, good catch!

> BTW, one interesting side effect of all this is that if your initial
> attach failed, you won't be able to try again, because
> prog->aux->tgt_link is gone. If that's the problem, we'll need to
> introduce locking and copy that link, try to attach, then clear out
> prog->aug->tgt_link only if we succeeded. Just bringing this up, as it
> might not be obvious (or I might be wrong :).

Yeah, did think about that. From a purist PoV you're right that a
"destructive attempt" is not ideal; but we already agreed that clearing
out the link on attach was an acceptable change in behaviour. And I
figured that a failure in link_prim() or trampoline_link_prog() would be
quite rare, so not something we'd want to expend a lot of effort
ensuring was really atomic...

-Toke
Andrii Nakryiko Sept. 11, 2020, 9:10 p.m. UTC | #2
On Fri, Sep 11, 2020 at 3:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> This enables support for attaching freplace programs to multiple attach
> points. It does this by amending UAPI for bpf_raw_tracepoint_open with a
> target prog fd and btf ID pair that can be used to supply the new
> attachment point. The target must be compatible with the target that was
> supplied at program load time.
>
> The implementation reuses the checks that were factored out of
> check_attach_btf_id() to ensure compatibility between the BTF types of the
> old and new attachment. If these match, a new bpf_tracing_link will be
> created for the new attach target, allowing multiple attachments to
> co-exist simultaneously.
>
> The code could theoretically support multiple-attach of other types of
> tracing programs as well, but since I don't have a use case for any of
> those, the bpf_tracing_prog_attach() function will reject new targets for
> anything other than PROG_TYPE_EXT programs.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---

It feels like using a semi-constructed bpf_tracing_link inside
prog->aux->tgt_link is just an unnecessary complication, after reading
this and previous patches. Seems more straightforward and simpler to
store tgt_attach_type/tgt_prog_type (permanently) and
tgt_prog/tgt_trampoline (until first attachment) in prog->aux and then
properly create bpf_link on attach.

>  include/linux/bpf.h      |    3 +
>  include/uapi/linux/bpf.h |    6 ++-
>  kernel/bpf/syscall.c     |   96 +++++++++++++++++++++++++++++++++++++++-------
>  kernel/bpf/verifier.c    |    9 ++++
>  4 files changed, 97 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 722c60f1c1fc..c6b856b2d296 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -753,6 +753,9 @@ struct bpf_prog_aux {
>         struct hlist_node tramp_hlist;
>         /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
>         const struct btf_type *attach_func_proto;
> +       /* target BPF prog types for trace programs */
> +       enum bpf_prog_type tgt_prog_type;
> +       enum bpf_attach_type tgt_attach_type;
>         /* function name for valid attach_btf_id */
>         const char *attach_func_name;
>         struct bpf_prog **func;
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 90359cab501d..0885ab6ac8d9 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -595,8 +595,10 @@ union bpf_attr {
>         } query;
>
>         struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
> -               __u64 name;
> -               __u32 prog_fd;
> +               __u64           name;
> +               __u32           prog_fd;
> +               __u32           tgt_prog_fd;
> +               __u32           tgt_btf_id;
>         } raw_tracepoint;

rant: any chance of putting this into LINK_CREATE instead of extending
very unfortunately named RAW_TRACEPOINT_OPEN?

>
>         struct { /* anonymous struct for BPF_BTF_LOAD */
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 2d238aa8962e..7b1da5f063eb 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -4,6 +4,7 @@
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
>  #include <linux/bpf_lirc.h>
> +#include <linux/bpf_verifier.h>
>  #include <linux/btf.h>
>  #include <linux/syscalls.h>
>  #include <linux/slab.h>
> @@ -2582,10 +2583,16 @@ static struct bpf_tracing_link *bpf_tracing_link_create(struct bpf_prog *prog,
>         return link;
>  }
>
> -static int bpf_tracing_prog_attach(struct bpf_prog *prog)
> +static int bpf_tracing_prog_attach(struct bpf_prog *prog,
> +                                  int tgt_prog_fd,
> +                                  u32 btf_id)
>  {
> -       struct bpf_tracing_link *link, *olink;
>         struct bpf_link_primer link_primer;
> +       struct bpf_prog *tgt_prog = NULL;
> +       struct bpf_tracing_link *link;
> +       struct btf_func_model fmodel;
> +       long addr;
> +       u64 key;
>         int err;
>
>         switch (prog->type) {
> @@ -2613,28 +2620,80 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
>                 err = -EINVAL;
>                 goto out_put_prog;
>         }
> +       if (tgt_prog_fd) {
> +               /* For now we only allow new targets for BPF_PROG_TYPE_EXT */
> +               if (prog->type != BPF_PROG_TYPE_EXT ||
> +                   !btf_id) {
> +                       err = -EINVAL;
> +                       goto out_put_prog;
> +               }
>
> -       link = READ_ONCE(prog->aux->tgt_link);
> -       if (!link) {
> -               err = -ENOENT;
> +               tgt_prog = bpf_prog_get(tgt_prog_fd);
> +               if (IS_ERR(tgt_prog)) {
> +                       err = PTR_ERR(tgt_prog);
> +                       tgt_prog = NULL;
> +                       goto out_put_prog;
> +               }
> +
> +               key = ((u64)tgt_prog->aux->id) << 32 | btf_id;
> +       } else if (btf_id) {
> +               err = -EINVAL;
>                 goto out_put_prog;
>         }
> -       olink = cmpxchg(&prog->aux->tgt_link, link, NULL);
> -       if (olink != link) {
> -               err = -ENOENT;
> -               goto out_put_prog;
> +
> +       link = READ_ONCE(prog->aux->tgt_link);
> +       if (link) {
> +               if (tgt_prog && link->trampoline->key != key) {

I think we need to have a proper locking about this. Imagine two
attaches racing, both read non-NULL tgt_link, one of them proceeds to
cmpxchg, attach, detach, and free link. Then this one wakes up and
tries to access freed memory here. We are coordinating multiple
threads on this, it needs to be locked, at least for simplicity, given
that performance is not critical here.

> +                       link = NULL;
> +               } else {
> +                       struct bpf_tracing_link *olink;
> +
> +                       olink = cmpxchg(&prog->aux->tgt_link, link, NULL);
> +                       if (olink != link) {
> +                               link = NULL;
> +                       } else if (tgt_prog) {
> +                               /* re-using link that already has ref on
> +                                * tgt_prog, don't take another
> +                                */
> +                               bpf_prog_put(tgt_prog);
> +                               tgt_prog = NULL;
> +                       }
> +               }
> +       }
> +
> +       if (!link) {
> +               if (!tgt_prog) {
> +                       err = -ENOENT;
> +                       goto out_put_prog;
> +               }
> +
> +               err = bpf_check_attach_target(NULL, prog, tgt_prog, btf_id,
> +                                             &fmodel, &addr, NULL, NULL);
> +               if (err)
> +                       goto out_put_prog;
> +
> +               link = bpf_tracing_link_create(prog, tgt_prog);
> +               if (IS_ERR(link)) {
> +                       err = PTR_ERR(link);
> +                       goto out_put_prog;
> +               }
> +               tgt_prog = NULL;
> +
> +               err = bpf_trampoline_get(key, (void *)addr, &fmodel, &link->trampoline);
> +               if (err)
> +                       goto out_put_link;

see previous patch, let's avoid bpf_link_put before bpf_link_settle.
bpf_link_cleanup() is for after priming, otherwise it's just a kfree.

>         }
>
>         err = bpf_link_prime(&link->link, &link_primer);
>         if (err) {
>                 kfree(link);
> -               goto out_put_prog;
> +               goto out_put_link;

hm... did you try running this with KASAN? you are freeing link and
then bpf_link_put() on it?


>         }
>
>         err = bpf_trampoline_link_prog(prog, link->trampoline);
>         if (err) {
>                 bpf_link_cleanup(&link_primer);
> -               goto out_put_prog;
> +               goto out_put_link;

similarly, you've already bpf_link_cleanup()'d, no need to do extra
bpf_link_put()

>         }
>
>         /* at this point the link is no longer referenced from struct bpf_prog,
> @@ -2643,8 +2702,12 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
>         link->link.prog = prog;
>
>         return bpf_link_settle(&link_primer);
> +out_put_link:
> +       bpf_link_put(&link->link);
>  out_put_prog:
>         bpf_prog_put(prog);
> +       if (tgt_prog)
> +               bpf_prog_put(tgt_prog);
>         return err;
>  }
>

[...]
Toke Høiland-Jørgensen Sept. 14, 2020, 4:08 p.m. UTC | #3
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Fri, Sep 11, 2020 at 3:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:

>>

>> From: Toke Høiland-Jørgensen <toke@redhat.com>

>>

>> This enables support for attaching freplace programs to multiple attach

>> points. It does this by amending UAPI for bpf_raw_tracepoint_open with a

>> target prog fd and btf ID pair that can be used to supply the new

>> attachment point. The target must be compatible with the target that was

>> supplied at program load time.

>>

>> The implementation reuses the checks that were factored out of

>> check_attach_btf_id() to ensure compatibility between the BTF types of the

>> old and new attachment. If these match, a new bpf_tracing_link will be

>> created for the new attach target, allowing multiple attachments to

>> co-exist simultaneously.

>>

>> The code could theoretically support multiple-attach of other types of

>> tracing programs as well, but since I don't have a use case for any of

>> those, the bpf_tracing_prog_attach() function will reject new targets for

>> anything other than PROG_TYPE_EXT programs.

>>

>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

>> ---

>

> It feels like using a semi-constructed bpf_tracing_link inside

> prog->aux->tgt_link is just an unnecessary complication, after reading

> this and previous patches. Seems more straightforward and simpler to

> store tgt_attach_type/tgt_prog_type (permanently) and

> tgt_prog/tgt_trampoline (until first attachment) in prog->aux and then

> properly create bpf_link on attach.


I updated v4 with your comments, but kept the link in prog->aux; the
reason being that having a container for the two pointers makes it
possible to atomically swap it out with xchg() as you suggested
previously. Could you please take a look at v4? If you still think it's
better to just keep two separate pointers (and add a lock) in prog->aux,
I can change it to that. But I'd rather avoid the lock if possible...

-Toke
Andrii Nakryiko Sept. 14, 2020, 11:10 p.m. UTC | #4
On Mon, Sep 14, 2020 at 9:08 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Fri, Sep 11, 2020 at 3:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >>
> >> This enables support for attaching freplace programs to multiple attach
> >> points. It does this by amending UAPI for bpf_raw_tracepoint_open with a
> >> target prog fd and btf ID pair that can be used to supply the new
> >> attachment point. The target must be compatible with the target that was
> >> supplied at program load time.
> >>
> >> The implementation reuses the checks that were factored out of
> >> check_attach_btf_id() to ensure compatibility between the BTF types of the
> >> old and new attachment. If these match, a new bpf_tracing_link will be
> >> created for the new attach target, allowing multiple attachments to
> >> co-exist simultaneously.
> >>
> >> The code could theoretically support multiple-attach of other types of
> >> tracing programs as well, but since I don't have a use case for any of
> >> those, the bpf_tracing_prog_attach() function will reject new targets for
> >> anything other than PROG_TYPE_EXT programs.
> >>
> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> ---
> >
> > It feels like using a semi-constructed bpf_tracing_link inside
> > prog->aux->tgt_link is just an unnecessary complication, after reading
> > this and previous patches. Seems more straightforward and simpler to
> > store tgt_attach_type/tgt_prog_type (permanently) and
> > tgt_prog/tgt_trampoline (until first attachment) in prog->aux and then
> > properly create bpf_link on attach.
>
> I updated v4 with your comments, but kept the link in prog->aux; the
> reason being that having a container for the two pointers makes it
> possible to atomically swap it out with xchg() as you suggested
> previously. Could you please take a look at v4? If you still think it's
> better to just keep two separate pointers (and add a lock) in prog->aux,
> I can change it to that. But I'd rather avoid the lock if possible...

I took a very quick look at this specific bit, planning to do another
pass tomorrow.

What's the problem with adding a mutex to bpf_prog_aux? In your case,
now you introduced (unlikely, but still) extra state transition for
tgt_link from non-NULL to NULL and then back to non-NULL? And why?
Just to use atomic xchg, while using atomic operation is not an
absolute necessity because it's not a performance-critical path at
all. We are not optimizing for millions of freplace attachments a
second, right? On the other hand, having a mutex there won't require
restoration logic, it will be dead simple, obvious and
straightforward. So yeah, I still think mutex is better there.

BTW, check Stanislav's latest patch set. He's adding used_maps_mutex
to bpf_prog_aux with no problems at all. It seems to me that we might
want to generalize that used_maps_mutex to be just bpf_prog_aux's
mutex ('prog_aux_mutex' or whatever we'd call it) and use it for such
kinds of low-frequency bpf_prog metadata manipulations/checks.

Thoughts?


>
> -Toke
>
Toke Høiland-Jørgensen Sept. 15, 2020, 11:35 a.m. UTC | #5
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Sep 14, 2020 at 9:08 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Fri, Sep 11, 2020 at 3:01 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >>
>> >> This enables support for attaching freplace programs to multiple attach
>> >> points. It does this by amending UAPI for bpf_raw_tracepoint_open with a
>> >> target prog fd and btf ID pair that can be used to supply the new
>> >> attachment point. The target must be compatible with the target that was
>> >> supplied at program load time.
>> >>
>> >> The implementation reuses the checks that were factored out of
>> >> check_attach_btf_id() to ensure compatibility between the BTF types of the
>> >> old and new attachment. If these match, a new bpf_tracing_link will be
>> >> created for the new attach target, allowing multiple attachments to
>> >> co-exist simultaneously.
>> >>
>> >> The code could theoretically support multiple-attach of other types of
>> >> tracing programs as well, but since I don't have a use case for any of
>> >> those, the bpf_tracing_prog_attach() function will reject new targets for
>> >> anything other than PROG_TYPE_EXT programs.
>> >>
>> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> ---
>> >
>> > It feels like using a semi-constructed bpf_tracing_link inside
>> > prog->aux->tgt_link is just an unnecessary complication, after reading
>> > this and previous patches. Seems more straightforward and simpler to
>> > store tgt_attach_type/tgt_prog_type (permanently) and
>> > tgt_prog/tgt_trampoline (until first attachment) in prog->aux and then
>> > properly create bpf_link on attach.
>>
>> I updated v4 with your comments, but kept the link in prog->aux; the
>> reason being that having a container for the two pointers makes it
>> possible to atomically swap it out with xchg() as you suggested
>> previously. Could you please take a look at v4? If you still think it's
>> better to just keep two separate pointers (and add a lock) in prog->aux,
>> I can change it to that. But I'd rather avoid the lock if possible...
>
> I took a very quick look at this specific bit, planning to do another
> pass tomorrow.
>
> What's the problem with adding a mutex to bpf_prog_aux? In your case,
> now you introduced (unlikely, but still) extra state transition for
> tgt_link from non-NULL to NULL and then back to non-NULL? And why?
> Just to use atomic xchg, while using atomic operation is not an
> absolute necessity because it's not a performance-critical path at
> all. We are not optimizing for millions of freplace attachments a
> second, right? On the other hand, having a mutex there won't require
> restoration logic, it will be dead simple, obvious and
> straightforward. So yeah, I still think mutex is better there.

So I went ahead and implemented a mutex-based version of this. I'm not
sure I agree with "dead simple", I'd say it's on par with the previous
version; and that is only if I explicitly limit the scope of the mutex
to *writing* of the tgt_* pointers (i.e., I haven't gone through and
protected all the reads from within the verifier).

The mutex version does have the benefit of still making it possible to
retry a bpf_raw_tracepoint_open() if it fails, so I guess that is a
benefit; I'll post it as v5 and you can have a look :)

> BTW, check Stanislav's latest patch set. He's adding used_maps_mutex
> to bpf_prog_aux with no problems at all. It seems to me that we might
> want to generalize that used_maps_mutex to be just bpf_prog_aux's
> mutex ('prog_aux_mutex' or whatever we'd call it) and use it for such
> kinds of low-frequency bpf_prog metadata manipulations/checks.

I'm not sure I like the idea of widening the scope of the mutex. Or at
least I think that should be done as a follow-up patch that does a
proper analysis of all the different fields it is supposed to protect
and makes sure no deadlocks creep in.

-Toke