[v3,bpf-next,0/7] Support kernel module ksym variables

Message ID	20210112075520.4103414-1-andrii@kernel.org
Headers	show Return-Path: <netdev-owner@kernel.org> From: Andrii Nakryiko <andrii@kernel.org> To: <bpf@vger.kernel.org>, <netdev@vger.kernel.org>, <ast@fb.com>, <daniel@iogearbox.net> CC: <andrii@kernel.org>, <kernel-team@fb.com>, Hao Luo <haoluo@google.com> Subject: [PATCH v3 bpf-next 0/7] Support kernel module ksym variables Date: Mon, 11 Jan 2021 23:55:13 -0800 Message-ID: <20210112075520.4103414-1-andrii@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain Precedence: bulk
Series	Support kernel module ksym variables \| expand [v3,bpf-next,0/7] Support kernel module ksym variables [v3,bpf-next,1/7] bpf: add bpf_patch_call_args prototype to include/linux/bpf.h [v3,bpf-next,2/7] bpf: avoid warning when re-casting __bpf_call_base into __bpf_call_base_args [v3,bpf-next,3/7] bpf: declare __bpf_free_used_maps() unconditionally [v3,bpf-next,4/7] selftests/bpf: sync RCU before unloading bpf_testmod [v3,bpf-next,5/7] bpf: support BPF ksym variables in kernel modules [v3,bpf-next,6/7] libbpf: support kernel module ksym externs [v3,bpf-next,7/7] selftests/bpf: test kernel module ksym externs

Message ID

20210112075520.4103414-1-andrii@kernel.org

Headers

From: Andrii Nakryiko <andrii@kernel.org>
To: <bpf@vger.kernel.org>, <netdev@vger.kernel.org>, <ast@fb.com>,
	<daniel@iogearbox.net>
CC: <andrii@kernel.org>, <kernel-team@fb.com>, Hao Luo <haoluo@google.com>
Subject: [PATCH v3 bpf-next 0/7] Support kernel module ksym variables
Date: Mon, 11 Jan 2021 23:55:13 -0800
Message-ID: <20210112075520.4103414-1-andrii@kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8BIT
Content-Type: text/plain
Precedence: bulk

Series

Support kernel module ksym variables | expand

Message

Andrii Nakryiko Jan. 12, 2021, 7:55 a.m. UTC

Add support for using kernel module global variables (__ksym externs in BPF
program). BPF verifier will now support ldimm64 with src_reg=BPF_PSEUDO_BTF_ID
and non-zero insn[1].imm field, specifying module BTF's FD. In such case,
module BTF object, similarly to BPF maps referenced from ldimm64 with
src_reg=BPF_PSEUDO_MAP_FD, will be recorded in bpf_progran's auxiliary data
and refcnt will be increased for both BTF object itself and its kernel module.
This makes sure kernel module won't be unloaded from under active attached BPF
program. These refcounts will be dropped when BPF program is unloaded.

New selftest validates all this is working as intended. bpf_testmod.ko is
extended with per-CPU variable. Selftests expects the latest pahole changes
(soon to be released as v1.20) to generate per-CPU variable BTF info for
kernel module.

v2->v3:
  - added comments, addressed feedack (Yonghong, Hao);
v1->v2:
  - fixed few compiler warnings, posted as separate pre-patches;
rfc->v1:
  - use sys_membarrier(MEMBARRIER_CMD_GLOBAL) (Alexei).

Cc: Hao Luo <haoluo@google.com>

Andrii Nakryiko (7):
  bpf: add bpf_patch_call_args prototype to include/linux/bpf.h
  bpf: avoid warning when re-casting __bpf_call_base into
    __bpf_call_base_args
  bpf: declare __bpf_free_used_maps() unconditionally
  selftests/bpf: sync RCU before unloading bpf_testmod
  bpf: support BPF ksym variables in kernel modules
  libbpf: support kernel module ksym externs
  selftests/bpf: test kernel module ksym externs

 include/linux/bpf.h                           |  18 +-
 include/linux/bpf_verifier.h                  |   3 +
 include/linux/btf.h                           |   3 +
 include/linux/filter.h                        |   2 +-
 kernel/bpf/btf.c                              |  31 +++-
 kernel/bpf/core.c                             |  23 +++
 kernel/bpf/verifier.c                         | 154 ++++++++++++++----
 tools/lib/bpf/libbpf.c                        |  50 ++++--
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |   3 +
 .../selftests/bpf/prog_tests/btf_map_in_map.c |  33 ----
 .../selftests/bpf/prog_tests/ksyms_module.c   |  31 ++++
 .../selftests/bpf/progs/test_ksyms_module.c   |  26 +++
 tools/testing/selftests/bpf/test_progs.c      |  11 ++
 tools/testing/selftests/bpf/test_progs.h      |   1 +
 14 files changed, 305 insertions(+), 84 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_module.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_module.c

Comments

Andrii Nakryiko Jan. 12, 2021, 8:38 p.m. UTC | #1

On Tue, Jan 12, 2021 at 8:27 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 1/12/21 8:55 AM, Andrii Nakryiko wrote:
> > Add support for directly accessing kernel module variables from BPF programs
> > using special ldimm64 instructions. This functionality builds upon vmlinux
> > ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow
> > specifying kernel module BTF's FD in insn[1].imm field.
> >
> > During BPF program load time, verifier will resolve FD to BTF object and will
> > take reference on BTF object itself and, for module BTFs, corresponding module
> > as well, to make sure it won't be unloaded from under running BPF program. The
> > mechanism used is similar to how bpf_prog keeps track of used bpf_maps.
> >
> > One interesting change is also in how per-CPU variable is determined. The
> > logic is to find .data..percpu data section in provided BTF, but both vmlinux
> > and module each have their own .data..percpu entries in BTF. So for module's
> > case, the search for DATASEC record needs to look at only module's added BTF
> > types. This is implemented with custom search function.
> >
> > Acked-by: Yonghong Song <yhs@fb.com>
> > Acked-by: Hao Luo <haoluo@google.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> [...]
> > +
> > +struct module *btf_try_get_module(const struct btf *btf)
> > +{
> > +     struct module *res = NULL;
> > +#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
> > +     struct btf_module *btf_mod, *tmp;
> > +
> > +     mutex_lock(&btf_module_mutex);
> > +     list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> > +             if (btf_mod->btf != btf)
> > +                     continue;
> > +
> > +             if (try_module_get(btf_mod->module))
> > +                     res = btf_mod->module;
>
> One more thought (follow-up would be okay I'd think) ... when a module references
> a symbol from another module, it similarly needs to bump the refcount of the module
> that is owning it and thus disallowing to unload for that other module's lifetime.
> That usage dependency is visible via /proc/modules however, so if unload doesn't work
> then lsmod allows a way to introspect that to the user. This seems to be achieved via
> resolve_symbol() where it records its dependency/usage. Would be great if we could at
> some point also include the BPF prog name into that list so that this is more obvious.
> Wdyt?
>

Yeah, it's definitely nice to see dependent bpf progs. There is struct
module_use, which is used to record these dependencies, but the
assumption there is that dependencies could be only other modules. So
one way is to somehow extend that or add another set of bpf_prog
dependencies. First is a bit intrusive, while the seconds sucks even
more, IMO.

Alternatively, we can rely on bpf_link info to emit module info, if
the BPF program is attached to BTF type from the module. Then with
bpftool it would be easy to see this, but it's not as
readily-available info as /proc/modules, of course.

Any preferences?

> > +             break;
> > +     }
> > +     mutex_unlock(&btf_module_mutex);
> > +#endif
> > +
> > +     return res;
> > +}
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 261f8692d0d2..69c3c308de5e 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2119,6 +2119,28 @@ static void bpf_free_used_maps(struct bpf_prog_aux *aux)
> >       kfree(aux->used_maps);
> >   }
> >
> > +void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
> > +                       struct btf_mod_pair *used_btfs, u32 len)
> > +{
> > +#ifdef CONFIG_BPF_SYSCALL
> > +     struct btf_mod_pair *btf_mod;
> > +     u32 i;
> > +
> > +     for (i = 0; i < len; i++) {
> > +             btf_mod = &used_btfs[i];
> > +             if (btf_mod->module)
> > +                     module_put(btf_mod->module);
> > +             btf_put(btf_mod->btf);
> > +     }
> > +#endif
> > +}

Alexei Starovoitov Jan. 12, 2021, 11:18 p.m. UTC | #2

On Tue, Jan 12, 2021 at 8:30 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 1/12/21 8:55 AM, Andrii Nakryiko wrote:
> > Add support for directly accessing kernel module variables from BPF programs
> > using special ldimm64 instructions. This functionality builds upon vmlinux
> > ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow
> > specifying kernel module BTF's FD in insn[1].imm field.
> >
> > During BPF program load time, verifier will resolve FD to BTF object and will
> > take reference on BTF object itself and, for module BTFs, corresponding module
> > as well, to make sure it won't be unloaded from under running BPF program. The
> > mechanism used is similar to how bpf_prog keeps track of used bpf_maps.
> >
> > One interesting change is also in how per-CPU variable is determined. The
> > logic is to find .data..percpu data section in provided BTF, but both vmlinux
> > and module each have their own .data..percpu entries in BTF. So for module's
> > case, the search for DATASEC record needs to look at only module's added BTF
> > types. This is implemented with custom search function.
> >
> > Acked-by: Yonghong Song <yhs@fb.com>
> > Acked-by: Hao Luo <haoluo@google.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> [...]
> > +
> > +struct module *btf_try_get_module(const struct btf *btf)
> > +{
> > +     struct module *res = NULL;
> > +#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
> > +     struct btf_module *btf_mod, *tmp;
> > +
> > +     mutex_lock(&btf_module_mutex);
> > +     list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> > +             if (btf_mod->btf != btf)
> > +                     continue;
> > +
> > +             if (try_module_get(btf_mod->module))
> > +                     res = btf_mod->module;
>
> One more thought (follow-up would be okay I'd think) ... when a module references
> a symbol from another module, it similarly needs to bump the refcount of the module
> that is owning it and thus disallowing to unload for that other module's lifetime.
> That usage dependency is visible via /proc/modules however, so if unload doesn't work
> then lsmod allows a way to introspect that to the user. This seems to be achieved via
> resolve_symbol() where it records its dependency/usage. Would be great if we could at
> some point also include the BPF prog name into that list so that this is more obvious.
> Wdyt?

I thought about it as well, but plenty of kernel things just grab the ref of ko
and don't add any way to introspect what piece of kernel is holding ko.
So this case won't be the first.
Also if we add it for bpf progs it could be confusing in lsmod.
Since it currently only shows other ko-s in there.
Long ago I had an awk script to parse that output to rmmod dependent modules
before rmmoding the main one. If somebody doing something like this
bpf prog names in the same place may break things.
So I think there are more cons than pros.
That is certainly a follow up if we agree on the direction.

Alexei Starovoitov Jan. 13, 2021, 1:29 a.m. UTC | #3

On Tue, Jan 12, 2021 at 3:41 AM Andrii Nakryiko <andrii@kernel.org> wrote:
>

> Add per-CPU variable to bpf_testmod.ko and use those from new selftest to

> validate it works end-to-end.

>

> Acked-by: Yonghong Song <yhs@fb.com>

> Acked-by: Hao Luo <haoluo@google.com>

> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>


Applied.

FYI for everyone. This test needs the latest pahole.

Daniel Borkmann Jan. 13, 2021, 10:55 p.m. UTC | #4

On 1/13/21 12:18 AM, Alexei Starovoitov wrote:
> On Tue, Jan 12, 2021 at 8:30 AM Daniel Borkmann <daniel@iogearbox.net> wrote:

>> On 1/12/21 8:55 AM, Andrii Nakryiko wrote:

>>> Add support for directly accessing kernel module variables from BPF programs

>>> using special ldimm64 instructions. This functionality builds upon vmlinux

>>> ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow

>>> specifying kernel module BTF's FD in insn[1].imm field.

>>>

>>> During BPF program load time, verifier will resolve FD to BTF object and will

>>> take reference on BTF object itself and, for module BTFs, corresponding module

>>> as well, to make sure it won't be unloaded from under running BPF program. The

>>> mechanism used is similar to how bpf_prog keeps track of used bpf_maps.

>>>

>>> One interesting change is also in how per-CPU variable is determined. The

>>> logic is to find .data..percpu data section in provided BTF, but both vmlinux

>>> and module each have their own .data..percpu entries in BTF. So for module's

>>> case, the search for DATASEC record needs to look at only module's added BTF

>>> types. This is implemented with custom search function.

>>>

>>> Acked-by: Yonghong Song <yhs@fb.com>

>>> Acked-by: Hao Luo <haoluo@google.com>

>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

>> [...]

>>> +

>>> +struct module *btf_try_get_module(const struct btf *btf)

>>> +{

>>> +     struct module *res = NULL;

>>> +#ifdef CONFIG_DEBUG_INFO_BTF_MODULES

>>> +     struct btf_module *btf_mod, *tmp;

>>> +

>>> +     mutex_lock(&btf_module_mutex);

>>> +     list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {

>>> +             if (btf_mod->btf != btf)

>>> +                     continue;

>>> +

>>> +             if (try_module_get(btf_mod->module))

>>> +                     res = btf_mod->module;

>>

>> One more thought (follow-up would be okay I'd think) ... when a module references

>> a symbol from another module, it similarly needs to bump the refcount of the module

>> that is owning it and thus disallowing to unload for that other module's lifetime.

>> That usage dependency is visible via /proc/modules however, so if unload doesn't work

>> then lsmod allows a way to introspect that to the user. This seems to be achieved via

>> resolve_symbol() where it records its dependency/usage. Would be great if we could at

>> some point also include the BPF prog name into that list so that this is more obvious.

>> Wdyt?

> 

> I thought about it as well, but plenty of kernel things just grab the ref of ko

> and don't add any way to introspect what piece of kernel is holding ko.

> So this case won't be the first.

> Also if we add it for bpf progs it could be confusing in lsmod.

> Since it currently only shows other ko-s in there.

> Long ago I had an awk script to parse that output to rmmod dependent modules

> before rmmoding the main one. If somebody doing something like this

> bpf prog names in the same place may break things.

> So I think there are more cons than pros.


Hm, true that scripting could break in this case if we were to add bpf prog names in
there. :/ I don't have a better suggestion atm.. we could potentially add something
for the bpf prog info dump via bpftool, but it's a non-obvious location to people who
are used to check deps via lsmod. Also true that we bump ref from plenty of other
locations where it's not directly shown either apart from just the refcnt (e.g. socket
using tcp congctl module etc).