mbox series

[bpf-next,00/11] libbpf: split BTF support

Message ID 20201029005902.1706310-1-andrii@kernel.org
Headers show
Series libbpf: split BTF support | expand

Message

Andrii Nakryiko Oct. 29, 2020, 12:58 a.m. UTC
This patch set adds support for generating and deduplicating split BTF. This
is an enhancement to the BTF, which allows to designate one BTF as the "base
BTF" (e.g., vmlinux BTF), and one or more other BTFs as "split BTF" (e.g.,
kernel module BTF), which are building upon and extending base BTF with extra
types and strings.

Once loaded, split BTF appears as a single unified BTF superset of base BTF,
with continuous and transparent numbering scheme. This allows all the existing
users of BTF to work correctly and stay agnostic to the base/split BTFs
composition.  The only difference is in how to instantiate split BTF: it
requires base BTF to be alread instantiated and passed to btf__new_xxx_split()
or btf__parse_xxx_split() "constructors" explicitly.

This split approach is necessary if we are to have a reasonably-sized kernel
module BTFs. By deduping each kernel module's BTF individually, resulting
module BTFs contain copies of a lot of kernel types that are already present
in vmlinux BTF. Even those single copies result in a big BTF size bloat. On my
kernel configuration with 700 modules built, non-split BTF approach results in
115MBs of BTFs across all modules. With split BTF deduplication approach,
total size is down to 5.2MBs total, which is on part with vmlinux BTF (at
around 4MBs). This seems reasonable and practical. As to why we'd need kernel
module BTFs, that should be pretty obvious to anyone using BPF at this point,
as it allows all the BTF-powered features to be used with kernel modules:
tp_btf, fentry/fexit/fmod_ret, lsm, bpf_iter, etc.

This patch set is a pre-requisite to adding split BTF support to pahole, which
is a prerequisite to integrating split BTF into the Linux kernel build setup
to generate BTF for kernel modules. The latter will come as a follow-up patch
series once this series makes it to the libbpf and pahole makes use of it.

Patch #4 introduces necessary basic support for split BTF into libbpf APIs.
Patch #8 implements minimal changes to BTF dedup algorithm to allow
deduplicating split BTFs. Patch #11 adds extra -B flag to bpftool to allow to
specify the path to base BTF for cases when one wants to dump or inspect split
BTF. All the rest are refactorings, clean ups, bug fixes and selftests.

Andrii Nakryiko (11):
  libbpf: factor out common operations in BTF writing APIs
  selftest/bpf: relax btf_dedup test checks
  libbpf: unify and speed up BTF string deduplication
  libbpf: implement basic split BTF support
  selftests/bpf: add split BTF basic test
  selftests/bpf: add checking of raw type dump in BTF writer APIs
    selftests
  libbpf: fix BTF data layout checks and allow empty BTF
  libbpf: support BTF dedup of split BTFs
  libbpf: accomodate DWARF/compiler bug with duplicated identical arrays
  selftests/bpf: add split BTF dedup selftests
  tools/bpftool: add bpftool support for split BTF

 tools/bpf/bpftool/btf.c                       |   9 +-
 tools/bpf/bpftool/main.c                      |  15 +-
 tools/bpf/bpftool/main.h                      |   1 +
 tools/lib/bpf/btf.c                           | 814 ++++++++++--------
 tools/lib/bpf/btf.h                           |   8 +
 tools/lib/bpf/libbpf.map                      |   9 +
 tools/testing/selftests/bpf/Makefile          |   2 +-
 tools/testing/selftests/bpf/btf_helpers.c     | 259 ++++++
 tools/testing/selftests/bpf/btf_helpers.h     |  19 +
 tools/testing/selftests/bpf/prog_tests/btf.c  |  34 +-
 .../bpf/prog_tests/btf_dedup_split.c          | 326 +++++++
 .../selftests/bpf/prog_tests/btf_split.c      |  99 +++
 .../selftests/bpf/prog_tests/btf_write.c      |  48 +-
 tools/testing/selftests/bpf/test_progs.h      |  11 +
 14 files changed, 1294 insertions(+), 360 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/btf_helpers.c
 create mode 100644 tools/testing/selftests/bpf/btf_helpers.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_dedup_split.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_split.c

Comments

Song Liu Oct. 30, 2020, 12:33 a.m. UTC | #1
> On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> This patch set adds support for generating and deduplicating split BTF. This
> is an enhancement to the BTF, which allows to designate one BTF as the "base
> BTF" (e.g., vmlinux BTF), and one or more other BTFs as "split BTF" (e.g.,
> kernel module BTF), which are building upon and extending base BTF with extra
> types and strings.
> 
> Once loaded, split BTF appears as a single unified BTF superset of base BTF,
> with continuous and transparent numbering scheme. This allows all the existing
> users of BTF to work correctly and stay agnostic to the base/split BTFs
> composition.  The only difference is in how to instantiate split BTF: it
> requires base BTF to be alread instantiated and passed to btf__new_xxx_split()
> or btf__parse_xxx_split() "constructors" explicitly.
> 
> This split approach is necessary if we are to have a reasonably-sized kernel
> module BTFs. By deduping each kernel module's BTF individually, resulting
> module BTFs contain copies of a lot of kernel types that are already present
> in vmlinux BTF. Even those single copies result in a big BTF size bloat. On my
> kernel configuration with 700 modules built, non-split BTF approach results in
> 115MBs of BTFs across all modules. With split BTF deduplication approach,
> total size is down to 5.2MBs total, which is on part with vmlinux BTF (at
> around 4MBs). This seems reasonable and practical. As to why we'd need kernel
> module BTFs, that should be pretty obvious to anyone using BPF at this point,
> as it allows all the BTF-powered features to be used with kernel modules:
> tp_btf, fentry/fexit/fmod_ret, lsm, bpf_iter, etc.

Some high level questions. Do we plan to use split BTF for in-tree modules
(those built together with the kernel) or out-of-tree modules (those built 
separately)? If it is for in-tree modules, is it possible to build split BTF
into vmlinux BTF? 

Thanks,
Song

[...]
Song Liu Oct. 30, 2020, 12:36 a.m. UTC | #2
> On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Factor out commiting of appended type data. Also extract fetching the very
> last type in the BTF (to append members to). These two operations are common
> across many APIs and will be easier to refactor with split BTF, if they are
> extracted into a single place.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Song Liu <songliubraving@fb.com>

[...]
Andrii Nakryiko Oct. 30, 2020, 2:33 a.m. UTC | #3
On Thu, Oct 29, 2020 at 5:33 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > This patch set adds support for generating and deduplicating split BTF. This
> > is an enhancement to the BTF, which allows to designate one BTF as the "base
> > BTF" (e.g., vmlinux BTF), and one or more other BTFs as "split BTF" (e.g.,
> > kernel module BTF), which are building upon and extending base BTF with extra
> > types and strings.
> >
> > Once loaded, split BTF appears as a single unified BTF superset of base BTF,
> > with continuous and transparent numbering scheme. This allows all the existing
> > users of BTF to work correctly and stay agnostic to the base/split BTFs
> > composition.  The only difference is in how to instantiate split BTF: it
> > requires base BTF to be alread instantiated and passed to btf__new_xxx_split()
> > or btf__parse_xxx_split() "constructors" explicitly.
> >
> > This split approach is necessary if we are to have a reasonably-sized kernel
> > module BTFs. By deduping each kernel module's BTF individually, resulting
> > module BTFs contain copies of a lot of kernel types that are already present
> > in vmlinux BTF. Even those single copies result in a big BTF size bloat. On my
> > kernel configuration with 700 modules built, non-split BTF approach results in
> > 115MBs of BTFs across all modules. With split BTF deduplication approach,
> > total size is down to 5.2MBs total, which is on part with vmlinux BTF (at
> > around 4MBs). This seems reasonable and practical. As to why we'd need kernel
> > module BTFs, that should be pretty obvious to anyone using BPF at this point,
> > as it allows all the BTF-powered features to be used with kernel modules:
> > tp_btf, fentry/fexit/fmod_ret, lsm, bpf_iter, etc.
>
> Some high level questions. Do we plan to use split BTF for in-tree modules
> (those built together with the kernel) or out-of-tree modules (those built
> separately)? If it is for in-tree modules, is it possible to build split BTF
> into vmlinux BTF?

It will be possible to use for both in-tree and out-of-tree. For
in-tree, this will be integrated into the kernel build process. For
out-of-tree, whoever builds their kernel module will need to invoke
pahole -J with an extra flag pointing to the right vmlinux image (I
haven't looked into the exact details of this integration, maybe there
are already scripts in Linux repo that out-of-tree modules have to
use, in such case we can add this integration there).

Merging all in-tree modules' BTFs into vmlinux's BTF defeats the
purpose of the split BTF and will just increase the size of vmlinux
BTF unnecessarily.

>
> Thanks,
> Song
>
> [...]
Song Liu Oct. 30, 2020, 6:45 a.m. UTC | #4
> On Oct 29, 2020, at 7:33 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> 

> On Thu, Oct 29, 2020 at 5:33 PM Song Liu <songliubraving@fb.com> wrote:

>> 

>> 

>> 

>>> On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:

>>> 

>>> This patch set adds support for generating and deduplicating split BTF. This

>>> is an enhancement to the BTF, which allows to designate one BTF as the "base

>>> BTF" (e.g., vmlinux BTF), and one or more other BTFs as "split BTF" (e.g.,

>>> kernel module BTF), which are building upon and extending base BTF with extra

>>> types and strings.

>>> 

>>> Once loaded, split BTF appears as a single unified BTF superset of base BTF,

>>> with continuous and transparent numbering scheme. This allows all the existing

>>> users of BTF to work correctly and stay agnostic to the base/split BTFs

>>> composition.  The only difference is in how to instantiate split BTF: it

>>> requires base BTF to be alread instantiated and passed to btf__new_xxx_split()

>>> or btf__parse_xxx_split() "constructors" explicitly.

>>> 

>>> This split approach is necessary if we are to have a reasonably-sized kernel

>>> module BTFs. By deduping each kernel module's BTF individually, resulting

>>> module BTFs contain copies of a lot of kernel types that are already present

>>> in vmlinux BTF. Even those single copies result in a big BTF size bloat. On my

>>> kernel configuration with 700 modules built, non-split BTF approach results in

>>> 115MBs of BTFs across all modules. With split BTF deduplication approach,

>>> total size is down to 5.2MBs total, which is on part with vmlinux BTF (at

>>> around 4MBs). This seems reasonable and practical. As to why we'd need kernel

>>> module BTFs, that should be pretty obvious to anyone using BPF at this point,

>>> as it allows all the BTF-powered features to be used with kernel modules:

>>> tp_btf, fentry/fexit/fmod_ret, lsm, bpf_iter, etc.

>> 

>> Some high level questions. Do we plan to use split BTF for in-tree modules

>> (those built together with the kernel) or out-of-tree modules (those built

>> separately)? If it is for in-tree modules, is it possible to build split BTF

>> into vmlinux BTF?

> 

> It will be possible to use for both in-tree and out-of-tree. For

> in-tree, this will be integrated into the kernel build process. For

> out-of-tree, whoever builds their kernel module will need to invoke

> pahole -J with an extra flag pointing to the right vmlinux image (I

> haven't looked into the exact details of this integration, maybe there

> are already scripts in Linux repo that out-of-tree modules have to

> use, in such case we can add this integration there).


Thanks for the explanation. 

> 

> Merging all in-tree modules' BTFs into vmlinux's BTF defeats the

> purpose of the split BTF and will just increase the size of vmlinux

> BTF unnecessarily.


Is the purpose of split BTF to save memory used by module BTF? In the 
example above, I guess part of those 5.2MB will be loaded at run time, 
so the actual saving is less than 5.2MB. 5.2MB is really small for a 
decent system, e.g. ~0.03% of my laptop's main memory. 

Did I miss anything here? 

Song
Alan Maguire Oct. 30, 2020, 12:04 p.m. UTC | #5
On Thu, 29 Oct 2020, Andrii Nakryiko wrote:

> On Thu, Oct 29, 2020 at 5:33 PM Song Liu <songliubraving@fb.com> wrote:
> >
> >
> >
> > > On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > This patch set adds support for generating and deduplicating split BTF. This
> > > is an enhancement to the BTF, which allows to designate one BTF as the "base
> > > BTF" (e.g., vmlinux BTF), and one or more other BTFs as "split BTF" (e.g.,
> > > kernel module BTF), which are building upon and extending base BTF with extra
> > > types and strings.
> > >
> > > Once loaded, split BTF appears as a single unified BTF superset of base BTF,
> > > with continuous and transparent numbering scheme. This allows all the existing
> > > users of BTF to work correctly and stay agnostic to the base/split BTFs
> > > composition.  The only difference is in how to instantiate split BTF: it
> > > requires base BTF to be alread instantiated and passed to btf__new_xxx_split()
> > > or btf__parse_xxx_split() "constructors" explicitly.
> > >
> > > This split approach is necessary if we are to have a reasonably-sized kernel
> > > module BTFs. By deduping each kernel module's BTF individually, resulting
> > > module BTFs contain copies of a lot of kernel types that are already present
> > > in vmlinux BTF. Even those single copies result in a big BTF size bloat. On my
> > > kernel configuration with 700 modules built, non-split BTF approach results in
> > > 115MBs of BTFs across all modules. With split BTF deduplication approach,
> > > total size is down to 5.2MBs total, which is on part with vmlinux BTF (at
> > > around 4MBs). This seems reasonable and practical. As to why we'd need kernel
> > > module BTFs, that should be pretty obvious to anyone using BPF at this point,
> > > as it allows all the BTF-powered features to be used with kernel modules:
> > > tp_btf, fentry/fexit/fmod_ret, lsm, bpf_iter, etc.
> >
> > Some high level questions. Do we plan to use split BTF for in-tree modules
> > (those built together with the kernel) or out-of-tree modules (those built
> > separately)? If it is for in-tree modules, is it possible to build split BTF
> > into vmlinux BTF?
> 
> It will be possible to use for both in-tree and out-of-tree. For
> in-tree, this will be integrated into the kernel build process. For
> out-of-tree, whoever builds their kernel module will need to invoke
> pahole -J with an extra flag pointing to the right vmlinux image (I
> haven't looked into the exact details of this integration, maybe there
> are already scripts in Linux repo that out-of-tree modules have to
> use, in such case we can add this integration there).
> 
> Merging all in-tree modules' BTFs into vmlinux's BTF defeats the
> purpose of the split BTF and will just increase the size of vmlinux
> BTF unnecessarily.
>

Again more of a question about how module BTF will be exposed, but
I'm wondering if there will be a way for a consumer to ask for
type info across kernel and module BTF, i.e. something like
libbpf_find_kernel_btf_id() ? Similarly will __builtin_btf_type_id()
work across both vmlinux and modules? I'm thinking of the case where we 
potentially don't know which module a type is defined in.

I realize in some cases type names may refer to different types in 
different modules (not sure how frequent this is in practice?) but
I'm curious how the split model for modules will interact with existing 
APIs and helpers.

In some cases it's likely that modules may share types with
each other that they do not share with vmlinux; in such cases 
will those types get deduplicated also, or is deduplication just
between kernel/module, and not module/module? 

Sorry I know these questions aren't about this patchset in
particular, but I'm just trying to get a sense of the bigger
picture. Thanks!

Alan
Andrii Nakryiko Oct. 30, 2020, 6:30 p.m. UTC | #6
On Fri, Oct 30, 2020 at 5:06 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Thu, 29 Oct 2020, Andrii Nakryiko wrote:
>
> > On Thu, Oct 29, 2020 at 5:33 PM Song Liu <songliubraving@fb.com> wrote:
> > >
> > >
> > >
> > > > On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > This patch set adds support for generating and deduplicating split BTF. This
> > > > is an enhancement to the BTF, which allows to designate one BTF as the "base
> > > > BTF" (e.g., vmlinux BTF), and one or more other BTFs as "split BTF" (e.g.,
> > > > kernel module BTF), which are building upon and extending base BTF with extra
> > > > types and strings.
> > > >
> > > > Once loaded, split BTF appears as a single unified BTF superset of base BTF,
> > > > with continuous and transparent numbering scheme. This allows all the existing
> > > > users of BTF to work correctly and stay agnostic to the base/split BTFs
> > > > composition.  The only difference is in how to instantiate split BTF: it
> > > > requires base BTF to be alread instantiated and passed to btf__new_xxx_split()
> > > > or btf__parse_xxx_split() "constructors" explicitly.
> > > >
> > > > This split approach is necessary if we are to have a reasonably-sized kernel
> > > > module BTFs. By deduping each kernel module's BTF individually, resulting
> > > > module BTFs contain copies of a lot of kernel types that are already present
> > > > in vmlinux BTF. Even those single copies result in a big BTF size bloat. On my
> > > > kernel configuration with 700 modules built, non-split BTF approach results in
> > > > 115MBs of BTFs across all modules. With split BTF deduplication approach,
> > > > total size is down to 5.2MBs total, which is on part with vmlinux BTF (at
> > > > around 4MBs). This seems reasonable and practical. As to why we'd need kernel
> > > > module BTFs, that should be pretty obvious to anyone using BPF at this point,
> > > > as it allows all the BTF-powered features to be used with kernel modules:
> > > > tp_btf, fentry/fexit/fmod_ret, lsm, bpf_iter, etc.
> > >
> > > Some high level questions. Do we plan to use split BTF for in-tree modules
> > > (those built together with the kernel) or out-of-tree modules (those built
> > > separately)? If it is for in-tree modules, is it possible to build split BTF
> > > into vmlinux BTF?
> >
> > It will be possible to use for both in-tree and out-of-tree. For
> > in-tree, this will be integrated into the kernel build process. For
> > out-of-tree, whoever builds their kernel module will need to invoke
> > pahole -J with an extra flag pointing to the right vmlinux image (I
> > haven't looked into the exact details of this integration, maybe there
> > are already scripts in Linux repo that out-of-tree modules have to
> > use, in such case we can add this integration there).
> >
> > Merging all in-tree modules' BTFs into vmlinux's BTF defeats the
> > purpose of the split BTF and will just increase the size of vmlinux
> > BTF unnecessarily.
> >
>
> Again more of a question about how module BTF will be exposed, but
> I'm wondering if there will be a way for a consumer to ask for
> type info across kernel and module BTF, i.e. something like
> libbpf_find_kernel_btf_id() ?

I'm still playing with the options, but I think libbpf will do all the
search across vmlinux and modules. I'm considering allowing users to
specify module name as an optional hint. Just in case if there are
conflicting types/functions in two different modules with the same
name.

> Similarly will __builtin_btf_type_id()
> work across both vmlinux and modules? I'm thinking of the case where we
> potentially don't know which module a type is defined in.

I think we'll need another built-in/relocation to specify
module/vmlinux ID. Type ID itself is not unique enough to identify the
module.

Alternatively, we can extend its return type to u64 and have BTF
object ID in upper 4 bytes, and BTF type ID in lower 4 bytes. Need to
think about this and discuss it with Yonghong.

>
> I realize in some cases type names may refer to different types in
> different modules (not sure how frequent this is in practice?) but
> I'm curious how the split model for modules will interact with existing
> APIs and helpers.
>
> In some cases it's likely that modules may share types with
> each other that they do not share with vmlinux; in such cases
> will those types get deduplicated also, or is deduplication just
> between kernel/module, and not module/module?

Yes, they will be duplicated in two modules. It's a start schema,
where vmlinux BTF is the base for all kernel modules. It's technically
possible to have a longer chain of BTFs, but we'd need to deal with
dependencies between modules, making sure that dependent BTF is loaded
and available first, etc. That can be added later without breaking
anything, if there is a need.

>
> Sorry I know these questions aren't about this patchset in
> particular, but I'm just trying to get a sense of the bigger
> picture. Thanks!

These are fair questions, I just didn't want to go into too many
details in this particular patch set, because it's pretty agnostic to
all of those concerns. The next patch set will be dealing with all the
details of kernel/user space interface.

>
> Alan
Song Liu Nov. 2, 2020, 11:23 p.m. UTC | #7
> On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> 

[...]

> 
> BTF deduplication is not yet supported for split BTF and support for it will
> be added in separate patch.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Song Liu <songliubraving@fb.com>

With a couple nits:

> ---
> tools/lib/bpf/btf.c      | 205 ++++++++++++++++++++++++++++++---------
> tools/lib/bpf/btf.h      |   8 ++
> tools/lib/bpf/libbpf.map |   9 ++
> 3 files changed, 175 insertions(+), 47 deletions(-)
> 
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index db9331fea672..20c64a8441a8 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -78,10 +78,32 @@ struct btf {
> 	void *types_data;
> 	size_t types_data_cap; /* used size stored in hdr->type_len */
> 
> -	/* type ID to `struct btf_type *` lookup index */
> +	/* type ID to `struct btf_type *` lookup index
> +	 * type_offs[0] corresponds to the first non-VOID type:
> +	 *   - for base BTF it's type [1];
> +	 *   - for split BTF it's the first non-base BTF type.
> +	 */
> 	__u32 *type_offs;
> 	size_t type_offs_cap;
> +	/* number of types in this BTF instance:
> +	 *   - doesn't include special [0] void type;
> +	 *   - for split BTF counts number of types added on top of base BTF.
> +	 */
> 	__u32 nr_types;

This is a little confusing. Maybe add a void type for every split BTF? 

> +	/* if not NULL, points to the base BTF on top of which the current
> +	 * split BTF is based
> +	 */

[...]

> 
> @@ -252,12 +274,20 @@ static int btf_parse_str_sec(struct btf *btf)
> 	const char *start = btf->strs_data;
> 	const char *end = start + btf->hdr->str_len;
> 
> -	if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET ||
> -	    start[0] || end[-1]) {
> -		pr_debug("Invalid BTF string section\n");
> -		return -EINVAL;
> +	if (btf->base_btf) {
> +		if (hdr->str_len == 0)
> +			return 0;
> +		if (hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1]) {
> +			pr_debug("Invalid BTF string section\n");
> +			return -EINVAL;
> +		}
> +	} else {
> +		if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET ||
> +		    start[0] || end[-1]) {
> +			pr_debug("Invalid BTF string section\n");
> +			return -EINVAL;
> +		}
> 	}
> -
> 	return 0;

I found this function a little difficult to follow. Maybe rearrange it as 

	/* too long, or not \0 terminated */
	if (hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1])
		goto err_out;

	/* for base btf, .... */
	if (!btf->base_btf && (!hdr->str_len || start[0]))
		goto err_out;

	return 0;
err_out:
	pr_debug("Invalid BTF string section\n");
	return -EINVAL;
}
> }
> 
> @@ -372,19 +402,9 @@ static int btf_parse_type_sec(struct btf *btf)
> 	struct btf_header *hdr = btf->hdr;
> 	void *next_type = btf->types_data;
> 	void *end_type = next_type + hdr->type_len;
> -	int err, i = 0, type_size;

[...]
Song Liu Nov. 3, 2020, 12:51 a.m. UTC | #8
> On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Make data section layout checks stricter, disallowing overlap of types and
> strings data.
> 
> Additionally, allow BTFs with no type data. There is nothing inherently wrong
> with having BTF with no types (put potentially with some strings). This could
> be a situation with kernel module BTFs, if module doesn't introduce any new
> type information.
> 
> Also fix invalid offset alignment check for btf->hdr->type_off.
> 
> Fixes: 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
> tools/lib/bpf/btf.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 20c64a8441a8..9b0ef71a03d0 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -245,22 +245,18 @@ static int btf_parse_hdr(struct btf *btf)
> 		return -EINVAL;
> 	}
> 
> -	if (meta_left < hdr->type_off) {
> -		pr_debug("Invalid BTF type section offset:%u\n", hdr->type_off);
> +	if (meta_left < hdr->str_off + hdr->str_len) {
> +		pr_debug("Invalid BTF total size:%u\n", btf->raw_size);
> 		return -EINVAL;
> 	}

Can we make this one as 
	if (meta_left != hdr->str_off + hdr->str_len) {

> 
> -	if (meta_left < hdr->str_off) {
> -		pr_debug("Invalid BTF string section offset:%u\n", hdr->str_off);
> +	if (hdr->type_off + hdr->type_len > hdr->str_off) {
> +		pr_debug("Invalid BTF data sections layout: type data at %u + %u, strings data at %u + %u\n",
> +			 hdr->type_off, hdr->type_len, hdr->str_off, hdr->str_len);
> 		return -EINVAL;
> 	}

And this one 
	if (hdr->type_off + hdr->type_len != hdr->str_off) {

?

[...]
Andrii Nakryiko Nov. 3, 2020, 5:02 a.m. UTC | #9
On Mon, Nov 2, 2020 at 3:24 PM Song Liu <songliubraving@fb.com> wrote:
>

>

>

> > On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:

> >

>

> [...]

>

> >

> > BTF deduplication is not yet supported for split BTF and support for it will

> > be added in separate patch.

> >

> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

>

> Acked-by: Song Liu <songliubraving@fb.com>

>

> With a couple nits:

>

> > ---

> > tools/lib/bpf/btf.c      | 205 ++++++++++++++++++++++++++++++---------

> > tools/lib/bpf/btf.h      |   8 ++

> > tools/lib/bpf/libbpf.map |   9 ++

> > 3 files changed, 175 insertions(+), 47 deletions(-)

> >

> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c

> > index db9331fea672..20c64a8441a8 100644

> > --- a/tools/lib/bpf/btf.c

> > +++ b/tools/lib/bpf/btf.c

> > @@ -78,10 +78,32 @@ struct btf {

> >       void *types_data;

> >       size_t types_data_cap; /* used size stored in hdr->type_len */

> >

> > -     /* type ID to `struct btf_type *` lookup index */

> > +     /* type ID to `struct btf_type *` lookup index

> > +      * type_offs[0] corresponds to the first non-VOID type:

> > +      *   - for base BTF it's type [1];

> > +      *   - for split BTF it's the first non-base BTF type.

> > +      */

> >       __u32 *type_offs;

> >       size_t type_offs_cap;

> > +     /* number of types in this BTF instance:

> > +      *   - doesn't include special [0] void type;

> > +      *   - for split BTF counts number of types added on top of base BTF.

> > +      */

> >       __u32 nr_types;

>

> This is a little confusing. Maybe add a void type for every split BTF?


Agree about being a bit confusing. But I don't want VOID in every BTF,
that seems sloppy (there's no continuity). I'm currently doing similar
changes on kernel side, and so far everything also works cleanly with
start_id == 0 && nr_types including VOID (for base BTF), and start_id
== base_btf->nr_type && nr_types has all the added types (for split
BTF). That seems a bit more straightforward, so I'll probably do that
here as well (unless I'm missing something, I'll double check).

>

> > +     /* if not NULL, points to the base BTF on top of which the current

> > +      * split BTF is based

> > +      */

>

> [...]

>

> >

> > @@ -252,12 +274,20 @@ static int btf_parse_str_sec(struct btf *btf)

> >       const char *start = btf->strs_data;

> >       const char *end = start + btf->hdr->str_len;

> >

> > -     if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET ||

> > -         start[0] || end[-1]) {

> > -             pr_debug("Invalid BTF string section\n");

> > -             return -EINVAL;

> > +     if (btf->base_btf) {

> > +             if (hdr->str_len == 0)

> > +                     return 0;

> > +             if (hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1]) {

> > +                     pr_debug("Invalid BTF string section\n");

> > +                     return -EINVAL;

> > +             }

> > +     } else {

> > +             if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET ||

> > +                 start[0] || end[-1]) {

> > +                     pr_debug("Invalid BTF string section\n");

> > +                     return -EINVAL;

> > +             }

> >       }

> > -

> >       return 0;

>

> I found this function a little difficult to follow. Maybe rearrange it as

>

>         /* too long, or not \0 terminated */

>         if (hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1])

>                 goto err_out;


this won't work, if str_len == 0. Both str_len - 1 will underflow, and
end[-1] will be reading garbage

How about this:

if (btf->base_btf && hdr->str_len == 0)
    return 0;

if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1])
    return -EINVAL;

if (!btf->base_btf && start[0])
    return -EINVAL;

return 0;

This seems more straightforward, right?


>

>         /* for base btf, .... */

>         if (!btf->base_btf && (!hdr->str_len || start[0]))

>                 goto err_out;

>

>         return 0;

> err_out:

>         pr_debug("Invalid BTF string section\n");

>         return -EINVAL;

> }

> > }

> >

> > @@ -372,19 +402,9 @@ static int btf_parse_type_sec(struct btf *btf)

> >       struct btf_header *hdr = btf->hdr;

> >       void *next_type = btf->types_data;

> >       void *end_type = next_type + hdr->type_len;

> > -     int err, i = 0, type_size;

>

> [...]

>
Andrii Nakryiko Nov. 3, 2020, 5:10 a.m. UTC | #10
On Mon, Nov 2, 2020 at 3:36 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add selftest validating ability to programmatically generate and then dump
> > split BTF.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Acked-by: Song Liu <songliubraving@fb.com>
>
> With a nit:
>
> [...]
> >
> > +
> > +static void btf_dump_printf(void *ctx, const char *fmt, va_list args)
> > +{
> > +     vfprintf(ctx, fmt, args);
> > +}
> > +
> > +void test_btf_split() {
> > +     struct btf_dump_opts opts;
> > +     struct btf_dump *d = NULL;
> > +     const struct btf_type *t;
> > +     struct btf *btf1, *btf2 = NULL;
>
> No need to initialize btf2 to NULL.

yep, must be a leftover from earlier version, I'll remove initialization.

>
> > +     int str_off, i, err;
> > +
> > +     btf1 = btf__new_empty();
> > +     if (!ASSERT_OK_PTR(btf1, "empty_main_btf"))
> > +             return;
> > +
> >
>
> [...]
>
Andrii Nakryiko Nov. 3, 2020, 5:18 a.m. UTC | #11
On Mon, Nov 2, 2020 at 4:51 PM Song Liu <songliubraving@fb.com> wrote:
>

>

>

> > On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:

> >

> > Make data section layout checks stricter, disallowing overlap of types and

> > strings data.

> >

> > Additionally, allow BTFs with no type data. There is nothing inherently wrong

> > with having BTF with no types (put potentially with some strings). This could

> > be a situation with kernel module BTFs, if module doesn't introduce any new

> > type information.

> >

> > Also fix invalid offset alignment check for btf->hdr->type_off.

> >

> > Fixes: 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf")

> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

> > ---

> > tools/lib/bpf/btf.c | 16 ++++++----------

> > 1 file changed, 6 insertions(+), 10 deletions(-)

> >

> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c

> > index 20c64a8441a8..9b0ef71a03d0 100644

> > --- a/tools/lib/bpf/btf.c

> > +++ b/tools/lib/bpf/btf.c

> > @@ -245,22 +245,18 @@ static int btf_parse_hdr(struct btf *btf)

> >               return -EINVAL;

> >       }

> >

> > -     if (meta_left < hdr->type_off) {

> > -             pr_debug("Invalid BTF type section offset:%u\n", hdr->type_off);

> > +     if (meta_left < hdr->str_off + hdr->str_len) {

> > +             pr_debug("Invalid BTF total size:%u\n", btf->raw_size);

> >               return -EINVAL;

> >       }

>

> Can we make this one as

>         if (meta_left != hdr->str_off + hdr->str_len) {


That would be not forward-compatible. I.e., old libbpf loading new BTF
with extra stuff after the string section. Kernel is necessarily more
strict, but I'd like to keep libbpf more permissive with this.

>

> >

> > -     if (meta_left < hdr->str_off) {

> > -             pr_debug("Invalid BTF string section offset:%u\n", hdr->str_off);

> > +     if (hdr->type_off + hdr->type_len > hdr->str_off) {

> > +             pr_debug("Invalid BTF data sections layout: type data at %u + %u, strings data at %u + %u\n",

> > +                      hdr->type_off, hdr->type_len, hdr->str_off, hdr->str_len);

> >               return -EINVAL;

> >       }

>

> And this one

>         if (hdr->type_off + hdr->type_len != hdr->str_off) {

>

> ?


Similarly, libbpf could be a bit more permissive here without
sacrificing correctness (at least for read-only BTF, when rewriting
BTF extra data will be discarded, of course).

>

> [...]
Song Liu Nov. 3, 2020, 5:41 a.m. UTC | #12
> On Nov 2, 2020, at 9:02 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> 

> On Mon, Nov 2, 2020 at 3:24 PM Song Liu <songliubraving@fb.com> wrote:

>> 

>> 

>> 

>>> On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:

>>> 

>> 

>> [...]

>> 

>>> 

>>> BTF deduplication is not yet supported for split BTF and support for it will

>>> be added in separate patch.

>>> 

>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

>> 

>> Acked-by: Song Liu <songliubraving@fb.com>

>> 

>> With a couple nits:

>> 

>>> ---

>>> tools/lib/bpf/btf.c      | 205 ++++++++++++++++++++++++++++++---------

>>> tools/lib/bpf/btf.h      |   8 ++

>>> tools/lib/bpf/libbpf.map |   9 ++

>>> 3 files changed, 175 insertions(+), 47 deletions(-)

>>> 

>>> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c

>>> index db9331fea672..20c64a8441a8 100644

>>> --- a/tools/lib/bpf/btf.c

>>> +++ b/tools/lib/bpf/btf.c

>>> @@ -78,10 +78,32 @@ struct btf {

>>>      void *types_data;

>>>      size_t types_data_cap; /* used size stored in hdr->type_len */

>>> 

>>> -     /* type ID to `struct btf_type *` lookup index */

>>> +     /* type ID to `struct btf_type *` lookup index

>>> +      * type_offs[0] corresponds to the first non-VOID type:

>>> +      *   - for base BTF it's type [1];

>>> +      *   - for split BTF it's the first non-base BTF type.

>>> +      */

>>>      __u32 *type_offs;

>>>      size_t type_offs_cap;

>>> +     /* number of types in this BTF instance:

>>> +      *   - doesn't include special [0] void type;

>>> +      *   - for split BTF counts number of types added on top of base BTF.

>>> +      */

>>>      __u32 nr_types;

>> 

>> This is a little confusing. Maybe add a void type for every split BTF?

> 

> Agree about being a bit confusing. But I don't want VOID in every BTF,

> that seems sloppy (there's no continuity). I'm currently doing similar

> changes on kernel side, and so far everything also works cleanly with

> start_id == 0 && nr_types including VOID (for base BTF), and start_id

> == base_btf->nr_type && nr_types has all the added types (for split

> BTF). That seems a bit more straightforward, so I'll probably do that

> here as well (unless I'm missing something, I'll double check).


That sounds good. 

> 

>> 

>>> +     /* if not NULL, points to the base BTF on top of which the current

>>> +      * split BTF is based

>>> +      */

>> 

>> [...]

>> 

>>> 

>>> @@ -252,12 +274,20 @@ static int btf_parse_str_sec(struct btf *btf)

>>>      const char *start = btf->strs_data;

>>>      const char *end = start + btf->hdr->str_len;

>>> 

>>> -     if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET ||

>>> -         start[0] || end[-1]) {

>>> -             pr_debug("Invalid BTF string section\n");

>>> -             return -EINVAL;

>>> +     if (btf->base_btf) {

>>> +             if (hdr->str_len == 0)

>>> +                     return 0;

>>> +             if (hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1]) {

>>> +                     pr_debug("Invalid BTF string section\n");

>>> +                     return -EINVAL;

>>> +             }

>>> +     } else {

>>> +             if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET ||

>>> +                 start[0] || end[-1]) {

>>> +                     pr_debug("Invalid BTF string section\n");

>>> +                     return -EINVAL;

>>> +             }

>>>      }

>>> -

>>>      return 0;

>> 

>> I found this function a little difficult to follow. Maybe rearrange it as

>> 

>>        /* too long, or not \0 terminated */

>>        if (hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1])

>>                goto err_out;

> 

> this won't work, if str_len == 0. Both str_len - 1 will underflow, and

> end[-1] will be reading garbage

> 

> How about this:

> 

> if (btf->base_btf && hdr->str_len == 0)

>    return 0;

> 

> if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1])

>    return -EINVAL;

> 

> if (!btf->base_btf && start[0])

>    return -EINVAL;

> 

> return 0;

> 

> This seems more straightforward, right?


Yeah, I like this version. BTW, short comment for each condition will be
helpful.
Song Liu Nov. 3, 2020, 5:44 a.m. UTC | #13
> On Nov 2, 2020, at 9:18 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> 

> On Mon, Nov 2, 2020 at 4:51 PM Song Liu <songliubraving@fb.com> wrote:

>> 

>> 

>> 

>>> On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:

>>> 

>>> Make data section layout checks stricter, disallowing overlap of types and

>>> strings data.

>>> 

>>> Additionally, allow BTFs with no type data. There is nothing inherently wrong

>>> with having BTF with no types (put potentially with some strings). This could

>>> be a situation with kernel module BTFs, if module doesn't introduce any new

>>> type information.

>>> 

>>> Also fix invalid offset alignment check for btf->hdr->type_off.

>>> 

>>> Fixes: 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf")

>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

>>> ---

>>> tools/lib/bpf/btf.c | 16 ++++++----------

>>> 1 file changed, 6 insertions(+), 10 deletions(-)

>>> 

>>> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c

>>> index 20c64a8441a8..9b0ef71a03d0 100644

>>> --- a/tools/lib/bpf/btf.c

>>> +++ b/tools/lib/bpf/btf.c

>>> @@ -245,22 +245,18 @@ static int btf_parse_hdr(struct btf *btf)

>>>              return -EINVAL;

>>>      }

>>> 

>>> -     if (meta_left < hdr->type_off) {

>>> -             pr_debug("Invalid BTF type section offset:%u\n", hdr->type_off);

>>> +     if (meta_left < hdr->str_off + hdr->str_len) {

>>> +             pr_debug("Invalid BTF total size:%u\n", btf->raw_size);

>>>              return -EINVAL;

>>>      }

>> 

>> Can we make this one as

>>        if (meta_left != hdr->str_off + hdr->str_len) {

> 

> That would be not forward-compatible. I.e., old libbpf loading new BTF

> with extra stuff after the string section. Kernel is necessarily more

> strict, but I'd like to keep libbpf more permissive with this.


Yeah, this makes sense. Let's keep both checks AS-IS. 

Thanks,
Song
Song Liu Nov. 3, 2020, 6:03 a.m. UTC | #14
> On Oct 28, 2020, at 5:59 PM, Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Add ability to work with split BTF by providing extra -B flag, which allows to
> specify the path to the base BTF file.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
> tools/bpf/bpftool/btf.c  |  9 ++++++---
> tools/bpf/bpftool/main.c | 15 ++++++++++++++-
> tools/bpf/bpftool/main.h |  1 +
> 3 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
> index 8ab142ff5eac..c96b56e8e3a4 100644
> --- a/tools/bpf/bpftool/btf.c
> +++ b/tools/bpf/bpftool/btf.c
> @@ -358,8 +358,12 @@ static int dump_btf_raw(const struct btf *btf,
> 		}
> 	} else {
> 		int cnt = btf__get_nr_types(btf);
> +		int start_id = 1;
> 
> -		for (i = 1; i <= cnt; i++) {
> +		if (base_btf)
> +			start_id = btf__get_nr_types(base_btf) + 1;
> +
> +		for (i = start_id; i <= cnt; i++) {
> 			t = btf__type_by_id(btf, i);
> 			dump_btf_type(btf, i, t);
> 		}
> @@ -438,7 +442,6 @@ static int do_dump(int argc, char **argv)
> 		return -1;
> 	}
> 	src = GET_ARG();
> -
> 	if (is_prefix(src, "map")) {
> 		struct bpf_map_info info = {};
> 		__u32 len = sizeof(info);
> @@ -499,7 +502,7 @@ static int do_dump(int argc, char **argv)
> 		}
> 		NEXT_ARG();
> 	} else if (is_prefix(src, "file")) {
> -		btf = btf__parse(*argv, NULL);
> +		btf = btf__parse_split(*argv, base_btf);
> 		if (IS_ERR(btf)) {
> 			err = -PTR_ERR(btf);
> 			btf = NULL;
> diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
> index 682daaa49e6a..b86f450e6fce 100644
> --- a/tools/bpf/bpftool/main.c
> +++ b/tools/bpf/bpftool/main.c
> @@ -11,6 +11,7 @@
> 
> #include <bpf/bpf.h>
> #include <bpf/libbpf.h>
> +#include <bpf/btf.h>
> 
> #include "main.h"
> 
> @@ -28,6 +29,7 @@ bool show_pinned;
> bool block_mount;
> bool verifier_logs;
> bool relaxed_maps;
> +struct btf *base_btf;
> struct pinned_obj_table prog_table;
> struct pinned_obj_table map_table;
> struct pinned_obj_table link_table;
> @@ -391,6 +393,7 @@ int main(int argc, char **argv)
> 		{ "mapcompat",	no_argument,	NULL,	'm' },
> 		{ "nomount",	no_argument,	NULL,	'n' },
> 		{ "debug",	no_argument,	NULL,	'd' },
> +		{ "base-btf",	required_argument, NULL, 'B' },
> 		{ 0 }
> 	};
> 	int opt, ret;
> @@ -407,7 +410,7 @@ int main(int argc, char **argv)
> 	hash_init(link_table.table);
> 
> 	opterr = 0;
> -	while ((opt = getopt_long(argc, argv, "Vhpjfmnd",
> +	while ((opt = getopt_long(argc, argv, "VhpjfmndB:",
> 				  options, NULL)) >= 0) {
> 		switch (opt) {
> 		case 'V':
> @@ -441,6 +444,15 @@ int main(int argc, char **argv)
> 			libbpf_set_print(print_all_levels);
> 			verifier_logs = true;
> 			break;
> +		case 'B':
> +			base_btf = btf__parse(optarg, NULL);
> +			if (libbpf_get_error(base_btf)) {
> +				p_err("failed to parse base BTF at '%s': %ld\n",
> +				      optarg, libbpf_get_error(base_btf));
> +				base_btf = NULL;
> +				return -1;
> +			}
> +			break;
> 		default:
> 			p_err("unrecognized option '%s'", argv[optind - 1]);
> 			if (json_output)
> @@ -465,6 +477,7 @@ int main(int argc, char **argv)
> 		delete_pinned_obj_table(&map_table);
> 		delete_pinned_obj_table(&link_table);
> 	}
> +	btf__free(base_btf);
> 
> 	return ret;
> }
> diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> index c46e52137b87..76e91641262b 100644
> --- a/tools/bpf/bpftool/main.h
> +++ b/tools/bpf/bpftool/main.h
> @@ -90,6 +90,7 @@ extern bool show_pids;
> extern bool block_mount;
> extern bool verifier_logs;
> extern bool relaxed_maps;
> +extern struct btf *base_btf;
> extern struct pinned_obj_table prog_table;
> extern struct pinned_obj_table map_table;
> extern struct pinned_obj_table link_table;
> -- 
> 2.24.1
>
Andrii Nakryiko Nov. 4, 2020, 11:51 p.m. UTC | #15
On Mon, Nov 2, 2020 at 9:41 PM Song Liu <songliubraving@fb.com> wrote:
>

>

>

> > On Nov 2, 2020, at 9:02 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> >

> > On Mon, Nov 2, 2020 at 3:24 PM Song Liu <songliubraving@fb.com> wrote:

> >>

> >>

> >>

> >>> On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@kernel.org> wrote:

> >>>

> >>

> >> [...]

> >>

> >>>

> >>> BTF deduplication is not yet supported for split BTF and support for it will

> >>> be added in separate patch.

> >>>

> >>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

> >>

> >> Acked-by: Song Liu <songliubraving@fb.com>

> >>

> >> With a couple nits:

> >>

> >>> ---

> >>> tools/lib/bpf/btf.c      | 205 ++++++++++++++++++++++++++++++---------

> >>> tools/lib/bpf/btf.h      |   8 ++

> >>> tools/lib/bpf/libbpf.map |   9 ++

> >>> 3 files changed, 175 insertions(+), 47 deletions(-)

> >>>

> >>> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c

> >>> index db9331fea672..20c64a8441a8 100644

> >>> --- a/tools/lib/bpf/btf.c

> >>> +++ b/tools/lib/bpf/btf.c

> >>> @@ -78,10 +78,32 @@ struct btf {

> >>>      void *types_data;

> >>>      size_t types_data_cap; /* used size stored in hdr->type_len */

> >>>

> >>> -     /* type ID to `struct btf_type *` lookup index */

> >>> +     /* type ID to `struct btf_type *` lookup index

> >>> +      * type_offs[0] corresponds to the first non-VOID type:

> >>> +      *   - for base BTF it's type [1];

> >>> +      *   - for split BTF it's the first non-base BTF type.

> >>> +      */

> >>>      __u32 *type_offs;

> >>>      size_t type_offs_cap;

> >>> +     /* number of types in this BTF instance:

> >>> +      *   - doesn't include special [0] void type;

> >>> +      *   - for split BTF counts number of types added on top of base BTF.

> >>> +      */

> >>>      __u32 nr_types;

> >>

> >> This is a little confusing. Maybe add a void type for every split BTF?

> >

> > Agree about being a bit confusing. But I don't want VOID in every BTF,

> > that seems sloppy (there's no continuity). I'm currently doing similar

> > changes on kernel side, and so far everything also works cleanly with

> > start_id == 0 && nr_types including VOID (for base BTF), and start_id

> > == base_btf->nr_type && nr_types has all the added types (for split

> > BTF). That seems a bit more straightforward, so I'll probably do that

> > here as well (unless I'm missing something, I'll double check).

>

> That sounds good.


So I don't think I can do that in libbpf representation,
unfortunately. I did miss something, turns out. The difference is that
in kernel BTF is always immutable, so we can store stable pointers for
id -> btf_type lookups. For libbpf, BTF can be modified, so pointers
could be invalidated. So I instead store offsets relative to the
beginning of the type data array. With such representation having VOID
as element #0 is more tricky (I actually tried, but it's too
cumbersome). So this representation will have to be slightly different
between kernel and libbpf. But that's ok, because it's just an
internal implementation. API abstracts all of that.

>

> >

> >>

> >>> +     /* if not NULL, points to the base BTF on top of which the current

> >>> +      * split BTF is based

> >>> +      */

> >>

> >> [...]

> >>

> >>>

> >>> @@ -252,12 +274,20 @@ static int btf_parse_str_sec(struct btf *btf)

> >>>      const char *start = btf->strs_data;

> >>>      const char *end = start + btf->hdr->str_len;

> >>>

> >>> -     if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET ||

> >>> -         start[0] || end[-1]) {

> >>> -             pr_debug("Invalid BTF string section\n");

> >>> -             return -EINVAL;

> >>> +     if (btf->base_btf) {

> >>> +             if (hdr->str_len == 0)

> >>> +                     return 0;

> >>> +             if (hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1]) {

> >>> +                     pr_debug("Invalid BTF string section\n");

> >>> +                     return -EINVAL;

> >>> +             }

> >>> +     } else {

> >>> +             if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET ||

> >>> +                 start[0] || end[-1]) {

> >>> +                     pr_debug("Invalid BTF string section\n");

> >>> +                     return -EINVAL;

> >>> +             }

> >>>      }

> >>> -

> >>>      return 0;

> >>

> >> I found this function a little difficult to follow. Maybe rearrange it as

> >>

> >>        /* too long, or not \0 terminated */

> >>        if (hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1])

> >>                goto err_out;

> >

> > this won't work, if str_len == 0. Both str_len - 1 will underflow, and

> > end[-1] will be reading garbage

> >

> > How about this:

> >

> > if (btf->base_btf && hdr->str_len == 0)

> >    return 0;

> >

> > if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET || end[-1])

> >    return -EINVAL;

> >

> > if (!btf->base_btf && start[0])

> >    return -EINVAL;

> >

> > return 0;

> >

> > This seems more straightforward, right?

>

> Yeah, I like this version. BTW, short comment for each condition will be

> helpful.

>

>