Message ID | 20200923155436.2117661-1-andriin@fb.com |
---|---|
Headers | show |
Series | libbpf: BTF writer APIs | expand |
Andrii Nakryiko wrote: > This patch set introduces a new set of BTF APIs to libbpf that allow to > conveniently produce BTF types and strings. Internals of struct btf were > changed such that it can transparently and automatically switch to writable > mode, which allows appending BTF types and strings. This will allow for libbpf > itself to do more intrusive modifications of program's BTF (by rewriting it, > at least as of right now), which is necessary for the upcoming libbpf static > linking. But they are complete and generic, so can be adopted by anyone who > has a need to produce BTF type information. I had this floating around on my todo list thanks a lot for doing it. I can remove a couple more silly hacks I have floating around now! > > One such example outside of libbpf is pahole, which was actually converted to > these APIs (locally, pending landing of these changes in libbpf) completely > and shows reduction in amount of custom pahole code necessary and brings nice > savings in memory usage (about 370MB reduction at peak for my kernel > configuration) and even BTF deduplication times (one second reduction, > 23.7s -> 22.7s). Memory savings are due to avoiding pahole's own copy of > "uncompressed" raw BTF data. Time reduction comes from faster string > search and deduplication by relying on hashmap instead of BST used by pahole's > own code. Consequently, these APIs are already tested on real-world > complicated kernel BTF, but there is also pretty extensive selftest doing > extra validations. > > Selftests in patch #9 add a set of generic ASSERT_{EQ,STREQ,ERR,OK} macros > that are useful for writing shorter and less repretitive selftests. I decided > to keep them local to that selftest for now, but if they prove to be useful in > more contexts we should move them to test_progs.h. And few more (e.g., > inequality tests) macros are probably necessary to have a more complete set. > > Cc: Arnaldo Carvalho de Melo <acme@redhat.com> For the series, I have a few nits I'll put in the patches, mostly spelling errors and a couple questions, otherwise this is awesome thanks. Acked-by: John Fastabend <john.fastabend@gmail.com> > > Andrii Nakryiko (9): > libbpf: refactor internals of BTF type index > libbpf: remove assumption of single contiguous memory for BTF data > libbpf: generalize common logic for managing dynamically-sized arrays > libbpf: extract generic string hashing function for reuse > libbpf: allow modification of BTF and add btf__add_str API > libbpf: add btf__new_empty() to create an empty BTF object > libbpf: add BTF writing APIs > libbpf: add btf__str_by_offset() as a more generic variant of > name_by_offset > selftests/bpf: test BTF writing APIs > > tools/lib/bpf/bpf.c | 2 +- > tools/lib/bpf/bpf.h | 2 +- > tools/lib/bpf/btf.c | 1311 +++++++++++++++-- > tools/lib/bpf/btf.h | 41 + > tools/lib/bpf/btf_dump.c | 9 +- > tools/lib/bpf/hashmap.h | 12 + > tools/lib/bpf/libbpf.map | 22 + > tools/lib/bpf/libbpf_internal.h | 3 + > .../selftests/bpf/prog_tests/btf_write.c | 271 ++++ > 9 files changed, 1553 insertions(+), 120 deletions(-) > create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_write.c > > -- > 2.24.1 >
Andrii Nakryiko wrote: > Allow internal BTF representation to switch from default read-only mode, in > which raw BTF data is a single non-modifiable block of memory with BTF header, > types, and strings layed out sequentially and contiguously in memory, into > a writable representation with types and strings data split out into separate > memory regions, that can be dynamically expanded. > > Such writable internal representation is transparent to users of libbpf APIs, > but allows to append new types and strings at the end of BTF, which is > a typical use case when generating BTF programmatically. All the basic > guarantees of BTF types and strings layout is preserved, i.e., user can get > `struct btf_type *` pointer and read it directly. Such btf_type pointers might > be invalidated if BTF is modified, so some care is required in such mixed > read/write scenarios. > > Switch from read-only to writable configuration happens automatically the > first time when user attempts to modify BTF by either adding a new type or new > string. It is still possible to get raw BTF data, which is a single piece of > memory that can be persisted in ELF section or into a file as raw BTF. Such > raw data memory is also still owned by BTF and will be freed either when BTF > object is freed or if another modification to BTF happens, as any modification > invalidates BTF raw representation. > > This patch adds the first BTF writing API: btf__add_str(), which allows to > add arbitrary strings to BTF string section. All the added strings are > automatically deduplicated. This is achieved by maintaining an additional > string lookup index for all unique strings. Such index is built when BTF is > switched to modifiable mode. If at that time BTF strings section contained > duplicate strings, they are not de-duplicated. This is done specifically to > not modify the existing content of BTF (types, their string offsets, etc), > which can cause confusion and is especially important property if there is > struct btf_ext associated with struct btf. By following this "imperfect > deduplication" process, btf_ext is kept consitent and correct. If > deduplication of strings is necessary, it can be forced by doing BTF > deduplication, at which point all the strings will be eagerly deduplicated and > all string offsets both in struct btf and struct btf_ext will be updated. > > Signed-off-by: Andrii Nakryiko <andriin@fb.com> > --- [...] > +/* Ensure BTF is ready to be modified (by splitting into a three memory > + * regions for header, types, and strings). Also invalidate cached > + * raw_data, if any. > + */ > +static int btf_ensure_modifiable(struct btf *btf) > +{ > + void *hdr, *types, *strs, *strs_end, *s; > + struct hashmap *hash = NULL; > + long off; > + int err; > + > + if (btf_is_modifiable(btf)) { > + /* any BTF modification invalidates raw_data */ > + if (btf->raw_data) { I missed why this case is needed? Just being cautious? It looks like we get btf->hdr != btf->raw_data (aka btf_is_modifiable) below, but by the tiime we do this set it looks like we will always null btf->raw_data as well. Again doesn't appear harmful just seeing if I missed a path. > + free(btf->raw_data); > + btf->raw_data = NULL; > + } > + return 0; > + } > + > + /* split raw data into three memory regions */ > + hdr = malloc(btf->hdr->hdr_len); > + types = malloc(btf->hdr->type_len); > + strs = malloc(btf->hdr->str_len); > + if (!hdr || !types || !strs) > + goto err_out; > + > + memcpy(hdr, btf->hdr, btf->hdr->hdr_len); > + memcpy(types, btf->types_data, btf->hdr->type_len); > + memcpy(strs, btf->strs_data, btf->hdr->str_len); > + > + /* build lookup index for all strings */ > + hash = hashmap__new(strs_hash_fn, strs_hash_equal_fn, btf); > + if (IS_ERR(hash)) { > + err = PTR_ERR(hash); > + hash = NULL; > + goto err_out; > + } > + [...] Thanks, John
On Thu, Sep 24, 2020 at 8:56 AM John Fastabend <john.fastabend@gmail.com> wrote: > > Andrii Nakryiko wrote: > > Allow internal BTF representation to switch from default read-only mode, in > > which raw BTF data is a single non-modifiable block of memory with BTF header, > > types, and strings layed out sequentially and contiguously in memory, into > > a writable representation with types and strings data split out into separate > > memory regions, that can be dynamically expanded. > > > > Such writable internal representation is transparent to users of libbpf APIs, > > but allows to append new types and strings at the end of BTF, which is > > a typical use case when generating BTF programmatically. All the basic > > guarantees of BTF types and strings layout is preserved, i.e., user can get > > `struct btf_type *` pointer and read it directly. Such btf_type pointers might > > be invalidated if BTF is modified, so some care is required in such mixed > > read/write scenarios. > > > > Switch from read-only to writable configuration happens automatically the > > first time when user attempts to modify BTF by either adding a new type or new > > string. It is still possible to get raw BTF data, which is a single piece of > > memory that can be persisted in ELF section or into a file as raw BTF. Such > > raw data memory is also still owned by BTF and will be freed either when BTF > > object is freed or if another modification to BTF happens, as any modification > > invalidates BTF raw representation. > > > > This patch adds the first BTF writing API: btf__add_str(), which allows to > > add arbitrary strings to BTF string section. All the added strings are > > automatically deduplicated. This is achieved by maintaining an additional > > string lookup index for all unique strings. Such index is built when BTF is > > switched to modifiable mode. If at that time BTF strings section contained > > duplicate strings, they are not de-duplicated. This is done specifically to > > not modify the existing content of BTF (types, their string offsets, etc), > > which can cause confusion and is especially important property if there is > > struct btf_ext associated with struct btf. By following this "imperfect > > deduplication" process, btf_ext is kept consitent and correct. If > > deduplication of strings is necessary, it can be forced by doing BTF > > deduplication, at which point all the strings will be eagerly deduplicated and > > all string offsets both in struct btf and struct btf_ext will be updated. > > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com> > > --- > > [...] > > > +/* Ensure BTF is ready to be modified (by splitting into a three memory > > + * regions for header, types, and strings). Also invalidate cached > > + * raw_data, if any. > > + */ > > +static int btf_ensure_modifiable(struct btf *btf) > > +{ > > + void *hdr, *types, *strs, *strs_end, *s; > > + struct hashmap *hash = NULL; > > + long off; > > + int err; > > + > > + if (btf_is_modifiable(btf)) { > > + /* any BTF modification invalidates raw_data */ > > + if (btf->raw_data) { > > I missed why this case is needed? Just being cautious? It looks like > we get btf->hdr != btf->raw_data (aka btf_is_modifiable) below, but > by the tiime we do this set it looks like we will always null btf->raw_data > as well. Again doesn't appear harmful just seeing if I missed a path. It's because of btf__get_raw_data() (it's currently used by pahole for BTF dedup). raw_data is cached in struct btf and is owned by it, so when we attempt modification, we have to invalidate a single-blob representation, as it is immediately invalid. This is mostly to preserve existing semantics, but also not to keep allocating new memory if caller created BTF and then accesses raw_data few times. > > > + free(btf->raw_data); > > + btf->raw_data = NULL; > > + } > > + return 0; > > + } > > + > > + /* split raw data into three memory regions */ > > + hdr = malloc(btf->hdr->hdr_len); > > + types = malloc(btf->hdr->type_len); > > + strs = malloc(btf->hdr->str_len); > > + if (!hdr || !types || !strs) > > + goto err_out; > > + > > + memcpy(hdr, btf->hdr, btf->hdr->hdr_len); > > + memcpy(types, btf->types_data, btf->hdr->type_len); > > + memcpy(strs, btf->strs_data, btf->hdr->str_len); > > + > > + /* build lookup index for all strings */ > > + hash = hashmap__new(strs_hash_fn, strs_hash_equal_fn, btf); > > + if (IS_ERR(hash)) { > > + err = PTR_ERR(hash); > > + hash = NULL; > > + goto err_out; > > + } > > + > > [...] > > Thanks, > John