Message ID | 20201022082138.2322434-9-jolsa@kernel.org |
---|---|
State | New |
Headers | show |
Series | bpf: Speed up trampoline attach | expand |
On Thu, Oct 22, 2020 at 8:01 AM Jiri Olsa <jolsa@kernel.org> wrote: > > Moving bpf_link_free call into delayed processing so we don't > need to wait for it when releasing the link. > > For example bpf_tracing_link_release could take considerable > amount of time in bpf_trampoline_put function due to > synchronize_rcu_tasks call. > > It speeds up bpftrace release time in following example: > > Before: > > Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s* > { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs): > > 3,290,457,628 cycles:k ( +- 0.27% ) > 933,581,973 cycles:u ( +- 0.20% ) > > 50.25 +- 4.79 seconds time elapsed ( +- 9.53% ) > > After: > > Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s* > { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs): > > 2,535,458,767 cycles:k ( +- 0.55% ) > 940,046,382 cycles:u ( +- 0.27% ) > > 33.60 +- 3.27 seconds time elapsed ( +- 9.73% ) > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > --- > kernel/bpf/syscall.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 1110ecd7d1f3..61ef29f9177d 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -2346,12 +2346,8 @@ void bpf_link_put(struct bpf_link *link) > if (!atomic64_dec_and_test(&link->refcnt)) > return; > > - if (in_atomic()) { > - INIT_WORK(&link->work, bpf_link_put_deferred); > - schedule_work(&link->work); > - } else { > - bpf_link_free(link); > - } > + INIT_WORK(&link->work, bpf_link_put_deferred); > + schedule_work(&link->work); We just recently reverted this exact change. Doing this makes it non-deterministic from user-space POV when the BPF program is **actually** detached. This makes user-space programming much more complicated and unpredictable. So please don't do this. Let's find some other way to speed this up. > } > > static int bpf_link_release(struct inode *inode, struct file *filp) > -- > 2.26.2 >
On Fri, Oct 23, 2020 at 12:46:15PM -0700, Andrii Nakryiko wrote: > On Thu, Oct 22, 2020 at 8:01 AM Jiri Olsa <jolsa@kernel.org> wrote: > > > > Moving bpf_link_free call into delayed processing so we don't > > need to wait for it when releasing the link. > > > > For example bpf_tracing_link_release could take considerable > > amount of time in bpf_trampoline_put function due to > > synchronize_rcu_tasks call. > > > > It speeds up bpftrace release time in following example: > > > > Before: > > > > Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s* > > { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs): > > > > 3,290,457,628 cycles:k ( +- 0.27% ) > > 933,581,973 cycles:u ( +- 0.20% ) > > > > 50.25 +- 4.79 seconds time elapsed ( +- 9.53% ) > > > > After: > > > > Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s* > > { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs): > > > > 2,535,458,767 cycles:k ( +- 0.55% ) > > 940,046,382 cycles:u ( +- 0.27% ) > > > > 33.60 +- 3.27 seconds time elapsed ( +- 9.73% ) > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org> > > --- > > kernel/bpf/syscall.c | 8 ++------ > > 1 file changed, 2 insertions(+), 6 deletions(-) > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index 1110ecd7d1f3..61ef29f9177d 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -2346,12 +2346,8 @@ void bpf_link_put(struct bpf_link *link) > > if (!atomic64_dec_and_test(&link->refcnt)) > > return; > > > > - if (in_atomic()) { > > - INIT_WORK(&link->work, bpf_link_put_deferred); > > - schedule_work(&link->work); > > - } else { > > - bpf_link_free(link); > > - } > > + INIT_WORK(&link->work, bpf_link_put_deferred); > > + schedule_work(&link->work); > > We just recently reverted this exact change. Doing this makes it > non-deterministic from user-space POV when the BPF program is > **actually** detached. This makes user-space programming much more > complicated and unpredictable. So please don't do this. Let's find > some other way to speed this up. ok, makes sense jirka > > > } > > > > static int bpf_link_release(struct inode *inode, struct file *filp) > > -- > > 2.26.2 > > >
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 1110ecd7d1f3..61ef29f9177d 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2346,12 +2346,8 @@ void bpf_link_put(struct bpf_link *link) if (!atomic64_dec_and_test(&link->refcnt)) return; - if (in_atomic()) { - INIT_WORK(&link->work, bpf_link_put_deferred); - schedule_work(&link->work); - } else { - bpf_link_free(link); - } + INIT_WORK(&link->work, bpf_link_put_deferred); + schedule_work(&link->work); } static int bpf_link_release(struct inode *inode, struct file *filp)
Moving bpf_link_free call into delayed processing so we don't need to wait for it when releasing the link. For example bpf_tracing_link_release could take considerable amount of time in bpf_trampoline_put function due to synchronize_rcu_tasks call. It speeds up bpftrace release time in following example: Before: Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s* { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs): 3,290,457,628 cycles:k ( +- 0.27% ) 933,581,973 cycles:u ( +- 0.20% ) 50.25 +- 4.79 seconds time elapsed ( +- 9.53% ) After: Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s* { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs): 2,535,458,767 cycles:k ( +- 0.55% ) 940,046,382 cycles:u ( +- 0.27% ) 33.60 +- 3.27 seconds time elapsed ( +- 9.73% ) Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- kernel/bpf/syscall.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)