Message ID | 20250320-bpf-next-net-mptcp-bpf_iter-subflows-v3-2-9abd22c2a7fd@kernel.org |
---|---|
State | New |
Headers | show |
Series | bpf: Add mptcp_subflow bpf_iter support | expand |
On 3/20/25 10:48 AM, Matthieu Baerts (NGI0) wrote: > From: Geliang Tang <tanggeliang@kylinos.cn> > > It's necessary to traverse all subflows on the conn_list of an MPTCP > socket and then call kfunc to modify the fields of each subflow. In > kernel space, mptcp_for_each_subflow() helper is used for this: > > mptcp_for_each_subflow(msk, subflow) > kfunc(subflow); > > But in the MPTCP BPF program, this has not yet been implemented. As > Martin suggested recently, this conn_list walking + modify-by-kfunc > usage fits the bpf_iter use case. > > So this patch adds a new bpf_iter type named "mptcp_subflow" to do > this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ > _destroy(). And register these bpf_iter mptcp_subflow into mptcp > common kfunc set. Then bpf_for_each() for mptcp_subflow can be used > in BPF program like this: > > bpf_for_each(mptcp_subflow, subflow, msk) > kfunc(subflow); > > Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> > Reviewed-by: Mat Martineau <martineau@kernel.org> > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > --- > Notes: > - v2: > - Add BUILD_BUG_ON() checks, similar to the ones done with other > bpf_iter_(...) helpers. > - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and > !spin_is_locked() (Martin). > - v3: > - Switch parameter from 'struct mptcp_sock' to 'struct sock' (Martin) > - Remove unneeded !msk check (Martin) > - Remove locks checks, add msk_owned_by_me for lockdep (Martin) > - The following note and 2 questions have been added below. > > This new bpf_iter will be used by our future BPF packet schedulers and > path managers. To see how we are going to use them, please check our > export branch [1], especially these two commits: > > - "bpf: Add mptcp packet scheduler struct_ops": introduce a new > struct_ops. > - "selftests/bpf: Add bpf_burst scheduler & test": new test showing > how the new struct_ops and bpf_iter are being used. > > [1] https://github.com/multipath-tcp/mptcp_net-next/commits/export > > @BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter > to be used with struct_ops, but only with the two new ones we are going > to introduce that are specific to MPTCP, and with not others struct_ops > (TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do > you have examples or doc you could point to us to have this restriction > in place, please? The bpf_qdisc.c has done that. Take a look at the "bpf_qdisc_kfunc_filter()". It is in net-next and bpf-next/net. > > Also, for one of the two future MPTCP struct_ops, not all callbacks > should be allowed to use this new bpf_iter, because they are called from > different contexts. How can we ensure such callbacks from a struct_ops > cannot call mptcp_subflow bpf_iter without adding new dedicated checks > looking if some locks are held for all callbacks? We understood that > they wanted to have something similar with sched_ext, but we are not > sure if this code is ready nor if it is going to be accepted. Same. Take a look at "bpf_qdisc_kfunc_filter()".
Hi Martin, Thank you for your reply! On 17/05/2025 00:34, Martin KaFai Lau wrote: > On 3/20/25 10:48 AM, Matthieu Baerts (NGI0) wrote: (...) >> @BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter >> to be used with struct_ops, but only with the two new ones we are going >> to introduce that are specific to MPTCP, and with not others struct_ops >> (TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do >> you have examples or doc you could point to us to have this restriction >> in place, please? > > The bpf_qdisc.c has done that. Take a look at the > "bpf_qdisc_kfunc_filter()". > > It is in net-next and bpf-next/net. Many thanks for the pointer! I see, some operations have specific kfunc, similar to our needs! >> Also, for one of the two future MPTCP struct_ops, not all callbacks >> should be allowed to use this new bpf_iter, because they are called from >> different contexts. How can we ensure such callbacks from a struct_ops >> cannot call mptcp_subflow bpf_iter without adding new dedicated checks >> looking if some locks are held for all callbacks? We understood that >> they wanted to have something similar with sched_ext, but we are not >> sure if this code is ready nor if it is going to be accepted. > > Same. Take a look at "bpf_qdisc_kfunc_filter()". Excellent, thank you, we will look at that! Cheers, Matt
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 2e4b8ddf81ab0bb9dc547ea8783b73767d553a18..d3b5597eddb915a19eca87d87c31a27dfbdda619 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -29,6 +29,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, }; +struct bpf_iter_mptcp_subflow { + __u64 __opaque[2]; +} __aligned(8); + +struct bpf_iter_mptcp_subflow_kern { + struct mptcp_sock *msk; + struct list_head *pos; +} __aligned(8); + __bpf_kfunc_start_defs(); __bpf_kfunc static struct mptcp_subflow_context * @@ -41,10 +50,57 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) return NULL; } +__bpf_kfunc static int +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct sock *sk) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + struct mptcp_sock *msk; + + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) > + sizeof(struct bpf_iter_mptcp_subflow)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) != + __alignof__(struct bpf_iter_mptcp_subflow)); + + if (unlikely(!sk || !sk_fullsock(sk))) + return -EINVAL; + + if (sk->sk_protocol != IPPROTO_MPTCP) + return -EINVAL; + + msk = mptcp_sk(sk); + + msk_owned_by_me(msk); + + kit->msk = msk; + kit->pos = &msk->conn_list; + return 0; +} + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + + if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list)) + return NULL; + + kit->pos = kit->pos->next; + return list_entry(kit->pos, struct mptcp_subflow_context, node); +} + +__bpf_kfunc static void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) +{ +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {