Message ID | 20210121012241.2109147-1-sdf@google.com |
---|---|
State | New |
Headers | show |
Series | [bpf-next,1/2] bpf: allow rewriting to ports under ip_unprivileged_port_start | expand |
Stanislav Fomichev <sdf@google.com> [Wed, 2021-01-20 18:09 -0800]: > At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port > to the privileged ones (< ip_unprivileged_port_start), but it will > be rejected later on in the __inet_bind or __inet6_bind. > > Let's export 'port_changed' event from the BPF program and bypass > ip_unprivileged_port_start range check when we've seen that > the program explicitly overrode the port. This is accomplished > by generating instructions to set ctx->port_changed along with > updating ctx->user_port. > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- ... > @@ -244,17 +245,27 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, > if (cgroup_bpf_enabled(type)) { \ > lock_sock(sk); \ > __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ > - t_ctx); \ > + t_ctx, NULL); \ > release_sock(sk); \ > } \ > __ret; \ > }) > > -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) \ > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_BIND, NULL) > - > -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) \ > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_BIND, NULL) > +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) \ > +({ \ > + bool port_changed = false; \ I see the discussion with Martin in [0] on the program overriding the port but setting exactly same value as it already contains. Commenting on this patch since the code is here. From what I understand there is no use-case to support overriding the port w/o changing the value to just bypass the capability. In this case the code can be simplified. Here instead of introducing port_changed you can just remember the original ((struct sockaddr_in *)uaddr)->sin_port or ((struct sockaddr_in6 *)uaddr)->sin6_port (they have same offset/size so it can be simplified same way as in sock_addr_convert_ctx_access() for user_port) ... > + int __ret = 0; \ > + if (cgroup_bpf_enabled(type)) { \ > + lock_sock(sk); \ > + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ > + NULL, \ > + &port_changed); \ > + release_sock(sk); \ > + if (port_changed) \ ... and then just compare the original and the new ports here. The benefits will be: * no need to introduce port_changed field in struct bpf_sock_addr_kern; * no need to do change program instructions; * no need to think about compiler optimizing out those instructions; * no need to think about multiple programs coordination, the flag will be set only if port has actually changed what is easy to reason about from user perspective. wdyt? > + *flags |= BIND_NO_CAP_NET_BIND_SERVICE; \ > + } \ > + __ret; \ > +}) > > #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) \ > ((cgroup_bpf_enabled(BPF_CGROUP_INET4_CONNECT) || \ > @@ -453,8 +464,7 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map, > #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; }) > #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; }) > #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; }) > -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) ({ 0; }) > -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) ({ 0; }) > +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) ({ 0; }) > #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; }) > #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; }) > #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr) ({ 0; }) ... [0] https://lore.kernel.org/bpf/20210121223330.pyk4ljtjirm2zlay@kafai-mbp/ -- Andrey Ignatov
On Fri, Jan 22, 2021 at 11:37 AM Andrey Ignatov <rdna@fb.com> wrote: > > Stanislav Fomichev <sdf@google.com> [Wed, 2021-01-20 18:09 -0800]: > > At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port > > to the privileged ones (< ip_unprivileged_port_start), but it will > > be rejected later on in the __inet_bind or __inet6_bind. > > > > Let's export 'port_changed' event from the BPF program and bypass > > ip_unprivileged_port_start range check when we've seen that > > the program explicitly overrode the port. This is accomplished > > by generating instructions to set ctx->port_changed along with > > updating ctx->user_port. > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > --- > ... > > @@ -244,17 +245,27 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, > > if (cgroup_bpf_enabled(type)) { \ > > lock_sock(sk); \ > > __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ > > - t_ctx); \ > > + t_ctx, NULL); \ > > release_sock(sk); \ > > } \ > > __ret; \ > > }) > > > > -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) \ > > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_BIND, NULL) > > - > > -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) \ > > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_BIND, NULL) > > +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) \ > > +({ \ > > + bool port_changed = false; \ > > I see the discussion with Martin in [0] on the program overriding the > port but setting exactly same value as it already contains. Commenting > on this patch since the code is here. > > From what I understand there is no use-case to support overriding the > port w/o changing the value to just bypass the capability. In this case > the code can be simplified. > > Here instead of introducing port_changed you can just remember the > original ((struct sockaddr_in *)uaddr)->sin_port or > ((struct sockaddr_in6 *)uaddr)->sin6_port (they have same offset/size so > it can be simplified same way as in sock_addr_convert_ctx_access() for > user_port) ... > > > + int __ret = 0; \ > > + if (cgroup_bpf_enabled(type)) { \ > > + lock_sock(sk); \ > > + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ > > + NULL, \ > > + &port_changed); \ > > + release_sock(sk); \ > > + if (port_changed) \ > > ... and then just compare the original and the new ports here. > > The benefits will be: > * no need to introduce port_changed field in struct bpf_sock_addr_kern; > * no need to do change program instructions; > * no need to think about compiler optimizing out those instructions; > * no need to think about multiple programs coordination, the flag will > be set only if port has actually changed what is easy to reason about > from user perspective. > > wdyt? Martin mentioned in another email that we might want to do that when we rewrite only the address portion of it. I think it makes sense. Imagine doing 1.1.1.1:50 -> 2.2.2.2:50 it seems like it should also work, right? And in this case, we need to store and compare addresses as well and it becomes messy :-/ It also seems like it would be nice to have this 'bypass cap_net_bind_service" without changing the address while we are at it.
Stanislav Fomichev <sdf@google.com> [Fri, 2021-01-22 11:54 -0800]: > On Fri, Jan 22, 2021 at 11:37 AM Andrey Ignatov <rdna@fb.com> wrote: > > > > Stanislav Fomichev <sdf@google.com> [Wed, 2021-01-20 18:09 -0800]: > > > At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port > > > to the privileged ones (< ip_unprivileged_port_start), but it will > > > be rejected later on in the __inet_bind or __inet6_bind. > > > > > > Let's export 'port_changed' event from the BPF program and bypass > > > ip_unprivileged_port_start range check when we've seen that > > > the program explicitly overrode the port. This is accomplished > > > by generating instructions to set ctx->port_changed along with > > > updating ctx->user_port. > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > > --- > > ... > > > @@ -244,17 +245,27 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, > > > if (cgroup_bpf_enabled(type)) { \ > > > lock_sock(sk); \ > > > __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ > > > - t_ctx); \ > > > + t_ctx, NULL); \ > > > release_sock(sk); \ > > > } \ > > > __ret; \ > > > }) > > > > > > -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) \ > > > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_BIND, NULL) > > > - > > > -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) \ > > > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_BIND, NULL) > > > +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) \ > > > +({ \ > > > + bool port_changed = false; \ > > > > I see the discussion with Martin in [0] on the program overriding the > > port but setting exactly same value as it already contains. Commenting > > on this patch since the code is here. > > > > From what I understand there is no use-case to support overriding the > > port w/o changing the value to just bypass the capability. In this case > > the code can be simplified. > > > > Here instead of introducing port_changed you can just remember the > > original ((struct sockaddr_in *)uaddr)->sin_port or > > ((struct sockaddr_in6 *)uaddr)->sin6_port (they have same offset/size so > > it can be simplified same way as in sock_addr_convert_ctx_access() for > > user_port) ... > > > > > + int __ret = 0; \ > > > + if (cgroup_bpf_enabled(type)) { \ > > > + lock_sock(sk); \ > > > + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ > > > + NULL, \ > > > + &port_changed); \ > > > + release_sock(sk); \ > > > + if (port_changed) \ > > > > ... and then just compare the original and the new ports here. > > > > The benefits will be: > > * no need to introduce port_changed field in struct bpf_sock_addr_kern; > > * no need to do change program instructions; > > * no need to think about compiler optimizing out those instructions; > > * no need to think about multiple programs coordination, the flag will > > be set only if port has actually changed what is easy to reason about > > from user perspective. > > > > wdyt? > Martin mentioned in another email that we might want to do that when > we rewrite only the address portion of it. > I think it makes sense. Imagine doing 1.1.1.1:50 -> 2.2.2.2:50 it > seems like it should also work, right? > And in this case, we need to store and compare addresses as well and > it becomes messy :-/ Why does address matter? CAP_NET_BIND_SERVICE is only about ports, not addresses. IMO address change should not matter to bypass CAP_NET_BIND_SERVICE in this case and correspondingly there should not be a need to compare addresses, only port should be enough. > It also seems like it would be nice to have this 'bypass > cap_net_bind_service" without changing the address while we are at it. Yeah, this part determines the behaviour. I guess it should be use-case driven. So far it seems to be more like "nice to have" rather than a real-use case exists, but I could miss it, please correct me if it's the case. -- Andrey Ignatov
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 0748fd87969e..874ed865bea1 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -125,7 +125,8 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk, int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, struct sockaddr *uaddr, enum bpf_attach_type type, - void *t_ctx); + void *t_ctx, + bool *port_changed); int __cgroup_bpf_run_filter_sock_ops(struct sock *sk, struct bpf_sock_ops_kern *sock_ops, @@ -234,7 +235,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, int __ret = 0; \ if (cgroup_bpf_enabled(type)) \ __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ - NULL); \ + NULL, NULL); \ __ret; \ }) @@ -244,17 +245,27 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, if (cgroup_bpf_enabled(type)) { \ lock_sock(sk); \ __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ - t_ctx); \ + t_ctx, NULL); \ release_sock(sk); \ } \ __ret; \ }) -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_BIND, NULL) - -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_BIND, NULL) +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) \ +({ \ + bool port_changed = false; \ + int __ret = 0; \ + if (cgroup_bpf_enabled(type)) { \ + lock_sock(sk); \ + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ + NULL, \ + &port_changed); \ + release_sock(sk); \ + if (port_changed) \ + *flags |= BIND_NO_CAP_NET_BIND_SERVICE; \ + } \ + __ret; \ +}) #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) \ ((cgroup_bpf_enabled(BPF_CGROUP_INET4_CONNECT) || \ @@ -453,8 +464,7 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map, #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr) ({ 0; }) diff --git a/include/linux/filter.h b/include/linux/filter.h index 5b3137d7b690..9bee8c057dd2 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1258,6 +1258,7 @@ struct bpf_sock_addr_kern { */ u64 tmp_reg; void *t_ctx; /* Attach type specific context. */ + u32 port_changed; }; struct bpf_sock_ops_kern { diff --git a/include/net/inet_common.h b/include/net/inet_common.h index cb2818862919..9ba935c15869 100644 --- a/include/net/inet_common.h +++ b/include/net/inet_common.h @@ -41,6 +41,9 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); #define BIND_WITH_LOCK (1 << 1) /* Called from BPF program. */ #define BIND_FROM_BPF (1 << 2) +/* Skip CAP_NET_BIND_SERVICE check. */ +#define BIND_NO_CAP_NET_BIND_SERVICE (1 << 3) + int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, u32 flags); int inet_getname(struct socket *sock, struct sockaddr *uaddr, diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index da649f20d6b2..f5d6205f1717 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1055,6 +1055,8 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk); * @uaddr: sockaddr struct provided by user * @type: The type of program to be exectuted * @t_ctx: Pointer to attach type specific context + * @port_changed: Pointer to bool which will be set to 'true' when BPF + * program updates user_port * * socket is expected to be of type INET or INET6. * @@ -1064,7 +1066,8 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk); int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, struct sockaddr *uaddr, enum bpf_attach_type type, - void *t_ctx) + void *t_ctx, + bool *port_changed) { struct bpf_sock_addr_kern ctx = { .sk = sk, @@ -1089,6 +1092,9 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN); + if (port_changed) + *port_changed = ctx.port_changed; + return ret == 1 ? 0 : -EPERM; } EXPORT_SYMBOL(__cgroup_bpf_run_filter_sock_addr); diff --git a/net/core/filter.c b/net/core/filter.c index 9ab94e90d660..b3dd02eb9551 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -9028,6 +9028,19 @@ static u32 sock_addr_convert_ctx_access(enum bpf_access_type type, offsetof(struct sockaddr_in6, sin6_port)); BUILD_BUG_ON(sizeof_field(struct sockaddr_in, sin_port) != sizeof_field(struct sockaddr_in6, sin6_port)); + + /* Set bpf_sock_addr_kern->port_changed=1 whenever + * the port is updated from the BPF program. + */ + if (type == BPF_WRITE) { + *insn++ = BPF_ST_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_addr_kern, + port_changed), + si->dst_reg, + offsetof(struct bpf_sock_addr_kern, + port_changed), + 1); + } + /* Account for sin6_port being smaller than user_port. */ port_size = min(port_size, BPF_LDST_BYTES(si)); SOCK_ADDR_LOAD_OR_STORE_NESTED_FIELD_SIZE_OFF( diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 6ba2930ff49b..aaa94bea19c3 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -438,6 +438,7 @@ EXPORT_SYMBOL(inet_release); int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; + u32 flags = BIND_WITH_LOCK; int err; /* If the socket has its own bind function then use it. (RAW) */ @@ -450,11 +451,12 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) /* BPF prog is run before any checks are done so that if the prog * changes context in a wrong way it will be caught. */ - err = BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr); + err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, + BPF_CGROUP_INET4_BIND, &flags); if (err) return err; - return __inet_bind(sk, uaddr, addr_len, BIND_WITH_LOCK); + return __inet_bind(sk, uaddr, addr_len, flags); } EXPORT_SYMBOL(inet_bind); @@ -499,7 +501,8 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, snum = ntohs(addr->sin_port); err = -EACCES; - if (snum && inet_port_requires_bind_service(net, snum) && + if (!(flags & BIND_NO_CAP_NET_BIND_SERVICE) && + snum && inet_port_requires_bind_service(net, snum) && !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) goto out; diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index b9c654836b72..3e523c4f5226 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -439,6 +439,7 @@ static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; + u32 flags = BIND_WITH_LOCK; int err = 0; /* If the socket has its own bind function then use it. */ @@ -451,11 +452,12 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) /* BPF prog is run before any checks are done so that if the prog * changes context in a wrong way it will be caught. */ - err = BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr); + err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, + BPF_CGROUP_INET6_BIND, &flags); if (err) return err; - return __inet6_bind(sk, uaddr, addr_len, BIND_WITH_LOCK); + return __inet6_bind(sk, uaddr, addr_len, flags); } EXPORT_SYMBOL(inet6_bind);
At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port to the privileged ones (< ip_unprivileged_port_start), but it will be rejected later on in the __inet_bind or __inet6_bind. Let's export 'port_changed' event from the BPF program and bypass ip_unprivileged_port_start range check when we've seen that the program explicitly overrode the port. This is accomplished by generating instructions to set ctx->port_changed along with updating ctx->user_port. Signed-off-by: Stanislav Fomichev <sdf@google.com> --- include/linux/bpf-cgroup.h | 30 ++++++++++++++++++++---------- include/linux/filter.h | 1 + include/net/inet_common.h | 3 +++ kernel/bpf/cgroup.c | 8 +++++++- net/core/filter.c | 13 +++++++++++++ net/ipv4/af_inet.c | 9 ++++++--- net/ipv6/af_inet6.c | 6 ++++-- 7 files changed, 54 insertions(+), 16 deletions(-)