Message ID | 20210126132311.3061388-9-idosch@idosch.org |
---|---|
State | New |
Headers | show |
Series | Add notifications when route hardware flags change | expand |
On 1/26/21 6:23 AM, Ido Schimmel wrote: > From: Amit Cohen <amcohen@nvidia.com> > > After installing a route to the kernel, user space receives an > acknowledgment, which means the route was installed in the kernel, > but not necessarily in hardware. > > The asynchronous nature of route installation in hardware can lead > to a routing daemon advertising a route before it was actually installed in > hardware. This can result in packet loss or mis-routed packets until the > route is installed in hardware. > > It is also possible for a route already installed in hardware to change > its action and therefore its flags. For example, a host route that is > trapping packets can be "promoted" to perform decapsulation following > the installation of an IPinIP/VXLAN tunnel. > > Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags > are changed. The aim is to provide an indication to user-space > (e.g., routing daemons) about the state of the route in hardware. > > Introduce a sysctl that controls this behavior. > > Keep the default value at 0 (i.e., do not emit notifications) for several > reasons: > - Multiple RTM_NEWROUTE notification per-route might confuse existing > routing daemons. > - Convergence reasons in routing daemons. > - The extra notifications will negatively impact the insertion rate. > - Not all users are interested in these notifications. > > Move fib6_info_hw_flags_set() to C file because it is no longer a short > function. > > Signed-off-by: Amit Cohen <amcohen@nvidia.com> > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > Documentation/networking/ip-sysctl.rst | 20 ++++++++++++ > include/net/ip6_fib.h | 10 ++---- > include/net/netns/ipv6.h | 1 + > net/ipv6/af_inet6.c | 1 + > net/ipv6/route.c | 44 ++++++++++++++++++++++++++ > net/ipv6/sysctl_net_ipv6.c | 9 ++++++ > 6 files changed, 77 insertions(+), 8 deletions(-) > Reviewed-by: David Ahern <dsahern@kernel.org>
On 1/26/21 6:23 AM, Ido Schimmel wrote: > From: Amit Cohen <amcohen@nvidia.com> > > After installing a route to the kernel, user space receives an > acknowledgment, which means the route was installed in the kernel, > but not necessarily in hardware. > > The asynchronous nature of route installation in hardware can lead > to a routing daemon advertising a route before it was actually installed in > hardware. This can result in packet loss or mis-routed packets until the > route is installed in hardware. > > It is also possible for a route already installed in hardware to change > its action and therefore its flags. For example, a host route that is > trapping packets can be "promoted" to perform decapsulation following > the installation of an IPinIP/VXLAN tunnel. > > Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags > are changed. The aim is to provide an indication to user-space > (e.g., routing daemons) about the state of the route in hardware. > > Introduce a sysctl that controls this behavior. > > Keep the default value at 0 (i.e., do not emit notifications) for several > reasons: > - Multiple RTM_NEWROUTE notification per-route might confuse existing > routing daemons. > - Convergence reasons in routing daemons. > - The extra notifications will negatively impact the insertion rate. > - Not all users are interested in these notifications. > > Move fib6_info_hw_flags_set() to C file because it is no longer a short > function. > > Signed-off-by: Amit Cohen <amcohen@nvidia.com> > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > Documentation/networking/ip-sysctl.rst | 20 ++++++++++++ > include/net/ip6_fib.h | 10 ++---- > include/net/netns/ipv6.h | 1 + > net/ipv6/af_inet6.c | 1 + > net/ipv6/route.c | 44 ++++++++++++++++++++++++++ > net/ipv6/sysctl_net_ipv6.c | 9 ++++++ > 6 files changed, 77 insertions(+), 8 deletions(-) > Reviewed-by: David Ahern <dsahern@kernel.org>
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 01927b36bbee..11f10b8f4a83 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1795,6 +1795,26 @@ nexthop_compat_mode - BOOLEAN and extraneous notifications. Default: true (backward compat mode) +fib_notify_on_flag_change - INTEGER + Whether to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/ + RTM_F_TRAP flags are changed. + + After installing a route to the kernel, user space receives an + acknowledgment, which means the route was installed in the kernel, + but not necessarily in hardware. + It is also possible for a route already installed in hardware to change + its action and therefore its flags. For example, a host route that is + trapping packets can be "promoted" to perform decapsulation following + the installation of an IPinIP/VXLAN tunnel. + The notifications will indicate to user-space the state of the route. + + Default: 0 (Do not emit notifications.) + + Possible values: + + - 0 - Do not emit notifications. + - 1 - Emit notifications. + IPv6 Fragmentation: ip6frag_high_thresh - INTEGER diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index cc189e668adf..1e262b23c68b 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -336,14 +336,6 @@ static inline void fib6_info_release(struct fib6_info *f6i) call_rcu(&f6i->rcu, fib6_info_destroy_rcu); } -static inline void -fib6_info_hw_flags_set(struct net *net, struct fib6_info *f6i, bool offload, - bool trap) -{ - f6i->offload = offload; - f6i->trap = trap; -} - enum fib6_walk_state { #ifdef CONFIG_IPV6_SUBTREES FWS_S, @@ -546,6 +538,8 @@ static inline bool fib6_metric_locked(struct fib6_info *f6i, int metric) { return !!(f6i->fib6_metrics->metrics[RTAX_LOCK - 1] & (1 << metric)); } +void fib6_info_hw_flags_set(struct net *net, struct fib6_info *f6i, + bool offload, bool trap); #if IS_BUILTIN(CONFIG_IPV6) && defined(CONFIG_BPF_SYSCALL) struct bpf_iter__ipv6_route { diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h index 5ec054473d81..21c0debbd39e 100644 --- a/include/net/netns/ipv6.h +++ b/include/net/netns/ipv6.h @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 { int max_hbh_opts_len; int seg6_flowlabel; bool skip_notify_on_dev_down; + int fib_notify_on_flag_change; }; struct netns_ipv6 { diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 8e9c3e9ea36e..0e9994e0ecd7 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -954,6 +954,7 @@ static int __net_init inet6_net_init(struct net *net) net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT; net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN; net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN; + net->ipv6.sysctl.fib_notify_on_flag_change = 0; atomic_set(&net->ipv6.fib6_sernum, 1); err = ipv6_init_mibs(net); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 188e114b29b4..dcf28240541d 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -6063,6 +6063,50 @@ void fib6_rt_update(struct net *net, struct fib6_info *rt, rtnl_set_sk_err(net, RTNLGRP_IPV6_ROUTE, err); } +void fib6_info_hw_flags_set(struct net *net, struct fib6_info *f6i, + bool offload, bool trap) +{ + struct sk_buff *skb; + int err; + + if (f6i->offload == offload && f6i->trap == trap) + return; + + f6i->offload = offload; + f6i->trap = trap; + + if (!rcu_access_pointer(f6i->fib6_node)) + /* The route was removed from the tree, do not send + * notfication. + */ + return; + + if (!net->ipv6.sysctl.fib_notify_on_flag_change) + return; + + skb = nlmsg_new(rt6_nlmsg_size(f6i), GFP_KERNEL); + if (!skb) { + err = -ENOBUFS; + goto errout; + } + + err = rt6_fill_node(net, skb, f6i, NULL, NULL, NULL, 0, RTM_NEWROUTE, 0, + 0, 0); + if (err < 0) { + /* -EMSGSIZE implies BUG in rt6_nlmsg_size() */ + WARN_ON(err == -EMSGSIZE); + kfree_skb(skb); + goto errout; + } + + rtnl_notify(skb, net, 0, RTNLGRP_IPV6_ROUTE, NULL, GFP_KERNEL); + return; + +errout: + rtnl_set_sk_err(net, RTNLGRP_IPV6_ROUTE, err); +} +EXPORT_SYMBOL(fib6_info_hw_flags_set); + static int ip6_route_dev_notify(struct notifier_block *this, unsigned long event, void *ptr) { diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c index 5b60a4bdd36a..392ef01e3366 100644 --- a/net/ipv6/sysctl_net_ipv6.c +++ b/net/ipv6/sysctl_net_ipv6.c @@ -160,6 +160,15 @@ static struct ctl_table ipv6_table_template[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "fib_notify_on_flag_change", + .data = &init_net.ipv6.sysctl.fib_notify_on_flag_change, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, { } };