Message ID | 20250409144133.2833606-6-cratiu@nvidia.com |
---|---|
State | Superseded |
Headers | show |
Series | xfrm & bonding: Correct use of xso.real_dev | expand |
On Wed, Apr 09, 2025 at 05:41:32PM +0300, Cosmin Ratiu wrote: > When the active link is changed for a bond device, the existing xfrm > states need to be migrated over to the new link. This is done with: > - bond_ipsec_del_sa_all() goes through the offloaded states list and > removes all of them from hw. > - bond_ipsec_add_sa_all() re-offloads all states to the new device. > > But because the offload status of xfrm states isn't marked in any way, > there can be bugs. > > When all bond links are down, bond_ipsec_del_sa_all() unoffloads > everything from the previous active link. If the same link then comes > back up, nothing gets reoffloaded by bond_ipsec_add_sa_all(). > This results in a stack trace like this a bit later when user space > removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to > remove a rule that's no longer offloaded: > > [] Call Trace: > [] <TASK> > [] ? __warn+0x7d/0x110 > [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core] > [] ? report_bug+0x16d/0x180 > [] ? handle_bug+0x4f/0x90 > [] ? exc_invalid_op+0x14/0x70 > [] ? asm_exc_invalid_op+0x16/0x20 > [] ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core] > [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core] > [] bond_ipsec_del_sa+0x1ab/0x200 [bonding] > [] xfrm_dev_state_delete+0x1f/0x60 > [] __xfrm_state_delete+0x196/0x200 > [] xfrm_state_delete+0x21/0x40 > [] xfrm_del_sa+0x69/0x110 > [] xfrm_user_rcv_msg+0x11d/0x300 > [] ? release_pages+0xca/0x140 > [] ? copy_to_user_tmpl.part.0+0x110/0x110 > [] netlink_rcv_skb+0x54/0x100 > [] xfrm_netlink_rcv+0x31/0x40 > [] netlink_unicast+0x1fc/0x2d0 > [] netlink_sendmsg+0x1e4/0x410 > [] __sock_sendmsg+0x38/0x60 > [] sock_write_iter+0x94/0xf0 > [] vfs_write+0x338/0x3f0 > [] ksys_write+0xba/0xd0 > [] do_syscall_64+0x4c/0x100 > [] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > > There's also another theoretical bug: > Calling bond_ipsec_del_sa_all() multiple times can result in corruption > in the driver implementation if the double-free isn't tolerated. This > isn't nice. > > Before the "Fixes" commit, xs->xso.real_dev was set to NULL when an xfrm > state was unoffloaded from a device, but a race with netdevsim's > .xdo_dev_offload_ok() accessing real_dev was considered a sufficient > reason to not set real_dev to NULL anymore. This unfortunately > introduced the new bugs. > > Since .xdo_dev_offload_ok() was significantly refactored by [1] and > there are no more users in the stack of xso.real_dev, that > race is now gone and xs->xso.real_dev can now once again be used to > represent which device (if any) currently holds the offloaded rule. > > Go one step further and set real_dev after add/before delete calls, to > catch any future driver misuses of real_dev. > > [1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/ > Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference") > Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> > Reviewed-by: Leon Romanovsky <leonro@nvidia.com> > --- > drivers/net/bonding/bond_main.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 4ba525a564c5..14f7c9712ad4 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -496,9 +496,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev, > goto out; > } > > - xs->xso.real_dev = real_dev; > err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack); > if (!err) { > + xs->xso.real_dev = real_dev; > ipsec->xs = xs; > INIT_LIST_HEAD(&ipsec->list); > mutex_lock(&bond->ipsec_lock); > @@ -540,12 +540,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) > if (ipsec->xs->xso.real_dev == real_dev) > continue; > > - ipsec->xs->xso.real_dev = real_dev; > if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, > ipsec->xs, NULL)) { > slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__); > - ipsec->xs->xso.real_dev = NULL; > + continue; > } > + ipsec->xs->xso.real_dev = real_dev; > } > out: > mutex_unlock(&bond->ipsec_lock); > @@ -629,6 +629,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) > __func__); > continue; > } > + ipsec->xs->xso.real_dev = NULL; > real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, > ipsec->xs); > if (real_dev->xfrmdev_ops->xdo_dev_state_free) > @@ -664,6 +665,7 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev, > > WARN_ON(xs->xso.real_dev != real_dev); > > + xs->xso.real_dev = NULL; > if (real_dev && real_dev->xfrmdev_ops && > real_dev->xfrmdev_ops->xdo_dev_state_free) > real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs); > -- > 2.45.0 > Tested-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Thanks Hangbin
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 4ba525a564c5..14f7c9712ad4 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -496,9 +496,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev, goto out; } - xs->xso.real_dev = real_dev; err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack); if (!err) { + xs->xso.real_dev = real_dev; ipsec->xs = xs; INIT_LIST_HEAD(&ipsec->list); mutex_lock(&bond->ipsec_lock); @@ -540,12 +540,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) if (ipsec->xs->xso.real_dev == real_dev) continue; - ipsec->xs->xso.real_dev = real_dev; if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, ipsec->xs, NULL)) { slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__); - ipsec->xs->xso.real_dev = NULL; + continue; } + ipsec->xs->xso.real_dev = real_dev; } out: mutex_unlock(&bond->ipsec_lock); @@ -629,6 +629,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) __func__); continue; } + ipsec->xs->xso.real_dev = NULL; real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, ipsec->xs); if (real_dev->xfrmdev_ops->xdo_dev_state_free) @@ -664,6 +665,7 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev, WARN_ON(xs->xso.real_dev != real_dev); + xs->xso.real_dev = NULL; if (real_dev && real_dev->xfrmdev_ops && real_dev->xfrmdev_ops->xdo_dev_state_free) real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);