Message ID | 20250506142117.1883598-1-cratiu@nvidia.com |
---|---|
State | New |
Headers | show |
Series | [net] net: Lock lower level devices when updating features | expand |
On Wed, 2025-05-07 at 14:35 +0000, Cosmin Ratiu wrote: > On Tue, 2025-05-06 at 11:13 -0700, Stanislav Fomichev wrote: > > On 05/06, Cosmin Ratiu wrote: > > > > > > Right, but netdev_sync_lower_features calls lower's > > __netdev_update_features > > only for NETIF_F_UPPER_DISABLES. So it doesn't propagate all > > features, > > only LRO AFAICT. > > Got it, I didn't look into what netdev_sync_lower_features actually > does besides noticing it can call __netdev_update_feature... > > In any case, please hold off with picking this patch up, it seems > there's a possibility of a real deadlock. Here's the scenario: > > ============================================ > WARNING: possible recursive locking detected > -------------------------------------------- > ethtool/44150 is trying to acquire lock: > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at: > __netdev_update_features+0x31e/0xe20 > > but task is already holding lock: > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at: > ethnl_set_features+0xbc/0x4b0 > and the lock comparison function returns 0: > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&dev_instance_lock_key#7); > lock(&dev_instance_lock_key#7); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by ethtool/44150: > #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40 > #1: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at: > ethnl_set_features+0x88/0x4b0 > #2: ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at: > ethnl_set_features+0xbc/0x4b0 > > stack backtrace: > Call Trace: > <TASK> > dump_stack_lvl+0x69/0xa0 > print_deadlock_bug.cold+0xbd/0xca > __lock_acquire+0x163c/0x2f00 > lock_acquire+0xd3/0x2e0 > __mutex_lock+0x98/0xf10 > __netdev_update_features+0x31e/0xe20 > netdev_update_features+0x1f/0x60 > vlan_device_event+0x57d/0x930 [8021q] > notifier_call_chain+0x3d/0x100 > netdev_features_change+0x32/0x50 > ethnl_set_features+0x17e/0x4b0 > genl_family_rcv_msg_doit+0xe0/0x130 > genl_rcv_msg+0x188/0x290 > [...] > > I'm not sure how to solve this yet... > Cosmin. If it's not clear, the problem is that: 1. the lower device is already ops locked 2. netdev_feature_change gets called. 3. __netdev_update_features gets called for the vlan (upper) dev. 4. It tries to acquire the same lock instance as 1 (this patch). 5. Deadlock. One solution I can think of would be to run device notifiers for changing features outside the lock, it doesn't seem like the netdev lock has anything to do with what other devices might do with this information. This can be triggered from many scenarios, I have another similar stack involving bonding. What do you think? Cosmin.
On Wed, 2025-05-07 at 15:13 +0000, Cosmin Ratiu wrote: > > In any case, please hold off with picking this patch up, it seems > > there's a possibility of a real deadlock. Here's the scenario: > > > > ============================================ > > WARNING: possible recursive locking detected > > -------------------------------------------- > > ethtool/44150 is trying to acquire lock: > > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at: > > __netdev_update_features+0x31e/0xe20 > > > > but task is already holding lock: > > ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at: > > ethnl_set_features+0xbc/0x4b0 > > and the lock comparison function returns 0: > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > > CPU0 > > ---- > > lock(&dev_instance_lock_key#7); > > lock(&dev_instance_lock_key#7); > > > > *** DEADLOCK *** > > > > May be due to missing lock nesting notation > > > > 3 locks held by ethtool/44150: > > #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40 > > #1: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at: > > ethnl_set_features+0x88/0x4b0 > > #2: ffff8881364e8c80 (&dev_instance_lock_key#7){+.+.}-{4:4}, at: > > ethnl_set_features+0xbc/0x4b0 > > > > stack backtrace: > > Call Trace: > > <TASK> > > dump_stack_lvl+0x69/0xa0 > > print_deadlock_bug.cold+0xbd/0xca > > __lock_acquire+0x163c/0x2f00 > > lock_acquire+0xd3/0x2e0 > > __mutex_lock+0x98/0xf10 > > __netdev_update_features+0x31e/0xe20 > > netdev_update_features+0x1f/0x60 > > vlan_device_event+0x57d/0x930 [8021q] > > notifier_call_chain+0x3d/0x100 > > netdev_features_change+0x32/0x50 > > ethnl_set_features+0x17e/0x4b0 > > genl_family_rcv_msg_doit+0xe0/0x130 > > genl_rcv_msg+0x188/0x290 > > [...] > > > > I'm not sure how to solve this yet... > > Cosmin. > > If it's not clear, the problem is that: > 1. the lower device is already ops locked > 2. netdev_feature_change gets called. > 3. __netdev_update_features gets called for the vlan (upper) dev. > 4. It tries to acquire the same lock instance as 1 (this patch). > 5. Deadlock. > > One solution I can think of would be to run device notifiers for > changing features outside the lock, it doesn't seem like the netdev > lock has anything to do with what other devices might do with this > information. > > This can be triggered from many scenarios, I have another similar > stack > involving bonding. > > What do you think? All I could think of was to drop the lock during the netdev_features_changed notifier calls, like in the following hunk. I'm running this through regressions, let's see if it's a good idea or not. diff --git a/net/core/dev.c b/net/core/dev.c index 1be7cb73a602..817fd5bc21b1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1514,7 +1514,12 @@ int dev_get_alias(const struct net_device *dev, char *name, size_t len) */ void netdev_features_change(struct net_device *dev) { + /* Drop the lock to avoid potential deadlocks from e.g. upper dev + * notifiers altering features of 'dev' and acquiring dev lock again. + */ + netdev_unlock_ops(dev); call_netdevice_notifiers(NETDEV_FEAT_CHANGE, dev); + netdev_lock_ops(dev); } EXPORT_SYMBOL(netdev_features_change);
On Wed, 2025-05-07 at 14:20 -0700, Stanislav Fomichev wrote: > On 05/07, Cosmin Ratiu wrote: > > On Wed, 2025-05-07 at 15:13 +0000, Cosmin Ratiu wrote: > > > > In any case, please hold off with picking this patch up, it > > > > seems > > > > there's a possibility of a real deadlock. Here's the scenario: > > Hmm, are you sure you're calling __netdev_update_features on the > upper? > I don't see how the lower would be locked in that case. From my POW, > this is what happens: > > 1. your dev (lower) has a vlan on it (upper) > 2. you call lro=off on the _lower_ > 3. this triggers FEAT_CHANGE notifier and vlan_device_event catches > it > 4. since the lower has a vlan device (dev->vlan_info != NULL), it > goes > over every other vlan in the group and triggers > netdev_update_features > for the upper (netdev_update_features vlandev) > 5. the upper tries to sync the features into the lower (including the > one that triggered FEAT_CHANGE) and that's where the deadlock > happens > > But I think (5) should be largely a no-op for the device triggering > the > notification, because the features have been already applied in > ethnl_set_features. > I'd move the lock into netdev_sync_lower_features, and only after > checking > the features (and making sure that we are going to change them). The > feature > check might be racy, but I think it should still work? > You are right, if I restrict the lower dev critical section to only the call to __netdev_update_features for the lower dev there's no deadlock any more, because the device with the lock held already had its features updated. I will send a new version of this patch soon after the full regression suite finishes and I convince myself there are no more issues related to this that we can encounter. > Can you also share the bonding stacktrace as well to confirm it's the > same issue? Sure, here it is, it's the same scenario. bond_netdev_event gets called on a slave dev, it recomputes features and updates all slaves (bond_compute_features), and then the same lock is reacquired. But this is also fixed with your suggestion above. ============================================ WARNING: possible recursive locking detected devlink/14341 is trying to acquire lock: ffff88810ebd8c80 (&dev_instance_lock_key#9){+.+.}-{4:4}, at: __netdev_update_features+0x31e/0xe20 but task is already holding lock: ffff88810ebd8c80 (&dev_instance_lock_key#9){+.+.}-{4:4}, at: mlx5e_attach_netdev+0x31f/0x360 [mlx5_core] and the lock comparison function returns 0: other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&dev_instance_lock_key#9); lock(&dev_instance_lock_key#9); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by devlink/14341: #0: ffffffff830e5a50 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40 #1: ffff888164a5c250 (&devlink->lock_key){+.+.}-{4:4}, at: devlink_get_from_attrs_lock+0xbc/0x180 #2: ffffffff830cf708 (rtnl_mutex){+.+.}-{4:4}, at: mlx5e_attach_netdev+0x30d/0x360 [mlx5_core] #3: ffff88810ebd8c80 (&dev_instance_lock_key#9){+.+.}-{4:4}, at: mlx5e_attach_netdev+0x31f/0x360 [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x69/0xa0 print_deadlock_bug.cold+0xbd/0xca __lock_acquire+0x163c/0x2f00 lock_acquire+0xd3/0x2e0 __mutex_lock+0x98/0xf10 __netdev_update_features+0x31e/0xe20 netdev_change_features+0x1f/0x60 bond_compute_features+0x24e/0x300 [bonding] bond_netdev_event+0x2e0/0x400 [bonding] notifier_call_chain+0x3d/0x100 netdev_update_features+0x52/0x60 mlx5e_attach_netdev+0x32f/0x360 [mlx5_core] mlx5e_netdev_attach_profile+0x48/0x90 [mlx5_core] mlx5e_netdev_change_profile+0x90/0xf0 [mlx5_core] mlx5e_vport_rep_load+0x414/0x490 [mlx5_core] __esw_offloads_load_rep+0x87/0xd0 [mlx5_core] mlx5_esw_offloads_rep_load+0x45/0xe0 [mlx5_core] esw_offloads_enable+0xb7b/0xca0 [mlx5_core] mlx5_eswitch_enable_locked+0x293/0x430 [mlx5_core] mlx5_devlink_eswitch_mode_set+0x229/0x620 [mlx5_core] devlink_nl_eswitch_set_doit+0x60/0xd0 genl_family_rcv_msg_doit+0xe0/0x130 genl_rcv_msg+0x188/0x290 netlink_rcv_skb+0x4b/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x1e1/0x2c0 netlink_sendmsg+0x210/0x450 Cosmin.
diff --git a/net/core/dev.c b/net/core/dev.c index 1be7cb73a602..77472364225c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10620,8 +10620,11 @@ int __netdev_update_features(struct net_device *dev) /* some features must be disabled on lower devices when disabled * on an upper device (think: bonding master or bridge) */ - netdev_for_each_lower_dev(dev, lower, iter) + netdev_for_each_lower_dev(dev, lower, iter) { + netdev_lock_ops(lower); netdev_sync_lower_features(dev, lower, features); + netdev_unlock_ops(lower); + } if (!err) { netdev_features_t diff = features ^ dev->features;
__netdev_update_features() expects the netdevice to be ops-locked, but it gets called recursively on the lower level netdevices to sync their features, and nothing locks those. This commit fixes that, with the assumption that it shouldn't be possible for both higher-level and lover-level netdevices to require the instance lock, because that would lead to lock dependency warnings. Without this, playing with higher level (e.g. vxlan) netdevices on top of netdevices with instance locking enabled can run into issues: WARNING: CPU: 59 PID: 206496 at ./include/net/netdev_lock.h:17 netif_napi_add_weight_locked+0x753/0xa60 [...] Call Trace: <TASK> mlx5e_open_channel+0xc09/0x3740 [mlx5_core] mlx5e_open_channels+0x1f0/0x770 [mlx5_core] mlx5e_safe_switch_params+0x1b5/0x2e0 [mlx5_core] set_feature_lro+0x1c2/0x330 [mlx5_core] mlx5e_handle_feature+0xc8/0x140 [mlx5_core] mlx5e_set_features+0x233/0x2e0 [mlx5_core] __netdev_update_features+0x5be/0x1670 __netdev_update_features+0x71f/0x1670 dev_ethtool+0x21c5/0x4aa0 dev_ioctl+0x438/0xae0 sock_ioctl+0x2ba/0x690 __x64_sys_ioctl+0xa78/0x1700 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> --- net/core/dev.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)