mbox series

[net,0/2] bond: fix xfrm offload feature during init

Message ID 20241211071127.38452-1-liuhangbin@gmail.com
Headers show
Series bond: fix xfrm offload feature during init | expand

Message

Hangbin Liu Dec. 11, 2024, 7:11 a.m. UTC
The first patch fixes the xfrm offload feature during setup active-backup
mode. The second patch add a ipsec offload testing.

Hangbin Liu (2):
  bonding: fix xfrm offload feature setup on active-backup mode
  selftests: bonding: add ipsec offload test

 drivers/net/bonding/bond_main.c               |   2 +-
 drivers/net/bonding/bond_netlink.c            |  17 +-
 include/net/bonding.h                         |   1 +
 .../selftests/drivers/net/bonding/Makefile    |   3 +-
 .../drivers/net/bonding/bond_ipsec_offload.sh | 155 ++++++++++++++++++
 .../selftests/drivers/net/bonding/config      |   4 +
 6 files changed, 173 insertions(+), 9 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh

Comments

Jakub Kicinski Dec. 12, 2024, 2:27 p.m. UTC | #1
On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
> The first patch fixes the xfrm offload feature during setup active-backup
> mode. The second patch add a ipsec offload testing.

Looks like the test is too good, is there a fix pending somewhere for
the BUG below? We can't merge the test before that:

https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/900082/11-bond-ipsec-offload-sh/stderr

[  859.672652][    C3] bond_xfrm_update_stats: eth0 doesn't support xdo_dev_state_update_stats
[  860.467189][ T8677] bond0: (slave eth0): link status definitely down, disabling slave
[  860.467664][ T8677] bond0: (slave eth1): making interface the new active one
[  860.831042][ T9677] bond_xfrm_update_stats: eth1 doesn't support xdo_dev_state_update_stats
[  862.195271][ T9683] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562
[  862.195880][ T9683] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9683, name: ip
[  862.196189][ T9683] preempt_count: 201, expected: 0
[  862.196396][ T9683] RCU nest depth: 0, expected: 0
[  862.196591][ T9683] 2 locks held by ip/9683:
[  862.196818][ T9683]  #0: ffff88800a829558 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x65/0x90 [xfrm_user]
[  862.197264][ T9683]  #1: ffff88800f460548 (&x->lock){+.-.}-{3:3}, at: xfrm_state_flush+0x1b3/0x3a0
[  862.197629][ T9683] CPU: 3 UID: 0 PID: 9683 Comm: ip Not tainted 6.13.0-rc1-virtme #1
[  862.197967][ T9683] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  862.198204][ T9683] Call Trace:
[  862.198352][ T9683]  <TASK>
[  862.198458][ T9683]  dump_stack_lvl+0xb0/0xd0
[  862.198659][ T9683]  __might_resched+0x2f8/0x530
[  862.198852][ T9683]  ? kfree+0x2d/0x330
[  862.199005][ T9683]  __mutex_lock+0xd9/0xbc0
[  862.199202][ T9683]  ? ref_tracker_free+0x35e/0x910
[  862.199401][ T9683]  ? bond_ipsec_del_sa+0x2c1/0x790
[  862.199937][ T9683]  ? find_held_lock+0x2c/0x110
[  862.200133][ T9683]  ? __pfx___mutex_lock+0x10/0x10
[  862.200329][ T9683]  ? bond_ipsec_del_sa+0x280/0x790
[  862.200519][ T9683]  ? xfrm_dev_state_delete+0x97/0x170
[  862.200711][ T9683]  ? __xfrm_state_delete+0x681/0x8e0
[  862.200907][ T9683]  ? xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user]
[  862.201151][ T9683]  ? netlink_rcv_skb+0x130/0x360
[  862.201347][ T9683]  ? xfrm_netlink_rcv+0x74/0x90 [xfrm_user]
[  862.201587][ T9683]  ? netlink_unicast+0x44b/0x710
[  862.201780][ T9683]  ? netlink_sendmsg+0x723/0xbe0
[  862.201973][ T9683]  ? ____sys_sendmsg+0x7ac/0xa10
[  862.202164][ T9683]  ? ___sys_sendmsg+0xee/0x170
[  862.202355][ T9683]  ? __sys_sendmsg+0x109/0x1a0
[  862.202546][ T9683]  ? do_syscall_64+0xc1/0x1d0
[  862.202738][ T9683]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  862.202986][ T9683]  ? __pfx_nsim_ipsec_del_sa+0x10/0x10 [netdevsim]
[  862.203251][ T9683]  ? bond_ipsec_del_sa+0x2c1/0x790
[  862.203457][ T9683]  bond_ipsec_del_sa+0x2c1/0x790
[  862.203648][ T9683]  ? __pfx_lock_acquire.part.0+0x10/0x10
[  862.203845][ T9683]  ? __pfx_bond_ipsec_del_sa+0x10/0x10
[  862.204034][ T9683]  ? do_raw_spin_lock+0x131/0x270
[  862.204225][ T9683]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  862.204468][ T9683]  xfrm_dev_state_delete+0x97/0x170
[  862.204665][ T9683]  __xfrm_state_delete+0x681/0x8e0
[  862.204858][ T9683]  xfrm_state_flush+0x1bb/0x3a0
[  862.205057][ T9683]  xfrm_flush_sa+0xf0/0x270 [xfrm_user]
[  862.205290][ T9683]  ? __pfx_xfrm_flush_sa+0x10/0x10 [xfrm_user]
[  862.205537][ T9683]  ? __nla_validate_parse+0x48/0x3d0
[  862.205744][ T9683]  xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user]
[  862.205985][ T9683]  ? __pfx___lock_release+0x10/0x10
[  862.206174][ T9683]  ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
[  862.206412][ T9683]  ? __pfx_validate_chain+0x10/0x10
[  862.206614][ T9683]  ? hlock_class+0x4e/0x130
[  862.206807][ T9683]  ? mark_lock+0x38/0x3e0
[  862.206986][ T9683]  ? __mutex_trylock_common+0xfa/0x260
[  862.207181][ T9683]  ? __pfx___mutex_trylock_common+0x10/0x10
[  862.207425][ T9683]  netlink_rcv_skb+0x130/0x360
Hangbin Liu Dec. 13, 2024, 7:18 a.m. UTC | #2
On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
> On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
> > The first patch fixes the xfrm offload feature during setup active-backup
> > mode. The second patch add a ipsec offload testing.
> 
> Looks like the test is too good, is there a fix pending somewhere for
> the BUG below? We can't merge the test before that:

This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from
spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock)
for the xfrm state delete.

But I'm not sure if it's proper to release the spin lock in bond code.
This seems too specific.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7daeab67e7b5..69563bc958ca 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
 out:
 	netdev_put(real_dev, &tracker);
+	spin_unlock_bh(&xs->lock);
 	mutex_lock(&bond->ipsec_lock);
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
 		if (ipsec->xs == xs) {
@@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
 		}
 	}
 	mutex_unlock(&bond->ipsec_lock);
+	spin_lock_bh(&xs->lock);
 }
 

What do you think?

Thanks
Hangbin
> 
> https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/900082/11-bond-ipsec-offload-sh/stderr
> 
> [  859.672652][    C3] bond_xfrm_update_stats: eth0 doesn't support xdo_dev_state_update_stats
> [  860.467189][ T8677] bond0: (slave eth0): link status definitely down, disabling slave
> [  860.467664][ T8677] bond0: (slave eth1): making interface the new active one
> [  860.831042][ T9677] bond_xfrm_update_stats: eth1 doesn't support xdo_dev_state_update_stats
> [  862.195271][ T9683] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562
> [  862.195880][ T9683] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9683, name: ip
> [  862.196189][ T9683] preempt_count: 201, expected: 0
> [  862.196396][ T9683] RCU nest depth: 0, expected: 0
> [  862.196591][ T9683] 2 locks held by ip/9683:
> [  862.196818][ T9683]  #0: ffff88800a829558 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x65/0x90 [xfrm_user]
> [  862.197264][ T9683]  #1: ffff88800f460548 (&x->lock){+.-.}-{3:3}, at: xfrm_state_flush+0x1b3/0x3a0
> [  862.197629][ T9683] CPU: 3 UID: 0 PID: 9683 Comm: ip Not tainted 6.13.0-rc1-virtme #1
> [  862.197967][ T9683] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  862.198204][ T9683] Call Trace:
> [  862.198352][ T9683]  <TASK>
> [  862.198458][ T9683]  dump_stack_lvl+0xb0/0xd0
> [  862.198659][ T9683]  __might_resched+0x2f8/0x530
> [  862.198852][ T9683]  ? kfree+0x2d/0x330
> [  862.199005][ T9683]  __mutex_lock+0xd9/0xbc0
> [  862.199202][ T9683]  ? ref_tracker_free+0x35e/0x910
> [  862.199401][ T9683]  ? bond_ipsec_del_sa+0x2c1/0x790
> [  862.199937][ T9683]  ? find_held_lock+0x2c/0x110
> [  862.200133][ T9683]  ? __pfx___mutex_lock+0x10/0x10
> [  862.200329][ T9683]  ? bond_ipsec_del_sa+0x280/0x790
> [  862.200519][ T9683]  ? xfrm_dev_state_delete+0x97/0x170
> [  862.200711][ T9683]  ? __xfrm_state_delete+0x681/0x8e0
> [  862.200907][ T9683]  ? xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user]
> [  862.201151][ T9683]  ? netlink_rcv_skb+0x130/0x360
> [  862.201347][ T9683]  ? xfrm_netlink_rcv+0x74/0x90 [xfrm_user]
> [  862.201587][ T9683]  ? netlink_unicast+0x44b/0x710
> [  862.201780][ T9683]  ? netlink_sendmsg+0x723/0xbe0
> [  862.201973][ T9683]  ? ____sys_sendmsg+0x7ac/0xa10
> [  862.202164][ T9683]  ? ___sys_sendmsg+0xee/0x170
> [  862.202355][ T9683]  ? __sys_sendmsg+0x109/0x1a0
> [  862.202546][ T9683]  ? do_syscall_64+0xc1/0x1d0
> [  862.202738][ T9683]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  862.202986][ T9683]  ? __pfx_nsim_ipsec_del_sa+0x10/0x10 [netdevsim]
> [  862.203251][ T9683]  ? bond_ipsec_del_sa+0x2c1/0x790
> [  862.203457][ T9683]  bond_ipsec_del_sa+0x2c1/0x790
> [  862.203648][ T9683]  ? __pfx_lock_acquire.part.0+0x10/0x10
> [  862.203845][ T9683]  ? __pfx_bond_ipsec_del_sa+0x10/0x10
> [  862.204034][ T9683]  ? do_raw_spin_lock+0x131/0x270
> [  862.204225][ T9683]  ? __pfx_do_raw_spin_lock+0x10/0x10
> [  862.204468][ T9683]  xfrm_dev_state_delete+0x97/0x170
> [  862.204665][ T9683]  __xfrm_state_delete+0x681/0x8e0
> [  862.204858][ T9683]  xfrm_state_flush+0x1bb/0x3a0
> [  862.205057][ T9683]  xfrm_flush_sa+0xf0/0x270 [xfrm_user]
> [  862.205290][ T9683]  ? __pfx_xfrm_flush_sa+0x10/0x10 [xfrm_user]
> [  862.205537][ T9683]  ? __nla_validate_parse+0x48/0x3d0
> [  862.205744][ T9683]  xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user]
> [  862.205985][ T9683]  ? __pfx___lock_release+0x10/0x10
> [  862.206174][ T9683]  ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
> [  862.206412][ T9683]  ? __pfx_validate_chain+0x10/0x10
> [  862.206614][ T9683]  ? hlock_class+0x4e/0x130
> [  862.206807][ T9683]  ? mark_lock+0x38/0x3e0
> [  862.206986][ T9683]  ? __mutex_trylock_common+0xfa/0x260
> [  862.207181][ T9683]  ? __pfx___mutex_trylock_common+0x10/0x10
> [  862.207425][ T9683]  netlink_rcv_skb+0x130/0x360
Jakub Kicinski Dec. 14, 2024, 3:31 a.m. UTC | #3
On Fri, 13 Dec 2024 07:18:08 +0000 Hangbin Liu wrote:
> On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
> > On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:  
> > > The first patch fixes the xfrm offload feature during setup active-backup
> > > mode. The second patch add a ipsec offload testing.  
> > 
> > Looks like the test is too good, is there a fix pending somewhere for
> > the BUG below? We can't merge the test before that:  
> 
> This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from
> spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock)
> for the xfrm state delete.
> 
> But I'm not sure if it's proper to release the spin lock in bond code.
> This seems too specific.
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 7daeab67e7b5..69563bc958ca 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>  	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
>  out:
>  	netdev_put(real_dev, &tracker);
> +	spin_unlock_bh(&xs->lock);
>  	mutex_lock(&bond->ipsec_lock);
>  	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>  		if (ipsec->xs == xs) {
> @@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>  		}
>  	}
>  	mutex_unlock(&bond->ipsec_lock);
> +	spin_lock_bh(&xs->lock);
>  }
>  
> 
> What do you think?

Re-locking doesn't look great, glancing at the code I don't see any
obvious better workarounds. Easiest fix would be to don't let the
drivers sleep in the callbacks and then we can go back to a spin lock.
Maybe nvidia people have better ideas, I'm not familiar with this
offload.
Hangbin Liu Jan. 2, 2025, 2:44 a.m. UTC | #4
On Fri, Dec 13, 2024 at 07:31:27PM -0800, Jakub Kicinski wrote:
> On Fri, 13 Dec 2024 07:18:08 +0000 Hangbin Liu wrote:
> > On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
> > > On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:  
> > > > The first patch fixes the xfrm offload feature during setup active-backup
> > > > mode. The second patch add a ipsec offload testing.  
> > > 
> > > Looks like the test is too good, is there a fix pending somewhere for
> > > the BUG below? We can't merge the test before that:  
> > 
> > This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from
> > spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock)
> > for the xfrm state delete.
> > 
> > But I'm not sure if it's proper to release the spin lock in bond code.
> > This seems too specific.
> > 
> > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > index 7daeab67e7b5..69563bc958ca 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> >  	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
> >  out:
> >  	netdev_put(real_dev, &tracker);
> > +	spin_unlock_bh(&xs->lock);
> >  	mutex_lock(&bond->ipsec_lock);
> >  	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> >  		if (ipsec->xs == xs) {
> > @@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> >  		}
> >  	}
> >  	mutex_unlock(&bond->ipsec_lock);
> > +	spin_lock_bh(&xs->lock);
> >  }
> >  
> > 
> > What do you think?
> 
> Re-locking doesn't look great, glancing at the code I don't see any
> obvious better workarounds. Easiest fix would be to don't let the
> drivers sleep in the callbacks and then we can go back to a spin lock.
> Maybe nvidia people have better ideas, I'm not familiar with this
> offload.

I don't know how to disable bonding sleeping since we use mutex_lock now.
Hi Jianbo, do you have any idea?

Thanks
Hangbin
Jianbo Liu Jan. 2, 2025, 3:33 a.m. UTC | #5
On 1/2/2025 10:44 AM, Hangbin Liu wrote:
> On Fri, Dec 13, 2024 at 07:31:27PM -0800, Jakub Kicinski wrote:
>> On Fri, 13 Dec 2024 07:18:08 +0000 Hangbin Liu wrote:
>>> On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
>>>> On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
>>>>> The first patch fixes the xfrm offload feature during setup active-backup
>>>>> mode. The second patch add a ipsec offload testing.
>>>>
>>>> Looks like the test is too good, is there a fix pending somewhere for
>>>> the BUG below? We can't merge the test before that:
>>>
>>> This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from
>>> spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock)
>>> for the xfrm state delete.
>>>
>>> But I'm not sure if it's proper to release the spin lock in bond code.
>>> This seems too specific.
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 7daeab67e7b5..69563bc958ca 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>>>   	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
>>>   out:
>>>   	netdev_put(real_dev, &tracker);
>>> +	spin_unlock_bh(&xs->lock);
>>>   	mutex_lock(&bond->ipsec_lock);
>>>   	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>>>   		if (ipsec->xs == xs) {
>>> @@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>>>   		}
>>>   	}
>>>   	mutex_unlock(&bond->ipsec_lock);
>>> +	spin_lock_bh(&xs->lock);
>>>   }
>>>   
>>>
>>> What do you think?
>>
>> Re-locking doesn't look great, glancing at the code I don't see any
>> obvious better workarounds. Easiest fix would be to don't let the
>> drivers sleep in the callbacks and then we can go back to a spin lock.
>> Maybe nvidia people have better ideas, I'm not familiar with this
>> offload.
> 
> I don't know how to disable bonding sleeping since we use mutex_lock now.
> Hi Jianbo, do you have any idea?
> 

I think we should allow drivers to sleep in the callbacks. So, maybe 
it's better to move driver's xdo_dev_state_delete out of state's spin lock.

Thanks!
Jianbo
Hangbin Liu Jan. 3, 2025, 11:05 a.m. UTC | #6
On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
> > > Re-locking doesn't look great, glancing at the code I don't see any
> > > obvious better workarounds. Easiest fix would be to don't let the
> > > drivers sleep in the callbacks and then we can go back to a spin lock.
> > > Maybe nvidia people have better ideas, I'm not familiar with this
> > > offload.
> > 
> > I don't know how to disable bonding sleeping since we use mutex_lock now.
> > Hi Jianbo, do you have any idea?
> > 
> 
> I think we should allow drivers to sleep in the callbacks. So, maybe it's
> better to move driver's xdo_dev_state_delete out of state's spin lock.

Thanks for the suggestion, let me have a try first.

Hangbin