diff mbox series

[v3,net-next,10/12] net: bonding: ensure .ndo_get_stats64 can sleep

Message ID 20210107094951.1772183-11-olteanv@gmail.com
State Superseded
Headers show
Series None | expand

Commit Message

Vladimir Oltean Jan. 7, 2021, 9:49 a.m. UTC
From: Vladimir Oltean <vladimir.oltean@nxp.com>

There is an effort to convert .ndo_get_stats64 to sleepable context, and
for that to work, we need to prevent callers of dev_get_stats from using
atomic locking.

The bonding driver retrieves its statistics recursively from its lower
interfaces, with additional care to only count packets sent/received
while those lowers were actually enslaved to the bond - see commit
5f0c5f73e5ef ("bonding: make global bonding stats more reliable").

Since commit 87163ef9cda7 ("bonding: remove last users of bond->lock and
bond->lock itself"), the bonding driver uses the following protection
for its array of slaves: RCU for readers and rtnl_mutex for updaters.
This is not great because there is another movement [ somehow
simultaneous with the one to make .ndo_get_stats64 sleepable ] to reduce
driver usage of rtnl_mutex. This makes sense, because the rtnl_mutex has
become a very contended resource.

The aforementioned commit removed an interesting comment:

	/* [...] we can't hold bond->lock [...] because we'll
	 * deadlock. The only solution is to rely on the fact
	 * that we're under rtnl_lock here, and the slaves
	 * list won't change. This doesn't solve the problem
	 * of setting the slave's MTU while it is
	 * transmitting, but the assumption is that the base
	 * driver can handle that.
	 *
	 * TODO: figure out a way to safely iterate the slaves
	 * list, but without holding a lock around the actual
	 * call to the base driver.
	 */

The above summarizes pretty well the challenges we have with nested
bonding interfaces (bond over bond over bond over...), which need to be
addressed by a better locking scheme that also not relies on the bloated
rtnl_mutex.

Instead of using something as broad as the rtnl_mutex to ensure
serialization of updates to the slave array, we can reintroduce a
private mutex in the bonding driver, called slaves_lock.
This mutex circles the only updater, bond_update_slave_arr, and ensures
that whatever other readers want to see a consistent slave array, they
don't need to hold the rtnl_mutex for that.

Now _of_course_ I did not convert the entire driver to use
bond_for_each_slave protected by the bond->slaves_lock, and
rtnl_dereference to bond_dereference. I just started that process by
converting the one reader I needed: ndo_get_stats64. Not only is it nice
to not hold rtnl_mutex in .ndo_get_stats64, but it is also in fact
forbidden to do so (since top-level callers may hold netif_lists_lock,
which is a sub-lock of the rtnl_mutex, and therefore this would cause a
lock inversion and a deadlock).

To solve the nesting problem, the simple way is to not hold any locks
when recursing into the slave netdev operation, which is exactly the
approach that we take. We can "cheat" and use dev_hold to take a
reference on the slave net_device, which is enough to ensure that
netdev_wait_allrefs() waits until we finish, and the kernel won't fault.
However, the slave structure might no longer be valid, just its
associated net_device. That isn't a biggie. We just need to do some more
work to ensure that the slave exists after we took the statistics, and
if it still does, reapply the logic from Andy's commit 5f0c5f73e5ef.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
Changes in v3:
- Added kfree(dev_stats);
- Removed the now useless stats_lock (bond->bond_stats and
  slave->slave_stats are now protected by bond->slaves_lock)

Changes in v2:
Switched to the new scheme of holding just a refcnt to the slave
interfaces while recursing with dev_get_stats.

 drivers/net/bonding/bond_main.c | 118 +++++++++++++++-----------------
 include/net/bonding.h           |  49 ++++++++++++-
 2 files changed, 104 insertions(+), 63 deletions(-)

Comments

Eric Dumazet Jan. 7, 2021, 11:18 a.m. UTC | #1
On Thu, Jan 7, 2021 at 10:51 AM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
>

-
>  static void bond_get_stats(struct net_device *bond_dev,
>                            struct rtnl_link_stats64 *stats)
>  {
>         struct bonding *bond = netdev_priv(bond_dev);
> -       struct rtnl_link_stats64 temp;
> -       struct list_head *iter;
> -       struct slave *slave;
> -       int nest_level = 0;
> +       struct rtnl_link_stats64 *dev_stats;
> +       struct net_device **slaves;
> +       int i, res, num_slaves;
>
> +       res = bond_get_slave_arr(bond, &slaves, &num_slaves);
> +       if (res) {
> +               netdev_err(bond->dev,
> +                          "failed to allocate memory for slave array\n");
> +               return;
> +       }
>

What a mess really.

You chose to keep the assumption that ndo_get_stats() would not fail,
since we were providing the needed storage from callers.

If ndo_get_stats() are now allowed to sleep, and presumably allocate
memory, we need to make sure
we report potential errors back to the user.

I think your patch series is mostly good, but I would prefer not
hiding errors and always report them to user space.
And no, netdev_err() is not appropriate, we do not want tools to look
at syslog to guess something went wrong.

Last point about drivers having to go to slow path, talking to
firmware : Make sure that malicious/innocent users
reading /proc/net/dev from many threads in parallel wont brick these devices.

Maybe they implicitly _relied_ on the fact that firmware was gently
read every second and results were cached from a work queue or
something.

Thanks.
Eric Dumazet Jan. 7, 2021, 12:58 p.m. UTC | #2
On Thu, Jan 7, 2021 at 12:33 PM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> On Thu, Jan 07, 2021 at 12:18:28PM +0100, Eric Dumazet wrote:
> > What a mess really.
>
> Thanks, that's at least _some_ feedback :)

Yeah, I was on PTO for the last two weeks.

>
> > You chose to keep the assumption that ndo_get_stats() would not fail,
> > since we were providing the needed storage from callers.
> >
> > If ndo_get_stats() are now allowed to sleep, and presumably allocate
> > memory, we need to make sure
> > we report potential errors back to the user.
> >
> > I think your patch series is mostly good, but I would prefer not
> > hiding errors and always report them to user space.
> > And no, netdev_err() is not appropriate, we do not want tools to look
> > at syslog to guess something went wrong.
>
> Well, there are only 22 dev_get_stats callers in the kernel, so I assume
> that after the conversion to return void, I can do another conversion to
> return int, and then I can convert the ndo_get_stats64 method to return
> int too. I will keep the plain ndo_get_stats still void (no reason not
> to).
>
> > Last point about drivers having to go to slow path, talking to
> > firmware : Make sure that malicious/innocent users
> > reading /proc/net/dev from many threads in parallel wont brick these devices.
> >
> > Maybe they implicitly _relied_ on the fact that firmware was gently
> > read every second and results were cached from a work queue or
> > something.
>
> How? I don't understand how I can make sure of that.

Your patches do not attempt to change these drivers, but I guess your
cover letter might send to driver maintainers incentive to get rid of their
logic, that is all.

We might simply warn maintainers and ask them to test their future changes
with tests using 1000 concurrent theads reading /proc/net/dev

>
> There is an effort initiated by Jakub to standardize the ethtool
> statistics. My objection was that you can't expect that to happen unless
> dev_get_stats is sleepable just like ethtool -S is. So I think the same
> reasoning should apply to ethtool -S too, really.

I think we all agree on the principles, once we make sure to not
add more pressure on RTNL. It seems you addressed our feedback, all is fine.

Thanks.
Saeed Mahameed Jan. 8, 2021, 3:59 a.m. UTC | #3
On Thu, 2021-01-07 at 13:58 +0100, Eric Dumazet wrote:
> On Thu, Jan 7, 2021 at 12:33 PM Vladimir Oltean <olteanv@gmail.com>

> wrote:

> > On Thu, Jan 07, 2021 at 12:18:28PM +0100, Eric Dumazet wrote:

> > > What a mess really.

> > 

> > Thanks, that's at least _some_ feedback :)

> 

> Yeah, I was on PTO for the last two weeks.

> 

> > > You chose to keep the assumption that ndo_get_stats() would not

> > > fail,

> > > since we were providing the needed storage from callers.

> > > 

> > > If ndo_get_stats() are now allowed to sleep, and presumably

> > > allocate

> > > memory, we need to make sure

> > > we report potential errors back to the user.

> > > 

> > > I think your patch series is mostly good, but I would prefer not

> > > hiding errors and always report them to user space.

> > > And no, netdev_err() is not appropriate, we do not want tools to

> > > look

> > > at syslog to guess something went wrong.

> > 

> > Well, there are only 22 dev_get_stats callers in the kernel, so I

> > assume

> > that after the conversion to return void, I can do another

> > conversion to

> > return int, and then I can convert the ndo_get_stats64 method to

> > return

> > int too. I will keep the plain ndo_get_stats still void (no reason

> > not

> > to).

> > 

> > > Last point about drivers having to go to slow path, talking to

> > > firmware : Make sure that malicious/innocent users

> > > reading /proc/net/dev from many threads in parallel wont brick

> > > these devices.

> > > 

> > > Maybe they implicitly _relied_ on the fact that firmware was

> > > gently

> > > read every second and results were cached from a work queue or

> > > something.

> > 

> > How? I don't understand how I can make sure of that.

> 

> Your patches do not attempt to change these drivers, but I guess your

> cover letter might send to driver maintainers incentive to get rid of

> their

> logic, that is all.

> 

> We might simply warn maintainers and ask them to test their future

> changes

> with tests using 1000 concurrent theads reading /proc/net/dev

> 

> > There is an effort initiated by Jakub to standardize the ethtool

> > statistics. My objection was that you can't expect that to happen

> > unless

> > dev_get_stats is sleepable just like ethtool -S is. So I think the

> > same

> > reasoning should apply to ethtool -S too, really.

> 

> I think we all agree on the principles, once we make sure to not

> add more pressure on RTNL. It seems you addressed our feedback, all

> is fine.

> 


Eric, about two years ago you were totally against sleeping in
ndo_get_stats, what happened ? :) 
https://lore.kernel.org/netdev/4cc44e85-cb5e-502c-30f3-c6ea564fe9ac@gmail.com/

My approach to solve this was much simpler and didn't require  a new
mutex nor RTNL lock, all i did is to reduce the rcu critical section to
not include the call to the driver by simply holding the netdev via
dev_hold()
Vladimir Oltean Jan. 8, 2021, 8:57 a.m. UTC | #4
On Thu, Jan 07, 2021 at 07:59:37PM -0800, Saeed Mahameed wrote:
> On Thu, 2021-01-07 at 13:58 +0100, Eric Dumazet wrote:

> > On Thu, Jan 7, 2021 at 12:33 PM Vladimir Oltean <olteanv@gmail.com>

> > wrote:

> > > On Thu, Jan 07, 2021 at 12:18:28PM +0100, Eric Dumazet wrote:

> > > > What a mess really.

> > >

> > > Thanks, that's at least _some_ feedback :)

> >

> > Yeah, I was on PTO for the last two weeks.

> >

> > > > You chose to keep the assumption that ndo_get_stats() would not

> > > > fail,

> > > > since we were providing the needed storage from callers.

> > > >

> > > > If ndo_get_stats() are now allowed to sleep, and presumably

> > > > allocate

> > > > memory, we need to make sure

> > > > we report potential errors back to the user.

> > > >

> > > > I think your patch series is mostly good, but I would prefer not

> > > > hiding errors and always report them to user space.

> > > > And no, netdev_err() is not appropriate, we do not want tools to

> > > > look

> > > > at syslog to guess something went wrong.

> > >

> > > Well, there are only 22 dev_get_stats callers in the kernel, so I

> > > assume

> > > that after the conversion to return void, I can do another

> > > conversion to

> > > return int, and then I can convert the ndo_get_stats64 method to

> > > return

> > > int too. I will keep the plain ndo_get_stats still void (no reason

> > > not

> > > to).

> > >

> > > > Last point about drivers having to go to slow path, talking to

> > > > firmware : Make sure that malicious/innocent users

> > > > reading /proc/net/dev from many threads in parallel wont brick

> > > > these devices.

> > > >

> > > > Maybe they implicitly _relied_ on the fact that firmware was

> > > > gently

> > > > read every second and results were cached from a work queue or

> > > > something.

> > >

> > > How? I don't understand how I can make sure of that.

> >

> > Your patches do not attempt to change these drivers, but I guess your

> > cover letter might send to driver maintainers incentive to get rid of

> > their

> > logic, that is all.

> >

> > We might simply warn maintainers and ask them to test their future

> > changes

> > with tests using 1000 concurrent theads reading /proc/net/dev

> >

> > > There is an effort initiated by Jakub to standardize the ethtool

> > > statistics. My objection was that you can't expect that to happen

> > > unless

> > > dev_get_stats is sleepable just like ethtool -S is. So I think the

> > > same

> > > reasoning should apply to ethtool -S too, really.

> >

> > I think we all agree on the principles, once we make sure to not

> > add more pressure on RTNL. It seems you addressed our feedback, all

> > is fine.

> >

>

> Eric, about two years ago you were totally against sleeping in

> ndo_get_stats, what happened ? :)

> https://lore.kernel.org/netdev/4cc44e85-cb5e-502c-30f3-c6ea564fe9ac@gmail.com/


I believe that what is different this time is that DSA switches are
typically connected over a slow and bottlenecked bus (so periodic
driver-level readouts would only make things worse for phc2sys and
such other latency-sensitive programs), plus they are offloading
interfaces for forwarding (so software-based counters could never be
accurate). Support those, and supporting firmware-based high-speed
devices will come as a nice side-effect.
FWIW that discussion took place here:
https://patchwork.ozlabs.org/project/netdev/patch/20201125193740.36825-3-george.mccollister@gmail.com/

> My approach to solve this was much simpler and didn't require  a new

> mutex nor RTNL lock, all i did is to reduce the rcu critical section to

> not include the call to the driver by simply holding the netdev via

> dev_hold()


I feel this is a call for the bonding maintainers to make. If they're
willing to replace rtnl_dereference with bond_dereference throughout the
whole driver, and reduce other guys' amount of work when other NDOs
start losing the rtnl_mutex too, then I can't see what's wrong with my
approach (despite not being "as simple"). If they think that update-side
protection of the slaves array is just fine the way it is, then I
suppose that RCU protection + dev_hold is indeed all that I can do.
Eric Dumazet Jan. 8, 2021, 9:14 a.m. UTC | #5
On Fri, Jan 8, 2021 at 4:59 AM Saeed Mahameed <saeed@kernel.org> wrote:
>

> Eric, about two years ago you were totally against sleeping in

> ndo_get_stats, what happened ? :)

> https://lore.kernel.org/netdev/4cc44e85-cb5e-502c-30f3-c6ea564fe9ac@gmail.com/

>

> My approach to solve this was much simpler and didn't require  a new

> mutex nor RTNL lock, all i did is to reduce the rcu critical section to

> not include the call to the driver by simply holding the netdev via

> dev_hold()

>


Yeah, and how have you dealt with bonding at that time ?

Look, it seems to me Vladimir's work is more polished.

If you disagree, repost a rebased patch series so that we can
test/compare and choose the best solution.

And make sure to test it with LOCKDEP enabled ;)

Thanks.
Vladimir Oltean Jan. 8, 2021, 9:21 a.m. UTC | #6
On Fri, Jan 08, 2021 at 10:14:01AM +0100, Eric Dumazet wrote:
> If you disagree, repost a rebased patch series so that we can

> test/compare and choose the best solution.


I would rather use Saeed's time as a reviewer to my existing and current
patch set.
Eric Dumazet Jan. 8, 2021, 9:27 a.m. UTC | #7
On Fri, Jan 8, 2021 at 10:21 AM Vladimir Oltean <olteanv@gmail.com> wrote:
>

> On Fri, Jan 08, 2021 at 10:14:01AM +0100, Eric Dumazet wrote:

> > If you disagree, repost a rebased patch series so that we can

> > test/compare and choose the best solution.

>

> I would rather use Saeed's time as a reviewer to my existing and current

> patch set.


Yes, same feeling here, but Saeed brought back his own old
implementation, so maybe he does not feel the same way ?
Saeed Mahameed Jan. 8, 2021, 7:25 p.m. UTC | #8
On Fri, 2021-01-08 at 10:57 +0200, Vladimir Oltean wrote:
> On Thu, Jan 07, 2021 at 07:59:37PM -0800, Saeed Mahameed wrote:

> > On Thu, 2021-01-07 at 13:58 +0100, Eric Dumazet wrote:

> > > On Thu, Jan 7, 2021 at 12:33 PM Vladimir Oltean <

> > > olteanv@gmail.com>

> > > wrote:


...

> > > 

> > > > There is an effort initiated by Jakub to standardize the

> > > > ethtool

> > > > statistics. My objection was that you can't expect that to

> > > > happen

> > > > unless

> > > > dev_get_stats is sleepable just like ethtool -S is. So I think

> > > > the

> > > > same

> > > > reasoning should apply to ethtool -S too, really.

> > > 

> > > I think we all agree on the principles, once we make sure to not

> > > add more pressure on RTNL. It seems you addressed our feedback,

> > > all

> > > is fine.

> > > 

> > 

> > Eric, about two years ago you were totally against sleeping in

> > ndo_get_stats, what happened ? :)

> > https://lore.kernel.org/netdev/4cc44e85-cb5e-502c-30f3-c6ea564fe9ac@gmail.com/

> 

> I believe that what is different this time is that DSA switches are

> typically connected over a slow and bottlenecked bus (so periodic

> driver-level readouts would only make things worse for phc2sys and

> such other latency-sensitive programs), plus they are offloading

> interfaces for forwarding (so software-based counters could never be

> accurate). Support those, and supporting firmware-based high-speed

> devices will come as a nice side-effect.

> FWIW that discussion took place here:

> https://patchwork.ozlabs.org/project/netdev/patch/20201125193740.36825-3-george.mccollister@gmail.com/

> 


I understand the motivation and I agree with the concept, 
hence my patchset :)

> > My approach to solve this was much simpler and didn't require  a

> > new

> > mutex nor RTNL lock, all i did is to reduce the rcu critical

> > section to

> > not include the call to the driver by simply holding the netdev via

> > dev_hold()

> 

> I feel this is a call for the bonding maintainers to make. If they're

> willing to replace rtnl_dereference with bond_dereference throughout

> the

> whole driver, and reduce other guys' amount of work when other NDOs

> start losing the rtnl_mutex too, then I can't see what's wrong with

> my

> approach (despite not being "as simple"). If they think that update-

> side

> protection of the slaves array is just fine the way it is, then I

> suppose that RCU protection + dev_hold is indeed all that I can do.


To be honest i haven't really looked at your patches, I just quickly
went through them to get an idea of what you did, but let me take a
more careful look and will give my ultimate feedback.
Saeed Mahameed Jan. 8, 2021, 7:33 p.m. UTC | #9
On Fri, 2021-01-08 at 10:14 +0100, Eric Dumazet wrote:
> On Fri, Jan 8, 2021 at 4:59 AM Saeed Mahameed <saeed@kernel.org>

> wrote:

> > Eric, about two years ago you were totally against sleeping in

> > ndo_get_stats, what happened ? :)

> > https://lore.kernel.org/netdev/4cc44e85-cb5e-502c-30f3-c6ea564fe9ac@gmail.com/

> > 

> > My approach to solve this was much simpler and didn't require  a

> > new

> > mutex nor RTNL lock, all i did is to reduce the rcu critical

> > section to

> > not include the call to the driver by simply holding the netdev via

> > dev_hold()

> > 

> 

> Yeah, and how have you dealt with bonding at that time ?

> 


I needed to get the ack on the RFC first, imagine if I'd changed the
whole stack and then got a nack :)

> Look, it seems to me Vladimir's work is more polished.

> 

> If you disagree, repost a rebased patch series so that we can

> test/compare and choose the best solution.

> 


I will need to carefully look at Vladimir's series first.

> And make sure to test it with LOCKDEP enabled ;)

> 


Indeed :)
Saeed Mahameed Jan. 8, 2021, 7:38 p.m. UTC | #10
On Fri, 2021-01-08 at 10:27 +0100, Eric Dumazet wrote:
> On Fri, Jan 8, 2021 at 10:21 AM Vladimir Oltean <olteanv@gmail.com>

> wrote:

> > On Fri, Jan 08, 2021 at 10:14:01AM +0100, Eric Dumazet wrote:

> > > If you disagree, repost a rebased patch series so that we can

> > > test/compare and choose the best solution.

> > 

> > I would rather use Saeed's time as a reviewer to my existing and

> > current

> > patch set.

> 

> Yes, same feeling here, but Saeed brought back his own old

> implementation, so maybe he does not feel the same way ?


Agreed, Vladimir's work is more complete than mine, my work was just a
RFC to show the concept, it was far from being ready or testable with
some use cases such as bonding/bridge/ovs/etc...

Let me take a look at the current series, and if I see that the
rcu/dev_hold approach is more lightweight then i will suggest it to
Vladimir and he can make the final decision.
Vladimir Oltean Jan. 8, 2021, 8:20 p.m. UTC | #11
On Fri, Jan 08, 2021 at 11:38:57AM -0800, Saeed Mahameed wrote:
> Let me take a look at the current series, and if I see that the

> rcu/dev_hold approach is more lightweight then i will suggest it to

> Vladimir and he can make the final decision.


The last version does use temporary RCU protection. I decided to not
change anything about the locking architecture of the drivers.
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 714aa0e5d041..6e38079ea66d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3693,77 +3693,64 @@  static void bond_fold_stats(struct rtnl_link_stats64 *_res,
 	}
 }
 
-#ifdef CONFIG_LOCKDEP
-static int bond_get_lowest_level_rcu(struct net_device *dev)
-{
-	struct net_device *ldev, *next, *now, *dev_stack[MAX_NEST_DEV + 1];
-	struct list_head *niter, *iter, *iter_stack[MAX_NEST_DEV + 1];
-	int cur = 0, max = 0;
-
-	now = dev;
-	iter = &dev->adj_list.lower;
-
-	while (1) {
-		next = NULL;
-		while (1) {
-			ldev = netdev_next_lower_dev_rcu(now, &iter);
-			if (!ldev)
-				break;
-
-			next = ldev;
-			niter = &ldev->adj_list.lower;
-			dev_stack[cur] = now;
-			iter_stack[cur++] = iter;
-			if (max <= cur)
-				max = cur;
-			break;
-		}
-
-		if (!next) {
-			if (!cur)
-				return max;
-			next = dev_stack[--cur];
-			niter = iter_stack[cur];
-		}
-
-		now = next;
-		iter = niter;
-	}
-
-	return max;
-}
-#endif
-
 static void bond_get_stats(struct net_device *bond_dev,
 			   struct rtnl_link_stats64 *stats)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
-	struct rtnl_link_stats64 temp;
-	struct list_head *iter;
-	struct slave *slave;
-	int nest_level = 0;
+	struct rtnl_link_stats64 *dev_stats;
+	struct net_device **slaves;
+	int i, res, num_slaves;
 
+	res = bond_get_slave_arr(bond, &slaves, &num_slaves);
+	if (res) {
+		netdev_err(bond->dev,
+			   "failed to allocate memory for slave array\n");
+		return;
+	}
 
-	rcu_read_lock();
-#ifdef CONFIG_LOCKDEP
-	nest_level = bond_get_lowest_level_rcu(bond_dev);
-#endif
+	dev_stats = kcalloc(num_slaves, sizeof(*dev_stats), GFP_KERNEL);
+	if (!dev_stats) {
+		netdev_err(bond->dev,
+			   "failed to allocate memory for slave stats\n");
+		bond_put_slave_arr(slaves, num_slaves);
+		return;
+	}
+
+	/* Recurse with no locks taken */
+	for (i = 0; i < num_slaves; i++)
+		dev_get_stats(slaves[i], &dev_stats[i]);
+
+	/* When taking the slaves lock again, the new slave array might be
+	 * different from the original one.
+	 */
+	mutex_lock(&bond->slaves_lock);
 
-	spin_lock_nested(&bond->stats_lock, nest_level);
 	memcpy(stats, &bond->bond_stats, sizeof(*stats));
 
-	bond_for_each_slave_rcu(bond, slave, iter) {
-		dev_get_stats(slave->dev, &temp);
+	for (i = 0; i < num_slaves; i++) {
+		struct list_head *iter;
+		struct slave *slave;
 
-		bond_fold_stats(stats, &temp, &slave->slave_stats);
+		bond_for_each_slave(bond, slave, iter) {
+			if (slave->dev != slaves[i])
+				continue;
 
-		/* save off the slave stats for the next run */
-		memcpy(&slave->slave_stats, &temp, sizeof(temp));
+			bond_fold_stats(stats, &dev_stats[i],
+					&slave->slave_stats);
+
+			/* save off the slave stats for the next run */
+			memcpy(&slave->slave_stats, &dev_stats[i],
+			       sizeof(dev_stats[i]));
+			break;
+		}
 	}
 
 	memcpy(&bond->bond_stats, stats, sizeof(*stats));
-	spin_unlock(&bond->stats_lock);
-	rcu_read_unlock();
+
+	mutex_unlock(&bond->slaves_lock);
+
+	kfree(dev_stats);
+	bond_put_slave_arr(slaves, num_slaves);
 }
 
 static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd)
@@ -4287,11 +4274,11 @@  static void bond_set_slave_arr(struct bonding *bond,
 {
 	struct bond_up_slave *usable, *all;
 
-	usable = rtnl_dereference(bond->usable_slaves);
+	usable = bond_dereference(bond, bond->usable_slaves);
 	rcu_assign_pointer(bond->usable_slaves, usable_slaves);
 	kfree_rcu(usable, rcu);
 
-	all = rtnl_dereference(bond->all_slaves);
+	all = bond_dereference(bond, bond->all_slaves);
 	rcu_assign_pointer(bond->all_slaves, all_slaves);
 	kfree_rcu(all, rcu);
 }
@@ -4333,6 +4320,8 @@  int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 	WARN_ON(lockdep_is_held(&bond->mode_lock));
 #endif
 
+	mutex_lock(&bond->slaves_lock);
+
 	usable_slaves = kzalloc(struct_size(usable_slaves, arr,
 					    bond->slave_cnt), GFP_KERNEL);
 	all_slaves = kzalloc(struct_size(all_slaves, arr,
@@ -4376,17 +4365,22 @@  int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 	}
 
 	bond_set_slave_arr(bond, usable_slaves, all_slaves);
+
+	mutex_unlock(&bond->slaves_lock);
+
 	return ret;
 out:
 	if (ret != 0 && skipslave) {
-		bond_skip_slave(rtnl_dereference(bond->all_slaves),
+		bond_skip_slave(bond_dereference(bond, bond->all_slaves),
 				skipslave);
-		bond_skip_slave(rtnl_dereference(bond->usable_slaves),
+		bond_skip_slave(bond_dereference(bond, bond->usable_slaves),
 				skipslave);
 	}
 	kfree_rcu(all_slaves, rcu);
 	kfree_rcu(usable_slaves, rcu);
 
+	mutex_unlock(&bond->slaves_lock);
+
 	return ret;
 }
 
@@ -4699,6 +4693,7 @@  void bond_setup(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 
+	mutex_init(&bond->slaves_lock);
 	spin_lock_init(&bond->mode_lock);
 	bond->params = bonding_defaults;
 
@@ -5189,7 +5184,6 @@  static int bond_init(struct net_device *bond_dev)
 	if (!bond->wq)
 		return -ENOMEM;
 
-	spin_lock_init(&bond->stats_lock);
 	netdev_lockdep_set_classes(bond_dev);
 
 	list_add_tail(&bond->bond_list, &bn->dev_list);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index adc3da776970..146b46d276c0 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -222,7 +222,6 @@  struct bonding {
 	 * ALB mode (6) - to sync the use and modifications of its hash table
 	 */
 	spinlock_t mode_lock;
-	spinlock_t stats_lock;
 	u8	 send_peer_notif;
 	u8       igmp_retrans;
 #ifdef CONFIG_PROC_FS
@@ -249,6 +248,11 @@  struct bonding {
 #ifdef CONFIG_XFRM_OFFLOAD
 	struct xfrm_state *xs;
 #endif /* CONFIG_XFRM_OFFLOAD */
+
+	/* Protects the slave array. TODO: convert all instances of
+	 * rtnl_dereference to bond_dereference
+	 */
+	struct mutex slaves_lock;
 };
 
 #define bond_slave_get_rcu(dev) \
@@ -257,6 +261,9 @@  struct bonding {
 #define bond_slave_get_rtnl(dev) \
 	((struct slave *) rtnl_dereference(dev->rx_handler_data))
 
+#define bond_dereference(bond, p) \
+	rcu_dereference_protected(p, lockdep_is_held(&(bond)->slaves_lock))
+
 void bond_queue_slave_event(struct slave *slave);
 void bond_lower_state_changed(struct slave *slave);
 
@@ -449,6 +456,46 @@  static inline void bond_hw_addr_copy(u8 *dst, const u8 *src, unsigned int len)
 	memcpy(dst, src, len);
 }
 
+static inline int bond_get_slave_arr(struct bonding *bond,
+				     struct net_device ***slaves,
+				     int *num_slaves)
+{
+	struct net *net = dev_net(bond->dev);
+	struct list_head *iter;
+	struct slave *slave;
+	int i = 0;
+
+	mutex_lock(&bond->slaves_lock);
+
+	*slaves = kcalloc(bond->slave_cnt, sizeof(*slaves), GFP_KERNEL);
+	if (!(*slaves)) {
+		netif_lists_unlock(net);
+		return -ENOMEM;
+	}
+
+	bond_for_each_slave(bond, slave, iter) {
+		dev_hold(slave->dev);
+		*slaves[i++] = slave->dev;
+	}
+
+	*num_slaves = bond->slave_cnt;
+
+	mutex_unlock(&bond->slaves_lock);
+
+	return 0;
+}
+
+static inline void bond_put_slave_arr(struct net_device **slaves,
+				      int num_slaves)
+{
+	int i;
+
+	for (i = 0; i < num_slaves; i++)
+		dev_put(slaves[i]);
+
+	kfree(slaves);
+}
+
 #define BOND_PRI_RESELECT_ALWAYS	0
 #define BOND_PRI_RESELECT_BETTER	1
 #define BOND_PRI_RESELECT_FAILURE	2