diff mbox series

[rcf/patch] netpoll: Make it RT friendly

Message ID 773fd8246a1ec4ef79142d9e31b8ba4163a0d496.camel@gmx.de
State New
Headers show
Series [rcf/patch] netpoll: Make it RT friendly | expand

Commit Message

Mike Galbraith Nov. 19, 2021, 2:41 p.m. UTC
On Thu, 2021-11-18 at 17:34 +0100, Sebastian Andrzej Siewior wrote:
> Dear RT folks!
>
> I'm pleased to announce the v5.16-rc1-rt2 patch set.
>
> Changes since v5.16-rc1-rt1:
>
>   - Redo the delayed deallocation of the task-stack.
>
> Known issues
>      - netconsole triggers WARN.

The below seems to do the trick for netconsole in master.today-rt.

netpoll: Make it RT friendly

PREEMPT_RT cannot alloc/free memory when not preemptible, making
zap_completion_queue() disabling preemption across kfree() an
issue, as well as disabling IRQs asross __netpoll_send_skb(),
which leads to zap_completion_queue() and our pesky kfree().

We can let rcu_read_lock_bh() provide local exclusion for RT across
__netpoll_send_skb() (via softirq_ctrl.lock) instead of disabling
IRQs, and since zap_completion_queue() replaces sd->completion_queue
with IRQs disabled and makes a private copy, there's no need to
keep preemption disabled during traversal/freeing, so put_cpu_var()
before doing so.  Disable a couple warnings for RT, and we're done.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 net/core/netpoll.c |   23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

Comments

Sebastian Andrzej Siewior Dec. 15, 2021, 4:06 p.m. UTC | #1
On 2021-11-19 15:41:25 [+0100], Mike Galbraith wrote:
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -252,6 +252,7 @@ static void zap_completion_queue(void)
>  		clist = sd->completion_queue;
>  		sd->completion_queue = NULL;
>  		local_irq_restore(flags);
> +		put_cpu_var(softnet_data);
> 
>  		while (clist != NULL) {
>  			struct sk_buff *skb = clist;
> @@ -263,9 +264,8 @@ static void zap_completion_queue(void)
>  				__kfree_skb(skb);
>  			}
>  		}
> -	}
> -
> -	put_cpu_var(softnet_data);
> +	} else
> +		put_cpu_var(softnet_data);
>  }

Looking at the callers of zap_completion_queue() it seems that
get_cpu_var() could be replaced this_cpu_ptr() since the pointer is
stable at this point.

>  static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
> @@ -365,16 +366,22 @@ static netdev_tx_t __netpoll_send_skb(st
> 
>  netdev_tx_t netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
>  {
> -	unsigned long flags;
> +	unsigned long __maybe_unused flags;
>  	netdev_tx_t ret;
> 
>  	if (unlikely(!np)) {
>  		dev_kfree_skb_irq(skb);
>  		ret = NET_XMIT_DROP;
>  	} else {
> -		local_irq_save(flags);
> +		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> +			local_irq_save(flags);
> +		else
> +			rcu_read_lock_bh();
>  		ret = __netpoll_send_skb(np, skb);
> -		local_irq_restore(flags);
> +		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> +			local_irq_restore(flags);
> +		else
> +			rcu_read_unlock_bh();
>  	}
>  	return ret;
>  }

What is the context for netpoll_send_skb()? Why do we need to disable BH
+ RCU on RT?
If interrupts are never disabled, doesn't this break the assumption made
in netpoll_tx_running()?

queue_process() is also busted.

Sebastian
Mike Galbraith Dec. 15, 2021, 7:54 p.m. UTC | #2
On Wed, 2021-12-15 at 17:06 +0100, Sebastian Andrzej Siewior wrote:
>
> What is the context for netpoll_send_skb()? Why do we need to disable BH
> + RCU on RT?

Because rcu_dereference_bh() says it better be, but..

> If interrupts are never disabled, doesn't this break the assumption made
> in netpoll_tx_running()?
>
> queue_process() is also busted.

..that makes it kinda moot, and wasn't unexpected. It works, or at
least netconsole does, and lockdep was fine with it, but I figured it
unlikely to fly.  In fact, I had already taken the silence as a "Surely
you jest" or such.

	-Mike
Sebastian Andrzej Siewior Dec. 17, 2021, 1:59 p.m. UTC | #3
On 2021-12-15 20:54:40 [+0100], Mike Galbraith wrote:
>                   In fact, I had already taken the silence as a "Surely
> you jest" or such.

No, not really. netpoll is currently the least of my worries and I
wanted to finish my other network patches before I start wrapping my
mind around this in case something changes. But I have currently no idea
how to do netpoll run/ detect and so on.

> 	-Mike

Sebastian
Mike Galbraith Dec. 17, 2021, 4:41 p.m. UTC | #4
On Fri, 2021-12-17 at 14:59 +0100, Sebastian Andrzej Siewior wrote:
> On 2021-12-15 20:54:40 [+0100], Mike Galbraith wrote:
> >                   In fact, I had already taken the silence as a "Surely
> > you jest" or such.
>
> No, not really. netpoll is currently the least of my worries and I
> wanted to finish my other network patches before I start wrapping my
> mind around this in case something changes. But I have currently no idea
> how to do netpoll run/ detect and so on.

I was gonna toss a rock or two at those "and so on" bits since they're
wired into stuff my box uses regularly.. but I've yet to find a means
to make them be more that compiler fodder, so they're safe.

	-Mike
Mike Galbraith Dec. 21, 2021, 9:44 a.m. UTC | #5
On Fri, 2021-12-17 at 17:41 +0100, Mike Galbraith wrote:
> On Fri, 2021-12-17 at 14:59 +0100, Sebastian Andrzej Siewior wrote:
> > On 2021-12-15 20:54:40 [+0100], Mike Galbraith wrote:
> > >                   In fact, I had already taken the silence as a "Surely
> > > you jest" or such.
> >
> > No, not really. netpoll is currently the least of my worries and I
> > wanted to finish my other network patches before I start wrapping my
> > mind around this in case something changes. But I have currently no idea
> > how to do netpoll run/ detect and so on.
>
> I was gonna toss a rock or two at those "and so on" bits since they're
> wired into stuff my box uses regularly.. but I've yet to find a means
> to make them be more that compiler fodder, so they're safe.

Nah, I was just insufficiently bored.

It's surprising that patchlet worked at all with netpoll_tx_running()
being busted, but whatever, bridge now does its forwarding thing for RT
and !RT, transmitting twice per kmsg, though apparently only 1 actually
hits copper to fly over to console logger. See attached trace. Note
that for that trace I sabotaged it to alternately xmit via fallback.

This will likely end up being done differently (way prettier), but what
the heck, it beats a lump of coal in your xmas stocking.

	Ho Ho Hum :)

	-Mike

netpoll: Make it RT friendly

PREEMPT_RT cannot alloc/free memory when not preemptible, making
disabling of IRQs across transmission an issue for RT.

Use rcu_read_lock_bh() to provide local exclusion for RT (via
softirq_ctrl.lock) for normal and fallback transmission paths
instead of disabling IRQs. Since zap_completion_queue() callers
ensure pointer stability, replace get_cpu_var() therein with
this_cpu_ptr() to keep it preemptible across kfree().

Disable a couple warnings for RT, and we're done.

v0.kinda-works -> v1 feedback fixes:
    remove get_cpu_var() from zap_completion_queue().
    fix/test netpoll_tx_running() to work for RT/!RT.
    fix/test xmit fallback path for RT.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 include/linux/netpoll.h |    4 +++-
 net/core/netpoll.c      |   44 ++++++++++++++++++++++++++++++++------------
 2 files changed, 35 insertions(+), 13 deletions(-)

--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -89,9 +89,11 @@ static inline void netpoll_poll_unlock(v
 		smp_store_release(&napi->poll_owner, -1);
 }

+DECLARE_PER_CPU(int, _netpoll_tx_running);
+
 static inline bool netpoll_tx_running(struct net_device *dev)
 {
-	return irqs_disabled();
+	return this_cpu_read(_netpoll_tx_running);
 }

 #else
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -70,6 +70,27 @@ module_param(carrier_timeout, uint, 0644
 #define np_notice(np, fmt, ...)				\
 	pr_notice("%s: " fmt, np->name, ##__VA_ARGS__)

+DEFINE_PER_CPU(int, _netpoll_tx_running);
+EXPORT_PER_CPU_SYMBOL(_netpoll_tx_running);
+
+#define netpoll_tx_begin(flags)				\
+	do {						\
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))	\
+			local_irq_save(flags);		\
+		else					\
+			rcu_read_lock_bh();		\
+		this_cpu_write(_netpoll_tx_running, 1);	\
+	} while (0)
+
+#define netpoll_tx_end(flags)				\
+	do {						\
+		this_cpu_write(_netpoll_tx_running, 0);	\
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))	\
+			local_irq_restore(flags);	\
+		else					\
+			rcu_read_unlock_bh();		\
+	} while (0)
+
 static netdev_tx_t netpoll_start_xmit(struct sk_buff *skb,
 				      struct net_device *dev,
 				      struct netdev_queue *txq)
@@ -102,7 +123,7 @@ static void queue_process(struct work_st
 	struct netpoll_info *npinfo =
 		container_of(work, struct netpoll_info, tx_work.work);
 	struct sk_buff *skb;
-	unsigned long flags;
+	unsigned long __maybe_unused flags;

 	while ((skb = skb_dequeue(&npinfo->txq))) {
 		struct net_device *dev = skb->dev;
@@ -114,7 +135,7 @@ static void queue_process(struct work_st
 			continue;
 		}

-		local_irq_save(flags);
+		netpoll_tx_begin(flags);
 		/* check if skb->queue_mapping is still valid */
 		q_index = skb_get_queue_mapping(skb);
 		if (unlikely(q_index >= dev->real_num_tx_queues)) {
@@ -127,13 +148,13 @@ static void queue_process(struct work_st
 		    !dev_xmit_complete(netpoll_start_xmit(skb, dev, txq))) {
 			skb_queue_head(&npinfo->txq, skb);
 			HARD_TX_UNLOCK(dev, txq);
-			local_irq_restore(flags);
+			netpoll_tx_end(flags);

 			schedule_delayed_work(&npinfo->tx_work, HZ/10);
 			return;
 		}
 		HARD_TX_UNLOCK(dev, txq);
-		local_irq_restore(flags);
+		netpoll_tx_end(flags);
 	}
 }

@@ -243,7 +264,7 @@ static void refill_skbs(void)
 static void zap_completion_queue(void)
 {
 	unsigned long flags;
-	struct softnet_data *sd = &get_cpu_var(softnet_data);
+	struct softnet_data *sd = this_cpu_ptr(&softnet_data);

 	if (sd->completion_queue) {
 		struct sk_buff *clist;
@@ -264,8 +285,6 @@ static void zap_completion_queue(void)
 			}
 		}
 	}
-
-	put_cpu_var(softnet_data);
 }

 static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
@@ -314,7 +333,8 @@ static netdev_tx_t __netpoll_send_skb(st
 	/* It is up to the caller to keep npinfo alive. */
 	struct netpoll_info *npinfo;

-	lockdep_assert_irqs_disabled();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		lockdep_assert_irqs_disabled();

 	dev = np->dev;
 	npinfo = rcu_dereference_bh(dev->npinfo);
@@ -350,7 +370,7 @@ static netdev_tx_t __netpoll_send_skb(st
 			udelay(USEC_PER_POLL);
 		}

-		WARN_ONCE(!irqs_disabled(),
+		WARN_ONCE(!IS_ENABLED(CONFIG_PREEMPT_RT) && !irqs_disabled(),
 			"netpoll_send_skb_on_dev(): %s enabled interrupts in poll (%pS)\n",
 			dev->name, dev->netdev_ops->ndo_start_xmit);

@@ -365,16 +385,16 @@ static netdev_tx_t __netpoll_send_skb(st

 netdev_tx_t netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
 {
-	unsigned long flags;
+	unsigned long __maybe_unused flags;
 	netdev_tx_t ret;

 	if (unlikely(!np)) {
 		dev_kfree_skb_irq(skb);
 		ret = NET_XMIT_DROP;
 	} else {
-		local_irq_save(flags);
+		netpoll_tx_begin(flags);
 		ret = __netpoll_send_skb(np, skb);
-		local_irq_restore(flags);
+		netpoll_tx_end(flags);
 	}
 	return ret;
 }
# tracer: nop
#
   netconsole.sh-4041    [005] .......   189.802694: netpoll_setup+0x5/0x460: struct netpoll *np:ffff8881eea7f128
   netconsole.sh-4041    [005] .......   189.802706: __netpoll_setup+0x5/0x240: struct netpoll *np:ffff8881eea7f128, struct net_device *ndev:ffff888104c0a000
   netconsole.sh-4041    [005] .......   189.802707: __netpoll_setup+0x165/0x240: !ndev->npinfo ==> ops->ndo_netpoll_setup()
   netconsole.sh-4041    [005] .......   189.802708: __netpoll_setup+0x5/0x240: struct netpoll *np:ffff88818c733180, struct net_device *ndev:ffff888107de8000
   netconsole.sh-4041    [005] .......   189.802708: __netpoll_setup+0x1cb/0x240: !ndev->npinfo and !ops->ndo_netpoll_setup
   netconsole.sh-4041    [005] .......   189.802708: br_netpoll_setup+0x4b/0xa0 [bridge]: __br_netpoll_enable()
   netconsole.sh-4041    [005] .......   189.802709: br_netpoll_setup+0x69/0xa0 [bridge]: return:0
   netconsole.sh-4041    [005] .......   189.802709: netpoll_setup+0x1fe/0x460: __netpoll_setup() return:0
      pr/netcon0-4045    [006] .....13   189.803951: __br_forward+0x176/0x1b0 [bridge]: netpoll_tx_running() ==> br_netpoll_send_skb()
      pr/netcon0-4045    [006] .....13   189.803954: netpoll_send_skb+0x7f/0x250: netpoll_start_xmit() ==> schedule_delayed_work()
      pr/netcon0-4045    [006] .....13   189.803954: netpoll_send_skb+0x1c1/0x250: netpoll_start_xmit() ==> dev_xmit_complete()
      pr/netcon0-4045    [006] .....13   189.803957: __br_forward+0x176/0x1b0 [bridge]: netpoll_tx_running() ==> br_netpoll_send_skb()
      pr/netcon0-4045    [006] .....13   189.803957: netpoll_send_skb+0x7f/0x250: netpoll_start_xmit() ==> schedule_delayed_work()
      pr/netcon0-4045    [006] .....13   189.803957: netpoll_send_skb+0x1c1/0x250: netpoll_start_xmit() ==> dev_xmit_complete()
      pr/netcon0-4045    [006] .....13   189.803958: __br_forward+0x176/0x1b0 [bridge]: netpoll_tx_running() ==> br_netpoll_send_skb()
      pr/netcon0-4045    [006] .....13   189.803958: netpoll_send_skb+0x7f/0x250: netpoll_start_xmit() ==> schedule_delayed_work()
      pr/netcon0-4045    [006] .....13   189.803959: netpoll_send_skb+0x1c1/0x250: netpoll_start_xmit() ==> dev_xmit_complete()
     kworker/6:1-101     [006] .....13   189.803970: queue_process+0xc3/0x230: netpoll_start_xmit() ==> dev_xmit_complete()
     kworker/6:1-101     [006] .....13   189.803971: queue_process+0xc3/0x230: netpoll_start_xmit() ==> dev_xmit_complete()
     kworker/6:1-101     [006] .....13   189.803971: queue_process+0xc3/0x230: netpoll_start_xmit() ==> dev_xmit_complete()
      pr/netcon0-4045    [001] .....13   189.803978: __br_forward+0x176/0x1b0 [bridge]: netpoll_tx_running() ==> br_netpoll_send_skb()
      pr/netcon0-4045    [001] .....13   189.803981: netpoll_send_skb+0x7f/0x250: netpoll_start_xmit() ==> schedule_delayed_work()
      pr/netcon0-4045    [001] .....13   189.803981: netpoll_send_skb+0x1c1/0x250: netpoll_start_xmit() ==> dev_xmit_complete()
     kworker/1:0-25      [001] .....13   189.803986: queue_process+0xc3/0x230: netpoll_start_xmit() ==> dev_xmit_complete()
      pr/netcon0-4045    [006] .....13   189.803989: __br_forward+0x176/0x1b0 [bridge]: netpoll_tx_running() ==> br_netpoll_send_skb()
      pr/netcon0-4045    [006] .....13   189.803990: netpoll_send_skb+0x7f/0x250: netpoll_start_xmit() ==> schedule_delayed_work()
      pr/netcon0-4045    [006] .....13   189.803990: netpoll_send_skb+0x1c1/0x250: netpoll_start_xmit() ==> dev_xmit_complete()
     kworker/6:1-101     [006] .....13   189.803993: queue_process+0xc3/0x230: netpoll_start_xmit() ==> dev_xmit_complete()
      pr/netcon0-4045    [001] .....13   189.803998: __br_forward+0x176/0x1b0 [bridge]: netpoll_tx_running() ==> br_netpoll_send_skb()
      pr/netcon0-4045    [001] .....13   189.804000: netpoll_send_skb+0x7f/0x250: netpoll_start_xmit() ==> schedule_delayed_work()
      pr/netcon0-4045    [001] .....13   189.804000: netpoll_send_skb+0x1c1/0x250: netpoll_start_xmit() ==> dev_xmit_complete()
     kworker/1:0-25      [001] .....13   189.804003: queue_process+0xc3/0x230: netpoll_start_xmit() ==> dev_xmit_complete()
      pr/netcon0-4045    [006] .....13   189.804006: __br_forward+0x176/0x1b0 [bridge]: netpoll_tx_running() ==> br_netpoll_send_skb()
      pr/netcon0-4045    [006] .....13   189.804007: netpoll_send_skb+0x7f/0x250: netpoll_start_xmit() ==> schedule_delayed_work()
      pr/netcon0-4045    [006] .....13   189.804007: netpoll_send_skb+0x1c1/0x250: netpoll_start_xmit() ==> dev_xmit_complete()
     kworker/6:1-101     [006] .....13   189.804010: queue_process+0xc3/0x230: netpoll_start_xmit() ==> dev_xmit_complete()
      pr/netcon0-4045    [006] .....13   206.226295: __br_forward+0x176/0x1b0 [bridge]: netpoll_tx_running() ==> br_netpoll_send_skb()
      pr/netcon0-4045    [006] .....13   206.226299: netpoll_send_skb+0x7f/0x250: netpoll_start_xmit() ==> schedule_delayed_work()
      pr/netcon0-4045    [006] .....13   206.226299: netpoll_send_skb+0x1c1/0x250: netpoll_start_xmit() ==> dev_xmit_complete()
     kworker/6:1-101     [006] .....13   206.226304: queue_process+0xc3/0x230: netpoll_start_xmit() ==> dev_xmit_complete()
diff mbox series

Patch

--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -252,6 +252,7 @@  static void zap_completion_queue(void)
 		clist = sd->completion_queue;
 		sd->completion_queue = NULL;
 		local_irq_restore(flags);
+		put_cpu_var(softnet_data);

 		while (clist != NULL) {
 			struct sk_buff *skb = clist;
@@ -263,9 +264,8 @@  static void zap_completion_queue(void)
 				__kfree_skb(skb);
 			}
 		}
-	}
-
-	put_cpu_var(softnet_data);
+	} else
+		put_cpu_var(softnet_data);
 }

 static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
@@ -314,7 +314,8 @@  static netdev_tx_t __netpoll_send_skb(st
 	/* It is up to the caller to keep npinfo alive. */
 	struct netpoll_info *npinfo;

-	lockdep_assert_irqs_disabled();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		lockdep_assert_irqs_disabled();

 	dev = np->dev;
 	npinfo = rcu_dereference_bh(dev->npinfo);
@@ -350,7 +351,7 @@  static netdev_tx_t __netpoll_send_skb(st
 			udelay(USEC_PER_POLL);
 		}

-		WARN_ONCE(!irqs_disabled(),
+		WARN_ONCE(!IS_ENABLED(CONFIG_PREEMPT_RT) && !irqs_disabled(),
 			"netpoll_send_skb_on_dev(): %s enabled interrupts in poll (%pS)\n",
 			dev->name, dev->netdev_ops->ndo_start_xmit);

@@ -365,16 +366,22 @@  static netdev_tx_t __netpoll_send_skb(st

 netdev_tx_t netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
 {
-	unsigned long flags;
+	unsigned long __maybe_unused flags;
 	netdev_tx_t ret;

 	if (unlikely(!np)) {
 		dev_kfree_skb_irq(skb);
 		ret = NET_XMIT_DROP;
 	} else {
-		local_irq_save(flags);
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+			local_irq_save(flags);
+		else
+			rcu_read_lock_bh();
 		ret = __netpoll_send_skb(np, skb);
-		local_irq_restore(flags);
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+			local_irq_restore(flags);
+		else
+			rcu_read_unlock_bh();
 	}
 	return ret;
 }