diff mbox series

[RFC] net: Avoid spurious rx_dropped increases with tap and rx_handler

Message ID 20200522011420.263574-1-bpoirier@cumulusnetworks.com
State New
Headers show
Series [RFC] net: Avoid spurious rx_dropped increases with tap and rx_handler | expand

Commit Message

Benjamin Poirier May 22, 2020, 1:14 a.m. UTC
Consider an skb which doesn't match a ptype_base/ptype_specific handler. If
this skb is delivered to a ptype_all handler, it does not count as a drop.
However, if the skb is also processed by an rx_handler which returns
RX_HANDLER_PASS, the frame is now counted as a drop because pt_prev was
reset. An example of this situation is an LLDP frame received on a bridge
port while lldpd is listening on a packet socket with ETH_P_ALL (ex. by
specifying `lldpd -c`).

Fix by adding an extra condition variable to record if the skb was
delivered to a packet tap before running an rx_handler.

The situation is similar for RX_HANDLER_EXACT frames so their accounting is
also changed. OTOH, the behavior is unchanged for RX_HANDLER_ANOTHER frames
- they are accounted according to what happens with the new skb->dev.

Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter")
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
---

The main problem (described above) involves RX_HANDLER_PASS and I think
it's clear that it needs fixing but I'm wondering if there are different
views on what the behavior should be in related cases.

For RX_HANDLER_ANOTHER,
Considering an example with bonding, currently, with a similar use case
as described for the bridge, an LLDP frame received on a bond slave is
counted as rx_dropped on the bond master, even if it was delivered to a
ptype_all tap bound to the slave. That seems a bit iffy to me but kind of
fair because it's counted as a drop _on the master_.

For RX_HANDLER_EXACT,
Also considering an example with bonding, currently, a unicast frame
received on a backup bond slave is counted as rx_nohandler on the slave,
even if it was delivered to a ptype_all tap. I'd say that's not correct
so the patch is changing that too.


Also, it looks to me like a better fix for these issues would be for
rx_handlers to avoid spuriously unsharing skbs in cases where they are not
trying to enqueue them (cf. commit 7b995651e373 "[BRIDGE]: Unshare skb upon
entry") and to return something like a pt_prev func in other cases so that
copies can be deferred. That looks like quite a bit more work though.


 net/core/dev.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index f36bd3b21997..13ff1933e791 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5061,10 +5061,10 @@  static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
 static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 				    struct packet_type **ppt_prev)
 {
+	bool deliver_exact = false, rx_tapped = false;
 	struct packet_type *ptype, *pt_prev;
 	rx_handler_func_t *rx_handler;
 	struct net_device *orig_dev;
-	bool deliver_exact = false;
 	int ret = NET_RX_DROP;
 	__be16 type;
 
@@ -5155,12 +5155,14 @@  static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 		if (pt_prev) {
 			ret = deliver_skb(skb, pt_prev, orig_dev);
 			pt_prev = NULL;
+			rx_tapped = true;
 		}
 		switch (rx_handler(&skb)) {
 		case RX_HANDLER_CONSUMED:
 			ret = NET_RX_SUCCESS;
 			goto out;
 		case RX_HANDLER_ANOTHER:
+			rx_tapped = false;
 			goto another_round;
 		case RX_HANDLER_EXACT:
 			deliver_exact = true;
@@ -5231,11 +5233,13 @@  static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 			goto drop;
 		*ppt_prev = pt_prev;
 	} else {
+		if (!rx_tapped) {
 drop:
-		if (!deliver_exact)
-			atomic_long_inc(&skb->dev->rx_dropped);
-		else
-			atomic_long_inc(&skb->dev->rx_nohandler);
+			if (!deliver_exact)
+				atomic_long_inc(&skb->dev->rx_dropped);
+			else
+				atomic_long_inc(&skb->dev->rx_nohandler);
+		}
 		kfree_skb(skb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)