[net-next,1/2] net: socionext: correctly recover txq after being full

Message ID 1544777941-24083-1-git-send-email-ilias.apalodimas@linaro.org
State New
Headers show
Series
  • [net-next,1/2] net: socionext: correctly recover txq after being full
Related show

Commit Message

Ilias Apalodimas Dec. 14, 2018, 8:59 a.m.
Running pktgen with packets sizes > 512b ends up in the interface Txq
getting stuck.
"netsec 522d0000.ethernet eth0: netsec_netdev_start_xmit: TxQFull!"
appears on dmesg but the interface never recovers. It requires an
ifconfig down/up to make the interface usable again.

The reason that triggers this, is a race condition between
.ndo_start_xmit and the napi completion. The available budget is
calculated first and indicates the queue is full. Due to a costly
netif_err() the queue is not stopped in time while the napi completion
runs, clears the irq and frees up descriptors, thus the queue never wakes
up again.

Fix this by moving the print after stopping the queue, make the print
ratelimited, add barriers and check for cleaned descriptors..

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

---
 drivers/net/ethernet/socionext/netsec.c | 56 ++++++++++++++++++++-----
 1 file changed, 45 insertions(+), 11 deletions(-)

-- 
2.19.1

Comments

David Miller Dec. 15, 2018, 9:15 p.m. | #1
From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Date: Fri, 14 Dec 2018 10:59:00 +0200

> Running pktgen with packets sizes > 512b ends up in the interface Txq

> getting stuck.

> "netsec 522d0000.ethernet eth0: netsec_netdev_start_xmit: TxQFull!"

> appears on dmesg but the interface never recovers. It requires an

> ifconfig down/up to make the interface usable again.

> 

> The reason that triggers this, is a race condition between

> .ndo_start_xmit and the napi completion. The available budget is

> calculated first and indicates the queue is full. Due to a costly

> netif_err() the queue is not stopped in time while the napi completion

> runs, clears the irq and frees up descriptors, thus the queue never wakes

> up again.

> 

> Fix this by moving the print after stopping the queue, make the print

> ratelimited, add barriers and check for cleaned descriptors..

> 

> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>


Applied.

Patch

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index bba9733b5119..584a6b3f6542 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -656,8 +656,13 @@  static int netsec_process_tx(struct netsec_priv *priv, int budget)
 		budget -= new;
 	} while (new);
 
-	if (done && netif_queue_stopped(ndev))
+	if (done && netif_queue_stopped(ndev)) {
+		/* Make sure we update the value, anyone stopping the queue
+		 * after this will read the proper consumer idx
+		 */
+		smp_wmb();
 		netif_wake_queue(ndev);
+	}
 
 	return done;
 }
@@ -877,6 +882,41 @@  static void netsec_set_tx_de(struct netsec_priv *priv,
 	dring->head = (dring->head + 1) % DESC_NUM;
 }
 
+static int netsec_desc_used(struct netsec_desc_ring *dring)
+{
+	int used;
+
+	if (dring->head >= dring->tail)
+		used = dring->head - dring->tail;
+	else
+		used = dring->head + DESC_NUM - dring->tail;
+
+	return used;
+}
+
+static int netsec_check_stop_tx(struct netsec_priv *priv, int used)
+{
+	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_TX];
+
+	/* keep tail from touching the queue */
+	if (DESC_NUM - used < 2) {
+		netif_stop_queue(priv->ndev);
+
+		/* Make sure we read the updated value in case
+		 * descriptors got freed
+		 */
+		smp_rmb();
+
+		used = netsec_desc_used(dring);
+		if (DESC_NUM - used < 2)
+			return NETDEV_TX_BUSY;
+
+		netif_wake_queue(priv->ndev);
+	}
+
+	return 0;
+}
+
 static netdev_tx_t netsec_netdev_start_xmit(struct sk_buff *skb,
 					    struct net_device *ndev)
 {
@@ -887,16 +927,10 @@  static netdev_tx_t netsec_netdev_start_xmit(struct sk_buff *skb,
 	u16 tso_seg_len = 0;
 	int filled;
 
-	/* differentiate between full/emtpy ring */
-	if (dring->head >= dring->tail)
-		filled = dring->head - dring->tail;
-	else
-		filled = dring->head + DESC_NUM - dring->tail;
-
-	if (DESC_NUM - filled < 2) { /* if less than 2 available */
-		netif_err(priv, drv, priv->ndev, "%s: TxQFull!\n", __func__);
-		netif_stop_queue(priv->ndev);
-		dma_wmb();
+	filled = netsec_desc_used(dring);
+	if (netsec_check_stop_tx(priv, filled)) {
+		net_warn_ratelimited("%s %s Tx queue full\n",
+				     dev_name(priv->dev), ndev->name);
 		return NETDEV_TX_BUSY;
 	}