diff mbox series

Revert "macb: support the two tx descriptors on at91rm9200"

Message ID 20201209184740.16473-1-w@1wt.eu
State New
Headers show
Series Revert "macb: support the two tx descriptors on at91rm9200" | expand

Commit Message

Willy Tarreau Dec. 9, 2020, 6:47 p.m. UTC
This reverts commit 0a4e9ce17ba77847e5a9f87eed3c0ba46e3f82eb.

The code was developed and tested on an MSC313E SoC, which seems to be
half-way between the AT91RM9200 and the AT91SAM9260 in that it supports
both the 2-descriptors mode and a Tx ring.

It turns out that after the code was merged I could notice that the
controller would sometimes lock up, and only when dealing with sustained
bidirectional transfers, in which case it would report a Tx overrun
condition right after having reported being ready, and will stop sending
even after the status is cleared (a down/up cycle fixes it though).

After adding lots of traces I couldn't spot a sequence pattern allowing
to predict that this situation would happen. The chip comes with no
documentation and other bits are often reported with no conclusive
pattern either.

It is possible that my change is wrong just like it is possible that
the controller on the chip is bogus or at least unpredictable based on
existing docs from other chips. I do not have an RM9200 at hand to test
at the moment and a few tests run on a more recent 9G20 indicate that
this code path cannot be used there to test the code on a 3rd platform.

Since the MSC313E works fine in the single-descriptor mode, and that
people using the old RM9200 very likely favor stability over performance,
better revert this patch until we can test it on the original platform
this part of the driver was written for. Note that the reverted patch
was actually tested on MSC313E.

Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
Cc: Daniel Palmer <daniel@0x0f.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/netdev/20201206092041.GA10646@1wt.eu/
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ethernet/cadence/macb.h      |  2 --
 drivers/net/ethernet/cadence/macb_main.c | 46 ++++++--------------------------
 2 files changed, 8 insertions(+), 40 deletions(-)

Comments

Daniel Palmer April 6, 2021, 10:04 a.m. UTC | #1
Hi Willy,

I've been messing with the SSD202D (sibling of the MSC313E) recently
and the ethernet performance was awful.
I remembered this revert and reverted it and it makes the ethernet
work pretty well.

So I would like to find some way of making this patch work and I did
some digging..

On Thu, 10 Dec 2020 at 03:47, Willy Tarreau <w@1wt.eu> wrote:
>

> This reverts commit 0a4e9ce17ba77847e5a9f87eed3c0ba46e3f82eb.

>

> The code was developed and tested on an MSC313E SoC, which seems to be

> half-way between the AT91RM9200 and the AT91SAM9260 in that it supports

> both the 2-descriptors mode and a Tx ring.


The MSC313E and later seem to have a hidden "TX ring" with 4 descriptors.
It looks like instead of implementing a ring in memory like the RX
side has and all other macb variations they decided to put a 4 entry
FIFO behind the TAR/TCR registers.
You can load TAR/TCR normally so it is backwards compatible to the
at91rm9200 but you can fill it with multiple frames to transmit
without waiting for TX to end.
There are some extra bits in TSR that seem to indicate how many frames
are still waiting to be transmitted and the way BNQ works is a little
different.

> It turns out that after the code was merged I could notice that the

> controller would sometimes lock up, and only when dealing with sustained

> bidirectional transfers, in which case it would report a Tx overrun

> condition right after having reported being ready, and will stop sending

> even after the status is cleared (a down/up cycle fixes it though).


I can reproduce this with iperf3's bidirectional mode without fail.

I hacked up the driver a bit so that the TX path sends each frame 6
times recording the TSR register before and after to see what is
happening:

# udhcpc
udhcpc: started, v1.31.1
[   12.944302] Doing phy power up
[   13.147460] macb 1f2a2000.emac eth0: PHY
[1f2a2000.emac-ffffffff:00] driver [msc313e phy] (irq=POLL)
[   13.156655] macb 1f2a2000.emac eth0: configuring for phy/mii link mode
udhcpc: sending discover
[   15.205691] macb 1f2a2000.emac eth0: Link is Up - 100Mbps/Full -
flow control off
[   15.213358] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   15.245235] pre0 41001fb8, post0 30001fb0 -> IDLETSR clear
[   15.249291] pre1 30001f90, post1 20001d90 -> FIFOIDLE1 clear
[   15.253331] pre2 20001d90, post2 10001990 -> FIFOIDLE2 clear
[   15.257385] pre3 10001990, post3 00001190 -> FIFOIDLE3 clear
[   15.261435] pre4 00001190, post4 f0000191 -> FIFOIDLE4 clear, OVR set
[   15.265485] pre5 f0000190, post5 f0000191 -> OVR set
[   15.269535] pre6 f0000190, post6 e0000181 -> OVR set, BNQ clear

There seems to be a FIFO empty counter in the top of the register but
this is totally undocumented.
There are two new status bits TBNQ and FBNQ at bits 7 and 8. I have no
idea what they mean.
Bits 9 through 12 are some new status bits that seem to show if a FIFO
slot is inuse.
I can't find any mention of these bits anywhere except the header of
the vendor driver so I think these are specific to MStar's macb.

The interesting part though is that BNQ does not get cleared until
multiple frames have been pushed in and after OVR is set.
I think this is what breaks your code when it runs on the MSC313E
version of MACB.

Anyhow. I'm working on a version of your patch that should work with
both the at91rm9200 and the MSC313E.

Thanks,

Daniel
Willy Tarreau April 7, 2021, 8:42 a.m. UTC | #2
Hi Daniel,

On Tue, Apr 06, 2021 at 07:04:58PM +0900, Daniel Palmer wrote:
> Hi Willy,

> 

> I've been messing with the SSD202D (sibling of the MSC313E) recently

> and the ethernet performance was awful.

> I remembered this revert and reverted it and it makes the ethernet

> work pretty well.


OK, that's reassuring, at least they use the same IP blocks.

> [   15.213358] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

> [   15.245235] pre0 41001fb8, post0 30001fb0 -> IDLETSR clear

> [   15.249291] pre1 30001f90, post1 20001d90 -> FIFOIDLE1 clear

> [   15.253331] pre2 20001d90, post2 10001990 -> FIFOIDLE2 clear

> [   15.257385] pre3 10001990, post3 00001190 -> FIFOIDLE3 clear

> [   15.261435] pre4 00001190, post4 f0000191 -> FIFOIDLE4 clear, OVR set

> [   15.265485] pre5 f0000190, post5 f0000191 -> OVR set

> [   15.269535] pre6 f0000190, post6 e0000181 -> OVR set, BNQ clear

> 

> There seems to be a FIFO empty counter in the top of the register but

> this is totally undocumented.


Yes that's extremely visible. I did notice some apparently random
values at the top of the register as well without being able to
determine what they were for, because I was pretty much convinced I
was facing a 2-deep queue only. You had a good idea to force it to
duplicate packets :-)

> There are two new status bits TBNQ and FBNQ at bits 7 and 8. I have no

> idea what they mean.


Maybe they're related to tx queue empty / tx queue full. Just guessing.
Since all these bits tend not to be reset until written to, I got confused
a few times trying to understand what they indicated.

> Bits 9 through 12 are some new status bits that seem to show if a FIFO

> slot is inuse.


Maybe indeed.

> I can't find any mention of these bits anywhere except the header of

> the vendor driver so I think these are specific to MStar's macb.

> The interesting part though is that BNQ does not get cleared until

> multiple frames have been pushed in and after OVR is set.

> I think this is what breaks your code when it runs on the MSC313E

> version of MACB.


Yes that's very possible, because the behavior was inconsistent with
what was documented for older chips. Also BNQ changed its meaning on
more recent chips, so it may very well have yet another one here.

> Anyhow. I'm working on a version of your patch that should work with

> both the at91rm9200 and the MSC313E.


Cool! Thanks for letting me know. If you need me to run some test, let
me know (just don't expect too short latency these days but I definitely
will test).

Cheers,
Willy
Alexandre Belloni April 7, 2021, 11:47 a.m. UTC | #3
Hi,

On 07/04/2021 10:42:07+0200, Willy Tarreau wrote:
> Hi Daniel,

> 

> On Tue, Apr 06, 2021 at 07:04:58PM +0900, Daniel Palmer wrote:

> > Hi Willy,

> > 

> > I've been messing with the SSD202D (sibling of the MSC313E) recently

> > and the ethernet performance was awful.

> > I remembered this revert and reverted it and it makes the ethernet

> > work pretty well.

> 

> OK, that's reassuring, at least they use the same IP blocks.

> 

> > [   15.213358] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

> > [   15.245235] pre0 41001fb8, post0 30001fb0 -> IDLETSR clear

> > [   15.249291] pre1 30001f90, post1 20001d90 -> FIFOIDLE1 clear

> > [   15.253331] pre2 20001d90, post2 10001990 -> FIFOIDLE2 clear

> > [   15.257385] pre3 10001990, post3 00001190 -> FIFOIDLE3 clear

> > [   15.261435] pre4 00001190, post4 f0000191 -> FIFOIDLE4 clear, OVR set

> > [   15.265485] pre5 f0000190, post5 f0000191 -> OVR set

> > [   15.269535] pre6 f0000190, post6 e0000181 -> OVR set, BNQ clear

> > 

> > There seems to be a FIFO empty counter in the top of the register but

> > this is totally undocumented.

> 

> Yes that's extremely visible. I did notice some apparently random

> values at the top of the register as well without being able to

> determine what they were for, because I was pretty much convinced I

> was facing a 2-deep queue only. You had a good idea to force it to

> duplicate packets :-)

> 

> > There are two new status bits TBNQ and FBNQ at bits 7 and 8. I have no

> > idea what they mean.

> 

> Maybe they're related to tx queue empty / tx queue full. Just guessing.

> Since all these bits tend not to be reset until written to, I got confused

> a few times trying to understand what they indicated.

> 

> > Bits 9 through 12 are some new status bits that seem to show if a FIFO

> > slot is inuse.

> 

> Maybe indeed.

> 

> > I can't find any mention of these bits anywhere except the header of

> > the vendor driver so I think these are specific to MStar's macb.

> > The interesting part though is that BNQ does not get cleared until

> > multiple frames have been pushed in and after OVR is set.

> > I think this is what breaks your code when it runs on the MSC313E

> > version of MACB.

> 

> Yes that's very possible, because the behavior was inconsistent with

> what was documented for older chips. Also BNQ changed its meaning on

> more recent chips, so it may very well have yet another one here.

> 

> > Anyhow. I'm working on a version of your patch that should work with

> > both the at91rm9200 and the MSC313E.

> 

> Cool! Thanks for letting me know. If you need me to run some test, let

> me know (just don't expect too short latency these days but I definitely

> will test).

> 


Note that I have my rm9200ek home now and I could also run some tests.
(I do hope it still boots a recent kernel).

-- 
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Daniel Palmer April 7, 2021, 11:54 a.m. UTC | #4
Hi Willy

On Wed, 7 Apr 2021 at 17:42, Willy Tarreau <w@1wt.eu> wrote:
> > There are two new status bits TBNQ and FBNQ at bits 7 and 8. I have no

> > idea what they mean.

>

> Maybe they're related to tx queue empty / tx queue full. Just guessing.

> Since all these bits tend not to be reset until written to, I got confused

> a few times trying to understand what they indicated.


In the vendor driver they have some weird logic that counts idle, bnq,
and these 6 new bits,
and if the count of set bits is bigger than 5 it's apparently OK to
setup another frame to send.
I guess it's some really optimized form of checking for the right
flags but it's over my head. :D

You have to check it's OK to send right before setting the frame
address/length otherwise you will get an overflow eventually.
So I think some of these bits are related to the state machine that is
handling popping frames out of the FIFO and the bits might change
between getting a TX complete interrupt and trying to queue another
frame. i.e. you can have a situation where there seems to be a free
slot in the irq handler but it's not actually safe to queue it when
xmit is called, and if you do an overflow happens and the TX locks up.

> > Anyhow. I'm working on a version of your patch that should work with

> > both the at91rm9200 and the MSC313E.

>

> Cool! Thanks for letting me know. If you need me to run some test, let

> me know (just don't expect too short latency these days but I definitely

> will test).


I got this working really well last night. 10+ hours with the iperf3
bidir test and it's still going. Before TX locked up within a few
seconds.
Once I've cleaned it up a bit more I'll push it to my mstar tree. I
don't think it can be mainlined yet as it's not usable without a bunch
of other bits.

Cheers,

Daniel
diff mbox series

Patch

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 85be045..9fbdf53 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -1216,8 +1216,6 @@  struct macb {
 
 	/* AT91RM9200 transmit queue (1 on wire + 1 queued) */
 	struct macb_tx_skb	rm9200_txq[2];
-	unsigned int		rm9200_tx_tail;
-	unsigned int		rm9200_tx_len;
 	unsigned int		max_tx_length;
 
 	u64			ethtool_stats[GEM_STATS_LEN + QUEUE_STATS_LEN * MACB_MAX_QUEUES];
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 817c7b0..352ae7f 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -3936,7 +3936,6 @@  static int at91ether_start(struct macb *lp)
 			     MACB_BIT(ISR_TUND)	|
 			     MACB_BIT(ISR_RLE)	|
 			     MACB_BIT(TCOMP)	|
-			     MACB_BIT(RM9200_TBRE)	|
 			     MACB_BIT(ISR_ROVR)	|
 			     MACB_BIT(HRESP));
 
@@ -3953,7 +3952,6 @@  static void at91ether_stop(struct macb *lp)
 			     MACB_BIT(ISR_TUND)	|
 			     MACB_BIT(ISR_RLE)	|
 			     MACB_BIT(TCOMP)	|
-			     MACB_BIT(RM9200_TBRE)	|
 			     MACB_BIT(ISR_ROVR) |
 			     MACB_BIT(HRESP));
 
@@ -4023,10 +4021,11 @@  static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb,
 					struct net_device *dev)
 {
 	struct macb *lp = netdev_priv(dev);
-	unsigned long flags;
 
-	if (lp->rm9200_tx_len < 2) {
-		int desc = lp->rm9200_tx_tail;
+	if (macb_readl(lp, TSR) & MACB_BIT(RM9200_BNQ)) {
+		int desc = 0;
+
+		netif_stop_queue(dev);
 
 		/* Store packet information (to free when Tx completed) */
 		lp->rm9200_txq[desc].skb = skb;
@@ -4040,15 +4039,6 @@  static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb,
 			return NETDEV_TX_OK;
 		}
 
-		spin_lock_irqsave(&lp->lock, flags);
-
-		lp->rm9200_tx_tail = (desc + 1) & 1;
-		lp->rm9200_tx_len++;
-		if (lp->rm9200_tx_len > 1)
-			netif_stop_queue(dev);
-
-		spin_unlock_irqrestore(&lp->lock, flags);
-
 		/* Set address of the data in the Transmit Address register */
 		macb_writel(lp, TAR, lp->rm9200_txq[desc].mapping);
 		/* Set length of the packet in the Transmit Control register */
@@ -4114,8 +4104,6 @@  static irqreturn_t at91ether_interrupt(int irq, void *dev_id)
 	struct macb *lp = netdev_priv(dev);
 	u32 intstatus, ctl;
 	unsigned int desc;
-	unsigned int qlen;
-	u32 tsr;
 
 	/* MAC Interrupt Status register indicates what interrupts are pending.
 	 * It is automatically cleared once read.
@@ -4127,39 +4115,21 @@  static irqreturn_t at91ether_interrupt(int irq, void *dev_id)
 		at91ether_rx(dev);
 
 	/* Transmit complete */
-	if (intstatus & (MACB_BIT(TCOMP) | MACB_BIT(RM9200_TBRE))) {
+	if (intstatus & MACB_BIT(TCOMP)) {
 		/* The TCOM bit is set even if the transmission failed */
 		if (intstatus & (MACB_BIT(ISR_TUND) | MACB_BIT(ISR_RLE)))
 			dev->stats.tx_errors++;
 
-		spin_lock(&lp->lock);
-
-		tsr = macb_readl(lp, TSR);
-
-		/* we have three possibilities here:
-		 *   - all pending packets transmitted (TGO, implies BNQ)
-		 *   - only first packet transmitted (!TGO && BNQ)
-		 *   - two frames pending (!TGO && !BNQ)
-		 * Note that TGO ("transmit go") is called "IDLE" on RM9200.
-		 */
-		qlen = (tsr & MACB_BIT(TGO)) ? 0 :
-			(tsr & MACB_BIT(RM9200_BNQ)) ? 1 : 2;
-
-		while (lp->rm9200_tx_len > qlen) {
-			desc = (lp->rm9200_tx_tail - lp->rm9200_tx_len) & 1;
+		desc = 0;
+		if (lp->rm9200_txq[desc].skb) {
 			dev_consume_skb_irq(lp->rm9200_txq[desc].skb);
 			lp->rm9200_txq[desc].skb = NULL;
 			dma_unmap_single(&lp->pdev->dev, lp->rm9200_txq[desc].mapping,
 					 lp->rm9200_txq[desc].size, DMA_TO_DEVICE);
 			dev->stats.tx_packets++;
 			dev->stats.tx_bytes += lp->rm9200_txq[desc].size;
-			lp->rm9200_tx_len--;
 		}
-
-		if (lp->rm9200_tx_len < 2 && netif_queue_stopped(dev))
-			netif_wake_queue(dev);
-
-		spin_unlock(&lp->lock);
+		netif_wake_queue(dev);
 	}
 
 	/* Work-around for EMAC Errata section 41.3.1 */