mbox series

[net-next,0/3] macb: support the 2-deep Tx queue on at91

Message ID 20201011090944.10607-1-w@1wt.eu
Headers show
Series macb: support the 2-deep Tx queue on at91 | expand

Message

Willy Tarreau Oct. 11, 2020, 9:09 a.m. UTC
Hi,

while running some tests on my Breadbee board, I noticed poor network
Tx performance. I had a look at the driver (macb, at91ether variant)
and noticed that at91ether_start_xmit() immediately stops the queue
after sending a frame and waits for the interrupt to restart the queue,
causing a dead time after each packet is sent.

The AT91RM9200 datasheet states that the controller supports two frames,
one being sent and the other one being queued, so I performed minimal
changes to support this. The transmit performance on my board has
increased by 50% on medium-sized packets (HTTP traffic), and with large
packets I can now reach line rate.

Since this driver is shared by various platforms, I tried my best to
isolate and limit the changes as much as possible and I think it's pretty
reasonable as-is. I've run extensive tests and couldn't meet any
unexpected situation (no stall, overflow nor lockup).

There are 3 patches in this series. The first one adds the missing
interrupt flag for RM9200 (TBRE, indicating the tx buffer is willing
to take a new packet). The second one replaces the single skb with a
2-array and uses only index 0. It does no other change, this is just
to prepare the code for the third one. The third one implements the
queue. Packets are added at the tail of the queue, the queue is
stopped at 2 packets and the interrupt releases 0, 1 or 2 depending
on what the transmit status register reports.

Thanks,
Willy

Willy Tarreau (3):
  macb: add RM9200's interrupt flag TBRE
  macb: prepare at91 to use a 2-frame TX queue
  macb: support the two tx descriptors on at91rm9200

 drivers/net/ethernet/cadence/macb.h      | 10 ++--
 drivers/net/ethernet/cadence/macb_main.c | 66 ++++++++++++++++++------
 2 files changed, 56 insertions(+), 20 deletions(-)

Comments

Jakub Kicinski Oct. 14, 2020, 12:03 a.m. UTC | #1
On Sun, 11 Oct 2020 11:09:41 +0200 Willy Tarreau wrote:
> while running some tests on my Breadbee board, I noticed poor network

> Tx performance. I had a look at the driver (macb, at91ether variant)

> and noticed that at91ether_start_xmit() immediately stops the queue

> after sending a frame and waits for the interrupt to restart the queue,

> causing a dead time after each packet is sent.

> 

> The AT91RM9200 datasheet states that the controller supports two frames,

> one being sent and the other one being queued, so I performed minimal

> changes to support this. The transmit performance on my board has

> increased by 50% on medium-sized packets (HTTP traffic), and with large

> packets I can now reach line rate.

> 

> Since this driver is shared by various platforms, I tried my best to

> isolate and limit the changes as much as possible and I think it's pretty

> reasonable as-is. I've run extensive tests and couldn't meet any

> unexpected situation (no stall, overflow nor lockup).

> 

> There are 3 patches in this series. The first one adds the missing

> interrupt flag for RM9200 (TBRE, indicating the tx buffer is willing

> to take a new packet). The second one replaces the single skb with a

> 2-array and uses only index 0. It does no other change, this is just

> to prepare the code for the third one. The third one implements the

> queue. Packets are added at the tail of the queue, the queue is

> stopped at 2 packets and the interrupt releases 0, 1 or 2 depending

> on what the transmit status register reports.


LGTM. There's always a chance that this will make other 
designs explode, but short of someone from Cadence giving 
us a timely review we have only one way to find that out.. :)

Applied, thanks!
Willy Tarreau Oct. 14, 2020, 3:06 a.m. UTC | #2
On Tue, Oct 13, 2020 at 05:03:58PM -0700, Jakub Kicinski wrote:
> On Sun, 11 Oct 2020 11:09:41 +0200 Willy Tarreau wrote:
> > while running some tests on my Breadbee board, I noticed poor network
> > Tx performance. I had a look at the driver (macb, at91ether variant)
> > and noticed that at91ether_start_xmit() immediately stops the queue
> > after sending a frame and waits for the interrupt to restart the queue,
> > causing a dead time after each packet is sent.
> > 
> > The AT91RM9200 datasheet states that the controller supports two frames,
> > one being sent and the other one being queued, so I performed minimal
> > changes to support this. The transmit performance on my board has
> > increased by 50% on medium-sized packets (HTTP traffic), and with large
> > packets I can now reach line rate.
> > 
> > Since this driver is shared by various platforms, I tried my best to
> > isolate and limit the changes as much as possible and I think it's pretty
> > reasonable as-is. I've run extensive tests and couldn't meet any
> > unexpected situation (no stall, overflow nor lockup).
> > 
> > There are 3 patches in this series. The first one adds the missing
> > interrupt flag for RM9200 (TBRE, indicating the tx buffer is willing
> > to take a new packet). The second one replaces the single skb with a
> > 2-array and uses only index 0. It does no other change, this is just
> > to prepare the code for the third one. The third one implements the
> > queue. Packets are added at the tail of the queue, the queue is
> > stopped at 2 packets and the interrupt releases 0, 1 or 2 depending
> > on what the transmit status register reports.
> 
> LGTM. There's always a chance that this will make other 
> designs explode, but short of someone from Cadence giving 
> us a timely review we have only one way to find that out.. :)

Not that much in fact, given that the at91ether_* functions are only
used by AT91RM9200 (whose datasheet I used to do this) and Mstar which
I used for the tests. I initially wanted to get my old SAM9G20 board
to boot until I noticed that it doesn't even use the same set of
functions, so the potential victims are extremely limited :-)

> Applied, thanks!

Thank you!
Willy
Jakub Kicinski Oct. 14, 2020, 3:13 a.m. UTC | #3
On Wed, 14 Oct 2020 05:06:30 +0200 Willy Tarreau wrote:
> > LGTM. There's always a chance that this will make other 

> > designs explode, but short of someone from Cadence giving 

> > us a timely review we have only one way to find that out.. :)  

> 

> Not that much in fact, given that the at91ether_* functions are only

> used by AT91RM9200 (whose datasheet I used to do this) and Mstar which

> I used for the tests. I initially wanted to get my old SAM9G20 board

> to boot until I noticed that it doesn't even use the same set of

> functions, so the potential victims are extremely limited :-)


GTK :)
Alexandre Belloni Nov. 13, 2020, 10:03 a.m. UTC | #4
On 14/10/2020 05:06:30+0200, Willy Tarreau wrote:
> On Tue, Oct 13, 2020 at 05:03:58PM -0700, Jakub Kicinski wrote:

> > On Sun, 11 Oct 2020 11:09:41 +0200 Willy Tarreau wrote:

> > > while running some tests on my Breadbee board, I noticed poor network

> > > Tx performance. I had a look at the driver (macb, at91ether variant)

> > > and noticed that at91ether_start_xmit() immediately stops the queue

> > > after sending a frame and waits for the interrupt to restart the queue,

> > > causing a dead time after each packet is sent.

> > > 

> > > The AT91RM9200 datasheet states that the controller supports two frames,

> > > one being sent and the other one being queued, so I performed minimal

> > > changes to support this. The transmit performance on my board has

> > > increased by 50% on medium-sized packets (HTTP traffic), and with large

> > > packets I can now reach line rate.

> > > 

> > > Since this driver is shared by various platforms, I tried my best to

> > > isolate and limit the changes as much as possible and I think it's pretty

> > > reasonable as-is. I've run extensive tests and couldn't meet any

> > > unexpected situation (no stall, overflow nor lockup).

> > > 

> > > There are 3 patches in this series. The first one adds the missing

> > > interrupt flag for RM9200 (TBRE, indicating the tx buffer is willing

> > > to take a new packet). The second one replaces the single skb with a

> > > 2-array and uses only index 0. It does no other change, this is just

> > > to prepare the code for the third one. The third one implements the

> > > queue. Packets are added at the tail of the queue, the queue is

> > > stopped at 2 packets and the interrupt releases 0, 1 or 2 depending

> > > on what the transmit status register reports.

> > 

> > LGTM. There's always a chance that this will make other 

> > designs explode, but short of someone from Cadence giving 

> > us a timely review we have only one way to find that out.. :)

> 

> Not that much in fact, given that the at91ether_* functions are only

> used by AT91RM9200 (whose datasheet I used to do this) and Mstar which

> I used for the tests. I initially wanted to get my old SAM9G20 board

> to boot until I noticed that it doesn't even use the same set of

> functions, so the potential victims are extremely limited :-)

> 


I think I'm the only one booting recent linux kernels on at91rm9200 and
I'm currently stuck home while the board is at the office. I'll try to
test as soon as possible, which may not be before 2021... At least I'll
know who is the culprit ;)


-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Willy Tarreau Nov. 13, 2020, 7:44 p.m. UTC | #5
Hi Alexandre!

On Fri, Nov 13, 2020 at 11:03:59AM +0100, Alexandre Belloni wrote:
> I think I'm the only one booting recent linux kernels on at91rm9200 and

> I'm currently stuck home while the board is at the office. I'll try to

> test as soon as possible, which may not be before 2021... At least I'll

> know who is the culprit ;)


Oh that's great. I have a SAMG20 based one, which uses slightly different
registers and supports a tx ring, so initially I thought that I couldn't
test it there. But a friend of mine who wrote the drivers for FreeBSD
told me that the original driver still worked for the SAMG20, and I
suspect that despite not being mentioned in the datasheet, the more
recent chip still supports the old behavior, at least to ease the
transistion for their customers. So eventually I'll try it too.

In all transparency, I must tell you that I recently noticed an issue
when facing intense bidirectional traffic, eventually the chip would
report a TX overrun and would stop sending. I finally attributed this
to the unmerged changes needed for the MStar chip, which uses two 16
bit I/O accesses instead of a single 32-bit one, and which apparently
doesn't like being interrupted between the two when writing to the TAR
or TLEN registers. It took me the whole week-end to figure the root
cause so now I need to remove all my debugging code, address the issue
and test again. If I can't manage to fix this and you can't find a
moment to test on your board, I'll propose to revert my patch to stay
on the safe side, as I want at least one implementation to be 100%
reliable with it.

Cheers,
Willy