stmmac/RTL8211F/Meson GXBB: TX throughput problems

Message ID 1478190980.6632.26.camel@baylibre.com
State New
Headers show

Commit Message

Jerome Brunet Nov. 3, 2016, 4:36 p.m.
On Sat, 2016-10-01 at 17:58 +0200, Martin Blumenstingl wrote:
> Hello Peppe,

> 

> On Mon, Sep 26, 2016 at 8:17 AM, Giuseppe CAVALLARO

> <peppe.cavallaro@st.com> wrote:

> > 

> > Hello André

> > 

> > On 9/17/2016 11:23 PM, André Roth wrote:

> > > 

> > > 

> > > 

> > > Hi all,

> > > 

> > > I have an odroid c2 board which shows this issue. No data is

> > > transmitted or received after a moment of intense tx traffic.

> > > Copying a

> > > 1GB file per scp from the board triggers it repeatedly.

> > > 

> > > The board has a stmmac - user ID: 0x11, Synopsys ID: 0x37.

> > > 

> > > When switching the network to 100Mb/s the copying does

> > > not seam to trigger the issue.

> > > 

> > > I've attached the ethtool statistics before and after the

> > > problem.

> > 

> > 

> > at first glance, it enters in EEE mode often in the ethtool.after.

> > On some platforms we met problems and it was necessary to disable

> > the

> > feature. Maybe, you can start looking at if this is true on yours.

> > We will see to provide a clean subset of patches to switch-on/off

> > it.

> I did some hacking in the stmmac driver to disable the LPI stuff (see

> the attachment)

> 

> Unfortunately this did not fix the problem.

> 

> I did not issue any ethtool commands not shown in the logs.

> Also I did not have time to change the AXI tuning / PBL value yet -

> so

> those are also untouched.

> 

> I will keep testing, but unfortunately my device is starting to fall

> apart (I sometimes have DDR initialization issues and u-boot fails to

> come up, oh dear...).


Hi all,

I did several tests on this issue with amlogic's S905 SoC (Synopsys MAC
- user ID: 0x11, Synopsys ID: 0x37.) 

With the OdroidC2 (PHY Realtek RTL8211F), EEE is on by default.
Just before launching iperf3, here are the ethtool stats regarding LPI:
     irq_tx_path_in_lpi_mode_n: 6
     irq_tx_path_exit_lpi_mode_n: 5
     irq_rx_path_in_lpi_mode_n: 76
     irq_rx_path_exit_lpi_mode_n: 75
     phy_eee_wakeup_error_n: 0

Sending data with iperf usually works for little while (between 0 and
10s)

# iperf3 -c 192.168.1.170 -p12345
Connecting to host 192.168.1.170, port 12345
local 192.168.1.30 port 54450 connected to 192.168.1.170 port 12345
Interval           Transfer     Bandwidth       Retr  Cwnd
0.00-1.00   sec   112 MBytes   938 Mbits/sec    0    409 KBytes       
1.00-2.00   sec   112 MBytes   940 Mbits/sec    0    426 KBytes       
2.00-3.00   sec   112 MBytes   939 Mbits/sec    0    426 KBytes       
3.00-4.00   sec   112 MBytes   940 Mbits/sec    0    426 KBytes       
4.00-5.00   sec   112 MBytes   940 Mbits/sec    0    426 KBytes       
5.00-6.00   sec   112 MBytes   939 Mbits/sec    0    426 KBytes       
6.00-7.00   sec  9.26 MBytes  77.6 Mbits/sec    2   1.41 KBytes       
7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
^C10.00-13.58  sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
Interval           Transfer     Bandwidth       Retr
0.00-13.58  sec   681 MBytes   421 Mbits/sec    4             sender
0.00-13.58  sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

iperf3 does not exit ant the link seems completely broken. We cannot
send or receive until the interface is brought down then up again.

Here are the LPI related stats after the test:
     irq_tx_path_in_lpi_mode_n: 48
     irq_tx_path_exit_lpi_mode_n: 48
     irq_rx_path_in_lpi_mode_n: 325
     irq_rx_path_exit_lpi_mode_n: 325
     phy_eee_wakeup_error_n: 0

Like Martin, I tried playing around with eee in stmmac, but I could not
improve the situation. Then I tried disabling EEE advertisement on the
PHY (patch attached). With this patch, iperf3 runs nicely for me.

This is what the folks of FreeBSD have done for the Same MAC/PHY
combination [0]

On the P200 Board (PHY Micrel KSZ9031), EEE is off by default. There is
no problem on this board right now. I tried to force the activation of
EEE on this board and ended up in the same situation as the OdroidC2
(link broken). The stats were a bit different though:
     irq_tx_path_in_lpi_mode_n: 28
     irq_tx_path_exit_lpi_mode_n: 28
     irq_rx_path_in_lpi_mode_n: 408
     irq_rx_path_exit_lpi_mode_n: 408
     phy_eee_wakeup_error_n: 5440

To everybody having similar issue with their OdroidC2, could you try
the attached patch and let us know if it changes anything for you ?

Peppe, Alexandre,
What is your view on this ? I'm not sure that removing EEE
advertisement is the right way to address the problem ?
Could it be an issue stmmac ?
If there is any other information / test which would help understand
the issue, please let me know.

Cheers

Jerome


[0] : https://github.com/freebsd/freebsd-base-graphics/commit/1f49e276c
3801545dc0a337792a5f07e6ad39c84
 

> _______________________________________________

> linux-amlogic mailing list

> linux-amlogic@lists.infradead.org

> http://lists.infradead.org/mailman/listinfo/linux-amlogic

Comments

Martin Blumenstingl Nov. 5, 2016, 12:20 p.m. | #1
On Thu, Nov 3, 2016 at 5:36 PM, Jerome Brunet <jbrunet@baylibre.com> wrote:
> Hi all,

>

> I did several tests on this issue with amlogic's S905 SoC (Synopsys MAC

> - user ID: 0x11, Synopsys ID: 0x37.)

>

> With the OdroidC2 (PHY Realtek RTL8211F), EEE is on by default.

> Just before launching iperf3, here are the ethtool stats regarding LPI:

>      irq_tx_path_in_lpi_mode_n: 6

>      irq_tx_path_exit_lpi_mode_n: 5

>      irq_rx_path_in_lpi_mode_n: 76

>      irq_rx_path_exit_lpi_mode_n: 75

>      phy_eee_wakeup_error_n: 0

>

> Sending data with iperf usually works for little while (between 0 and

> 10s)

>

> # iperf3 -c 192.168.1.170 -p12345

> Connecting to host 192.168.1.170, port 12345

> local 192.168.1.30 port 54450 connected to 192.168.1.170 port 12345

> Interval           Transfer     Bandwidth       Retr  Cwnd

> 0.00-1.00   sec   112 MBytes   938 Mbits/sec    0    409 KBytes

> 1.00-2.00   sec   112 MBytes   940 Mbits/sec    0    426 KBytes

> 2.00-3.00   sec   112 MBytes   939 Mbits/sec    0    426 KBytes

> 3.00-4.00   sec   112 MBytes   940 Mbits/sec    0    426 KBytes

> 4.00-5.00   sec   112 MBytes   940 Mbits/sec    0    426 KBytes

> 5.00-6.00   sec   112 MBytes   939 Mbits/sec    0    426 KBytes

> 6.00-7.00   sec  9.26 MBytes  77.6 Mbits/sec    2   1.41 KBytes

> 7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes

> 8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes

> ^C10.00-13.58  sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes

> - - - - - - - - - - - - - - - - - - - - - - - - -

> Interval           Transfer     Bandwidth       Retr

> 0.00-13.58  sec   681 MBytes   421 Mbits/sec    4             sender

> 0.00-13.58  sec  0.00 Bytes  0.00 bits/sec                  receiver

> iperf3: interrupt - the client has terminated

>

> iperf3 does not exit ant the link seems completely broken. We cannot

> send or receive until the interface is brought down then up again.

>

> Here are the LPI related stats after the test:

>      irq_tx_path_in_lpi_mode_n: 48

>      irq_tx_path_exit_lpi_mode_n: 48

>      irq_rx_path_in_lpi_mode_n: 325

>      irq_rx_path_exit_lpi_mode_n: 325

>      phy_eee_wakeup_error_n: 0

>

> Like Martin, I tried playing around with eee in stmmac, but I could not

> improve the situation. Then I tried disabling EEE advertisement on the

> PHY (patch attached). With this patch, iperf3 runs nicely for me.

>

> This is what the folks of FreeBSD have done for the Same MAC/PHY

> combination [0]

>

> On the P200 Board (PHY Micrel KSZ9031), EEE is off by default. There is

> no problem on this board right now. I tried to force the activation of

> EEE on this board and ended up in the same situation as the OdroidC2

> (link broken). The stats were a bit different though:

>      irq_tx_path_in_lpi_mode_n: 28

>      irq_tx_path_exit_lpi_mode_n: 28

>      irq_rx_path_in_lpi_mode_n: 408

>      irq_rx_path_exit_lpi_mode_n: 408

>      phy_eee_wakeup_error_n: 5440

>

> To everybody having similar issue with their OdroidC2, could you try

> the attached patch and let us know if it changes anything for you ?

I have tried this on my Vega S95 Meta clone - the patch can be found
here if anyone cares: [0].
Unfortunately this does not improve the situation for me:
# iperf3 --client 192.168.1.100
Connecting to host 192.168.1.100, port 5201
[  4] local 192.168.1.198 port 50720 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   178 KBytes  1.46 Mbits/sec   13   1.41 KBytes
[  4]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
[  4]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
[  4]   3.00-4.00   sec  5.66 KBytes  46.3 Kbits/sec    4   1.41 KBytes
[  4]   4.00-5.00   sec  63.6 KBytes   521 Kbits/sec    2   1.41 KBytes
[  4]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    4   1.41 KBytes
[  4]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    2   1.41 KBytes
[  4]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
[  4]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
[  4]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   247 KBytes   203 Kbits/sec   29             sender
[  4]   0.00-10.00  sec  90.5 KBytes  74.1 Kbits/sec                  receiver

iperf Done.
# iperf3 --client 192.168.1.100 -R
Connecting to host 192.168.1.100, port 5201
Reverse mode, remote host 192.168.1.100 is sending
[  4] local 192.168.1.198 port 50724 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   111 MBytes   930 Mbits/sec
[  4]   1.00-2.00   sec   111 MBytes   935 Mbits/sec
[  4]   2.00-3.00   sec   111 MBytes   934 Mbits/sec
[  4]   3.00-4.00   sec   111 MBytes   934 Mbits/sec
[  4]   4.00-5.00   sec   111 MBytes   934 Mbits/sec
[  4]   5.00-6.00   sec   111 MBytes   935 Mbits/sec
[  4]   6.00-7.00   sec   111 MBytes   935 Mbits/sec
[  4]   7.00-8.00   sec   111 MBytes   934 Mbits/sec
[  4]   8.00-9.00   sec   111 MBytes   934 Mbits/sec
[  4]   9.00-10.00  sec   111 MBytes   934 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.09 GBytes   936 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.09 GBytes   934 Mbits/sec                  receiver

iperf Done.
#

However, if I remove the realtek,disable-eee-* properties and use
max-speed = <100>; instead ethernet is working fine (but limited to
100Mbit/s obviously).

> Peppe, Alexandre,

> What is your view on this ? I'm not sure that removing EEE

> advertisement is the right way to address the problem ?

> Could it be an issue stmmac ?

> If there is any other information / test which would help understand

> the issue, please let me know.

I CC'ed the two FreeBSD developers to who added the corresponding
FreeBSD code. Maybe they could also explain why that change was
needed.
Peppe's and Alexandre's feedback will hopefully also lead to
improvements in the FreeBSD driver.


[0] https://paste.kde.org/p9lwilneh
André Roth Nov. 13, 2016, 7:20 p.m. | #2
Hi all,

> To everybody having similar issue with their OdroidC2, could you try

> the attached patch and let us know if it changes anything for you ?


I can confirm that this patch removes the problem, I have now constant
~300Mb/s in both directions without any issue ! :)

I used the v4.10/integ branch, which shows the problem without applying
this patch. 

Thanks for your work,

 André

Patch

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
index a45d1013c225..3cbeec63a439 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
@@ -127,3 +127,18 @@ 
 &usb1 {
 	status = "okay";
 };
+
+&ethmac {
+	phy-handle = <&eth_phy0>;
+
+	mdio {
+		compatible = "snps,dwmac-mdio";
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		eth_phy0: ethernet-phy@0 {
+			reg = <0>;
+			realtek,disable-eee-1000t;
+		};
+	};
+};
diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index aadd6e9f54ad..30e20ba10f45 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -15,6 +15,12 @@ 
  */
 #include <linux/phy.h>
 #include <linux/module.h>
+#include <linux/of.h>
+
+struct rtl8211f_phy_priv {
+	bool eee_1000_disable;
+	bool eee_100_disable;
+};
 
 #define RTL821x_PHYSR		0x11
 #define RTL821x_PHYSR_DUPLEX	0x2000
@@ -93,6 +99,25 @@  static int rtl8211f_config_intr(struct phy_device *phydev)
 	return err;
 }
 
+static void rtl8211f_force_eee(struct phy_device *phydev)
+{
+	struct rtl8211f_phy_priv *priv = phydev->priv;
+	u16 val;
+
+	if (priv->eee_1000_disable || priv->eee_100_disable) {
+		val = phy_read_mmd_indirect(phydev, MDIO_AN_EEE_ADV,
+					    MDIO_MMD_AN);
+
+		if (priv->eee_1000_disable)
+			val &= ~MDIO_AN_EEE_ADV_1000T;
+		if (priv->eee_100_disable)
+			val &= ~MDIO_AN_EEE_ADV_100TX;
+
+		phy_write_mmd_indirect(phydev, MDIO_AN_EEE_ADV,
+				       MDIO_MMD_AN, val);
+	}
+}
+
 static int rtl8211f_config_init(struct phy_device *phydev)
 {
 	int ret;
@@ -102,6 +127,8 @@  static int rtl8211f_config_init(struct phy_device *phydev)
 	if (ret < 0)
 		return ret;
 
+	rtl8211f_force_eee(phydev);
+
 	if (phydev->interface == PHY_INTERFACE_MODE_RGMII) {
 		/* enable TXDLY */
 		phy_write(phydev, RTL8211F_PAGE_SELECT, 0xd08);
@@ -115,6 +142,26 @@  static int rtl8211f_config_init(struct phy_device *phydev)
 	return 0;
 }
 
+static int rtl8211f_phy_probe(struct phy_device *phydev)
+{
+	struct device *dev = &phydev->mdio.dev;
+	struct device_node *of_node = dev->of_node;
+	struct rtl8211f_phy_priv *priv;
+
+	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	if (of_property_read_bool(of_node, "realtek,disable-eee-1000t"))
+		priv->eee_1000_disable= true;
+	if (of_property_read_bool(of_node, "realtek,disable-eee-100t"))
+		priv->eee_100_disable= true;
+
+	phydev->priv = priv;
+
+	return 0;
+}
+
 static struct phy_driver realtek_drvs[] = {
 	{
 		.phy_id         = 0x00008201,
@@ -164,6 +211,7 @@  static struct phy_driver realtek_drvs[] = {
 		.phy_id_mask	= 0x001fffff,
 		.features	= PHY_GBIT_FEATURES,
 		.flags		= PHY_HAS_INTERRUPT,
+		.probe		= &rtl8211f_phy_probe,
 		.config_aneg	= &genphy_config_aneg,
 		.config_init	= &rtl8211f_config_init,
 		.read_status	= &genphy_read_status,