diff mbox series

net: stmmac: fix MAC not working when system resume back with WoL enabled

Message ID 20210901090228.11308-1-qiangqing.zhang@nxp.com
State New
Headers show
Series net: stmmac: fix MAC not working when system resume back with WoL enabled | expand

Commit Message

Joakim Zhang Sept. 1, 2021, 9:02 a.m. UTC
We can reproduce this issue with below steps:
1) enable WoL on the host
2) host system suspended
3) remote client send out wakeup packets
We can see that host system resume back, but can't work, such as ping failed.

After a bit digging, this issue is introduced by the commit 46f69ded988d
("net: stmmac: Use resolved link config in mac_link_up()"), which use
the finalised link parameters in mac_link_up() rather than the
parameters in mac_config().

There are two scenarios for MAC suspend/resume:

1) MAC suspend with WoL disabled, stmmac_suspend() call
phylink_mac_change() to notify phylink machine that a change in MAC
state, then .mac_link_down callback would be invoked. Further, it will
call phylink_stop() to stop the phylink instance. When MAC resume back,
firstly phylink_start() is called to start the phylink instance, then
call phylink_mac_change() which will finally trigger phylink machine to
invoke .mac_config and .mac_link_up callback. All is fine since
configuration in these two callbacks will be initialized.

2) MAC suspend with WoL enabled, phylink_mac_change() will put link
down, but there is no phylink_stop() to stop the phylink instance, so it
will link up again, that means .mac_config and .mac_link_up would be
invoked before system suspended. After system resume back, it will do
DMA initialization and SW reset which let MAC lost the hardware setting
(i.e MAC_Configuration register(offset 0x0) is reset). Since link is up
before system suspended, so .mac_link_up would not be invoked after
system resume back, lead to there is no chance to initialize the
configuration in .mac_link_up callback, as a result, MAC can't work any
longer.

Above description is what I found when debug this issue, this patch is
just revert broken patch to workaround it, at least make MAC work when
system resume back with WoL enabled.

Said this is a workaround, since it has not resolve the issue completely.
I just move the speed/duplex/pause etc into .mac_config callback, there are
other configurations in .mac_link_up callback which also need to be
initialized to work for specific functions.

Fixes: 46f69ded988d ("net: stmmac: Use resolved link config in mac_link_up()")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
---

Broken patch cannot be reverted directly, so manually modified it.

I also tried to fix in other ways, but failed to find a better solution,
any suggestions would be appreciated. Thanks.

Joakim
---
 .../net/ethernet/stmicro/stmmac/stmmac_main.c | 98 +++++++++----------
 1 file changed, 49 insertions(+), 49 deletions(-)

Comments

Vladimir Oltean Sept. 1, 2021, 9:21 a.m. UTC | #1
On Wed, Sep 01, 2021 at 05:02:28PM +0800, Joakim Zhang wrote:
> We can reproduce this issue with below steps:
> 1) enable WoL on the host
> 2) host system suspended
> 3) remote client send out wakeup packets
> We can see that host system resume back, but can't work, such as ping failed.
> 
> After a bit digging, this issue is introduced by the commit 46f69ded988d
> ("net: stmmac: Use resolved link config in mac_link_up()"), which use
> the finalised link parameters in mac_link_up() rather than the
> parameters in mac_config().
> 
> There are two scenarios for MAC suspend/resume:
> 
> 1) MAC suspend with WoL disabled, stmmac_suspend() call
> phylink_mac_change() to notify phylink machine that a change in MAC
> state, then .mac_link_down callback would be invoked. Further, it will
> call phylink_stop() to stop the phylink instance. When MAC resume back,
> firstly phylink_start() is called to start the phylink instance, then
> call phylink_mac_change() which will finally trigger phylink machine to
> invoke .mac_config and .mac_link_up callback. All is fine since
> configuration in these two callbacks will be initialized.
> 
> 2) MAC suspend with WoL enabled, phylink_mac_change() will put link
> down, but there is no phylink_stop() to stop the phylink instance, so it
> will link up again, that means .mac_config and .mac_link_up would be
> invoked before system suspended. After system resume back, it will do
> DMA initialization and SW reset which let MAC lost the hardware setting
> (i.e MAC_Configuration register(offset 0x0) is reset). Since link is up
> before system suspended, so .mac_link_up would not be invoked after
> system resume back, lead to there is no chance to initialize the
> configuration in .mac_link_up callback, as a result, MAC can't work any
> longer.

Have you tried putting phylink_stop in .suspend, and phylink_start in .resume?

> 
> Above description is what I found when debug this issue, this patch is
> just revert broken patch to workaround it, at least make MAC work when
> system resume back with WoL enabled.
> 
> Said this is a workaround, since it has not resolve the issue completely.
> I just move the speed/duplex/pause etc into .mac_config callback, there are
> other configurations in .mac_link_up callback which also need to be
> initialized to work for specific functions.
> 
> Fixes: 46f69ded988d ("net: stmmac: Use resolved link config in mac_link_up()")
> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> ---
> 
> Broken patch cannot be reverted directly, so manually modified it.
> 
> I also tried to fix in other ways, but failed to find a better solution,
> any suggestions would be appreciated. Thanks.
> 
> Joakim

Do you know exactly why it used to work prior to this patch?
Joakim Zhang Sept. 1, 2021, 10:21 a.m. UTC | #2
Hi Russell,

> -----Original Message-----

> From: Russell King <linux@armlinux.org.uk>

> Sent: 2021年9月1日 17:14

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; andrew@lunn.ch;

> f.fainelli@gmail.com; hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> On Wed, Sep 01, 2021 at 05:02:28PM +0800, Joakim Zhang wrote:

> > We can reproduce this issue with below steps:

> > 1) enable WoL on the host

> > 2) host system suspended

> > 3) remote client send out wakeup packets We can see that host system

> > resume back, but can't work, such as ping failed.

> >

> > After a bit digging, this issue is introduced by the commit

> > 46f69ded988d

> > ("net: stmmac: Use resolved link config in mac_link_up()"), which use

> > the finalised link parameters in mac_link_up() rather than the

> > parameters in mac_config().

> >

> > There are two scenarios for MAC suspend/resume:

> >

> > 1) MAC suspend with WoL disabled, stmmac_suspend() call

> > phylink_mac_change() to notify phylink machine that a change in MAC

> > state, then .mac_link_down callback would be invoked. Further, it will

> > call phylink_stop() to stop the phylink instance. When MAC resume

> > back, firstly phylink_start() is called to start the phylink instance,

> > then call phylink_mac_change() which will finally trigger phylink

> > machine to invoke .mac_config and .mac_link_up callback. All is fine

> > since configuration in these two callbacks will be initialized.

> >

> > 2) MAC suspend with WoL enabled, phylink_mac_change() will put link

> > down, but there is no phylink_stop() to stop the phylink instance, so

> > it will link up again, that means .mac_config and .mac_link_up would

> > be invoked before system suspended. After system resume back, it will

> > do DMA initialization and SW reset which let MAC lost the hardware

> > setting (i.e MAC_Configuration register(offset 0x0) is reset). Since

> > link is up before system suspended, so .mac_link_up would not be

> > invoked after system resume back, lead to there is no chance to

> > initialize the configuration in .mac_link_up callback, as a result,

> > MAC can't work any longer.

> >

> > Above description is what I found when debug this issue, this patch is

> > just revert broken patch to workaround it, at least make MAC work when

> > system resume back with WoL enabled.

> >

> > Said this is a workaround, since it has not resolve the issue completely.

> > I just move the speed/duplex/pause etc into .mac_config callback,

> > there are other configurations in .mac_link_up callback which also

> > need to be initialized to work for specific functions.

> 

> NAK. Please read the phylink documentation. speed/duplex/pause is undefined

> in .mac_config.


Speed/duplex/pause also the field of " struct phylink_link_state", so these can be refered in .mac_config, please
see the link which stmmac did before:
https://elixir.bootlin.com/linux/v5.4.143/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L852


> I think the problem here is that you're not calling phylink_stop() when WoL is

> enabled, which means phylink will continue to maintain the state as per the

> hardware state, and phylib will continue to run its state machine reporting the

> link state to phylink.


Yes, I also tried do below code change, but the host would not be wakeup, phylink_stop() would
call phy_stop(), phylib would call phy_suspend() finally, it will not suspend phy if it detect WoL enabled,
so now I don't know why system can't be wakeup with this code change.

@@ -5374,7 +5374,6 @@ int stmmac_suspend(struct device *dev)
                rtnl_lock();
                if (device_may_wakeup(priv->device))
                        phylink_speed_down(priv->phylink, false);
-               phylink_stop(priv->phylink);
                rtnl_unlock();
                mutex_lock(&priv->lock);

@@ -5385,6 +5384,10 @@ int stmmac_suspend(struct device *dev)
        }
        mutex_unlock(&priv->lock);

+       rtnl_lock();
+       phylink_stop(priv->phylink);
+       rtnl_unlock();
+
        priv->speed = SPEED_UNKNOWN;
        return 0;
 }
@@ -5448,6 +5451,12 @@ int stmmac_resume(struct device *dev)
                pinctrl_pm_select_default_state(priv->device);
                if (priv->plat->clk_ptp_ref)
                        clk_prepare_enable(priv->plat->clk_ptp_ref);
+
+               rtnl_lock();
+               /* We may have called phylink_speed_down before */
+               phylink_speed_up(priv->phylink);
+               rtnl_unlock();
+
                /* reset the phy so that it's ready */
                if (priv->mii && priv->mdio_rst_after_resume)
                        stmmac_mdio_reset(priv->mii);
@@ -5461,13 +5470,9 @@ int stmmac_resume(struct device *dev)
                        return ret;
        }

-       if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
-               rtnl_lock();
-               phylink_start(priv->phylink);
-               /* We may have called phylink_speed_down before */
-               phylink_speed_up(priv->phylink);
-               rtnl_unlock();
-       }
+       rtnl_lock();
+       phylink_start(priv->phylink);
+       rtnl_unlock();

        rtnl_lock();
        mutex_lock(&priv->lock);


> phylink_stop() (and therefore phy_stop()) should be called even if WoL is active

> to shut down this state reporting, as other network drivers do.


Ok, you mean that phylink_stop() also should be called even if WoL is active, I would look in this direction since
you are a professional.

Thanks.

Best Regards,
Joakim Zhang
Vladimir Oltean Sept. 1, 2021, 10:56 a.m. UTC | #3
On Wed, Sep 01, 2021 at 10:25:15AM +0000, Joakim Zhang wrote:
>
> Hi Vladimir,
>
> > -----Original Message-----
> > From: Vladimir Oltean <olteanv@gmail.com>
> > Sent: 2021年9月1日 17:22
> > To: Joakim Zhang <qiangqing.zhang@nxp.com>
> > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;
> > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;
> > mcoquelin.stm32@gmail.com; linux@armlinux.org.uk;
> > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;
> > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>
> > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume
> > back with WoL enabled
> >
> > On Wed, Sep 01, 2021 at 05:02:28PM +0800, Joakim Zhang wrote:
> > > We can reproduce this issue with below steps:
> > > 1) enable WoL on the host
> > > 2) host system suspended
> > > 3) remote client send out wakeup packets We can see that host system
> > > resume back, but can't work, such as ping failed.
> > >
> > > After a bit digging, this issue is introduced by the commit
> > > 46f69ded988d
> > > ("net: stmmac: Use resolved link config in mac_link_up()"), which use
> > > the finalised link parameters in mac_link_up() rather than the
> > > parameters in mac_config().
> > >
> > > There are two scenarios for MAC suspend/resume:
> > >
> > > 1) MAC suspend with WoL disabled, stmmac_suspend() call
> > > phylink_mac_change() to notify phylink machine that a change in MAC
> > > state, then .mac_link_down callback would be invoked. Further, it will
> > > call phylink_stop() to stop the phylink instance. When MAC resume
> > > back, firstly phylink_start() is called to start the phylink instance,
> > > then call phylink_mac_change() which will finally trigger phylink
> > > machine to invoke .mac_config and .mac_link_up callback. All is fine
> > > since configuration in these two callbacks will be initialized.
> > >
> > > 2) MAC suspend with WoL enabled, phylink_mac_change() will put link
> > > down, but there is no phylink_stop() to stop the phylink instance, so
> > > it will link up again, that means .mac_config and .mac_link_up would
> > > be invoked before system suspended. After system resume back, it will
> > > do DMA initialization and SW reset which let MAC lost the hardware
> > > setting (i.e MAC_Configuration register(offset 0x0) is reset). Since
> > > link is up before system suspended, so .mac_link_up would not be
> > > invoked after system resume back, lead to there is no chance to
> > > initialize the configuration in .mac_link_up callback, as a result,
> > > MAC can't work any longer.
> >
> > Have you tried putting phylink_stop in .suspend, and phylink_start in .resume?
>
> Yes, I tried, but the system can't be wakeup with remote packets.
> Please see the code change.

That makes it a PHY driver issue then, I guess?
At least some PHY drivers avoid suspending when WoL is active, like lan88xx_suspend.
Even the phy_suspend function takes wol.wolopts into consideration
before proceeding to call the driver. What PHY driver is it?

> > Do you know exactly why it used to work prior to this patch?
>
> Yes, since it configures the MAC_CTRL_REG register in .mac_config callback,
> it will be called when system resume back with WoL enabled.
> https://elixir.bootlin.com/linux/v5.4.143/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L852
>
> If configure the MAC_CTRL_REG register in .mac_link_up callback, when system resume back with WoL active,
> .mac_link_up would not be called, so MAC can't work any longer.
> https://elixir.bootlin.com/linux/v5.14-rc7/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L1044

Ok, so it worked because phylink_mac_change triggers a phylink resolve,
and that function calls phylink_mac_config if the link is up (which it is),
but phylink_link_up only if the link state actually changed (which it did not).
So you are saying that the momentary link flap induced by phylink_mac_change(false),
which set pl->mac_link_dropped = true, all consumed itself _before_ the actual
suspend, and therefore does not help after the resume. Interesting behavior.
Bad assumption in the stmmac driver, if the intention was for the link
state change to be induced to phylink after the resume?
Joakim Zhang Sept. 1, 2021, 11:42 a.m. UTC | #4
Hi Vladimir,

> -----Original Message-----

> From: Vladimir Oltean <olteanv@gmail.com>

> Sent: 2021年9月1日 18:56

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com; linux@armlinux.org.uk;

> netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> On Wed, Sep 01, 2021 at 10:25:15AM +0000, Joakim Zhang wrote:

> >

> > Hi Vladimir,

> >

> > > -----Original Message-----

> > > From: Vladimir Oltean <olteanv@gmail.com>

> > > Sent: 2021年9月1日 17:22

> > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> > > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> > > mcoquelin.stm32@gmail.com; linux@armlinux.org.uk;

> > > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system

> > > resume back with WoL enabled

> > >

> > > On Wed, Sep 01, 2021 at 05:02:28PM +0800, Joakim Zhang wrote:

> > > > We can reproduce this issue with below steps:

> > > > 1) enable WoL on the host

> > > > 2) host system suspended

> > > > 3) remote client send out wakeup packets We can see that host

> > > > system resume back, but can't work, such as ping failed.

> > > >

> > > > After a bit digging, this issue is introduced by the commit

> > > > 46f69ded988d

> > > > ("net: stmmac: Use resolved link config in mac_link_up()"), which

> > > > use the finalised link parameters in mac_link_up() rather than the

> > > > parameters in mac_config().

> > > >

> > > > There are two scenarios for MAC suspend/resume:

> > > >

> > > > 1) MAC suspend with WoL disabled, stmmac_suspend() call

> > > > phylink_mac_change() to notify phylink machine that a change in

> > > > MAC state, then .mac_link_down callback would be invoked. Further,

> > > > it will call phylink_stop() to stop the phylink instance. When MAC

> > > > resume back, firstly phylink_start() is called to start the

> > > > phylink instance, then call phylink_mac_change() which will

> > > > finally trigger phylink machine to invoke .mac_config and

> > > > .mac_link_up callback. All is fine since configuration in these two callbacks

> will be initialized.

> > > >

> > > > 2) MAC suspend with WoL enabled, phylink_mac_change() will put

> > > > link down, but there is no phylink_stop() to stop the phylink

> > > > instance, so it will link up again, that means .mac_config and

> > > > .mac_link_up would be invoked before system suspended. After

> > > > system resume back, it will do DMA initialization and SW reset

> > > > which let MAC lost the hardware setting (i.e MAC_Configuration

> > > > register(offset 0x0) is reset). Since link is up before system

> > > > suspended, so .mac_link_up would not be invoked after system

> > > > resume back, lead to there is no chance to initialize the

> > > > configuration in .mac_link_up callback, as a result, MAC can't work any

> longer.

> > >

> > > Have you tried putting phylink_stop in .suspend, and phylink_start

> in .resume?

> >

> > Yes, I tried, but the system can't be wakeup with remote packets.

> > Please see the code change.

> 

> That makes it a PHY driver issue then, I guess?

> At least some PHY drivers avoid suspending when WoL is active, like

> lan88xx_suspend.

> Even the phy_suspend function takes wol.wolopts into consideration before

> proceeding to call the driver. What PHY driver is it?


I think it's not the PHY issue, since both STMMAC and FEC controllers on i.MX8MP use the same
PHY(Realtek RTL8211FD, drivers/net/phy/realtek.c), there is no issue with FEC.


> > > Do you know exactly why it used to work prior to this patch?

> >

> > Yes, since it configures the MAC_CTRL_REG register in .mac_config

> > callback, it will be called when system resume back with WoL enabled.

> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felix

> >

> ir.bootlin.com%2Flinux%2Fv5.4.143%2Fsource%2Fdrivers%2Fnet%2Fethernet%

> >

> 2Fstmicro%2Fstmmac%2Fstmmac_main.c%23L852&amp;data=04%7C01%7Cq

> iangqing

> > .zhang%40nxp.com%7C412a8b69c1244d4c4ab708d96d371e52%7C686ea1d3

> bc2b4c6f

> >

> a92cd99c5c301635%7C0%7C0%7C637660905771744718%7CUnknown%7CTW

> FpbGZsb3d8

> >

> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D

> %7C1

> >

> 000&amp;sdata=8ony5ZI%2BF31eDduK9Qh0CVIPDHE3EiBdnab6osiyUhc%3D&

> amp;res

> > erved=0

> >

> > If configure the MAC_CTRL_REG register in .mac_link_up callback, when

> > system resume back with WoL active, .mac_link_up would not be called, so

> MAC can't work any longer.

> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felix

> >

> ir.bootlin.com%2Flinux%2Fv5.14-rc7%2Fsource%2Fdrivers%2Fnet%2Fethernet

> > %2Fstmicro%2Fstmmac%2Fstmmac_main.c%23L1044&amp;data=04%7C01

> %7Cqiangqi

> >

> ng.zhang%40nxp.com%7C412a8b69c1244d4c4ab708d96d371e52%7C686ea1d

> 3bc2b4c

> >

> 6fa92cd99c5c301635%7C0%7C0%7C637660905771744718%7CUnknown%7CT

> WFpbGZsb3

> >

> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%

> 3D%7

> >

> C1000&amp;sdata=O%2B%2BUW01PUL4Tp1yt3kQ0bhWIo%2Buc37RFUENLcla

> C6AM%3D&a

> > mp;reserved=0

> 

> Ok, so it worked because phylink_mac_change triggers a phylink resolve, and

> that function calls phylink_mac_config if the link is up (which it is), but

> phylink_link_up only if the link state actually changed (which it did not).

> So you are saying that the momentary link flap induced by

> phylink_mac_change(false), which set pl->mac_link_dropped = true, all

> consumed itself _before_ the actual suspend, and therefore does not help after

> the resume. Interesting behavior.


Yes, what I have seen at my side is what you concluded.

> Bad assumption in the stmmac driver, if the intention was for the link state

> change to be induced to phylink after the resume?


Yes, I also think link state change should be captured after the resume, it's very strange that
link up again before suspended. You would see below log if I add no_console_suspend in cmdline.

root@imx8mpevk:~# ethtool -s eth1 wol g
[   76.309460] stmmac: wakeup enable
root@imx8mpevk:~# echo mem > /sys/power/state
[   83.278489] PM: suspend entry (deep)
[   83.285371] Filesystems sync: 0.003 seconds
[   83.293310] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   83.301833] OOM killer disabled.
[   83.305069] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   83.938416] imx-dwmac 30bf0000.ethernet eth1: Link is Down                                      -----> link down
[   83.945022] imx-dwmac 30bf0000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx              ----> link up
[   83.986280] PM: suspend devices took 0.672 seconds
[   83.994724] Disabling non-boot CPUs ...
[   83.999007] CPU1: shutdown
[   84.001727] psci: CPU1 killed (polled 0 ms)
[   84.007315] IRQ 14: no longer affine to CPU2
[   84.007451] CPU2: shutdown
[   84.014445] psci: CPU2 killed (polled 0 ms)
[   84.020220] CPU3: shutdown
[   84.022933] psci: CPU3 killed (polled 0 ms)

Best Regards,
Joakim Zhang
Russell King (Oracle) Sept. 1, 2021, 1:25 p.m. UTC | #5
On Wed, Sep 01, 2021 at 11:42:08AM +0000, Joakim Zhang wrote:
> Hi Vladimir,
> 
> > -----Original Message-----
> > From: Vladimir Oltean <olteanv@gmail.com>
> > Sent: 2021年9月1日 18:56
> > To: Joakim Zhang <qiangqing.zhang@nxp.com>
> > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;
> > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;
> > mcoquelin.stm32@gmail.com; linux@armlinux.org.uk;
> > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;
> > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>
> > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume
> > back with WoL enabled
> > 
> > On Wed, Sep 01, 2021 at 10:25:15AM +0000, Joakim Zhang wrote:
> > >
> > > Hi Vladimir,
> > >
> > > > -----Original Message-----
> > > > From: Vladimir Oltean <olteanv@gmail.com>
> > > > Sent: 2021年9月1日 17:22
> > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>
> > > > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;
> > > > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;
> > > > mcoquelin.stm32@gmail.com; linux@armlinux.org.uk;
> > > > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;
> > > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>
> > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system
> > > > resume back with WoL enabled
> > > >
> > > > On Wed, Sep 01, 2021 at 05:02:28PM +0800, Joakim Zhang wrote:
> > > > > We can reproduce this issue with below steps:
> > > > > 1) enable WoL on the host
> > > > > 2) host system suspended
> > > > > 3) remote client send out wakeup packets We can see that host
> > > > > system resume back, but can't work, such as ping failed.
> > > > >
> > > > > After a bit digging, this issue is introduced by the commit
> > > > > 46f69ded988d
> > > > > ("net: stmmac: Use resolved link config in mac_link_up()"), which
> > > > > use the finalised link parameters in mac_link_up() rather than the
> > > > > parameters in mac_config().
> > > > >
> > > > > There are two scenarios for MAC suspend/resume:
> > > > >
> > > > > 1) MAC suspend with WoL disabled, stmmac_suspend() call
> > > > > phylink_mac_change() to notify phylink machine that a change in
> > > > > MAC state, then .mac_link_down callback would be invoked. Further,
> > > > > it will call phylink_stop() to stop the phylink instance. When MAC
> > > > > resume back, firstly phylink_start() is called to start the
> > > > > phylink instance, then call phylink_mac_change() which will
> > > > > finally trigger phylink machine to invoke .mac_config and
> > > > > .mac_link_up callback. All is fine since configuration in these two callbacks
> > will be initialized.
> > > > >
> > > > > 2) MAC suspend with WoL enabled, phylink_mac_change() will put
> > > > > link down, but there is no phylink_stop() to stop the phylink
> > > > > instance, so it will link up again, that means .mac_config and
> > > > > .mac_link_up would be invoked before system suspended. After
> > > > > system resume back, it will do DMA initialization and SW reset
> > > > > which let MAC lost the hardware setting (i.e MAC_Configuration
> > > > > register(offset 0x0) is reset). Since link is up before system
> > > > > suspended, so .mac_link_up would not be invoked after system
> > > > > resume back, lead to there is no chance to initialize the
> > > > > configuration in .mac_link_up callback, as a result, MAC can't work any
> > longer.
> > > >
> > > > Have you tried putting phylink_stop in .suspend, and phylink_start
> > in .resume?
> > >
> > > Yes, I tried, but the system can't be wakeup with remote packets.
> > > Please see the code change.
> > 
> > That makes it a PHY driver issue then, I guess?
> > At least some PHY drivers avoid suspending when WoL is active, like
> > lan88xx_suspend.
> > Even the phy_suspend function takes wol.wolopts into consideration before
> > proceeding to call the driver. What PHY driver is it?
> 
> I think it's not the PHY issue, since both STMMAC and FEC controllers on i.MX8MP use the same
> PHY(Realtek RTL8211FD, drivers/net/phy/realtek.c), there is no issue with FEC.

Note that FEC calls phylink_stop() in fec_suspend() if the net device
was up. So that kind of rules out phylink and phylib too... and
points towards stmmac doing something it shouldn't.

> > Bad assumption in the stmmac driver, if the intention was for the link state
> > change to be induced to phylink after the resume?
> 
> Yes, I also think link state change should be captured after the resume, it's very strange that
> link up again before suspended. You would see below log if I add no_console_suspend in cmdline.

... because phylink_mac_change() is not supposed to be used to force
the link down. I can't say this loudly enough: Read the documentation.
I don't write it just for my pleasure, it's there to help others get
stuff correct. If people aren't going to read it, I might as well not
waste the time writing it.

 * phylink_mac_change() - notify phylink of a change in MAC state
 * @pl: a pointer to a &struct phylink returned from phylink_create()
 * @up: indicates whether the link is currently up.
 *
 * The MAC driver should call this driver when the state of its link
 * changes (eg, link failure, new negotiation results, etc.)

Realise that "up" is there merely to capture that the link has gone
down - but by the time phylink reacts to that (which may be some time
*after* this call has been made - it is *not* synchronous since it's
meant to be called from an *interrupt*) the link state may well have
changed again. So, phylink will always recheck the link state with
up = false, so you _will_ get the link going down and then up.

In any case "change in MAC state" is only applicable when in in-band
mode, not in PHY mode, so you should not be calling this if you have
a PHY attached which isn't in in-band mode.

> 
> root@imx8mpevk:~# ethtool -s eth1 wol g
> [   76.309460] stmmac: wakeup enable

So you've asked it to wake on MagicPacket, which is WAKE_MAGIC. As
you got the message "wakeup enable" which is emitted by
stmmac_set_wol(), this will only be emitted if priv->plat->pmt() is
set. It will _not_ call phylink_ethtool_set_wol().

So, you are not using the PHY-based wake-on-lan, but the MAC based
wake-on-lan. This means you need to have the phy <-> mac link up
during suspend, and in that case, yes, you do not want to call
phylink_stop() or phylink_start().

I'm not sure what stmmac_pmt() does - thanks to the macro stuff, I'd
need to trace it through the driver and find out where that goes,
and also which variant of stmmac you're using... so without more
information I can't follow what the driver is doing.
Heiner Kallweit Sept. 1, 2021, 3:40 p.m. UTC | #6
On 01.09.2021 12:21, Joakim Zhang wrote:
> 
> Hi Russell,
> 
>> -----Original Message-----
>> From: Russell King <linux@armlinux.org.uk>
>> Sent: 2021年9月1日 17:14
>> To: Joakim Zhang <qiangqing.zhang@nxp.com>
>> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;
>> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;
>> mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; andrew@lunn.ch;
>> f.fainelli@gmail.com; hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>
>> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume
>> back with WoL enabled
>>
>> On Wed, Sep 01, 2021 at 05:02:28PM +0800, Joakim Zhang wrote:
>>> We can reproduce this issue with below steps:
>>> 1) enable WoL on the host
>>> 2) host system suspended
>>> 3) remote client send out wakeup packets We can see that host system
>>> resume back, but can't work, such as ping failed.
>>>
>>> After a bit digging, this issue is introduced by the commit
>>> 46f69ded988d
>>> ("net: stmmac: Use resolved link config in mac_link_up()"), which use
>>> the finalised link parameters in mac_link_up() rather than the
>>> parameters in mac_config().
>>>
>>> There are two scenarios for MAC suspend/resume:
>>>
>>> 1) MAC suspend with WoL disabled, stmmac_suspend() call
>>> phylink_mac_change() to notify phylink machine that a change in MAC
>>> state, then .mac_link_down callback would be invoked. Further, it will
>>> call phylink_stop() to stop the phylink instance. When MAC resume
>>> back, firstly phylink_start() is called to start the phylink instance,
>>> then call phylink_mac_change() which will finally trigger phylink
>>> machine to invoke .mac_config and .mac_link_up callback. All is fine
>>> since configuration in these two callbacks will be initialized.
>>>
>>> 2) MAC suspend with WoL enabled, phylink_mac_change() will put link
>>> down, but there is no phylink_stop() to stop the phylink instance, so
>>> it will link up again, that means .mac_config and .mac_link_up would
>>> be invoked before system suspended. After system resume back, it will
>>> do DMA initialization and SW reset which let MAC lost the hardware
>>> setting (i.e MAC_Configuration register(offset 0x0) is reset). Since
>>> link is up before system suspended, so .mac_link_up would not be
>>> invoked after system resume back, lead to there is no chance to
>>> initialize the configuration in .mac_link_up callback, as a result,
>>> MAC can't work any longer.
>>>
>>> Above description is what I found when debug this issue, this patch is
>>> just revert broken patch to workaround it, at least make MAC work when
>>> system resume back with WoL enabled.
>>>
>>> Said this is a workaround, since it has not resolve the issue completely.
>>> I just move the speed/duplex/pause etc into .mac_config callback,
>>> there are other configurations in .mac_link_up callback which also
>>> need to be initialized to work for specific functions.
>>
>> NAK. Please read the phylink documentation. speed/duplex/pause is undefined
>> in .mac_config.
> 
> Speed/duplex/pause also the field of " struct phylink_link_state", so these can be refered in .mac_config, please
> see the link which stmmac did before:
> https://elixir.bootlin.com/linux/v5.4.143/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L852
> 
> 
>> I think the problem here is that you're not calling phylink_stop() when WoL is
>> enabled, which means phylink will continue to maintain the state as per the
>> hardware state, and phylib will continue to run its state machine reporting the
>> link state to phylink.
> 
> Yes, I also tried do below code change, but the host would not be wakeup, phylink_stop() would
> call phy_stop(), phylib would call phy_suspend() finally, it will not suspend phy if it detect WoL enabled,
> so now I don't know why system can't be wakeup with this code change.
> 
Follow-up question would be whether link breaks accidentally on suspend, or whether
something fails on resume.When suspending, does the link break and link LEDs go off?
Depending on LED configuration you may also see whether link speed is reduced
on suspend.
struct net_device has a member wol_enabled, does it make a difference if set it?

> @@ -5374,7 +5374,6 @@ int stmmac_suspend(struct device *dev)
>                 rtnl_lock();
>                 if (device_may_wakeup(priv->device))
>                         phylink_speed_down(priv->phylink, false);
> -               phylink_stop(priv->phylink);
>                 rtnl_unlock();
>                 mutex_lock(&priv->lock);
> 
> @@ -5385,6 +5384,10 @@ int stmmac_suspend(struct device *dev)
>         }
>         mutex_unlock(&priv->lock);
> 
> +       rtnl_lock();
> +       phylink_stop(priv->phylink);
> +       rtnl_unlock();
> +
>         priv->speed = SPEED_UNKNOWN;
>         return 0;
>  }
> @@ -5448,6 +5451,12 @@ int stmmac_resume(struct device *dev)
>                 pinctrl_pm_select_default_state(priv->device);
>                 if (priv->plat->clk_ptp_ref)
>                         clk_prepare_enable(priv->plat->clk_ptp_ref);
> +
> +               rtnl_lock();
> +               /* We may have called phylink_speed_down before */
> +               phylink_speed_up(priv->phylink);
> +               rtnl_unlock();
> +
>                 /* reset the phy so that it's ready */
>                 if (priv->mii && priv->mdio_rst_after_resume)
>                         stmmac_mdio_reset(priv->mii);
> @@ -5461,13 +5470,9 @@ int stmmac_resume(struct device *dev)
>                         return ret;
>         }
> 
> -       if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> -               rtnl_lock();
> -               phylink_start(priv->phylink);
> -               /* We may have called phylink_speed_down before */
> -               phylink_speed_up(priv->phylink);
> -               rtnl_unlock();
> -       }
> +       rtnl_lock();
> +       phylink_start(priv->phylink);
> +       rtnl_unlock();
> 
>         rtnl_lock();
>         mutex_lock(&priv->lock);
> 
> 
>> phylink_stop() (and therefore phy_stop()) should be called even if WoL is active
>> to shut down this state reporting, as other network drivers do.
> 
> Ok, you mean that phylink_stop() also should be called even if WoL is active, I would look in this direction since
> you are a professional.
> 
> Thanks.
> 
> Best Regards,
> Joakim Zhang
>
Joakim Zhang Sept. 2, 2021, 7:01 a.m. UTC | #7
Hi Russell,

> -----Original Message-----

> From: Russell King <linux@armlinux.org.uk>

> Sent: 2021年9月1日 20:57

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; andrew@lunn.ch;

> f.fainelli@gmail.com; hkallweit1@gmail.com; dl-linux-imx

> <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> On Wed, Sep 01, 2021 at 10:21:59AM +0000, Joakim Zhang wrote:

> >

> > Hi Russell,

> >

> > > -----Original Message-----

> > > From: Russell King <linux@armlinux.org.uk>

> > > Sent: 2021年9月1日 17:14

> > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> > > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> > > mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; andrew@lunn.ch;

> > > f.fainelli@gmail.com; hkallweit1@gmail.com; dl-linux-imx

> > > <linux-imx@nxp.com>

> > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system

> > > resume back with WoL enabled

> > >

> > > NAK. Please read the phylink documentation. speed/duplex/pause is

> > > undefined in .mac_config.

> >

> > Speed/duplex/pause also the field of " struct phylink_link_state", so

> > these can be refered in .mac_config, please see the link which stmmac did

> before:

> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felix

> >

> ir.bootlin.com%2Flinux%2Fv5.4.143%2Fsource%2Fdrivers%2Fnet%2Fethernet%

> >

> 2Fstmicro%2Fstmmac%2Fstmmac_main.c%23L852&amp;data=04%7C01%7Cq

> iangqing

> > .zhang%40nxp.com%7C431c0f2b1b904ad6f1ec08d96d47f399%7C686ea1d3b

> c2b4c6f

> >

> a92cd99c5c301635%7C0%7C0%7C637660978077417864%7CUnknown%7CTW

> FpbGZsb3d8

> >

> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D

> %7C1

> >

> 000&amp;sdata=GOmisORcQtC4MvLbOD6s7wlfdMahhoVv6z79xsjeTrI%3D&a

> mp;reser

> > ved=0

> 

> The phylink documentation says:

> 

> /**

>  * mac_config() - configure the MAC for the selected mode and state

>  * @config: a pointer to a &struct phylink_config.

>  * @mode: one of %MLO_AN_FIXED, %MLO_AN_PHY, %MLO_AN_INBAND.

>  * @state: a pointer to a &struct phylink_link_state.

>  *

>  * Note - not all members of @state are valid.  In particular,

>  * @state->lp_advertising, @state->link, @state->an_complete are never

>  * guaranteed to be correct, and so any mac_config() implementation must

>  * never reference these fields.

> ...

>  * %MLO_AN_FIXED, %MLO_AN_PHY:

> ...

>  *   Older drivers (prior to the mac_link_up() change) may use

> @state->speed,

>  *   @state->duplex and @state->pause to configure the MAC, but this is

>  *   deprecated; such drivers should be converted to use mac_link_up().

>  *   Valid state members: interface, advertising.

>  *   Deprecated state members: speed, duplex, pause.

> ...

>  * %MLO_AN_INBAND:

> ...

>  *   Valid state members: interface, an_enabled, pause, advertising.

> 

> The reason for this is there have _always_ been code paths through phylink

> where particularly speed and duplex are _not_ _set_ according to the current

> link settings. For example, a call to ksettings_set.

> This is why I revised the interface so that mac_link_up() receives the link

> settings and depreciated these members in mac_config().

> 

> In any case, as can be seen from the documentation, speed and duplex have

> _never_ been valid when operating in inband mode in mac_config.


Ok, thanks for your detailed explanation.


> > > I think the problem here is that you're not calling phylink_stop()

> > > when WoL is enabled, which means phylink will continue to maintain

> > > the state as per the hardware state, and phylib will continue to run

> > > its state machine reporting the link state to phylink.

> >

> > Yes, I also tried do below code change, but the host would not be

> > wakeup, phylink_stop() would call phy_stop(), phylib would call

> > phy_suspend() finally, it will not suspend phy if it detect WoL enabled, so now

> I don't know why system can't be wakeup with this code change.

> >

> > @@ -5374,7 +5374,6 @@ int stmmac_suspend(struct device *dev)

> >                 rtnl_lock();

> >                 if (device_may_wakeup(priv->device))

> >                         phylink_speed_down(priv->phylink, false);

> > -               phylink_stop(priv->phylink);

> >                 rtnl_unlock();

> >                 mutex_lock(&priv->lock);

> >

> > @@ -5385,6 +5384,10 @@ int stmmac_suspend(struct device *dev)

> >         }

> >         mutex_unlock(&priv->lock);

> >

> > +       rtnl_lock();

> > +       phylink_stop(priv->phylink);

> > +       rtnl_unlock();

> > +

> >         priv->speed = SPEED_UNKNOWN;

> >         return 0;

> >  }

> > @@ -5448,6 +5451,12 @@ int stmmac_resume(struct device *dev)

> >                 pinctrl_pm_select_default_state(priv->device);

> >                 if (priv->plat->clk_ptp_ref)

> >                         clk_prepare_enable(priv->plat->clk_ptp_ref);

> > +

> > +               rtnl_lock();

> > +               /* We may have called phylink_speed_down before */

> > +               phylink_speed_up(priv->phylink);

> > +               rtnl_unlock();

> > +

> >                 /* reset the phy so that it's ready */

> >                 if (priv->mii && priv->mdio_rst_after_resume)

> >                         stmmac_mdio_reset(priv->mii); @@ -5461,13

> > +5470,9 @@ int stmmac_resume(struct device *dev)

> >                         return ret;

> >         }

> >

> > -       if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {

> > -               rtnl_lock();

> > -               phylink_start(priv->phylink);

> > -               /* We may have called phylink_speed_down before */

> > -               phylink_speed_up(priv->phylink);

> > -               rtnl_unlock();

> > -       }

> > +       rtnl_lock();

> > +       phylink_start(priv->phylink);

> > +       rtnl_unlock();

> >

> >         rtnl_lock();

> >         mutex_lock(&priv->lock);

> 

> You also need to remove the calls to phylink_mac_change() from the

> suspend/resume functions. Without knowing how WoL is configured to work in

> your setup, I couldn't comment why it isn't working. Can you give some hints

> please?


The wakeup pattern is WAKE_MAGIC.

For WoL active case,
I tried remove phylink_mac_change()=false, add phylink_stop() from suspend patch;
And remove phylink_mac_change()=true, add phylink_start() from resume patch;

System still can't be waked up, I will follow up to debug why system can't be waked up
by magic packets if we call phylink_stop() when suspend. If you have any experience, happy you 
can share with me. The only difference is that whether we call phylink_stop() or not.

> Also, what configuration of WoL are you using? I see stmmac supports several

> different configurations, but I assume priv->plat->pmt is NULL here?


WoL pattern is WAKE_MAGIC, and priv->plat->pmt is set since we use MAC-based WoL, not PHY-based WoL.

> > > phylink_stop() (and therefore phy_stop()) should be called even if

> > > WoL is active to shut down this state reporting, as other network drivers

> do.

> >

> > Ok, you mean that phylink_stop() also should be called even if WoL is

> > active, I would look in this direction since you are a professional.

> 

> Yes. If the system is suspending for whatever reason, you want to bring the

> MAC down so that when it resumes, the MAC will see a link up event

> afterwards.


Ok, thanks.

Best Regards,
Joakim Zhang
Joakim Zhang Sept. 2, 2021, 7:28 a.m. UTC | #8
Hi Russell,

> -----Original Message-----

> From: Russell King <linux@armlinux.org.uk>

> Sent: 2021年9月1日 21:26

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: Vladimir Oltean <olteanv@gmail.com>; peppe.cavallaro@st.com;

> alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> On Wed, Sep 01, 2021 at 11:42:08AM +0000, Joakim Zhang wrote:

> > Hi Vladimir,

> >

> > > -----Original Message-----

> > > From: Vladimir Oltean <olteanv@gmail.com>

> > > Sent: 2021年9月1日 18:56

> > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> > > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> > > mcoquelin.stm32@gmail.com; linux@armlinux.org.uk;

> > > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system

> > > resume back with WoL enabled

> > >

> > > On Wed, Sep 01, 2021 at 10:25:15AM +0000, Joakim Zhang wrote:

> > > >

> > > > Hi Vladimir,

> > > >

> > > > > -----Original Message-----

> > > > > From: Vladimir Oltean <olteanv@gmail.com>

> > > > > Sent: 2021年9月1日 17:22

> > > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > > > Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> > > > > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> > > > > mcoquelin.stm32@gmail.com; linux@armlinux.org.uk;

> > > > > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> > > > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when

> > > > > system resume back with WoL enabled

> > > > >

> > > > > On Wed, Sep 01, 2021 at 05:02:28PM +0800, Joakim Zhang wrote:

> > > > > > We can reproduce this issue with below steps:

> > > > > > 1) enable WoL on the host

> > > > > > 2) host system suspended

> > > > > > 3) remote client send out wakeup packets We can see that host

> > > > > > system resume back, but can't work, such as ping failed.

> > > > > >

> > > > > > After a bit digging, this issue is introduced by the commit

> > > > > > 46f69ded988d

> > > > > > ("net: stmmac: Use resolved link config in mac_link_up()"),

> > > > > > which use the finalised link parameters in mac_link_up()

> > > > > > rather than the parameters in mac_config().

> > > > > >

> > > > > > There are two scenarios for MAC suspend/resume:

> > > > > >

> > > > > > 1) MAC suspend with WoL disabled, stmmac_suspend() call

> > > > > > phylink_mac_change() to notify phylink machine that a change

> > > > > > in MAC state, then .mac_link_down callback would be invoked.

> > > > > > Further, it will call phylink_stop() to stop the phylink

> > > > > > instance. When MAC resume back, firstly phylink_start() is

> > > > > > called to start the phylink instance, then call

> > > > > > phylink_mac_change() which will finally trigger phylink

> > > > > > machine to invoke .mac_config and .mac_link_up callback. All

> > > > > > is fine since configuration in these two callbacks

> > > will be initialized.

> > > > > >

> > > > > > 2) MAC suspend with WoL enabled, phylink_mac_change() will put

> > > > > > link down, but there is no phylink_stop() to stop the phylink

> > > > > > instance, so it will link up again, that means .mac_config and

> > > > > > .mac_link_up would be invoked before system suspended. After

> > > > > > system resume back, it will do DMA initialization and SW reset

> > > > > > which let MAC lost the hardware setting (i.e MAC_Configuration

> > > > > > register(offset 0x0) is reset). Since link is up before system

> > > > > > suspended, so .mac_link_up would not be invoked after system

> > > > > > resume back, lead to there is no chance to initialize the

> > > > > > configuration in .mac_link_up callback, as a result, MAC can't

> > > > > > work any

> > > longer.

> > > > >

> > > > > Have you tried putting phylink_stop in .suspend, and

> > > > > phylink_start

> > > in .resume?

> > > >

> > > > Yes, I tried, but the system can't be wakeup with remote packets.

> > > > Please see the code change.

> > >

> > > That makes it a PHY driver issue then, I guess?

> > > At least some PHY drivers avoid suspending when WoL is active, like

> > > lan88xx_suspend.

> > > Even the phy_suspend function takes wol.wolopts into consideration

> > > before proceeding to call the driver. What PHY driver is it?

> >

> > I think it's not the PHY issue, since both STMMAC and FEC controllers

> > on i.MX8MP use the same PHY(Realtek RTL8211FD,

> drivers/net/phy/realtek.c), there is no issue with FEC.

> 

> Note that FEC calls phylink_stop() in fec_suspend() if the net device was up. So

> that kind of rules out phylink and phylib too... and points towards stmmac doing

> something it shouldn't.


Yes, I also compared the logic between FEC and STMMAC, for FEC, both WoL active and inactive
will invoke phy_stop() when suspend, and phy_start() when resume, so that fec_enet_adjust_link()
would be called to adjust link, let FEC can work correctly.
 
> > > Bad assumption in the stmmac driver, if the intention was for the

> > > link state change to be induced to phylink after the resume?

> >

> > Yes, I also think link state change should be captured after the

> > resume, it's very strange that link up again before suspended. You would see

> below log if I add no_console_suspend in cmdline.

> 

> ... because phylink_mac_change() is not supposed to be used to force the link

> down. I can't say this loudly enough: Read the documentation.

> I don't write it just for my pleasure, it's there to help others get stuff correct. If

> people aren't going to read it, I might as well not waste the time writing it.


The documentation is very valuable, and worthy of respect. I am sorry for not notice the doc before.
And sometimes I am not quite understand them. Sorry again.

>  * phylink_mac_change() - notify phylink of a change in MAC state

>  * @pl: a pointer to a &struct phylink returned from phylink_create()

>  * @up: indicates whether the link is currently up.

>  *

>  * The MAC driver should call this driver when the state of its link

>  * changes (eg, link failure, new negotiation results, etc.)

> 

> Realise that "up" is there merely to capture that the link has gone down - but

> by the time phylink reacts to that (which may be some time

> *after* this call has been made - it is *not* synchronous since it's meant to be

> called from an *interrupt*) the link state may well have changed again. So,

> phylink will always recheck the link state with up = false, so you _will_ get the

> link going down and then up.


Yes, as I described in the commit message, I noticed that phylink_mac_change() will not stop
the phylink instance, so it will link up again before system suspended. What I want to describe here
is that current STMMAC suspend/resume path is not correct for WoL active.
 
> In any case "change in MAC state" is only applicable when in in-band mode, not

> in PHY mode, so you should not be calling this if you have a PHY attached which

> isn't in in-band mode.


Got it, thanks.
 
> >

> > root@imx8mpevk:~# ethtool -s eth1 wol g

> > [   76.309460] stmmac: wakeup enable

> 

> So you've asked it to wake on MagicPacket, which is WAKE_MAGIC. As you got

> the message "wakeup enable" which is emitted by stmmac_set_wol(), this will

> only be emitted if priv->plat->pmt() is set. It will _not_ call

> phylink_ethtool_set_wol().


Right.

> So, you are not using the PHY-based wake-on-lan, but the MAC based

> wake-on-lan. 


Right.

> This means you need to have the phy <-> mac link up during

> suspend, and in that case, yes, you do not want to call

> phylink_stop() or phylink_start().


I have a question here, why need to have the phy<->mac link up during suspend?
What I understand is we need ensure PHY not suspended. Such as FEC, it call phy_stop()
when suspend for MAC-based WoL active, then it also can be waked up by magic packets.

As you described in past thread, phylink_stop() and phylink_start() also need to be called even with
WoL active.

> I'm not sure what stmmac_pmt() does - thanks to the macro stuff, I'd need to

> trace it through the driver and find out where that goes, and also which variant

> of stmmac you're using... so without more information I can't follow what the

> driver is doing.


I think stmmac_pmt() has no effect, it just program the hardware to enable the WoL feature, please see below:
https://elixir.bootlin.com/linux/v5.14-rc7/source/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c#L300

I use the version 5.10a.

Best Regards,
Joakim Zhang
Joakim Zhang Sept. 2, 2021, 7:35 a.m. UTC | #9
Hi Heiner,

> -----Original Message-----

> From: Heiner Kallweit <hkallweit1@gmail.com>

> Sent: 2021年9月1日 23:40

> To: Joakim Zhang <qiangqing.zhang@nxp.com>; Russell King

> <linux@armlinux.org.uk>

> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; andrew@lunn.ch;

> f.fainelli@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> On 01.09.2021 12:21, Joakim Zhang wrote:

> >

> > Hi Russell,

> >

> >> -----Original Message-----

> >> From: Russell King <linux@armlinux.org.uk>

> >> Sent: 2021年9月1日 17:14

> >> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> >> Cc: peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> >> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> >> mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; andrew@lunn.ch;

> >> f.fainelli@gmail.com; hkallweit1@gmail.com; dl-linux-imx

> >> <linux-imx@nxp.com>

> >> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system

> >> resume back with WoL enabled

> >>

> >> On Wed, Sep 01, 2021 at 05:02:28PM +0800, Joakim Zhang wrote:

> >>> We can reproduce this issue with below steps:

> >>> 1) enable WoL on the host

> >>> 2) host system suspended

> >>> 3) remote client send out wakeup packets We can see that host system

> >>> resume back, but can't work, such as ping failed.

> >>>

> >>> After a bit digging, this issue is introduced by the commit

> >>> 46f69ded988d

> >>> ("net: stmmac: Use resolved link config in mac_link_up()"), which

> >>> use the finalised link parameters in mac_link_up() rather than the

> >>> parameters in mac_config().

> >>>

> >>> There are two scenarios for MAC suspend/resume:

> >>>

> >>> 1) MAC suspend with WoL disabled, stmmac_suspend() call

> >>> phylink_mac_change() to notify phylink machine that a change in MAC

> >>> state, then .mac_link_down callback would be invoked. Further, it

> >>> will call phylink_stop() to stop the phylink instance. When MAC

> >>> resume back, firstly phylink_start() is called to start the phylink

> >>> instance, then call phylink_mac_change() which will finally trigger

> >>> phylink machine to invoke .mac_config and .mac_link_up callback. All

> >>> is fine since configuration in these two callbacks will be initialized.

> >>>

> >>> 2) MAC suspend with WoL enabled, phylink_mac_change() will put link

> >>> down, but there is no phylink_stop() to stop the phylink instance,

> >>> so it will link up again, that means .mac_config and .mac_link_up

> >>> would be invoked before system suspended. After system resume back,

> >>> it will do DMA initialization and SW reset which let MAC lost the

> >>> hardware setting (i.e MAC_Configuration register(offset 0x0) is

> >>> reset). Since link is up before system suspended, so .mac_link_up

> >>> would not be invoked after system resume back, lead to there is no

> >>> chance to initialize the configuration in .mac_link_up callback, as

> >>> a result, MAC can't work any longer.

> >>>

> >>> Above description is what I found when debug this issue, this patch

> >>> is just revert broken patch to workaround it, at least make MAC work

> >>> when system resume back with WoL enabled.

> >>>

> >>> Said this is a workaround, since it has not resolve the issue completely.

> >>> I just move the speed/duplex/pause etc into .mac_config callback,

> >>> there are other configurations in .mac_link_up callback which also

> >>> need to be initialized to work for specific functions.

> >>

> >> NAK. Please read the phylink documentation. speed/duplex/pause is

> >> undefined in .mac_config.

> >

> > Speed/duplex/pause also the field of " struct phylink_link_state", so

> > these can be refered in .mac_config, please see the link which stmmac did

> before:

> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felix

> >

> ir.bootlin.com%2Flinux%2Fv5.4.143%2Fsource%2Fdrivers%2Fnet%2Fethernet%

> >

> 2Fstmicro%2Fstmmac%2Fstmmac_main.c%23L852&amp;data=04%7C01%7Cq

> iangqing

> > .zhang%40nxp.com%7C83fda179bf1d41fca42008d96d5ed3c3%7C686ea1d3b

> c2b4c6f

> >

> a92cd99c5c301635%7C0%7C0%7C637661076316192906%7CUnknown%7CTW

> FpbGZsb3d8

> >

> eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D

> %7C1

> >

> 000&amp;sdata=zM%2Fw1R%2BHkY067pg8wm2%2FS0zsLoBNQQ2TmdXWJat

> Ptes%3D&amp

> > ;reserved=0

> >

> >

> >> I think the problem here is that you're not calling phylink_stop()

> >> when WoL is enabled, which means phylink will continue to maintain

> >> the state as per the hardware state, and phylib will continue to run

> >> its state machine reporting the link state to phylink.

> >

> > Yes, I also tried do below code change, but the host would not be

> > wakeup, phylink_stop() would call phy_stop(), phylib would call

> > phy_suspend() finally, it will not suspend phy if it detect WoL enabled, so now

> I don't know why system can't be wakeup with this code change.

> >

> Follow-up question would be whether link breaks accidentally on suspend, or

> whether something fails on resume.When suspending, does the link break and

> link LEDs go off?


No, the LEDs is normal.

> Depending on LED configuration you may also see whether link speed is

> reduced on suspend.

> struct net_device has a member wol_enabled, does it make a difference if set

> it?


I have a try, there is no help. Have not clear why phylink_stop() will break WoL function,
lead it can't be waked up via magic packets.

Thanks.

Best Regards,
Joakim Zhang
Russell King (Oracle) Sept. 2, 2021, 8:32 a.m. UTC | #10
On Thu, Sep 02, 2021 at 07:28:44AM +0000, Joakim Zhang wrote:
> 

> Hi Russell,

> 

> > -----Original Message-----

> > From: Russell King <linux@armlinux.org.uk>

> > Sent: 2021年9月1日 21:26

> > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > Cc: Vladimir Oltean <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> > back with WoL enabled

> > 

> > This means you need to have the phy <-> mac link up during

> > suspend, and in that case, yes, you do not want to call

> > phylink_stop() or phylink_start().

> 

> I have a question here, why need to have the phy<->mac link up during suspend?


You need the link up because I think from reading the code, it is _not_
the PHY that is triggering the wakeup in the configuration you are using,
but the MAC.

If the link is down, the PHY can't pass the received packet to the MAC,
and the MAC can't recognise the magic packet.

FEC doesn't have this. FEC relies purely on the PHY detecting the magic
packet, which is much more power efficient, because it means the MAC
doesn't need to be powered up and operational while the rest of the
system is suspended.

> As you described in past thread, phylink_stop() and phylink_start() also need to be called even with

> WoL active.


That was with the assumption that the PHY was detecting the magic
packet. It isn't for stmmac - stmmac can be configured to bypass
the configuration of the PHY for this and uses the MAC to detect
this instead. If the MAC is doing the detecting for WoL, then you
need network connectivity to be functional from the network cable
through the PHY and up to the MAC.

So, bringing the link down at suspend in this case _will_ break
WoL. The PHY isn't the device detecting the magic packet, it is the
MAC, and the MAC must be able to see the network traffic.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Joakim Zhang Sept. 2, 2021, 10:26 a.m. UTC | #11
Hi Russell,

Thanks a lot!

> -----Original Message-----

> From: Russell King <linux@armlinux.org.uk>

> Sent: 2021年9月2日 16:32

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: Vladimir Oltean <olteanv@gmail.com>; peppe.cavallaro@st.com;

> alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> On Thu, Sep 02, 2021 at 07:28:44AM +0000, Joakim Zhang wrote:

> >

> > Hi Russell,

> >

> > > -----Original Message-----

> > > From: Russell King <linux@armlinux.org.uk>

> > > Sent: 2021年9月1日 21:26

> > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > Cc: Vladimir Oltean <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> > > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system

> > > resume back with WoL enabled

> > >

> > > This means you need to have the phy <-> mac link up during suspend,

> > > and in that case, yes, you do not want to call

> > > phylink_stop() or phylink_start().

> >

> > I have a question here, why need to have the phy<->mac link up during

> suspend?

> 

> You need the link up because I think from reading the code, it is _not_ the PHY

> that is triggering the wakeup in the configuration you are using, but the MAC.

> 

> If the link is down, the PHY can't pass the received packet to the MAC, and the

> MAC can't recognise the magic packet.


Per my understanding, if use PHY-based wakeup, PHY should be active, and MAC can be
totally suspended. When PHY receive the magic packets, it will generate a signal via wakeup
PIN (PHY seems all have such PIN) to inform SoC, we can use this to wake up the system.
Please correct me if I misunderstand.

> FEC doesn't have this. FEC relies purely on the PHY detecting the magic packet,

> which is much more power efficient, because it means the MAC doesn't need

> to be powered up and operational while the rest of the system is suspended.


AFAIK, FEC also use the MAC-based wakeup, when enable FEC WoL feature, it will
keep MAC receive logic active, PHY pass the received packets to MAC, if MAC detects
the magic packets, it will generate an interrupt to wake up the system.

Below is the block guide description:
To put the MAC in Sleep mode, set ENETn_ECR[SLEEP]. At the same time
ENETn_ECR[MAGICEN] should also be set to enable magic packet detection.
In addition, when the processor is in Stop mode, Sleep mode is entered, without affecting
the ENETn_ECR register bits.
When the core is in Sleep mode:
? The MAC transmit logic is disabled.
? The core FIFO receive/transmit functions are disabled.
? The MAC receive logic is kept in Normal mode, but it ignores all traffic from the
line except magic packets. They are detected so that a remote agent can wake the
node.

So FEC is MAC-based wakeup, right?

> > As you described in past thread, phylink_stop() and phylink_start()

> > also need to be called even with WoL active.

> 

> That was with the assumption that the PHY was detecting the magic packet. It

> isn't for stmmac - stmmac can be configured to bypass the configuration of the

> PHY for this and uses the MAC to detect this instead. If the MAC is doing the

> detecting for WoL, then you need network connectivity to be functional from

> the network cable through the PHY and up to the MAC.


Yes, we configure MAC detecting the WoL, I think, as long as PHY is active, it can receive
the packets then pass to MAC, MAC ignore all traffic from the line except magic packets.
So STMMAC should work the same as FEC do.

> So, bringing the link down at suspend in this case _will_ break WoL. The PHY

> isn't the device detecting the magic packet, it is the MAC, and the MAC must

> be able to see the network traffic.


I am not sure if it is the difference for phylink and phylib, may phylink has such requirement?

Best Regards,
Joakim Zhang
Russell King (Oracle) Sept. 2, 2021, 10:49 a.m. UTC | #12
On Thu, Sep 02, 2021 at 10:26:13AM +0000, Joakim Zhang wrote:
> 

> Hi Russell,

> 

> Thanks a lot!

> 

> > -----Original Message-----

> > From: Russell King <linux@armlinux.org.uk>

> > Sent: 2021年9月2日 16:32

> > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > Cc: Vladimir Oltean <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> > back with WoL enabled

> > 

> > On Thu, Sep 02, 2021 at 07:28:44AM +0000, Joakim Zhang wrote:

> > >

> > > Hi Russell,

> > >

> > > > -----Original Message-----

> > > > From: Russell King <linux@armlinux.org.uk>

> > > > Sent: 2021年9月1日 21:26

> > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > > Cc: Vladimir Oltean <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > > > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> > > > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> > > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system

> > > > resume back with WoL enabled

> > > >

> > > > This means you need to have the phy <-> mac link up during suspend,

> > > > and in that case, yes, you do not want to call

> > > > phylink_stop() or phylink_start().

> > >

> > > I have a question here, why need to have the phy<->mac link up during

> > suspend?

> > 

> > You need the link up because I think from reading the code, it is _not_ the PHY

> > that is triggering the wakeup in the configuration you are using, but the MAC.

> > 

> > If the link is down, the PHY can't pass the received packet to the MAC, and the

> > MAC can't recognise the magic packet.

> 

> Per my understanding, if use PHY-based wakeup, PHY should be active, and MAC can be

> totally suspended. When PHY receive the magic packets, it will generate a signal via wakeup

> PIN (PHY seems all have such PIN) to inform SoC, we can use this to wake up the system.

> Please correct me if I misunderstand.


Correct.

> > FEC doesn't have this. FEC relies purely on the PHY detecting the magic packet,

> > which is much more power efficient, because it means the MAC doesn't need

> > to be powered up and operational while the rest of the system is suspended.

> 

> AFAIK, FEC also use the MAC-based wakeup, when enable FEC WoL feature, it will

> keep MAC receive logic active, PHY pass the received packets to MAC, if MAC detects

> the magic packets, it will generate an interrupt to wake up the system.


You're right.

However, as the PHY is not configured for WoL with FEC, and
fec_suspend() unconditionally calls phy_stop() which will place the PHY
into suspend mode. Maybe the PHY driver there has a NULL phydrv->suspend
method? However, I see that at803x has suspend methods (which I believe
is the PHY that gets used with i.MX products) which will power down the
PHY.

So, how does this work with FEC - because right now I can't see it
working, but you say it does.

I think we need to understand how FEC is working here, and we need a
deeper understanding why stmmac isn't working.

I don't have any iMX systems that support WoL, so this isn't something
I can test. (SolidRun's i.MX platforms do not support any kind of
system power down, so suspend isn't supported.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Joakim Zhang Sept. 2, 2021, 11:15 a.m. UTC | #13
Hi Russell,

> -----Original Message-----

> From: Russell King <linux@armlinux.org.uk>

> Sent: 2021年9月2日 18:50

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: Vladimir Oltean <olteanv@gmail.com>; peppe.cavallaro@st.com;

> alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> On Thu, Sep 02, 2021 at 10:26:13AM +0000, Joakim Zhang wrote:

> >

> > Hi Russell,

> >

> > Thanks a lot!

> >

> > > -----Original Message-----

> > > From: Russell King <linux@armlinux.org.uk>

> > > Sent: 2021年9月2日 16:32

> > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > Cc: Vladimir Oltean <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> > > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system

> > > resume back with WoL enabled

> > >

> > > On Thu, Sep 02, 2021 at 07:28:44AM +0000, Joakim Zhang wrote:

> > > >

> > > > Hi Russell,

> > > >

> > > > > -----Original Message-----

> > > > > From: Russell King <linux@armlinux.org.uk>

> > > > > Sent: 2021年9月1日 21:26

> > > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > > > Cc: Vladimir Oltean <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > > > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > > > > davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com;

> > > > > netdev@vger.kernel.org; andrew@lunn.ch; f.fainelli@gmail.com;

> > > > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when

> > > > > system resume back with WoL enabled

> > > > >

> > > > > This means you need to have the phy <-> mac link up during

> > > > > suspend, and in that case, yes, you do not want to call

> > > > > phylink_stop() or phylink_start().

> > > >

> > > > I have a question here, why need to have the phy<->mac link up

> > > > during

> > > suspend?

> > >

> > > You need the link up because I think from reading the code, it is

> > > _not_ the PHY that is triggering the wakeup in the configuration you are

> using, but the MAC.

> > >

> > > If the link is down, the PHY can't pass the received packet to the

> > > MAC, and the MAC can't recognise the magic packet.

> >

> > Per my understanding, if use PHY-based wakeup, PHY should be active,

> > and MAC can be totally suspended. When PHY receive the magic packets,

> > it will generate a signal via wakeup PIN (PHY seems all have such PIN) to

> inform SoC, we can use this to wake up the system.

> > Please correct me if I misunderstand.

> 

> Correct.

> 

> > > FEC doesn't have this. FEC relies purely on the PHY detecting the

> > > magic packet, which is much more power efficient, because it means

> > > the MAC doesn't need to be powered up and operational while the rest of

> the system is suspended.

> >

> > AFAIK, FEC also use the MAC-based wakeup, when enable FEC WoL feature,

> > it will keep MAC receive logic active, PHY pass the received packets

> > to MAC, if MAC detects the magic packets, it will generate an interrupt to

> wake up the system.

> 

> You're right.

> 

> However, as the PHY is not configured for WoL with FEC, and

> fec_suspend() unconditionally calls phy_stop() which will place the PHY into

> suspend mode. 


No, phylib has much logic to avoid putting PHY into suspended state if either MAC or
PHY WoL active. Such as in phy_suspend(),
	if (wol.wolopts || (netdev && netdev->wol_enabled))
		return -EBUSY;

Or in mdio_bus_phy_may_suspend()...

> Maybe the PHY driver there has a NULL phydrv->suspend

> method? However, I see that at803x has suspend methods (which I believe is

> the PHY that gets used with i.MX products) which will power down the PHY.


Yes, we have many boards use AR8031 PHY, and all support MAC-based wakeup feature.

> So, how does this work with FEC - because right now I can't see it working, but

> you say it does.

>

> I think we need to understand how FEC is working here, and we need a deeper

> understanding why stmmac isn't working.


Emm, @andrew@lunn.ch, Andrew is much familiar with FEC, and PHY maintainers,
Could you please help put insights here if possible?

> I don't have any iMX systems that support WoL, so this isn't something I can

> test. (SolidRun's i.MX platforms do not support any kind of system power down,

> so suspend isn't supported.)


Understood, thanks.

Best Regards,
Joakim Zhang
Andrew Lunn Sept. 2, 2021, 12:24 p.m. UTC | #14
> Emm, @andrew@lunn.ch, Andrew is much familiar with FEC, and PHY maintainers,

> Could you please help put insights here if possible?


All the boards i have either have an Ethernet Switch connected to the
MAC, or a Micrel PHY. None are setup for WoL, since it is not used in
the use case of these boards.

I think you need to scatter some printk() in various places to confirm
what is going on. Where is the WoL implemented: MAC or PHY, what is
suspended or not, etc.

    Andrew
Joakim Zhang Sept. 3, 2021, 6:51 a.m. UTC | #15
> -----Original Message-----

> From: Andrew Lunn <andrew@lunn.ch>

> Sent: 2021年9月2日 20:24

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: Russell King <linux@armlinux.org.uk>; Vladimir Oltean

> <olteanv@gmail.com>; peppe.cavallaro@st.com;

> alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> netdev@vger.kernel.org; f.fainelli@gmail.com; hkallweit1@gmail.com;

> dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> > Emm, @andrew@lunn.ch, Andrew is much familiar with FEC, and PHY

> > maintainers, Could you please help put insights here if possible?

> 

> All the boards i have either have an Ethernet Switch connected to the MAC, or

> a Micrel PHY. None are setup for WoL, since it is not used in the use case of

> these boards.

> 

> I think you need to scatter some printk() in various places to confirm what is

> going on. Where is the WoL implemented: MAC or PHY, what is suspended or

> not, etc.


Thanks Andrew, Russell,

I confirmed FEC is MAC-based WoL, and PHY is active when system suspended if MAC-based WoL is active.
I scatter printk() in both phy_device.c and realtek.c phy driver to debug this for both WoL active and inactive case.

When MAC-based WoL is active, phy_suspend() is the last point to actually suspend the PHY, you can see that,
	phy_ethtool_get_wol(phydev, &wol);
	if (wol.wolopts || (netdev && netdev->wol_enabled))
		return -EBUSY;

Here, netdev is true and netdev->wol_enabled is ture (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled = !!wol.wolopts;)
So that phydev->suspend() would not be called, PHY is active after system suspended. PHY can receive packets and pass to MAC,
MAC is responsible for detecting magic packets then generate a wakeup interrupt. So all is fine for FEC, and the behavior is clear.

For STMMAC, when MAC-based WoL is active, according to the current implementation, only call phylink_mac_change()=false,
PHY would be active, so PHY can receive packets then pass to MAC, MAC ignore packets except magic packets. System can be
waked up successfully.

The issue is that phylink_mac_change()=false only notify a phylink of a change in MAC state, as we analyzed before, PHY would link up again
before system suspended, which lead to .mac_link_up can't be called when system resume back. Unfortunately, all MAC configurations
are in stmmac_mac_link_up(), as a result, MAC has not been initialized correctly when system resume back, so that it can't work any longer.
  
Intend to fix this obvious breakage, I did some work:
Removing phylink_mac_change() (Russell said it's for MLO_AN_INBAND, but we have a MLO_AN_PHY) from suspend/resume path,
then adding phylink_stop() in suspend, phylink_start() in resume() also for WoL active path. I found remote magic packets can't wake up the
system, I firstly suspect PHY may be suspended. After further debug, I confirm that PHY is active, and stmmac_pmt() is correctly configured.
So the issue seems we invoke phylink_stop() for WoL active patch, continuing..., Removing phylink_mac_change() and phylink_stop() in suspend, 
the system can be waked up via magic packets.

The conclusion is that, as long as we call phylink_stop() for WoL active in suspend(), then system can't be waked up any longer, and the PHY
situation is active. This let me recall what Russell mentioned in this thread, if we need bring MAC link up with phylink framework to let MAC
can see traffic from PHY when MAC-based WoL is active? 

Now, I don't know where I can further dig into this issue, if you have any advice please share with me , thanks in advance.

Best Regards,
Joakim Zhang
Russell King (Oracle) Sept. 3, 2021, 8:01 a.m. UTC | #16
On Fri, Sep 03, 2021 at 06:51:09AM +0000, Joakim Zhang wrote:
> 

> > -----Original Message-----

> > From: Andrew Lunn <andrew@lunn.ch>

> > Sent: 2021年9月2日 20:24

> > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > Cc: Russell King <linux@armlinux.org.uk>; Vladimir Oltean

> > <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> > netdev@vger.kernel.org; f.fainelli@gmail.com; hkallweit1@gmail.com;

> > dl-linux-imx <linux-imx@nxp.com>

> > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> > back with WoL enabled

> > 

> > > Emm, @andrew@lunn.ch, Andrew is much familiar with FEC, and PHY

> > > maintainers, Could you please help put insights here if possible?

> > 

> > All the boards i have either have an Ethernet Switch connected to the MAC, or

> > a Micrel PHY. None are setup for WoL, since it is not used in the use case of

> > these boards.

> > 

> > I think you need to scatter some printk() in various places to confirm what is

> > going on. Where is the WoL implemented: MAC or PHY, what is suspended or

> > not, etc.

> 

> Thanks Andrew, Russell,

> 

> I confirmed FEC is MAC-based WoL, and PHY is active when system suspended if MAC-based WoL is active.

> I scatter printk() in both phy_device.c and realtek.c phy driver to debug this for both WoL active and inactive case.

> 

> When MAC-based WoL is active, phy_suspend() is the last point to actually suspend the PHY, you can see that,

> 	phy_ethtool_get_wol(phydev, &wol);

> 	if (wol.wolopts || (netdev && netdev->wol_enabled))

> 		return -EBUSY;

> 

> Here, netdev is true and netdev->wol_enabled is ture (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled = !!wol.wolopts;)

> So that phydev->suspend() would not be called, PHY is active after system suspended. PHY can receive packets and pass to MAC,

> MAC is responsible for detecting magic packets then generate a wakeup interrupt. So all is fine for FEC, and the behavior is clear.


What happens on resume with FEC?

> For STMMAC, when MAC-based WoL is active, according to the current implementation, only call phylink_mac_change()=false,

> PHY would be active, so PHY can receive packets then pass to MAC, MAC ignore packets except magic packets. System can be

> waked up successfully.

> 

> The issue is that phylink_mac_change()=false only notify a phylink of a change in MAC state, as we analyzed before, PHY would link up again

> before system suspended, which lead to .mac_link_up can't be called when system resume back. Unfortunately, all MAC configurations

> are in stmmac_mac_link_up(), as a result, MAC has not been initialized correctly when system resume back, so that it can't work any longer.


Oh, I thought your problem was that the system didn't wake up.

In any case, remove the calls to phylink_mac_change() from the suspend
and resume functions, they are completely _incorrect_.

> Intend to fix this obvious breakage, I did some work:

> Removing phylink_mac_change() (Russell said it's for MLO_AN_INBAND, but we have a MLO_AN_PHY) from suspend/resume path,

> then adding phylink_stop() in suspend, phylink_start() in resume() also for WoL active path. I found remote magic packets can't wake up the

> system, I firstly suspect PHY may be suspended. After further debug, I confirm that PHY is active, and stmmac_pmt() is correctly configured.


As I've said a few times now, if the MAC is doing the wakeup, you
need the PHY to MAC link to be up, so you should _not_ call
phylink_stop() and phylink_start() from the suspend/resume functions
because they will take the link down.

Maybe I should provide phylink_suspend()/phylink_resume() which look
at the netdev state just like phylib does, and conditionally call
phylink_stop() and phylink_start() so driver authors don't have to
consider this.

Something like:

/**
 *...
 * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan
 */
void phylink_suspend(struct phylink *phylink, bool mac_wol)
{
	ASSERT_RTNL();

	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))
		phylink_stop(phylink);
}

/**
 *...
 * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan
 *
 * @mac_wol must have the same value as passed previously to
 * phylink_suspend().
 */
void phylink_resume(struct phylink *phylink, bool mac_wol)
{
	ASSERT_RTNL();

	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))
		phylink_start(phylink);
}

> The conclusion is that, as long as we call phylink_stop() for WoL active in suspend(), then system can't be waked up any longer, and the PHY

> situation is active. This let me recall what Russell mentioned in this thread, if we need bring MAC link up with phylink framework to let MAC

> can see traffic from PHY when MAC-based WoL is active? 

> 

> Now, I don't know where I can further dig into this issue, if you have any advice please share with me , thanks in advance.


So my question now is: as the MAC needs to be alive while the system
is suspended, that implies that it has been configured to receive
packets. When the system resumes, why exactly doesn't the MAC continue
to work? Does the MAC get reset after the system comes out of resume
and lose all of its configuration?

Reading what stmmac_resume() does, it seems that may well be the case,
or if not, the actions of stmmac_resume() ends up reprogramming a great
deal of the device setup. If this is the case, then yes, we need phylink
to be triggered to reconfigure the link - which we could do in
phylink_resume() if mac_wol was active.

While reading stmmac_resume(), I have to question the placement of this
code block:

        if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
                rtnl_lock();
                phylink_start(priv->phylink);
                /* We may have called phylink_speed_down before */
                phylink_speed_up(priv->phylink);
                rtnl_unlock();
        }

in the sequence there - phylink_start() should be called when you're
ready for the link to come up - in other words, when you're ready to
start seeing packets arrive at the MAC's interface. However, the
code following is clearing and resetting up queues, restoring receive
modes, setting up the hardware, and restoring the vlan filtering.
Surely all that should happen before calling phylink_start(), much
like it already does in stmmac_open() ?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Joakim Zhang Sept. 3, 2021, 8:39 a.m. UTC | #17
Hi Russell,

> -----Original Message-----

> From: Russell King <linux@armlinux.org.uk>

> Sent: 2021年9月3日 16:02

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: Andrew Lunn <andrew@lunn.ch>; Vladimir Oltean <olteanv@gmail.com>;

> peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; f.fainelli@gmail.com;

> hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> On Fri, Sep 03, 2021 at 06:51:09AM +0000, Joakim Zhang wrote:

> >

> > > -----Original Message-----

> > > From: Andrew Lunn <andrew@lunn.ch>

> > > Sent: 2021年9月2日 20:24

> > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > Cc: Russell King <linux@armlinux.org.uk>; Vladimir Oltean

> > > <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> > > netdev@vger.kernel.org; f.fainelli@gmail.com; hkallweit1@gmail.com;

> > > dl-linux-imx <linux-imx@nxp.com>

> > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system

> > > resume back with WoL enabled

> > >

> > > > Emm, @andrew@lunn.ch, Andrew is much familiar with FEC, and PHY

> > > > maintainers, Could you please help put insights here if possible?

> > >

> > > All the boards i have either have an Ethernet Switch connected to

> > > the MAC, or a Micrel PHY. None are setup for WoL, since it is not

> > > used in the use case of these boards.

> > >

> > > I think you need to scatter some printk() in various places to

> > > confirm what is going on. Where is the WoL implemented: MAC or PHY,

> > > what is suspended or not, etc.

> >

> > Thanks Andrew, Russell,

> >

> > I confirmed FEC is MAC-based WoL, and PHY is active when system

> suspended if MAC-based WoL is active.

> > I scatter printk() in both phy_device.c and realtek.c phy driver to debug this

> for both WoL active and inactive case.

> >

> > When MAC-based WoL is active, phy_suspend() is the last point to actually

> suspend the PHY, you can see that,

> > 	phy_ethtool_get_wol(phydev, &wol);

> > 	if (wol.wolopts || (netdev && netdev->wol_enabled))

> > 		return -EBUSY;

> >

> > Here, netdev is true and netdev->wol_enabled is ture

> > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =

> > !!wol.wolopts;) So that phydev->suspend() would not be called, PHY is active

> after system suspended. PHY can receive packets and pass to MAC, MAC is

> responsible for detecting magic packets then generate a wakeup interrupt. So

> all is fine for FEC, and the behavior is clear.

> 

> What happens on resume with FEC?


Since we call phy_stop() in fec_suspend(), the link is down, but the PHY is active, after receiving
magic packets, the system resume back; In fec_resume(), after restart/init FEC, we call phy_start()
to let link up, then all is going well.
 
> > For STMMAC, when MAC-based WoL is active, according to the current

> > implementation, only call phylink_mac_change()=false, PHY would be

> > active, so PHY can receive packets then pass to MAC, MAC ignore packets

> except magic packets. System can be waked up successfully.

> >

> > The issue is that phylink_mac_change()=false only notify a phylink of

> > a change in MAC state, as we analyzed before, PHY would link up again

> > before system suspended, which lead to .mac_link_up can't be called when

> system resume back. Unfortunately, all MAC configurations are in

> stmmac_mac_link_up(), as a result, MAC has not been initialized correctly

> when system resume back, so that it can't work any longer.

> 

> Oh, I thought your problem was that the system didn't wake up.

> 

> In any case, remove the calls to phylink_mac_change() from the suspend and

> resume functions, they are completely _incorrect_.


Ok, I will do that.

> > Intend to fix this obvious breakage, I did some work:

> > Removing phylink_mac_change() (Russell said it's for MLO_AN_INBAND,

> > but we have a MLO_AN_PHY) from suspend/resume path, then adding

> > phylink_stop() in suspend, phylink_start() in resume() also for WoL active path.

> I found remote magic packets can't wake up the system, I firstly suspect PHY

> may be suspended. After further debug, I confirm that PHY is active, and

> stmmac_pmt() is correctly configured.

> 

> As I've said a few times now, if the MAC is doing the wakeup, you need the PHY

> to MAC link to be up, so you should _not_ call

> phylink_stop() and phylink_start() from the suspend/resume functions because

> they will take the link down.


Yes, I recall you said this before. Is it the requirement for phylink?
For FEC, we call phy_stop() when suspend, and phy_start() when resume, with MAC is doing
the wakeup.

> Maybe I should provide phylink_suspend()/phylink_resume() which look at the

> netdev state just like phylib does, and conditionally call

> phylink_stop() and phylink_start() so driver authors don't have to consider this.

> 

> Something like:

> 

> /**

>  *...

>  * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan  */

> void phylink_suspend(struct phylink *phylink, bool mac_wol) {

> 	ASSERT_RTNL();

> 

> 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))

> 		phylink_stop(phylink);

> }

> 

> /**

>  *...

>  * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan

>  *

>  * @mac_wol must have the same value as passed previously to

>  * phylink_suspend().

>  */

> void phylink_resume(struct phylink *phylink, bool mac_wol) {

> 	ASSERT_RTNL();

> 

> 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))

> 		phylink_start(phylink);

> }


That's great!!! MAC driver authors don't need to distinguish the different cases.

> > The conclusion is that, as long as we call phylink_stop() for WoL

> > active in suspend(), then system can't be waked up any longer, and the

> > PHY situation is active. This let me recall what Russell mentioned in this

> thread, if we need bring MAC link up with phylink framework to let MAC can

> see traffic from PHY when MAC-based WoL is active?

> >

> > Now, I don't know where I can further dig into this issue, if you have any

> advice please share with me , thanks in advance.

> 

> So my question now is: as the MAC needs to be alive while the system is

> suspended, that implies that it has been configured to receive packets. When

> the system resumes, why exactly doesn't the MAC continue to work? Does the

> MAC get reset after the system comes out of resume and lose all of its

> configuration?


Yes, as I described in commit message, when STMMAC resume back, either WoL is active or not,
it reset the hardware then reconfig the MAC.
stmmac_resume()->stmmac_hw_setup()->stmmac_init_dma_engine()...

> Reading what stmmac_resume() does, it seems that may well be the case, or if

> not, the actions of stmmac_resume() ends up reprogramming a great deal of

> the device setup. If this is the case, then yes, we need phylink to be triggered

> to reconfigure the link - which we could do in

> phylink_resume() if mac_wol was active.

> 

> While reading stmmac_resume(), I have to question the placement of this code

> block:

> 

>         if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {

>                 rtnl_lock();

>                 phylink_start(priv->phylink);

>                 /* We may have called phylink_speed_down before */

>                 phylink_speed_up(priv->phylink);

>                 rtnl_unlock();

>         }

> 

> in the sequence there - phylink_start() should be called when you're ready for

> the link to come up - in other words, when you're ready to start seeing packets

> arrive at the MAC's interface. However, the code following is clearing and

> resetting up queues, restoring receive modes, setting up the hardware, and

> restoring the vlan filtering.

> Surely all that should happen before calling phylink_start(), much like it already

> does in stmmac_open() ?


There is a story here, SNPS EQOS IP need PHY provides RXC clock for MAC's receive
logic, so we need phylink_start() to bring PHY link up, that make PHY resume back,
PHY could stop RXC clock when in suspended state. This is the reason why calling phylink_start()
before re-config MAC.

Best Regards,
Joakim Zhang
Russell King (Oracle) Sept. 3, 2021, 9:32 a.m. UTC | #18
On Fri, Sep 03, 2021 at 08:39:23AM +0000, Joakim Zhang wrote:
> 

> Hi Russell,

> 

> > -----Original Message-----

> > From: Russell King <linux@armlinux.org.uk>

> > Sent: 2021年9月3日 16:02

> > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > Cc: Andrew Lunn <andrew@lunn.ch>; Vladimir Oltean <olteanv@gmail.com>;

> > peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> > mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; f.fainelli@gmail.com;

> > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> > back with WoL enabled

> > 

> > On Fri, Sep 03, 2021 at 06:51:09AM +0000, Joakim Zhang wrote:

> > >

> > > > -----Original Message-----

> > > > From: Andrew Lunn <andrew@lunn.ch>

> > > > Sent: 2021年9月2日 20:24

> > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > > Cc: Russell King <linux@armlinux.org.uk>; Vladimir Oltean

> > > > <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > > > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;

> > > > netdev@vger.kernel.org; f.fainelli@gmail.com; hkallweit1@gmail.com;

> > > > dl-linux-imx <linux-imx@nxp.com>

> > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system

> > > > resume back with WoL enabled

> > > >

> > > > > Emm, @andrew@lunn.ch, Andrew is much familiar with FEC, and PHY

> > > > > maintainers, Could you please help put insights here if possible?

> > > >

> > > > All the boards i have either have an Ethernet Switch connected to

> > > > the MAC, or a Micrel PHY. None are setup for WoL, since it is not

> > > > used in the use case of these boards.

> > > >

> > > > I think you need to scatter some printk() in various places to

> > > > confirm what is going on. Where is the WoL implemented: MAC or PHY,

> > > > what is suspended or not, etc.

> > >

> > > Thanks Andrew, Russell,

> > >

> > > I confirmed FEC is MAC-based WoL, and PHY is active when system

> > suspended if MAC-based WoL is active.

> > > I scatter printk() in both phy_device.c and realtek.c phy driver to debug this

> > for both WoL active and inactive case.

> > >

> > > When MAC-based WoL is active, phy_suspend() is the last point to actually

> > suspend the PHY, you can see that,

> > > 	phy_ethtool_get_wol(phydev, &wol);

> > > 	if (wol.wolopts || (netdev && netdev->wol_enabled))

> > > 		return -EBUSY;

> > >

> > > Here, netdev is true and netdev->wol_enabled is ture

> > > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =

> > > !!wol.wolopts;) So that phydev->suspend() would not be called, PHY is active

> > after system suspended. PHY can receive packets and pass to MAC, MAC is

> > responsible for detecting magic packets then generate a wakeup interrupt. So

> > all is fine for FEC, and the behavior is clear.

> > 

> > What happens on resume with FEC?

> 

> Since we call phy_stop() in fec_suspend(), the link is down, but the PHY is active, after receiving

> magic packets, the system resume back; In fec_resume(), after restart/init FEC, we call phy_start()

> to let link up, then all is going well.


... but the link never went down! So I don't understand the last point.

> > > For STMMAC, when MAC-based WoL is active, according to the current

> > > implementation, only call phylink_mac_change()=false, PHY would be

> > > active, so PHY can receive packets then pass to MAC, MAC ignore packets

> > except magic packets. System can be waked up successfully.

> > >

> > > The issue is that phylink_mac_change()=false only notify a phylink of

> > > a change in MAC state, as we analyzed before, PHY would link up again

> > > before system suspended, which lead to .mac_link_up can't be called when

> > system resume back. Unfortunately, all MAC configurations are in

> > stmmac_mac_link_up(), as a result, MAC has not been initialized correctly

> > when system resume back, so that it can't work any longer.

> > 

> > Oh, I thought your problem was that the system didn't wake up.

> > 

> > In any case, remove the calls to phylink_mac_change() from the suspend and

> > resume functions, they are completely _incorrect_.

> 

> Ok, I will do that.

> 

> > > Intend to fix this obvious breakage, I did some work:

> > > Removing phylink_mac_change() (Russell said it's for MLO_AN_INBAND,

> > > but we have a MLO_AN_PHY) from suspend/resume path, then adding

> > > phylink_stop() in suspend, phylink_start() in resume() also for WoL active path.

> > I found remote magic packets can't wake up the system, I firstly suspect PHY

> > may be suspended. After further debug, I confirm that PHY is active, and

> > stmmac_pmt() is correctly configured.

> > 

> > As I've said a few times now, if the MAC is doing the wakeup, you need the PHY

> > to MAC link to be up, so you should _not_ call

> > phylink_stop() and phylink_start() from the suspend/resume functions because

> > they will take the link down.

> 

> Yes, I recall you said this before. Is it the requirement for phylink?

> For FEC, we call phy_stop() when suspend, and phy_start() when resume, with MAC is doing

> the wakeup.


phylink_stop() will synchronously force the phy/mac link down if it
wasn't already down, and it'll do this by calling the mac_link_down()
method.

phylink_start() will remove the force-down that phylink_stop() places,
and will re-resolve the link. You will get a "major reconfiguration"
event, and if the link is up, a mac_link_up() call.

These are primarily designed to be called from the .ndo_open and
.ndo_stop calls (as their kerneldoc mentions) but have found their way
into suspend/resume methods.

If a MAC is being suspended - as in powered down - you definitely want
to bring it down to a safe state where link events are not going to
affect the MAC. Calling phylink_stop() will do that. On resume, you
want to reconfigure and allow the MAC to receive link events, and
calling phylink_start() will do that.

However, if the MAC is not being suspended because you want WoL to
work, then you need the PHY/MAC link _not_ to be brought down so the
MAC can receive packets and examine them to see if it should wake up.
Phylink does not really cater for that case - it wasn't on my radar
as I don't have any modern systems that support suspend/resume, much
less MAC based WoL.

> > Maybe I should provide phylink_suspend()/phylink_resume() which look at the

> > netdev state just like phylib does, and conditionally call

> > phylink_stop() and phylink_start() so driver authors don't have to consider this.

> > 

> > Something like:

> > 

> > /**

> >  *...

> >  * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan  */

> > void phylink_suspend(struct phylink *phylink, bool mac_wol) {

> > 	ASSERT_RTNL();

> > 

> > 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))

> > 		phylink_stop(phylink);

> > }

> > 

> > /**

> >  *...

> >  * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan

> >  *

> >  * @mac_wol must have the same value as passed previously to

> >  * phylink_suspend().

> >  */

> > void phylink_resume(struct phylink *phylink, bool mac_wol) {

> > 	ASSERT_RTNL();

> > 

> > 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))

> > 		phylink_start(phylink);

> > }

> 

> That's great!!! MAC driver authors don't need to distinguish the different cases.

> 

> > > The conclusion is that, as long as we call phylink_stop() for WoL

> > > active in suspend(), then system can't be waked up any longer, and the

> > > PHY situation is active. This let me recall what Russell mentioned in this

> > thread, if we need bring MAC link up with phylink framework to let MAC can

> > see traffic from PHY when MAC-based WoL is active?

> > >

> > > Now, I don't know where I can further dig into this issue, if you have any

> > advice please share with me , thanks in advance.

> > 

> > So my question now is: as the MAC needs to be alive while the system is

> > suspended, that implies that it has been configured to receive packets. When

> > the system resumes, why exactly doesn't the MAC continue to work? Does the

> > MAC get reset after the system comes out of resume and lose all of its

> > configuration?

> 

> Yes, as I described in commit message, when STMMAC resume back, either WoL is active or not,

> it reset the hardware then reconfig the MAC.

> stmmac_resume()->stmmac_hw_setup()->stmmac_init_dma_engine()...


Okay, so that means I need to make phylink_resume() trigger a major
config if we are using WoL.

There's a few more questions:
1. Since the state at this point will be that netdev and phylink believe
   the link to still be up, should phylink_suspend() force a
   netif_carrier_off() event to stop the netdev transmit watchdog - I
   think it ought to, even though the link will actually remain up.
2. Should we call mac_link_down() prior to the major reconfig - I think
   we should to keep the mac_link_down()/mac_link_up() calls balanced
   (as we do already guarantee.) Will that do any harm for stmmac if we
   were to call mac_link_down() as a result of a call to
   phylink_resume() ?

> > Reading what stmmac_resume() does, it seems that may well be the case, or if

> > not, the actions of stmmac_resume() ends up reprogramming a great deal of

> > the device setup. If this is the case, then yes, we need phylink to be triggered

> > to reconfigure the link - which we could do in

> > phylink_resume() if mac_wol was active.

> > 

> > While reading stmmac_resume(), I have to question the placement of this code

> > block:

> > 

> >         if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {

> >                 rtnl_lock();

> >                 phylink_start(priv->phylink);

> >                 /* We may have called phylink_speed_down before */

> >                 phylink_speed_up(priv->phylink);

> >                 rtnl_unlock();

> >         }

> > 

> > in the sequence there - phylink_start() should be called when you're ready for

> > the link to come up - in other words, when you're ready to start seeing packets

> > arrive at the MAC's interface. However, the code following is clearing and

> > resetting up queues, restoring receive modes, setting up the hardware, and

> > restoring the vlan filtering.

> > Surely all that should happen before calling phylink_start(), much like it already

> > does in stmmac_open() ?

> 

> There is a story here, SNPS EQOS IP need PHY provides RXC clock for MAC's receive

> logic, so we need phylink_start() to bring PHY link up, that make PHY resume back,

> PHY could stop RXC clock when in suspended state. This is the reason why calling phylink_start()

> before re-config MAC.


Why is it different from the .ndo_stop/.ndo_open case, where the PHY may
have been suspended by the actions of .ndo_stop?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Joakim Zhang Sept. 3, 2021, 11:04 a.m. UTC | #19
Hi Russell,

[...]
> > > > > -----Original Message-----

> > > > > From: Andrew Lunn <andrew@lunn.ch>

> > > > > Sent: 2021年9月2日 20:24

> > > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > > > Cc: Russell King <linux@armlinux.org.uk>; Vladimir Oltean

> > > > > <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > > > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > > > > davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com;

> > > > > netdev@vger.kernel.org; f.fainelli@gmail.com;

> > > > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when

> > > > > system resume back with WoL enabled

> > > > >

> > > > > > Emm, @andrew@lunn.ch, Andrew is much familiar with FEC, and

> > > > > > PHY maintainers, Could you please help put insights here if possible?

> > > > >

> > > > > All the boards i have either have an Ethernet Switch connected

> > > > > to the MAC, or a Micrel PHY. None are setup for WoL, since it is

> > > > > not used in the use case of these boards.

> > > > >

> > > > > I think you need to scatter some printk() in various places to

> > > > > confirm what is going on. Where is the WoL implemented: MAC or

> > > > > PHY, what is suspended or not, etc.

> > > >

> > > > Thanks Andrew, Russell,

> > > >

> > > > I confirmed FEC is MAC-based WoL, and PHY is active when system

> > > suspended if MAC-based WoL is active.

> > > > I scatter printk() in both phy_device.c and realtek.c phy driver

> > > > to debug this

> > > for both WoL active and inactive case.

> > > >

> > > > When MAC-based WoL is active, phy_suspend() is the last point to

> > > > actually

> > > suspend the PHY, you can see that,

> > > > 	phy_ethtool_get_wol(phydev, &wol);

> > > > 	if (wol.wolopts || (netdev && netdev->wol_enabled))

> > > > 		return -EBUSY;

> > > >

> > > > Here, netdev is true and netdev->wol_enabled is ture

> > > > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =

> > > > !!wol.wolopts;) So that phydev->suspend() would not be called, PHY

> > > > is active

> > > after system suspended. PHY can receive packets and pass to MAC, MAC

> > > is responsible for detecting magic packets then generate a wakeup

> > > interrupt. So all is fine for FEC, and the behavior is clear.

> > >

> > > What happens on resume with FEC?

> >

> > Since we call phy_stop() in fec_suspend(), the link is down, but the

> > PHY is active, after receiving magic packets, the system resume back;

> > In fec_resume(), after restart/init FEC, we call phy_start() to let link up, then

> all is going well.

> 

> ... but the link never went down! So I don't understand the last point.


Sorry, what the meaning of "the link never went down"? How do you define the link is down?
May be I have not get your original point correctly.

At my side, with MAC-based WoL is active, FEC calls phy_stop() in fec_suspend(), then fec_enet_adjust_link()
is called, further fec_stop() is called, FEC only keep necessary receive logic active to service WoL. This is not
the link went down? At least I see the log " fec 30be0000.ethernet eth0: Link is Down".

> > > > For STMMAC, when MAC-based WoL is active, according to the current

> > > > implementation, only call phylink_mac_change()=false, PHY would be

> > > > active, so PHY can receive packets then pass to MAC, MAC ignore

> > > > packets

> > > except magic packets. System can be waked up successfully.

> > > >

> > > > The issue is that phylink_mac_change()=false only notify a phylink

> > > > of a change in MAC state, as we analyzed before, PHY would link up

> > > > again before system suspended, which lead to .mac_link_up can't be

> > > > called when

> > > system resume back. Unfortunately, all MAC configurations are in

> > > stmmac_mac_link_up(), as a result, MAC has not been initialized

> > > correctly when system resume back, so that it can't work any longer.

> > >

> > > Oh, I thought your problem was that the system didn't wake up.

> > >

> > > In any case, remove the calls to phylink_mac_change() from the

> > > suspend and resume functions, they are completely _incorrect_.

> >

> > Ok, I will do that.

> >

> > > > Intend to fix this obvious breakage, I did some work:

> > > > Removing phylink_mac_change() (Russell said it's for

> > > > MLO_AN_INBAND, but we have a MLO_AN_PHY) from suspend/resume

> path,

> > > > then adding

> > > > phylink_stop() in suspend, phylink_start() in resume() also for WoL active

> path.

> > > I found remote magic packets can't wake up the system, I firstly

> > > suspect PHY may be suspended. After further debug, I confirm that

> > > PHY is active, and

> > > stmmac_pmt() is correctly configured.

> > >

> > > As I've said a few times now, if the MAC is doing the wakeup, you

> > > need the PHY to MAC link to be up, so you should _not_ call

> > > phylink_stop() and phylink_start() from the suspend/resume functions

> > > because they will take the link down.

> >

> > Yes, I recall you said this before. Is it the requirement for phylink?

> > For FEC, we call phy_stop() when suspend, and phy_start() when resume,

> > with MAC is doing the wakeup.

> 

> phylink_stop() will synchronously force the phy/mac link down if it wasn't

> already down, and it'll do this by calling the mac_link_down() method.

> 

> phylink_start() will remove the force-down that phylink_stop() places, and will

> re-resolve the link. You will get a "major reconfiguration"

> event, and if the link is up, a mac_link_up() call.

> 

> These are primarily designed to be called from the .ndo_open and .ndo_stop

> calls (as their kerneldoc mentions) but have found their way into

> suspend/resume methods.

> 

> If a MAC is being suspended - as in powered down - you definitely want to bring

> it down to a safe state where link events are not going to affect the MAC.

> Calling phylink_stop() will do that. On resume, you want to reconfigure and

> allow the MAC to receive link events, and calling phylink_start() will do that.

> 

> However, if the MAC is not being suspended because you want WoL to work,

> then you need the PHY/MAC link _not_ to be brought down so the MAC can

> receive packets and examine them to see if it should wake up.

> Phylink does not really cater for that case - it wasn't on my radar as I don't have

> any modern systems that support suspend/resume, much less MAC based

> WoL.


Great thanks, for your detailed explanation. It means phylink has not prepared for WoL yet.
I am happy to support testing since you lack of the environment.

> > > Maybe I should provide phylink_suspend()/phylink_resume() which look

> > > at the netdev state just like phylib does, and conditionally call

> > > phylink_stop() and phylink_start() so driver authors don't have to consider

> this.

> > >

> > > Something like:

> > >

> > > /**

> > >  *...

> > >  * @mac_wol: true if the MAC needs to receive packets for

> > > Wake-on-Lan  */ void phylink_suspend(struct phylink *phylink, bool

> mac_wol) {

> > > 	ASSERT_RTNL();

> > >

> > > 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))

> > > 		phylink_stop(phylink);

> > > }

> > >

> > > /**

> > >  *...

> > >  * @mac_wol: true if the MAC needs to receive packets for

> > > Wake-on-Lan

> > >  *

> > >  * @mac_wol must have the same value as passed previously to

> > >  * phylink_suspend().

> > >  */

> > > void phylink_resume(struct phylink *phylink, bool mac_wol) {

> > > 	ASSERT_RTNL();

> > >

> > > 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))

> > > 		phylink_start(phylink);

> > > }

> >

> > That's great!!! MAC driver authors don't need to distinguish the different

> cases.

> >

> > > > The conclusion is that, as long as we call phylink_stop() for WoL

> > > > active in suspend(), then system can't be waked up any longer, and

> > > > the PHY situation is active. This let me recall what Russell

> > > > mentioned in this

> > > thread, if we need bring MAC link up with phylink framework to let

> > > MAC can see traffic from PHY when MAC-based WoL is active?

> > > >

> > > > Now, I don't know where I can further dig into this issue, if you

> > > > have any

> > > advice please share with me , thanks in advance.

> > >

> > > So my question now is: as the MAC needs to be alive while the system

> > > is suspended, that implies that it has been configured to receive

> > > packets. When the system resumes, why exactly doesn't the MAC

> > > continue to work? Does the MAC get reset after the system comes out

> > > of resume and lose all of its configuration?

> >

> > Yes, as I described in commit message, when STMMAC resume back, either

> > WoL is active or not, it reset the hardware then reconfig the MAC.

> > stmmac_resume()->stmmac_hw_setup()->stmmac_init_dma_engine()...

> 

> Okay, so that means I need to make phylink_resume() trigger a major config if

> we are using WoL.

> 

> There's a few more questions:

> 1. Since the state at this point will be that netdev and phylink believe

>    the link to still be up, should phylink_suspend() force a

>    netif_carrier_off() event to stop the netdev transmit watchdog - I

>    think it ought to, even though the link will actually remain up.


Agree.

> 2. Should we call mac_link_down() prior to the major reconfig - I think

>    we should to keep the mac_link_down()/mac_link_up() calls balanced

>    (as we do already guarantee.) Will that do any harm for stmmac if we

>    were to call mac_link_down() as a result of a call to

>    phylink_resume() ?


For STMMAC, I think it's safe, since we calling phylink_resume() before re-config MAC.
But we design this for common usage, other MAC drivers may call this at the end of resume
path, but I think it also safe, like we unplug then plug the cable. However, it will print the LINK DOWN
then LINK UP log, which is very strange when system resume back. Is there any better solution?

> > > Reading what stmmac_resume() does, it seems that may well be the

> > > case, or if not, the actions of stmmac_resume() ends up

> > > reprogramming a great deal of the device setup. If this is the case,

> > > then yes, we need phylink to be triggered to reconfigure the link -

> > > which we could do in

> > > phylink_resume() if mac_wol was active.

> > >

> > > While reading stmmac_resume(), I have to question the placement of

> > > this code

> > > block:

> > >

> > >         if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {

> > >                 rtnl_lock();

> > >                 phylink_start(priv->phylink);

> > >                 /* We may have called phylink_speed_down before */

> > >                 phylink_speed_up(priv->phylink);

> > >                 rtnl_unlock();

> > >         }

> > >

> > > in the sequence there - phylink_start() should be called when you're

> > > ready for the link to come up - in other words, when you're ready to

> > > start seeing packets arrive at the MAC's interface. However, the

> > > code following is clearing and resetting up queues, restoring

> > > receive modes, setting up the hardware, and restoring the vlan filtering.

> > > Surely all that should happen before calling phylink_start(), much

> > > like it already does in stmmac_open() ?

> >

> > There is a story here, SNPS EQOS IP need PHY provides RXC clock for

> > MAC's receive logic, so we need phylink_start() to bring PHY link up,

> > that make PHY resume back, PHY could stop RXC clock when in suspended

> > state. This is the reason why calling phylink_start() before re-config MAC.

> 

> Why is it different from the .ndo_stop/.ndo_open case, where the PHY may

> have been suspended by the actions of .ndo_stop?


It's a good question. PHY will suspend by the actions od .ndo_stop, but it will resume
before we config MAC. Please see stmmac_open(), stmmac_init_phy()->phylink_of_phy_connect()
-> phy_attach_direct(): phy_resume(phydev), where PHY will be resume backed.

Best Regards,
Joakim Zhang
Russell King (Oracle) Sept. 3, 2021, 12:01 p.m. UTC | #20
On Fri, Sep 03, 2021 at 11:04:57AM +0000, Joakim Zhang wrote:
> 

> Hi Russell,

> 

> [...]

> > > > > > -----Original Message-----

> > > > > > From: Andrew Lunn <andrew@lunn.ch>

> > > > > > Sent: 2021年9月2日 20:24

> > > > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > > > > Cc: Russell King <linux@armlinux.org.uk>; Vladimir Oltean

> > > > > > <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > > > > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > > > > > davem@davemloft.net; kuba@kernel.org;

> > mcoquelin.stm32@gmail.com;

> > > > > > netdev@vger.kernel.org; f.fainelli@gmail.com;

> > > > > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when

> > > > > > system resume back with WoL enabled

> > > > > >

> > > > > > > Emm, @andrew@lunn.ch, Andrew is much familiar with FEC, and

> > > > > > > PHY maintainers, Could you please help put insights here if possible?

> > > > > >

> > > > > > All the boards i have either have an Ethernet Switch connected

> > > > > > to the MAC, or a Micrel PHY. None are setup for WoL, since it is

> > > > > > not used in the use case of these boards.

> > > > > >

> > > > > > I think you need to scatter some printk() in various places to

> > > > > > confirm what is going on. Where is the WoL implemented: MAC or

> > > > > > PHY, what is suspended or not, etc.

> > > > >

> > > > > Thanks Andrew, Russell,

> > > > >

> > > > > I confirmed FEC is MAC-based WoL, and PHY is active when system

> > > > suspended if MAC-based WoL is active.

> > > > > I scatter printk() in both phy_device.c and realtek.c phy driver

> > > > > to debug this

> > > > for both WoL active and inactive case.

> > > > >

> > > > > When MAC-based WoL is active, phy_suspend() is the last point to

> > > > > actually

> > > > suspend the PHY, you can see that,

> > > > > 	phy_ethtool_get_wol(phydev, &wol);

> > > > > 	if (wol.wolopts || (netdev && netdev->wol_enabled))

> > > > > 		return -EBUSY;

> > > > >

> > > > > Here, netdev is true and netdev->wol_enabled is ture

> > > > > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =

> > > > > !!wol.wolopts;) So that phydev->suspend() would not be called, PHY

> > > > > is active

> > > > after system suspended. PHY can receive packets and pass to MAC, MAC

> > > > is responsible for detecting magic packets then generate a wakeup

> > > > interrupt. So all is fine for FEC, and the behavior is clear.

> > > >

> > > > What happens on resume with FEC?

> > >

> > > Since we call phy_stop() in fec_suspend(), the link is down, but the

> > > PHY is active, after receiving magic packets, the system resume back;

> > > In fec_resume(), after restart/init FEC, we call phy_start() to let link up, then

> > all is going well.

> > 

> > ... but the link never went down! So I don't understand the last point.

> 

> Sorry, what the meaning of "the link never went down"? How do you define

> the link is down? May be I have not get your original point correctly.


If the link goes down, connectivity between the MAC and the outside
world is lost - whether that be the link between the MAC and PHY, or
PHY and the outside world. That's my definition of "link down".

However, if the MAC is still alive and receiving packets, even for
Wake-on-Lan purposes, from the outside world then the link can not be
down, it must be operational and therefore it must be in the "up"
state.

I'm talking about the physical state of the link - "up" meaning capable
of passing packets to and from the MAC, "down" meaning incapable of
passing packets.

> At my side, with MAC-based WoL is active, FEC calls phy_stop() in

> fec_suspend(), then fec_enet_adjust_link() is called, further

> fec_stop() is called, FEC only keep necessary receive logic active

> to service WoL. This is not the link went down? At least I see the

> log " fec 30be0000.ethernet eth0: Link is Down".


It looks like calling phy_stop() will force a link-down event to be
reported from phylib. As I say above though, really, this doesn't
affect the physical state of the link, because the link has to be
up for the WoL packets to be received by the MAC.

What I don't like about that is that we're saying that the link is
down, whereas the physical link is actually still up. This is going
to make network drivers implementation of mac_link_down() rather
yucky, especially ones that force the physical link down at the MAC
end when operating in PHY mode and they see a call to mac_link_down()
(which they do to stop packet reception.) There's no way for them to
know whether mac_link_down() is a result of a real physical link down
event, or whether this is a "soft" link down event as you're describing
at a suspend.

Then there's the issue that some network drivers _must_ see a
mac_link_down() call to force the link down prior to reconfiguring
the link (since settings are not allowed to be changed while the
physical link is up.) So we start to destroy the guarantee that
mac_link_down() and mac_link_up() will be properly ordered.

Damn it.

> > There's a few more questions:

> > 1. Since the state at this point will be that netdev and phylink believe

> >    the link to still be up, should phylink_suspend() force a

> >    netif_carrier_off() event to stop the netdev transmit watchdog - I

> >    think it ought to, even though the link will actually remain up.

> 

> Agree.


Note that phylib in the FEC case will do this just before calling
the adjust_link function already.

With phylink, we can make that silent, and I think it should be silent
because, as I describe above, the physical link isn't actually going
down.

> > 2. Should we call mac_link_down() prior to the major reconfig - I think

> >    we should to keep the mac_link_down()/mac_link_up() calls balanced

> >    (as we do already guarantee.) Will that do any harm for stmmac if we

> >    were to call mac_link_down() as a result of a call to

> >    phylink_resume() ?

> 

> For STMMAC, I think it's safe, since we calling phylink_resume() before re-config MAC.

> But we design this for common usage, other MAC drivers may call this at the end of resume

> path, but I think it also safe, like we unplug then plug the cable. However, it will print the LINK DOWN

> then LINK UP log, which is very strange when system resume back. Is there any better solution?


I think at this point, we should just print the state of the link at
resume, which should basically be only a "link down" if the link is
now physically down at resume.

One thing worries me though - what happens if the link parameters
change while the system is suspended. The MAC won't be updated with
the new link parameters. What happens on resume?

> > Why is it different from the .ndo_stop/.ndo_open case, where the PHY may

> > have been suspended by the actions of .ndo_stop?

> 

> It's a good question. PHY will suspend by the actions od .ndo_stop, but it will resume

> before we config MAC. Please see stmmac_open(), stmmac_init_phy()->phylink_of_phy_connect()

> -> phy_attach_direct(): phy_resume(phydev), where PHY will be resume backed.


Ah, I forgot FEC does the PHY connect/disconnect at open/stop. Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Russell King (Oracle) Sept. 3, 2021, 8:12 p.m. UTC | #21
Hi,

Here's a patch to try - you'll need to integrate the new calls into
stmmac's suspend and resume hooks. Obviously, given my previous
comments, this isn't tested!

I didn't need to repeat the mac_wol boolean to phylink_resume as we
can record the state internally - mac_wol should not change between
a call to phylink_suspend() and subsequent phylink_resume() anyway.

mac_wol should only be true if the MAC is involved in processing
packets for WoL, false otherwise.

Please let me know if this resolves your stmmac WoL issue.

Thanks.

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index f0c769027145..c4d0de04416a 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -33,6 +33,7 @@
 enum {
 	PHYLINK_DISABLE_STOPPED,
 	PHYLINK_DISABLE_LINK,
+	PHYLINK_DISABLE_MAC_WOL,
 };
 
 /**
@@ -1313,6 +1314,9 @@ EXPORT_SYMBOL_GPL(phylink_start);
  * network device driver's &struct net_device_ops ndo_stop() method.  The
  * network device's carrier state should not be changed prior to calling this
  * function.
+ *
+ * This will synchronously bring down the link if the link is not already
+ * down (in other words, it will trigger a mac_link_down() method call.)
  */
 void phylink_stop(struct phylink *pl)
 {
@@ -1338,6 +1342,81 @@ void phylink_stop(struct phylink *pl)
 }
 EXPORT_SYMBOL_GPL(phylink_stop);
 
+
+/**
+ * phylink_suspend() - handle a network device suspend event
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan
+ *
+ * Handle a network device suspend event. There are several cases:
+ * - If Wake-on-Lan is not active, we can bring down the link between
+ *   the MAC and PHY by calling phylink_stop().
+ * - If Wake-on-Lan is active, and being handled only by the PHY, we
+ *   can also bring down the link between the MAC and PHY.
+ * - If Wake-on-Lan is active, but being handled by the MAC, the MAC
+ *   still needs to receive packets, so we can not bring the link down.
+ */
+void phylink_suspend(struct phylink *pl, bool mac_wol)
+{
+	ASSERT_RTNL();
+
+	if (mac_wol && (!pl->netdev || pl->netdev->wol_enabled)) {
+		/* Wake-on-Lan enabled, MAC handling */
+		mutex_lock(&pl->state_mutex);
+
+		/* Stop the resolver bringing the link up */
+		__set_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state);
+
+		/* Disable the carrier, to prevent transmit timeouts,
+		 * but one would hope all packets have been sent.
+		 */
+		netif_carrier_off(pl->netdev);
+
+		/* We do not call mac_link_down() here as we want the
+		 * link to remain up to receive the WoL packets.
+		 */
+		mutex_unlock(&pl->state_mutex);
+	} else {
+		phylink_stop(pl);
+	}
+}
+EXPORT_SYMBOL_GPL(phylink_suspend);
+
+/**
+ * phylink_resume() - handle a network device resume event
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ *
+ * Undo the effects of phylink_suspend(), returning the link to an
+ * operational state.
+ */
+void phylink_resume(struct phylink *pl)
+{
+	ASSERT_RTNL();
+
+	if (test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)) {
+		/* Wake-on-Lan enabled, MAC handling */
+
+		/* Call mac_link_down() so we keep the overall state balanced.
+		 * Do this under the state_mutex lock for consistency. This
+		 * will cause a "Link Down" message to be printed during
+		 * resume, which is harmless - the true link state will be
+		 * printed when we run a resolve.
+		 */
+		mutex_lock(&pl->state_mutex);
+		phylink_link_down(pl);
+		mutex_unlock(&pl->state_mutex);
+
+		/* Re-apply the link parameters so that all the settings get
+		 * restored to the MAC.
+		 */
+		phylink_mac_initial_config(pl, true);
+		phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_MAC_WOL);
+	} else {
+		phylink_start(pl);
+	}
+}
+EXPORT_SYMBOL_GPL(phylink_resume);
+
 /**
  * phylink_ethtool_get_wol() - get the wake on lan parameters for the PHY
  * @pl: a pointer to a &struct phylink returned from phylink_create()
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index bdeec800da5c..ba0ab7126b96 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -462,6 +462,9 @@ void phylink_mac_change(struct phylink *, bool up);
 void phylink_start(struct phylink *);
 void phylink_stop(struct phylink *);
 
+void phylink_suspend(struct phylink *pl, bool mac_wol);
+void phylink_resume(struct phylink *pl);
+
 void phylink_ethtool_get_wol(struct phylink *, struct ethtool_wolinfo *);
 int phylink_ethtool_set_wol(struct phylink *, struct ethtool_wolinfo *);
 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Joakim Zhang Sept. 6, 2021, 2:21 a.m. UTC | #22
Hi Russell,

> -----Original Message-----

> From: Russell King <linux@armlinux.org.uk>

> Sent: 2021年9月3日 20:01

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: Andrew Lunn <andrew@lunn.ch>; Vladimir Oltean <olteanv@gmail.com>;

> peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; f.fainelli@gmail.com;

> hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> On Fri, Sep 03, 2021 at 11:04:57AM +0000, Joakim Zhang wrote:

> >

> > Hi Russell,

> >

> > [...]

> > > > > > > -----Original Message-----

> > > > > > > From: Andrew Lunn <andrew@lunn.ch>

> > > > > > > Sent: 2021年9月2日 20:24

> > > > > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>

> > > > > > > Cc: Russell King <linux@armlinux.org.uk>; Vladimir Oltean

> > > > > > > <olteanv@gmail.com>; peppe.cavallaro@st.com;

> > > > > > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;

> > > > > > > davem@davemloft.net; kuba@kernel.org;

> > > mcoquelin.stm32@gmail.com;

> > > > > > > netdev@vger.kernel.org; f.fainelli@gmail.com;

> > > > > > > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> > > > > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when

> > > > > > > system resume back with WoL enabled

> > > > > > >

> > > > > > > > Emm, @andrew@lunn.ch, Andrew is much familiar with FEC,

> > > > > > > > and PHY maintainers, Could you please help put insights here if

> possible?

> > > > > > >

> > > > > > > All the boards i have either have an Ethernet Switch

> > > > > > > connected to the MAC, or a Micrel PHY. None are setup for

> > > > > > > WoL, since it is not used in the use case of these boards.

> > > > > > >

> > > > > > > I think you need to scatter some printk() in various places

> > > > > > > to confirm what is going on. Where is the WoL implemented:

> > > > > > > MAC or PHY, what is suspended or not, etc.

> > > > > >

> > > > > > Thanks Andrew, Russell,

> > > > > >

> > > > > > I confirmed FEC is MAC-based WoL, and PHY is active when

> > > > > > system

> > > > > suspended if MAC-based WoL is active.

> > > > > > I scatter printk() in both phy_device.c and realtek.c phy

> > > > > > driver to debug this

> > > > > for both WoL active and inactive case.

> > > > > >

> > > > > > When MAC-based WoL is active, phy_suspend() is the last point

> > > > > > to actually

> > > > > suspend the PHY, you can see that,

> > > > > > 	phy_ethtool_get_wol(phydev, &wol);

> > > > > > 	if (wol.wolopts || (netdev && netdev->wol_enabled))

> > > > > > 		return -EBUSY;

> > > > > >

> > > > > > Here, netdev is true and netdev->wol_enabled is ture

> > > > > > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =

> > > > > > !!wol.wolopts;) So that phydev->suspend() would not be called,

> > > > > > PHY is active

> > > > > after system suspended. PHY can receive packets and pass to MAC,

> > > > > MAC is responsible for detecting magic packets then generate a

> > > > > wakeup interrupt. So all is fine for FEC, and the behavior is clear.

> > > > >

> > > > > What happens on resume with FEC?

> > > >

> > > > Since we call phy_stop() in fec_suspend(), the link is down, but

> > > > the PHY is active, after receiving magic packets, the system

> > > > resume back; In fec_resume(), after restart/init FEC, we call

> > > > phy_start() to let link up, then

> > > all is going well.

> > >

> > > ... but the link never went down! So I don't understand the last point.

> >

> > Sorry, what the meaning of "the link never went down"? How do you

> > define the link is down? May be I have not get your original point correctly.

> 

> If the link goes down, connectivity between the MAC and the outside world is

> lost - whether that be the link between the MAC and PHY, or PHY and the

> outside world. That's my definition of "link down".

> 

> However, if the MAC is still alive and receiving packets, even for Wake-on-Lan

> purposes, from the outside world then the link can not be down, it must be

> operational and therefore it must be in the "up"

> state.

> 

> I'm talking about the physical state of the link - "up" meaning capable of

> passing packets to and from the MAC, "down" meaning incapable of passing

> packets.


Yes, actually what you described is physical state, what I mean is a soft state.

> > At my side, with MAC-based WoL is active, FEC calls phy_stop() in

> > fec_suspend(), then fec_enet_adjust_link() is called, further

> > fec_stop() is called, FEC only keep necessary receive logic active to

> > service WoL. This is not the link went down? At least I see the log "

> > fec 30be0000.ethernet eth0: Link is Down".

> 

> It looks like calling phy_stop() will force a link-down event to be reported from

> phylib. As I say above though, really, this doesn't affect the physical state of the

> link, because the link has to be up for the WoL packets to be received by the

> MAC.


Yes, both MAC and PHY are active for MAC-based WoL.

> What I don't like about that is that we're saying that the link is down, whereas

> the physical link is actually still up. This is going to make network drivers

> implementation of mac_link_down() rather yucky, especially ones that force the

> physical link down at the MAC end when operating in PHY mode and they see a

> call to mac_link_down() (which they do to stop packet reception.) There's no

> way for them to know whether mac_link_down() is a result of a real physical

> link down event, or whether this is a "soft" link down event as you're describing

> at a suspend.


As I think, if WoL is active then this is a soft link down events; Instead, if WoL is
inactive this is a physical link down events.

> Then there's the issue that some network drivers _must_ see a

> mac_link_down() call to force the link down prior to reconfiguring the link (since

> settings are not allowed to be changed while the physical link is up.) So we start

> to destroy the guarantee that

> mac_link_down() and mac_link_up() will be properly ordered.

> 

> Damn it.

> 

> > > There's a few more questions:

> > > 1. Since the state at this point will be that netdev and phylink believe

> > >    the link to still be up, should phylink_suspend() force a

> > >    netif_carrier_off() event to stop the netdev transmit watchdog - I

> > >    think it ought to, even though the link will actually remain up.

> >

> > Agree.

> 

> Note that phylib in the FEC case will do this just before calling the adjust_link

> function already.

> 

> With phylink, we can make that silent, and I think it should be silent because, as

> I describe above, the physical link isn't actually going down.


Yes.

> > > 2. Should we call mac_link_down() prior to the major reconfig - I think

> > >    we should to keep the mac_link_down()/mac_link_up() calls balanced

> > >    (as we do already guarantee.) Will that do any harm for stmmac if we

> > >    were to call mac_link_down() as a result of a call to

> > >    phylink_resume() ?

> >

> > For STMMAC, I think it's safe, since we calling phylink_resume() before

> re-config MAC.

> > But we design this for common usage, other MAC drivers may call this

> > at the end of resume path, but I think it also safe, like we unplug

> > then plug the cable. However, it will print the LINK DOWN then LINK UP log,

> which is very strange when system resume back. Is there any better solution?

> 

> I think at this point, we should just print the state of the link at resume, which

> should basically be only a "link down" if the link is now physically down at

> resume.

> 

> One thing worries me though - what happens if the link parameters change

> while the system is suspended. The MAC won't be updated with the new link

> parameters. What happens on resume?


There is a state machine in phylink, then this link change events would be captured later
from reschedule the machine, right? If so, this seems not a critical issue. MAC just not working
for a while.

Yes, I am also curious if link parameters changed during system suspended, how phylib or phylink
react when system resume back. Do you know some details?

> > > Why is it different from the .ndo_stop/.ndo_open case, where the PHY

> > > may have been suspended by the actions of .ndo_stop?

> >

> > It's a good question. PHY will suspend by the actions od .ndo_stop,

> > but it will resume before we config MAC. Please see stmmac_open(),

> > stmmac_init_phy()->phylink_of_phy_connect()

> > -> phy_attach_direct(): phy_resume(phydev), where PHY will be resume

> backed.

> 

> Ah, I forgot FEC does the PHY connect/disconnect at open/stop. Thanks.


Best Regards,
Joakim Zhang
Joakim Zhang Sept. 6, 2021, 2:29 a.m. UTC | #23
Hi Russell,

> -----Original Message-----

> From: Russell King <linux@armlinux.org.uk>

> Sent: 2021年9月4日 4:12

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: Andrew Lunn <andrew@lunn.ch>; Vladimir Oltean <olteanv@gmail.com>;

> peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; f.fainelli@gmail.com;

> hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> Hi,

> 

> Here's a patch to try - you'll need to integrate the new calls into stmmac's

> suspend and resume hooks. Obviously, given my previous comments, this isn't

> tested!

> 

> I didn't need to repeat the mac_wol boolean to phylink_resume as we can

> record the state internally - mac_wol should not change between a call to

> phylink_suspend() and subsequent phylink_resume() anyway.

> 

> mac_wol should only be true if the MAC is involved in processing packets for

> WoL, false otherwise.

> 

> Please let me know if this resolves your stmmac WoL issue.


Thanks a lot for your work. 😊

There is a build issue in this patch, could you please have a check? I work on the latest net-next repo.

> Thanks.

> 

> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index

> f0c769027145..c4d0de04416a 100644

> --- a/drivers/net/phy/phylink.c

> +++ b/drivers/net/phy/phylink.c

> @@ -33,6 +33,7 @@

>  enum {

>  	PHYLINK_DISABLE_STOPPED,

>  	PHYLINK_DISABLE_LINK,

> +	PHYLINK_DISABLE_MAC_WOL,

>  };

> 

>  /**

> @@ -1313,6 +1314,9 @@ EXPORT_SYMBOL_GPL(phylink_start);

>   * network device driver's &struct net_device_ops ndo_stop() method.  The

>   * network device's carrier state should not be changed prior to calling this

>   * function.

> + *

> + * This will synchronously bring down the link if the link is not

> + already

> + * down (in other words, it will trigger a mac_link_down() method

> + call.)

>   */

>  void phylink_stop(struct phylink *pl)

>  {

> @@ -1338,6 +1342,81 @@ void phylink_stop(struct phylink *pl)  }

> EXPORT_SYMBOL_GPL(phylink_stop);

> 

> +

> +/**

> + * phylink_suspend() - handle a network device suspend event

> + * @pl: a pointer to a &struct phylink returned from phylink_create()

> + * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan

> + *

> + * Handle a network device suspend event. There are several cases:

> + * - If Wake-on-Lan is not active, we can bring down the link between

> + *   the MAC and PHY by calling phylink_stop().

> + * - If Wake-on-Lan is active, and being handled only by the PHY, we

> + *   can also bring down the link between the MAC and PHY.

> + * - If Wake-on-Lan is active, but being handled by the MAC, the MAC

> + *   still needs to receive packets, so we can not bring the link down.

> + */

> +void phylink_suspend(struct phylink *pl, bool mac_wol) {

> +	ASSERT_RTNL();

> +

> +	if (mac_wol && (!pl->netdev || pl->netdev->wol_enabled)) {

> +		/* Wake-on-Lan enabled, MAC handling */

> +		mutex_lock(&pl->state_mutex);

> +

> +		/* Stop the resolver bringing the link up */

> +		__set_bit(PHYLINK_DISABLE_MAC_WOL,

> &pl->phylink_disable_state);

> +

> +		/* Disable the carrier, to prevent transmit timeouts,

> +		 * but one would hope all packets have been sent.

> +		 */

> +		netif_carrier_off(pl->netdev);

> +

> +		/* We do not call mac_link_down() here as we want the

> +		 * link to remain up to receive the WoL packets.

> +		 */

> +		mutex_unlock(&pl->state_mutex);

> +	} else {

> +		phylink_stop(pl);

> +	}

> +}

> +EXPORT_SYMBOL_GPL(phylink_suspend);

> +

> +/**

> + * phylink_resume() - handle a network device resume event

> + * @pl: a pointer to a &struct phylink returned from phylink_create()

> + *

> + * Undo the effects of phylink_suspend(), returning the link to an

> + * operational state.

> + */

> +void phylink_resume(struct phylink *pl) {

> +	ASSERT_RTNL();

> +

> +	if (test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)) {

> +		/* Wake-on-Lan enabled, MAC handling */

> +

> +		/* Call mac_link_down() so we keep the overall state balanced.

> +		 * Do this under the state_mutex lock for consistency. This

> +		 * will cause a "Link Down" message to be printed during

> +		 * resume, which is harmless - the true link state will be

> +		 * printed when we run a resolve.

> +		 */

> +		mutex_lock(&pl->state_mutex);

> +		phylink_link_down(pl);

> +		mutex_unlock(&pl->state_mutex);

> +

> +		/* Re-apply the link parameters so that all the settings get

> +		 * restored to the MAC.

> +		 */

> +		phylink_mac_initial_config(pl, true);

> +		phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_MAC_WOL);


There is no "phylink_enable_and_run_resolve " sysbol, I guess you want do below operations in this function:
	clear_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state);
	phylink_run_resolve(pl);

> +	} else {

> +		phylink_start(pl);

> +	}

> +}

> +EXPORT_SYMBOL_GPL(phylink_resume);

> +

>  /**

>   * phylink_ethtool_get_wol() - get the wake on lan parameters for the PHY

>   * @pl: a pointer to a &struct phylink returned from phylink_create() diff --git

> a/include/linux/phylink.h b/include/linux/phylink.h index

> bdeec800da5c..ba0ab7126b96 100644

> --- a/include/linux/phylink.h

> +++ b/include/linux/phylink.h

> @@ -462,6 +462,9 @@ void phylink_mac_change(struct phylink *, bool up);

> void phylink_start(struct phylink *);  void phylink_stop(struct phylink *);

> 

> +void phylink_suspend(struct phylink *pl, bool mac_wol); void

> +phylink_resume(struct phylink *pl);

> +

>  void phylink_ethtool_get_wol(struct phylink *, struct ethtool_wolinfo *);  int

> phylink_ethtool_set_wol(struct phylink *, struct ethtool_wolinfo *);


Best Regards,
Joakim Zhang
Russell King (Oracle) Sept. 6, 2021, 9:34 a.m. UTC | #24
Hi,

On Mon, Sep 06, 2021 at 02:29:30AM +0000, Joakim Zhang wrote:
> Hi Russell,

> 

> > -----Original Message-----

> > +		/* Re-apply the link parameters so that all the settings get

> > +		 * restored to the MAC.

> > +		 */

> > +		phylink_mac_initial_config(pl, true);

> > +		phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_MAC_WOL);

> 

> There is no "phylink_enable_and_run_resolve " sysbol, I guess you want do below operations in this function:

> 	clear_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state);

> 	phylink_run_resolve(pl);


Yes, that is correct.

Please let me know whether that works for you.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Joakim Zhang Sept. 6, 2021, 10:41 a.m. UTC | #25
Hi Russell,

> -----Original Message-----

> From: Russell King <linux@armlinux.org.uk>

> Sent: 2021年9月6日 17:35

> To: Joakim Zhang <qiangqing.zhang@nxp.com>

> Cc: Andrew Lunn <andrew@lunn.ch>; Vladimir Oltean <olteanv@gmail.com>;

> peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;

> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;

> mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; f.fainelli@gmail.com;

> hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>

> Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume

> back with WoL enabled

> 

> Hi,

> 

> On Mon, Sep 06, 2021 at 02:29:30AM +0000, Joakim Zhang wrote:

> > Hi Russell,

> >

> > > -----Original Message-----

> > > +		/* Re-apply the link parameters so that all the settings get

> > > +		 * restored to the MAC.

> > > +		 */

> > > +		phylink_mac_initial_config(pl, true);

> > > +		phylink_enable_and_run_resolve(pl,

> PHYLINK_DISABLE_MAC_WOL);

> >

> > There is no "phylink_enable_and_run_resolve " sysbol, I guess you want do

> below operations in this function:

> > 	clear_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state);

> > 	phylink_run_resolve(pl);

> 

> Yes, that is correct.

> 

> Please let me know whether that works for you.


Thanks Russell, it works as we are expected, I test both MAC-based WoL active and inactive cases.

And I get the point you mentioned before, if link parameters changed during system suspended, what would happen?
I tried both FEC and STMMAC, the system can't be waked up via remote magic packets!!!
I have not think about this scenario before....

Since net-next is closed, so I would cook a patch set (keep you as the phylink patch author) after it re-open, could you
accept it? Or you plan to prepare this patch set for stmmac?

There is also a problem, we need a fix for LTS (5.10, 5.15), above patch set should go to 5.16, do you have any suggestion?

Best Regards,
Joakim Zhang
Russell King (Oracle) Sept. 6, 2021, 11:21 a.m. UTC | #26
On Mon, Sep 06, 2021 at 10:41:48AM +0000, Joakim Zhang wrote:
> Hi Russell,

> 

> Thanks Russell, it works as we are expected, I test both MAC-based WoL active and inactive cases.


Yay!

> And I get the point you mentioned before, if link parameters changed during system suspended, what would happen?


That's a problem with any setup that uses the MAC to detect WoL packets,
and the MAC requires software to run if the link state changes. It is
one of the fundamental problems of MAC-side WoL detection.

I see two possible solutions to this problem:

1) If the link changes, the PHY needs to wake the system up from suspend
   so that software can run to reprogram the MAC for the new link
   parameters, and then go back to sleep.

2) We need to set the link to a state which reduces the chances of the
   link parameters changing.

I don't think we have any support in the kernel for (1) - we assume if
we are woken up than the system as a whole will become operational, so
there's no automatic "go back to sleep".

We do have the ability to place the link into the slowest mutually
supported speed via phy(link)?_speed_down(). This has the advantage of
reducing the power used to keep the link active while in suspend (which
is its primary purpose) but also reduces the possible link modes that
could be autonegotiated with the partner.

I think I'd suggest to Andrew that phy_speed_down() should only
advertise one capability, not "everything we support below the minimum
mutually supported capability" - that way, if a link change is attempted
on the partner while the system is suspended, the link will not come up
and its obvious it isn't going to work.

I think this is an issue for a separate patch set.

> Since net-next is closed, so I would cook a patch set (keep you as the

> phylink patch author) after it re-open, could you accept it? Or you plan

> to prepare this patch set for stmmac?


As the bug was introduced in v5.7, this is a regression, and the fix
isn't too complex. I believe it's suitable for the net tree.

I'll prepare a proper patch for the net tree for phylink, which I'll
send you, so you can include with your patch set.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Andrew Lunn Sept. 6, 2021, 1:23 p.m. UTC | #27
> We do have the ability to place the link into the slowest mutually

> supported speed via phy(link)?_speed_down(). This has the advantage of

> reducing the power used to keep the link active while in suspend (which

> is its primary purpose) but also reduces the possible link modes that

> could be autonegotiated with the partner.

> 

> I think I'd suggest to Andrew that phy_speed_down() should only

> advertise one capability, not "everything we support below the minimum

> mutually supported capability" - that way, if a link change is attempted

> on the partner while the system is suspended, the link will not come up

> and its obvious it isn't going to work.


Yes, that sounds reasonable.

> I think this is an issue for a separate patch set.


Yes, i would say a change like that is net-next material.

     Andrew
Russell King (Oracle) Sept. 7, 2021, 8:52 a.m. UTC | #28
On Mon, Sep 06, 2021 at 12:22:00PM +0100, Russell King (Oracle) wrote:
> I'll prepare a proper patch for the net tree for phylink, which I'll

> send you, so you can include with your patch set.


Hi,

Here is the patch. I've tweaked some of the comments a little bit
further to explain more what is going on. Please submit this with
your patch for stmmac.

Thanks.

From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>

Subject: [PATCH net] net: phylink: add suspend/resume support

Joakim Zhang reports that Wake-on-Lan with the stmmac ethernet driver
broke when moving the incorrect handling of mac link state out of
mac_config(). This reason this breaks is because the stmmac's WoL is
handled by the MAC rather than the PHY, and phylink doesn't cater for
that scenario.

This patch adds the necessary phylink code to handle suspend/resume
events according to whether the MAC still needs a valid link or not.
This is the barest minimum for this support.

Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

---
 drivers/net/phy/phylink.c | 83 +++++++++++++++++++++++++++++++++++++++
 include/linux/phylink.h   |  3 ++
 2 files changed, 86 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 2cdf9f989dec..13d9191e8526 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -33,6 +33,7 @@
 enum {
 	PHYLINK_DISABLE_STOPPED,
 	PHYLINK_DISABLE_LINK,
+	PHYLINK_DISABLE_MAC_WOL,
 };
 
 /**
@@ -1282,6 +1283,9 @@ EXPORT_SYMBOL_GPL(phylink_start);
  * network device driver's &struct net_device_ops ndo_stop() method.  The
  * network device's carrier state should not be changed prior to calling this
  * function.
+ *
+ * This will synchronously bring down the link if the link is not already
+ * down (in other words, it will trigger a mac_link_down() method call.)
  */
 void phylink_stop(struct phylink *pl)
 {
@@ -1301,6 +1305,85 @@ void phylink_stop(struct phylink *pl)
 }
 EXPORT_SYMBOL_GPL(phylink_stop);
 
+
+/**
+ * phylink_suspend() - handle a network device suspend event
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan
+ *
+ * Handle a network device suspend event. There are several cases:
+ * - If Wake-on-Lan is not active, we can bring down the link between
+ *   the MAC and PHY by calling phylink_stop().
+ * - If Wake-on-Lan is active, and being handled only by the PHY, we
+ *   can also bring down the link between the MAC and PHY.
+ * - If Wake-on-Lan is active, but being handled by the MAC, the MAC
+ *   still needs to receive packets, so we can not bring the link down.
+ */
+void phylink_suspend(struct phylink *pl, bool mac_wol)
+{
+	ASSERT_RTNL();
+
+	if (mac_wol && (!pl->netdev || pl->netdev->wol_enabled)) {
+		/* Wake-on-Lan enabled, MAC handling */
+		mutex_lock(&pl->state_mutex);
+
+		/* Stop the resolver bringing the link up */
+		__set_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state);
+
+		/* Disable the carrier, to prevent transmit timeouts,
+		 * but one would hope all packets have been sent. This
+		 * also means phylink_resolve() will do nothing.
+		 */
+		netif_carrier_off(pl->netdev);
+
+		/* We do not call mac_link_down() here as we want the
+		 * link to remain up to receive the WoL packets.
+		 */
+		mutex_unlock(&pl->state_mutex);
+	} else {
+		phylink_stop(pl);
+	}
+}
+EXPORT_SYMBOL_GPL(phylink_suspend);
+
+/**
+ * phylink_resume() - handle a network device resume event
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ *
+ * Undo the effects of phylink_suspend(), returning the link to an
+ * operational state.
+ */
+void phylink_resume(struct phylink *pl)
+{
+	ASSERT_RTNL();
+
+	if (test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)) {
+		/* Wake-on-Lan enabled, MAC handling */
+
+		/* Call mac_link_down() so we keep the overall state balanced.
+		 * Do this under the state_mutex lock for consistency. This
+		 * will cause a "Link Down" message to be printed during
+		 * resume, which is harmless - the true link state will be
+		 * printed when we run a resolve.
+		 */
+		mutex_lock(&pl->state_mutex);
+		phylink_link_down(pl);
+		mutex_unlock(&pl->state_mutex);
+
+		/* Re-apply the link parameters so that all the settings get
+		 * restored to the MAC.
+		 */
+		phylink_mac_initial_config(pl, true);
+
+		/* Re-enable and re-resolve the link parameters */
+		clear_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state);
+		phylink_run_resolve(pl);
+	} else {
+		phylink_start(pl);
+	}
+}
+EXPORT_SYMBOL_GPL(phylink_resume);
+
 /**
  * phylink_ethtool_get_wol() - get the wake on lan parameters for the PHY
  * @pl: a pointer to a &struct phylink returned from phylink_create()
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index afb3ded0b691..237291196ce2 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -451,6 +451,9 @@ void phylink_mac_change(struct phylink *, bool up);
 void phylink_start(struct phylink *);
 void phylink_stop(struct phylink *);
 
+void phylink_suspend(struct phylink *pl, bool mac_wol);
+void phylink_resume(struct phylink *pl);
+
 void phylink_ethtool_get_wol(struct phylink *, struct ethtool_wolinfo *);
 int phylink_ethtool_set_wol(struct phylink *, struct ethtool_wolinfo *);
 
-- 
2.30.2

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ed0cd3920171..c0ed4b07d24a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1007,45 +1007,6 @@  static void stmmac_validate(struct phylink_config *config,
 
 static void stmmac_mac_config(struct phylink_config *config, unsigned int mode,
 			      const struct phylink_link_state *state)
-{
-	/* Nothing to do, xpcs_config() handles everything */
-}
-
-static void stmmac_fpe_link_state_handle(struct stmmac_priv *priv, bool is_up)
-{
-	struct stmmac_fpe_cfg *fpe_cfg = priv->plat->fpe_cfg;
-	enum stmmac_fpe_state *lo_state = &fpe_cfg->lo_fpe_state;
-	enum stmmac_fpe_state *lp_state = &fpe_cfg->lp_fpe_state;
-	bool *hs_enable = &fpe_cfg->hs_enable;
-
-	if (is_up && *hs_enable) {
-		stmmac_fpe_send_mpacket(priv, priv->ioaddr, MPACKET_VERIFY);
-	} else {
-		*lo_state = FPE_STATE_OFF;
-		*lp_state = FPE_STATE_OFF;
-	}
-}
-
-static void stmmac_mac_link_down(struct phylink_config *config,
-				 unsigned int mode, phy_interface_t interface)
-{
-	struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev));
-
-	stmmac_mac_set(priv, priv->ioaddr, false);
-	priv->eee_active = false;
-	priv->tx_lpi_enabled = false;
-	stmmac_eee_init(priv);
-	stmmac_set_eee_pls(priv, priv->hw, false);
-
-	if (priv->dma_cap.fpesel)
-		stmmac_fpe_link_state_handle(priv, false);
-}
-
-static void stmmac_mac_link_up(struct phylink_config *config,
-			       struct phy_device *phy,
-			       unsigned int mode, phy_interface_t interface,
-			       int speed, int duplex,
-			       bool tx_pause, bool rx_pause)
 {
 	struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev));
 	u32 ctrl;
@@ -1053,8 +1014,8 @@  static void stmmac_mac_link_up(struct phylink_config *config,
 	ctrl = readl(priv->ioaddr + MAC_CTRL_REG);
 	ctrl &= ~priv->hw->link.speed_mask;
 
-	if (interface == PHY_INTERFACE_MODE_USXGMII) {
-		switch (speed) {
+	if (state->interface == PHY_INTERFACE_MODE_USXGMII) {
+		switch (state->speed) {
 		case SPEED_10000:
 			ctrl |= priv->hw->link.xgmii.speed10000;
 			break;
@@ -1067,8 +1028,8 @@  static void stmmac_mac_link_up(struct phylink_config *config,
 		default:
 			return;
 		}
-	} else if (interface == PHY_INTERFACE_MODE_XLGMII) {
-		switch (speed) {
+	} else if (state->interface == PHY_INTERFACE_MODE_XLGMII) {
+		switch (state->speed) {
 		case SPEED_100000:
 			ctrl |= priv->hw->link.xlgmii.speed100000;
 			break;
@@ -1094,7 +1055,7 @@  static void stmmac_mac_link_up(struct phylink_config *config,
 			return;
 		}
 	} else {
-		switch (speed) {
+		switch (state->speed) {
 		case SPEED_2500:
 			ctrl |= priv->hw->link.speed2500;
 			break;
@@ -1112,21 +1073,60 @@  static void stmmac_mac_link_up(struct phylink_config *config,
 		}
 	}
 
-	priv->speed = speed;
+	priv->speed = state->speed;
 
 	if (priv->plat->fix_mac_speed)
-		priv->plat->fix_mac_speed(priv->plat->bsp_priv, speed);
+		priv->plat->fix_mac_speed(priv->plat->bsp_priv, state->speed);
 
-	if (!duplex)
+	if (!state->duplex)
 		ctrl &= ~priv->hw->link.duplex;
 	else
 		ctrl |= priv->hw->link.duplex;
 
 	/* Flow Control operation */
-	if (tx_pause && rx_pause)
-		stmmac_mac_flow_ctrl(priv, duplex);
+	if (state->pause)
+		stmmac_mac_flow_ctrl(priv, state->duplex);
 
 	writel(ctrl, priv->ioaddr + MAC_CTRL_REG);
+}
+
+static void stmmac_fpe_link_state_handle(struct stmmac_priv *priv, bool is_up)
+{
+	struct stmmac_fpe_cfg *fpe_cfg = priv->plat->fpe_cfg;
+	enum stmmac_fpe_state *lo_state = &fpe_cfg->lo_fpe_state;
+	enum stmmac_fpe_state *lp_state = &fpe_cfg->lp_fpe_state;
+	bool *hs_enable = &fpe_cfg->hs_enable;
+
+	if (is_up && *hs_enable) {
+		stmmac_fpe_send_mpacket(priv, priv->ioaddr, MPACKET_VERIFY);
+	} else {
+		*lo_state = FPE_STATE_OFF;
+		*lp_state = FPE_STATE_OFF;
+	}
+}
+
+static void stmmac_mac_link_down(struct phylink_config *config,
+				 unsigned int mode, phy_interface_t interface)
+{
+	struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev));
+
+	stmmac_mac_set(priv, priv->ioaddr, false);
+	priv->eee_active = false;
+	priv->tx_lpi_enabled = false;
+	stmmac_eee_init(priv);
+	stmmac_set_eee_pls(priv, priv->hw, false);
+
+	if (priv->dma_cap.fpesel)
+		stmmac_fpe_link_state_handle(priv, false);
+}
+
+static void stmmac_mac_link_up(struct phylink_config *config,
+			       struct phy_device *phy,
+			       unsigned int mode, phy_interface_t interface,
+			       int speed, int duplex,
+			       bool tx_pause, bool rx_pause)
+{
+	struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev));
 
 	stmmac_mac_set(priv, priv->ioaddr, true);
 	if (phy && priv->dma_cap.eee) {