diff mbox

[RFC] net: phy: Don't miss phy_suspend() on PHY_HALTED for PHYs with interrupts

Message ID 1489585887-8683-1-git-send-email-rogerq@ti.com
State New
Headers show

Commit Message

Roger Quadros March 15, 2017, 1:51 p.m. UTC
Since commit 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")
phy_suspend() doesn't get called as part of phy_stop() for PHYs using
interrupts because the phy state machine is never triggered after a phy_stop().

Explicitly trigger the PHY state machine so that it can
see the new PHY state (HALTED) and suspend the PHY.

Signed-off-by: Roger Quadros <rogerq@ti.com>

---
 drivers/net/phy/phy.c | 1 +
 1 file changed, 1 insertion(+)

-- 
2.7.4

Comments

Andrew Lunn March 15, 2017, 2:08 p.m. UTC | #1
On Wed, Mar 15, 2017 at 03:51:27PM +0200, Roger Quadros wrote:
> Since commit 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")

> phy_suspend() doesn't get called as part of phy_stop() for PHYs using

> interrupts because the phy state machine is never triggered after a phy_stop().

> 

> Explicitly trigger the PHY state machine so that it can

> see the new PHY state (HALTED) and suspend the PHY.

> 

> Signed-off-by: Roger Quadros <rogerq@ti.com>


Hi Roger

This seems sensible. It mirrors what phy_start() does.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>


It does however lead to a follow up question. Are there other places
phydev->state is changed and it is missing a phy_trigger_machine()?

	      Andrew
Roger Quadros March 15, 2017, 3 p.m. UTC | #2
Andrew,

On 15/03/17 16:08, Andrew Lunn wrote:
> On Wed, Mar 15, 2017 at 03:51:27PM +0200, Roger Quadros wrote:

>> Since commit 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")

>> phy_suspend() doesn't get called as part of phy_stop() for PHYs using

>> interrupts because the phy state machine is never triggered after a phy_stop().

>>

>> Explicitly trigger the PHY state machine so that it can

>> see the new PHY state (HALTED) and suspend the PHY.

>>

>> Signed-off-by: Roger Quadros <rogerq@ti.com>

> 

> Hi Roger

> 

> This seems sensible. It mirrors what phy_start() does.

> 

> Reviewed-by: Andrew Lunn <andrew@lunn.ch>


The reason for this being an RFC was the following comment just before
where I add the phy_trigger_machine()

        /* Cannot call flush_scheduled_work() here as desired because
         * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()
         * will not reenable interrupts.
         */

Is this comment still applicable? If yes, is it OK to call
phy_trigger_machine() there?

> 

> It does however lead to a follow up question. Are there other places

> phydev->state is changed and it is missing a phy_trigger_machine()?

> 


One other place I think we should add phy_trigger_machine() is phy_start_aneg().

Since commit 3c293f4e08b5 I was not getting my ethernet link to come up if
ethernet cable was plugged before the ethernet interface was brought up.
The PHY seems to be stuck from RUNNING to AN state with no new interrupts from the
PHY.

I could workaround it on my board by doing either of the following:

1) explicitly halt the PHY at ethernet driver probe time. Then when
network interface is brought up it resumes the PHY and we see the
Link/ANEG done interrupt.

2) force BMCR_ANRESTART every time from .config_aneg in the PHY driver.

I will still keep approach 1 as it is desirable to put the PHY in low power mode
for us but I need a second opinion if we should call phy_trigger_machine()
from phy_start_aneg() or not.

-- 
cheers,
-roger
Andrew Lunn March 15, 2017, 3:49 p.m. UTC | #3
On Wed, Mar 15, 2017 at 05:00:08PM +0200, Roger Quadros wrote:
> Andrew,

> 

> On 15/03/17 16:08, Andrew Lunn wrote:

> > On Wed, Mar 15, 2017 at 03:51:27PM +0200, Roger Quadros wrote:

> >> Since commit 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")

> >> phy_suspend() doesn't get called as part of phy_stop() for PHYs using

> >> interrupts because the phy state machine is never triggered after a phy_stop().

> >>

> >> Explicitly trigger the PHY state machine so that it can

> >> see the new PHY state (HALTED) and suspend the PHY.

> >>

> >> Signed-off-by: Roger Quadros <rogerq@ti.com>

> > 

> > Hi Roger

> > 

> > This seems sensible. It mirrors what phy_start() does.

> > 

> > Reviewed-by: Andrew Lunn <andrew@lunn.ch>

> 

> The reason for this being an RFC was the following comment just before

> where I add the phy_trigger_machine()

> 

>         /* Cannot call flush_scheduled_work() here as desired because

>          * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()

>          * will not reenable interrupts.

>          */

> 

> Is this comment still applicable? If yes, is it OK to call

> phy_trigger_machine() there?


Humm, good question.

My _guess_ would be, calling it with sync=True could
deadlock. sync=False is probably safe. But lets see what Florian says.

> 

> > 

> > It does however lead to a follow up question. Are there other places

> > phydev->state is changed and it is missing a phy_trigger_machine()?

> > 

> 

> One other place I think we should add phy_trigger_machine() is phy_start_aneg().


Humm, that might get us into a tight loop.

phy_start_aneg() kicks the phy driver to start autoneg and sets
phydev->state = PHY_AN.

phy_trigger_machine() triggers the state machine immediately. 

In state PHY_AN, we check if aneg is done. If not, it sets needs_aneg
= true. At the end of the state machine, this then calls
phy_start_aneg(), and it all starts again.

We are missing the 1s delay we have with polling. So for
phy_start_aneg(), we might need a phy_delayed_trigger_machine(), which
waits a second before doing anything?

      Andrew
Roger Quadros March 16, 2017, 7:46 a.m. UTC | #4
On 15/03/17 17:49, Andrew Lunn wrote:
> On Wed, Mar 15, 2017 at 05:00:08PM +0200, Roger Quadros wrote:

>> Andrew,

>>

>> On 15/03/17 16:08, Andrew Lunn wrote:

>>> On Wed, Mar 15, 2017 at 03:51:27PM +0200, Roger Quadros wrote:

>>>> Since commit 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")

>>>> phy_suspend() doesn't get called as part of phy_stop() for PHYs using

>>>> interrupts because the phy state machine is never triggered after a phy_stop().

>>>>

>>>> Explicitly trigger the PHY state machine so that it can

>>>> see the new PHY state (HALTED) and suspend the PHY.

>>>>

>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>

>>> Hi Roger

>>>

>>> This seems sensible. It mirrors what phy_start() does.

>>>

>>> Reviewed-by: Andrew Lunn <andrew@lunn.ch>

>>

>> The reason for this being an RFC was the following comment just before

>> where I add the phy_trigger_machine()

>>

>>         /* Cannot call flush_scheduled_work() here as desired because

>>          * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()

>>          * will not reenable interrupts.

>>          */

>>

>> Is this comment still applicable? If yes, is it OK to call

>> phy_trigger_machine() there?

> 

> Humm, good question.

> 

> My _guess_ would be, calling it with sync=True could

> deadlock. sync=False is probably safe. But lets see what Florian says.


I agree that we should use phy_trigger_machine() with sync=False.

> 

>>

>>>

>>> It does however lead to a follow up question. Are there other places

>>> phydev->state is changed and it is missing a phy_trigger_machine()?

>>>

>>

>> One other place I think we should add phy_trigger_machine() is phy_start_aneg().

> 

> Humm, that might get us into a tight loop.

> 

> phy_start_aneg() kicks the phy driver to start autoneg and sets

> phydev->state = PHY_AN.

> 

> phy_trigger_machine() triggers the state machine immediately. 

> 

> In state PHY_AN, we check if aneg is done. If not, it sets needs_aneg

> = true. At the end of the state machine, this then calls

> phy_start_aneg(), and it all starts again.

> 

> We are missing the 1s delay we have with polling. So for

> phy_start_aneg(), we might need a phy_delayed_trigger_machine(), which

> waits a second before doing anything?


I think that should do the trick.

How about this?


-- 
cheers,
-rogerdiff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 8fef03b..162061c 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -630,6 +630,10 @@ int phy_start_aneg(struct phy_device *phydev)
 
 out_unlock:
 	mutex_unlock(&phydev->lock);
+
+	if (!err)
+		queue_delayed_work(system_power_efficient_wq, &phydev->state_queue, HZ);
+
 	return err;
 }
 EXPORT_SYMBOL(phy_start_aneg);

Florian Fainelli March 20, 2017, 4:41 p.m. UTC | #5
On 03/16/2017 12:46 AM, Roger Quadros wrote:
> On 15/03/17 17:49, Andrew Lunn wrote:

>> On Wed, Mar 15, 2017 at 05:00:08PM +0200, Roger Quadros wrote:

>>> Andrew,

>>>

>>> On 15/03/17 16:08, Andrew Lunn wrote:

>>>> On Wed, Mar 15, 2017 at 03:51:27PM +0200, Roger Quadros wrote:

>>>>> Since commit 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")

>>>>> phy_suspend() doesn't get called as part of phy_stop() for PHYs using

>>>>> interrupts because the phy state machine is never triggered after a phy_stop().

>>>>>

>>>>> Explicitly trigger the PHY state machine so that it can

>>>>> see the new PHY state (HALTED) and suspend the PHY.

>>>>>

>>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>>

>>>> Hi Roger

>>>>

>>>> This seems sensible. It mirrors what phy_start() does.

>>>>

>>>> Reviewed-by: Andrew Lunn <andrew@lunn.ch>

>>>

>>> The reason for this being an RFC was the following comment just before

>>> where I add the phy_trigger_machine()

>>>

>>>         /* Cannot call flush_scheduled_work() here as desired because

>>>          * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()

>>>          * will not reenable interrupts.

>>>          */

>>>

>>> Is this comment still applicable? If yes, is it OK to call

>>> phy_trigger_machine() there?

>>

>> Humm, good question.

>>

>> My _guess_ would be, calling it with sync=True could

>> deadlock. sync=False is probably safe. But lets see what Florian says.

> 

> I agree that we should use phy_trigger_machine() with sync=False.

> 

>>

>>>

>>>>

>>>> It does however lead to a follow up question. Are there other places

>>>> phydev->state is changed and it is missing a phy_trigger_machine()?

>>>>

>>>

>>> One other place I think we should add phy_trigger_machine() is phy_start_aneg().

>>

>> Humm, that might get us into a tight loop.

>>

>> phy_start_aneg() kicks the phy driver to start autoneg and sets

>> phydev->state = PHY_AN.

>>

>> phy_trigger_machine() triggers the state machine immediately. 

>>

>> In state PHY_AN, we check if aneg is done. If not, it sets needs_aneg

>> = true. At the end of the state machine, this then calls

>> phy_start_aneg(), and it all starts again.

>>

>> We are missing the 1s delay we have with polling. So for

>> phy_start_aneg(), we might need a phy_delayed_trigger_machine(), which

>> waits a second before doing anything?

> 

> I think that should do the trick.

> 

> How about this?


This sounds like a possible fix indeed, however I would like to better
assess the impact on non interrupt driven PHYs before rolling such a change.

> 

> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c

> index 8fef03b..162061c 100644

> --- a/drivers/net/phy/phy.c

> +++ b/drivers/net/phy/phy.c

> @@ -630,6 +630,10 @@ int phy_start_aneg(struct phy_device *phydev)

>  

>  out_unlock:

>  	mutex_unlock(&phydev->lock);

> +

> +	if (!err)

> +		queue_delayed_work(system_power_efficient_wq, &phydev->state_queue, HZ);

> +

>  	return err;

>  }

>  EXPORT_SYMBOL(phy_start_aneg);

> 



-- 
Florian
Florian Fainelli March 20, 2017, 4:42 p.m. UTC | #6
On 03/15/2017 08:00 AM, Roger Quadros wrote:
> Andrew,

> 

> On 15/03/17 16:08, Andrew Lunn wrote:

>> On Wed, Mar 15, 2017 at 03:51:27PM +0200, Roger Quadros wrote:

>>> Since commit 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")

>>> phy_suspend() doesn't get called as part of phy_stop() for PHYs using

>>> interrupts because the phy state machine is never triggered after a phy_stop().

>>>

>>> Explicitly trigger the PHY state machine so that it can

>>> see the new PHY state (HALTED) and suspend the PHY.

>>>

>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>

>> Hi Roger

>>

>> This seems sensible. It mirrors what phy_start() does.

>>

>> Reviewed-by: Andrew Lunn <andrew@lunn.ch>

> 

> The reason for this being an RFC was the following comment just before

> where I add the phy_trigger_machine()

> 

>         /* Cannot call flush_scheduled_work() here as desired because

>          * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()

>          * will not reenable interrupts.

>          */

> 

> Is this comment still applicable? If yes, is it OK to call

> phy_trigger_machine() there?


I think the comment is still applicable, most PHYLIB consumers will call
this with rtnl_lock() held from ndo_close() for instance.

> 

>>

>> It does however lead to a follow up question. Are there other places

>> phydev->state is changed and it is missing a phy_trigger_machine()?

>>

> 

> One other place I think we should add phy_trigger_machine() is phy_start_aneg().

> 

> Since commit 3c293f4e08b5 I was not getting my ethernet link to come up if

> ethernet cable was plugged before the ethernet interface was brought up.

> The PHY seems to be stuck from RUNNING to AN state with no new interrupts from the

> PHY.

> 

> I could workaround it on my board by doing either of the following:

> 

> 1) explicitly halt the PHY at ethernet driver probe time. Then when

> network interface is brought up it resumes the PHY and we see the

> Link/ANEG done interrupt.

> 

> 2) force BMCR_ANRESTART every time from .config_aneg in the PHY driver.

> 

> I will still keep approach 1 as it is desirable to put the PHY in low power mode

> for us but I need a second opinion if we should call phy_trigger_machine()

> from phy_start_aneg() or not.

> 



-- 
Florian
Roger Quadros March 21, 2017, 10:09 a.m. UTC | #7
On 20/03/17 18:41, Florian Fainelli wrote:
> On 03/16/2017 12:46 AM, Roger Quadros wrote:

>> On 15/03/17 17:49, Andrew Lunn wrote:

>>> On Wed, Mar 15, 2017 at 05:00:08PM +0200, Roger Quadros wrote:

>>>> Andrew,

>>>>

>>>> On 15/03/17 16:08, Andrew Lunn wrote:

>>>>> On Wed, Mar 15, 2017 at 03:51:27PM +0200, Roger Quadros wrote:

>>>>>> Since commit 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")

>>>>>> phy_suspend() doesn't get called as part of phy_stop() for PHYs using

>>>>>> interrupts because the phy state machine is never triggered after a phy_stop().

>>>>>>

>>>>>> Explicitly trigger the PHY state machine so that it can

>>>>>> see the new PHY state (HALTED) and suspend the PHY.

>>>>>>

>>>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>>>

>>>>> Hi Roger

>>>>>

>>>>> This seems sensible. It mirrors what phy_start() does.

>>>>>

>>>>> Reviewed-by: Andrew Lunn <andrew@lunn.ch>

>>>>

>>>> The reason for this being an RFC was the following comment just before

>>>> where I add the phy_trigger_machine()

>>>>

>>>>         /* Cannot call flush_scheduled_work() here as desired because

>>>>          * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()

>>>>          * will not reenable interrupts.

>>>>          */

>>>>

>>>> Is this comment still applicable? If yes, is it OK to call

>>>> phy_trigger_machine() there?

>>>

>>> Humm, good question.

>>>

>>> My _guess_ would be, calling it with sync=True could

>>> deadlock. sync=False is probably safe. But lets see what Florian says.

>>

>> I agree that we should use phy_trigger_machine() with sync=False.

>>

>>>

>>>>

>>>>>

>>>>> It does however lead to a follow up question. Are there other places

>>>>> phydev->state is changed and it is missing a phy_trigger_machine()?

>>>>>

>>>>

>>>> One other place I think we should add phy_trigger_machine() is phy_start_aneg().

>>>

>>> Humm, that might get us into a tight loop.

>>>

>>> phy_start_aneg() kicks the phy driver to start autoneg and sets

>>> phydev->state = PHY_AN.

>>>

>>> phy_trigger_machine() triggers the state machine immediately. 

>>>

>>> In state PHY_AN, we check if aneg is done. If not, it sets needs_aneg

>>> = true. At the end of the state machine, this then calls

>>> phy_start_aneg(), and it all starts again.

>>>

>>> We are missing the 1s delay we have with polling. So for

>>> phy_start_aneg(), we might need a phy_delayed_trigger_machine(), which

>>> waits a second before doing anything?

>>

>> I think that should do the trick.

>>

>> How about this?

> 

> This sounds like a possible fix indeed, however I would like to better

> assess the impact on non interrupt driven PHYs before rolling such a change.


Is it safer if I add a check to do this only for interrupt driven PHYs?

e.g.

-- 
2.7.4

-- 
cheers,
-rogerdiff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 4b855f2..e0f5755 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -630,6 +630,9 @@ int phy_start_aneg(struct phy_device *phydev)
 
 out_unlock:
 	mutex_unlock(&phydev->lock);
+	if (!err && phy_interrupt_is_valid(phydev))
+		queue_delayed_work(system_power_efficient_wq, &phydev->state_queue, HZ);
+
 	return err;
 }
 EXPORT_SYMBOL(phy_start_aneg);

Florian Fainelli March 21, 2017, 4 p.m. UTC | #8
On 03/21/2017 03:09 AM, Roger Quadros wrote:
> On 20/03/17 18:41, Florian Fainelli wrote:

>> On 03/16/2017 12:46 AM, Roger Quadros wrote:

>>> On 15/03/17 17:49, Andrew Lunn wrote:

>>>> On Wed, Mar 15, 2017 at 05:00:08PM +0200, Roger Quadros wrote:

>>>>> Andrew,

>>>>>

>>>>> On 15/03/17 16:08, Andrew Lunn wrote:

>>>>>> On Wed, Mar 15, 2017 at 03:51:27PM +0200, Roger Quadros wrote:

>>>>>>> Since commit 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")

>>>>>>> phy_suspend() doesn't get called as part of phy_stop() for PHYs using

>>>>>>> interrupts because the phy state machine is never triggered after a phy_stop().

>>>>>>>

>>>>>>> Explicitly trigger the PHY state machine so that it can

>>>>>>> see the new PHY state (HALTED) and suspend the PHY.

>>>>>>>

>>>>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>>>>

>>>>>> Hi Roger

>>>>>>

>>>>>> This seems sensible. It mirrors what phy_start() does.

>>>>>>

>>>>>> Reviewed-by: Andrew Lunn <andrew@lunn.ch>

>>>>>

>>>>> The reason for this being an RFC was the following comment just before

>>>>> where I add the phy_trigger_machine()

>>>>>

>>>>>         /* Cannot call flush_scheduled_work() here as desired because

>>>>>          * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()

>>>>>          * will not reenable interrupts.

>>>>>          */

>>>>>

>>>>> Is this comment still applicable? If yes, is it OK to call

>>>>> phy_trigger_machine() there?

>>>>

>>>> Humm, good question.

>>>>

>>>> My _guess_ would be, calling it with sync=True could

>>>> deadlock. sync=False is probably safe. But lets see what Florian says.

>>>

>>> I agree that we should use phy_trigger_machine() with sync=False.

>>>

>>>>

>>>>>

>>>>>>

>>>>>> It does however lead to a follow up question. Are there other places

>>>>>> phydev->state is changed and it is missing a phy_trigger_machine()?

>>>>>>

>>>>>

>>>>> One other place I think we should add phy_trigger_machine() is phy_start_aneg().

>>>>

>>>> Humm, that might get us into a tight loop.

>>>>

>>>> phy_start_aneg() kicks the phy driver to start autoneg and sets

>>>> phydev->state = PHY_AN.

>>>>

>>>> phy_trigger_machine() triggers the state machine immediately. 

>>>>

>>>> In state PHY_AN, we check if aneg is done. If not, it sets needs_aneg

>>>> = true. At the end of the state machine, this then calls

>>>> phy_start_aneg(), and it all starts again.

>>>>

>>>> We are missing the 1s delay we have with polling. So for

>>>> phy_start_aneg(), we might need a phy_delayed_trigger_machine(), which

>>>> waits a second before doing anything?

>>>

>>> I think that should do the trick.

>>>

>>> How about this?

>>

>> This sounds like a possible fix indeed, however I would like to better

>> assess the impact on non interrupt driven PHYs before rolling such a change.

> 

> Is it safer if I add a check to do this only for interrupt driven PHYs?


Yes I think this is a good solution that would not impact polled PHYs.
Can you submit a formal patch for that change?

Thanks!

> 

> e.g.

> 

> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c

> index 4b855f2..e0f5755 100644

> --- a/drivers/net/phy/phy.c

> +++ b/drivers/net/phy/phy.c

> @@ -630,6 +630,9 @@ int phy_start_aneg(struct phy_device *phydev)

>  

>  out_unlock:

>  	mutex_unlock(&phydev->lock);

> +	if (!err && phy_interrupt_is_valid(phydev))

> +		queue_delayed_work(system_power_efficient_wq, &phydev->state_queue, HZ);

> +

>  	return err;

>  }

>  EXPORT_SYMBOL(phy_start_aneg);

> 



-- 
Florian
diff mbox

Patch

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 1be69d8..8fef03b 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -903,6 +903,7 @@  void phy_stop(struct phy_device *phydev)
 	 * of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()
 	 * will not reenable interrupts.
 	 */
+	phy_trigger_machine(phydev, true);
 }
 EXPORT_SYMBOL(phy_stop);