[v2] e1000e: Increase iteration on polling MDIC ready bit

Message ID	20200924150958.18016-1-kai.heng.feng@canonical.com
State	New
Headers	show Return-Path: <SRS0=kx4c=DB=vger.kernel.org=netdev-owner@kernel.org> From: Kai-Heng Feng <kai.heng.feng@canonical.com> To: jeffrey.t.kirsher@intel.com Cc: andrew@lunn.ch, Kai-Heng Feng <kai.heng.feng@canonical.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, intel-wired-lan@lists.osuosl.org (moderated list:INTEL ETHERNET DRIVERS), netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list) Subject: [PATCH v2] e1000e: Increase iteration on polling MDIC ready bit Date: Thu, 24 Sep 2020 23:09:58 +0800 Message-Id: <20200924150958.18016-1-kai.heng.feng@canonical.com> In-Reply-To: <20200923074751.10527-1-kai.heng.feng@canonical.com> References: <20200923074751.10527-1-kai.heng.feng@canonical.com> Precedence: bulk
Series	[v2] e1000e: Increase iteration on polling MDIC ready bit \| expand [v2] e1000e: Increase iteration on polling MDIC ready bit

Kai-Heng Feng Sept. 24, 2020, 3:09 p.m. UTC

We are seeing the following error after S3 resume:
[  704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
[  704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete
[  704.902817] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
[  704.903075] e1000e 0000:00:1f.6 eno1: reading PHY page 769 (or 0x6020 shifted) reg 0x17
[  704.903281] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
[  704.903486] e1000e 0000:00:1f.6 eno1: writing PHY page 769 (or 0x6020 shifted) reg 0x17
[  704.943155] e1000e 0000:00:1f.6 eno1: MDI Error
...
[  705.108161] e1000e 0000:00:1f.6 eno1: Hardware Error

As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
increase polling iteration can resolve the issue.

While at it, also move the delay to the end of loop, to potentially save
50 us.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v2:
 - Increase polling iteration instead of powering down the phy.

 drivers/net/ethernet/intel/e1000e/phy.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Paul Menzel Sept. 24, 2020, 3:32 p.m. UTC | #1

Dear Kai-Heng,


Thank you for sending version 2.

Am 24.09.20 um 17:09 schrieb Kai-Heng Feng:
> We are seeing the following error after S3 resume:


I’d be great if you added the system and used hardware, you are seeing 
this with.

> [  704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete


A follow-up patch, should extend the message to include the timeout value.

 > MDI Write did not complete did not complete in … seconds.


According to the Linux timestamps it’s 98 ms, which makes sense, as (640 
* 3 * 50 μs = 96 ms).

What crappy hardware is this, that it takes longer than 100 ms?

> [  704.902817] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.903075] e1000e 0000:00:1f.6 eno1: reading PHY page 769 (or 0x6020 shifted) reg 0x17

> [  704.903281] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.903486] e1000e 0000:00:1f.6 eno1: writing PHY page 769 (or 0x6020 shifted) reg 0x17

> [  704.943155] e1000e 0000:00:1f.6 eno1: MDI Error

> ...

> [  705.108161] e1000e 0000:00:1f.6 eno1: Hardware Error

> 

> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed

> increase polling iteration can resolve the issue.


Please explicitly state, what the current timeout value is, and what it 
is increased to.

     640 * 3 * 50 μs = 96 ms
     640 * 10 * 50 μs = 320 ms

The macro definition also misses the unit.

     /* SerDes Control */
     #define E1000_GEN_POLL_TIMEOUT          640

How did you determine, that tenfold that value is good. And not 
eightfold, for example? Please give the exact value (Linux log message 
timestamps should be enough), what the hardware needs now.

As a commit message summary, I suggest:

 > e1000e: Increase MDIC ready bit polling timeout from 96 ms to 320 ms


> While at it, also move the delay to the end of loop, to potentially save

> 50 us.

> 

> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

> ---

> v2:

>   - Increase polling iteration instead of powering down the phy.

> 

>   drivers/net/ethernet/intel/e1000e/phy.c | 5 +++--

>   1 file changed, 3 insertions(+), 2 deletions(-)

> 

> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c

> index e11c877595fb..72968a01164b 100644

> --- a/drivers/net/ethernet/intel/e1000e/phy.c

> +++ b/drivers/net/ethernet/intel/e1000e/phy.c

> @@ -203,11 +203,12 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data)

>   	 * Increasing the time out as testing showed failures with

>   	 * the lower time out

>   	 */

> -	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {

> -		udelay(50);

> +	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {

>   		mdic = er32(MDIC);

>   		if (mdic & E1000_MDIC_READY)

>   			break;

> +

> +		udelay(50);

>   	}

>   	if (!(mdic & E1000_MDIC_READY)) {

>   		e_dbg("MDI Write did not complete\n");

>

Andrew Lunn Sept. 24, 2020, 3:53 p.m. UTC | #2

On Thu, Sep 24, 2020 at 11:09:58PM +0800, Kai-Heng Feng wrote:
> We are seeing the following error after S3 resume:

> [  704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete

> [  704.902817] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.903075] e1000e 0000:00:1f.6 eno1: reading PHY page 769 (or 0x6020 shifted) reg 0x17

> [  704.903281] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.903486] e1000e 0000:00:1f.6 eno1: writing PHY page 769 (or 0x6020 shifted) reg 0x17

> [  704.943155] e1000e 0000:00:1f.6 eno1: MDI Error

> ...

> [  705.108161] e1000e 0000:00:1f.6 eno1: Hardware Error

> 

> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed

> increase polling iteration can resolve the issue.

> 

> While at it, also move the delay to the end of loop, to potentially save

> 50 us.

You are unlikely to save any time. 64 bits at 2.5MHz is 25.6uS. So it
is very unlikely doing a read directly after setting is going is going
to have E1000_MDIC_READY set. So this change likely causes an addition
read on MDIC. Did you profile this at all, for the normal case?

I also don't fully understand the fix. You are now looping up to 6400
times, each with a delay of 50uS. So that is around 12800 times more
than it actually needs to transfer the 64 bits! I've no idea how this
hardware works, but my guess would be, something is wrong with the
clock setup?

     Andrew

Kai-Heng Feng Sept. 24, 2020, 4:04 p.m. UTC | #3

Hi Andrew,

> On Sep 24, 2020, at 23:53, Andrew Lunn <andrew@lunn.ch> wrote:
> 
> On Thu, Sep 24, 2020 at 11:09:58PM +0800, Kai-Heng Feng wrote:
>> We are seeing the following error after S3 resume:
>> [  704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>> [  704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete
>> [  704.902817] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>> [  704.903075] e1000e 0000:00:1f.6 eno1: reading PHY page 769 (or 0x6020 shifted) reg 0x17
>> [  704.903281] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>> [  704.903486] e1000e 0000:00:1f.6 eno1: writing PHY page 769 (or 0x6020 shifted) reg 0x17
>> [  704.943155] e1000e 0000:00:1f.6 eno1: MDI Error
>> ...
>> [  705.108161] e1000e 0000:00:1f.6 eno1: Hardware Error
>> 
>> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
>> increase polling iteration can resolve the issue.
>> 
>> While at it, also move the delay to the end of loop, to potentially save
>> 50 us.
> 
> You are unlikely to save any time. 64 bits at 2.5MHz is 25.6uS. So it
> is very unlikely doing a read directly after setting is going is going
> to have E1000_MDIC_READY set. So this change likely causes an addition
> read on MDIC. Did you profile this at all, for the normal case?

You are right, it's actually may add an additional read.
Let me send a v3.

> 
> I also don't fully understand the fix. You are now looping up to 6400
> times, each with a delay of 50uS. So that is around 12800 times more
> than it actually needs to transfer the 64 bits! I've no idea how this
> hardware works, but my guess would be, something is wrong with the
> clock setup?

It's probably caused by Intel ME. This is not something new, you can find many polling codes in e1000e driver are for ME, especially after S3 resume.

Unless Intel is willing to open up ME, being patient and wait for a longer while is the best approach we got.

Kai-Heng

> 
>     Andrew

Andrew Lunn Sept. 24, 2020, 4:04 p.m. UTC | #4

On Thu, Sep 24, 2020 at 05:32:12PM +0200, Paul Menzel wrote:
> Dear Kai-Heng,

> 

> 

> Thank you for sending version 2.

> 

> Am 24.09.20 um 17:09 schrieb Kai-Heng Feng:

> > We are seeing the following error after S3 resume:

> 

> I’d be great if you added the system and used hardware, you are seeing this

> with.

> 

> > [  704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> > [  704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete

> 

> A follow-up patch, should extend the message to include the timeout value.

> 

> > MDI Write did not complete did not complete in … seconds.

> 

> According to the Linux timestamps it’s 98 ms, which makes sense, as (640 * 3

> * 50 μs = 96 ms).

> 

> What crappy hardware is this, that it takes longer than 100 ms?


I'm speculating, but i guess this happens with just the first couple
of transfers after power up. After that, it probably takes a single
loop. It would be good to see some profile data for this. Completely
different MDIO driver and implementation, but this patch might give
some ideas how to do the profiling:

https://github.com/lunn/linux/commit/76c7810a7e2c1b1e28a7a95d08dd440a8f48a516

Look at the debugfs and num_loops/us parts.

     Andrew

Andrew Lunn Sept. 24, 2020, 7:57 p.m. UTC | #5

On Fri, Sep 25, 2020 at 12:45:42AM +0800, Kai-Heng Feng wrote:
> We are seeing the following error after S3 resume:

> [  704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete

> [  704.902817] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.903075] e1000e 0000:00:1f.6 eno1: reading PHY page 769 (or 0x6020 shifted) reg 0x17

> [  704.903281] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.903486] e1000e 0000:00:1f.6 eno1: writing PHY page 769 (or 0x6020 shifted) reg 0x17

> [  704.943155] e1000e 0000:00:1f.6 eno1: MDI Error

> ...

> [  705.108161] e1000e 0000:00:1f.6 eno1: Hardware Error

> 

> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed

> increase polling iteration can resolve the issue.

> 

> The root cause is quite likely Intel ME, since it's a blackbox to the

> kernel so the only approach we can take is to be patient and wait

> longer.


Please could you explain how you see Intel ME being responsible for
this. I'm not convinced.

      Andrew

Kai-Heng Feng Sept. 25, 2020, 3:57 a.m. UTC | #6

> On Sep 25, 2020, at 03:57, Andrew Lunn <andrew@lunn.ch> wrote:
> 
> On Fri, Sep 25, 2020 at 12:45:42AM +0800, Kai-Heng Feng wrote:
>> We are seeing the following error after S3 resume:
>> [  704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>> [  704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete
>> [  704.902817] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>> [  704.903075] e1000e 0000:00:1f.6 eno1: reading PHY page 769 (or 0x6020 shifted) reg 0x17
>> [  704.903281] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>> [  704.903486] e1000e 0000:00:1f.6 eno1: writing PHY page 769 (or 0x6020 shifted) reg 0x17
>> [  704.943155] e1000e 0000:00:1f.6 eno1: MDI Error
>> ...
>> [  705.108161] e1000e 0000:00:1f.6 eno1: Hardware Error
>> 
>> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
>> increase polling iteration can resolve the issue.
>> 
>> The root cause is quite likely Intel ME, since it's a blackbox to the
>> kernel so the only approach we can take is to be patient and wait
>> longer.
> 
> Please could you explain how you see Intel ME being responsible for
> this. I'm not convinced.

Some other occurrences:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d17c7868b2f8e329dcee4ecd2f5d16cfc9b26ac8
https://lore.kernel.org/netdev/20200323191639.48826-1-aaron.ma@canonical.com/

Of course we need an ACK from Intel this one is also related to ME.

Kai-Heng

> 
>      Andrew

Paul Menzel Sept. 25, 2020, 5:16 a.m. UTC | #7

Dear Kai-Heng,


Thank you for patch version 3.

Am 24.09.20 um 18:45 schrieb Kai-Heng Feng:
> We are seeing the following error after S3 resume:

> [  704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete

> [  704.902817] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.903075] e1000e 0000:00:1f.6 eno1: reading PHY page 769 (or 0x6020 shifted) reg 0x17

> [  704.903281] e1000e 0000:00:1f.6 eno1: Setting page 0x6020

> [  704.903486] e1000e 0000:00:1f.6 eno1: writing PHY page 769 (or 0x6020 shifted) reg 0x17

> [  704.943155] e1000e 0000:00:1f.6 eno1: MDI Error

> ...

> [  705.108161] e1000e 0000:00:1f.6 eno1: Hardware Error

> 

> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed

> increase polling iteration can resolve the issue.

> 

> The root cause is quite likely Intel ME, since it's a blackbox to the

> kernel so the only approach we can take is to be patient and wait

> longer.

> 

> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

> ---

> v3:

>   - Moving delay to end of loop doesn't save anytime, move it back.

>   - Point out this is quitely likely caused by Intel ME.


quietly

You seem to have missed my comments regarding patch version 3. It’d be 
great if you improved the commit message with my suggestions.

Without knowing what hardware this happened on, nobody, even later 
getting the hardware, can reproduce the your results. If you say the ME 
is involved, please also document the ME firmware version, which is used 
here.

> v2:

>   - Increase polling iteration instead of powering down the phy.

> 

>   drivers/net/ethernet/intel/e1000e/phy.c | 2 +-

>   1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c

> index e11c877595fb..e6d4acd90937 100644

> --- a/drivers/net/ethernet/intel/e1000e/phy.c

> +++ b/drivers/net/ethernet/intel/e1000e/phy.c

> @@ -203,7 +203,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data)

>   	 * Increasing the time out as testing showed failures with

>   	 * the lower time out

>   	 */

> -	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {

> +	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {

>   		udelay(50);

>   		mdic = er32(MDIC);

>   		if (mdic & E1000_MDIC_READY)


In the PCI subsystem, a warning is shown, when something takes more then 
100 ms. As you increase it to over 320 ms, a warning should be printed 
to talk to the firmware folks, when it passes 100 ms.


Kind regards,

Paul

David Laight Sept. 25, 2020, 8:50 a.m. UTC | #8

From: Kai-Heng Feng

> Sent: 24 September 2020 17:04

...
> > I also don't fully understand the fix. You are now looping up to 6400

> > times, each with a delay of 50uS. So that is around 12800 times more

> > than it actually needs to transfer the 64 bits! I've no idea how this

> > hardware works, but my guess would be, something is wrong with the

> > clock setup?

> 

> It's probably caused by Intel ME. This is not something new, you can find many polling codes in e1000e

> driver are for ME, especially after S3 resume.

> 

> Unless Intel is willing to open up ME, being patient and wait for a longer while is the best approach

> we got.


There is some really broken code in the e1000e driver that affect my
Ivy bridge platform were it is trying to avoid hardware bugs in
the ME interface.

It seems that before EVERY write to a MAC register it must check
that the ME isn't using the interface - and spin until it isn't.
This causes massive delays in the TX path because it includes
the write that tells the MAC engine about a new packet.

The code is completely broken though.
Interrupts and processes switches can happen between the
test for the ME being idle and the actual write.

AFAICT the only reliable way to get ethernet access on such
systems is to use a different ethernet interface.

Also, from what I remember, the broken workaround is missing
from some of the setup code paths.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Andrew Lunn Sept. 25, 2020, 1:29 p.m. UTC | #9

On Fri, Sep 25, 2020 at 08:50:30AM +0000, David Laight wrote:
> From: Kai-Heng Feng
> > Sent: 24 September 2020 17:04
> ...
> > > I also don't fully understand the fix. You are now looping up to 6400
> > > times, each with a delay of 50uS. So that is around 12800 times more
> > > than it actually needs to transfer the 64 bits! I've no idea how this
> > > hardware works, but my guess would be, something is wrong with the
> > > clock setup?
> > 
> > It's probably caused by Intel ME. This is not something new, you can find many polling codes in e1000e
> > driver are for ME, especially after S3 resume.
> > 
> > Unless Intel is willing to open up ME, being patient and wait for a longer while is the best approach
> > we got.
> 
> There is some really broken code in the e1000e driver that affect my
> Ivy bridge platform were it is trying to avoid hardware bugs in
> the ME interface.
> 
> It seems that before EVERY write to a MAC register it must check
> that the ME isn't using the interface - and spin until it isn't.
> This causes massive delays in the TX path because it includes
> the write that tells the MAC engine about a new packet.

Hi David

Thanks for the information. This however does not really explain the
issue.

The code busy loops waiting for the MDIO transaction to complete. If
read/writes to the MAC are getting blocked, that just means less
iterations of the loop are needed, not more, since the time to
complete the transaction should be fixed.

If ME really is to blame, it means ME is completely hijacking the
hardware? Stopping the clocks? Maybe doing its own MDIO transactions?
How can you write a PHY driver if something else is also programming
the PHY.

We don't understand what is going on here. We are just papering over
the cracks. The commit message should say this, that the change fixes
the symptoms but probably not the cause.

    Andrew

David Laight Sept. 26, 2020, 10:08 a.m. UTC | #10

From: Andrew Lunn

> Sent: 25 September 2020 14:29

> On Fri, Sep 25, 2020 at 08:50:30AM +0000, David Laight wrote:

> > From: Kai-Heng Feng

> > > Sent: 24 September 2020 17:04

> > ...

> > > > I also don't fully understand the fix. You are now looping up to 6400

> > > > times, each with a delay of 50uS. So that is around 12800 times more

> > > > than it actually needs to transfer the 64 bits! I've no idea how this

> > > > hardware works, but my guess would be, something is wrong with the

> > > > clock setup?

> > >

> > > It's probably caused by Intel ME. This is not something new, you can find many polling codes in

> e1000e

> > > driver are for ME, especially after S3 resume.

> > >

> > > Unless Intel is willing to open up ME, being patient and wait for a longer while is the best

> approach

> > > we got.

> >

> > There is some really broken code in the e1000e driver that affect my

> > Ivy bridge platform were it is trying to avoid hardware bugs in

> > the ME interface.

> >

> > It seems that before EVERY write to a MAC register it must check

> > that the ME isn't using the interface - and spin until it isn't.

> > This causes massive delays in the TX path because it includes

> > the write that tells the MAC engine about a new packet.

> 

> Hi David

> 

> Thanks for the information. This however does not really explain the

> issue.

> 

> The code busy loops waiting for the MDIO transaction to complete. If

> read/writes to the MAC are getting blocked, that just means less

> iterations of the loop are needed, not more, since the time to

> complete the transaction should be fixed.

> 

> If ME really is to blame, it means ME is completely hijacking the

> hardware? Stopping the clocks? Maybe doing its own MDIO transactions?

> How can you write a PHY driver if something else is also programming

> the PHY.

> 

> We don't understand what is going on here. We are just papering over

> the cracks. The commit message should say this, that the change fixes

> the symptoms but probably not the cause.

You may not have the same broken hardware as I have...

From what I could infer from the code and guess from the behaviour
I got the impression that if the ME was accessing any of the MAC
registers it was likely that writes from the kernel just got discarded.

I got the impression that a bug in the hardware was being worked
around by the ME setting a status bit before and access, waiting
a bit for the kernel to finish anything it was doing, then
doing its access and clearing the bit.

The kernel keeps having to wait for the bit to be clear.
These delays were long; sub ms - but far longer than
the rest of the code path for sending a packet.
But the code didn't check/disable pre-emption or interrupts
so the check was actually broken.
(If I removed it completely my system wouldn't boot!)

Thing is I don't want the ME.
I don't need the ME on that system.
The ME might be a security hole.
The ME breaks my system.
But I can't disable it at all.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

[v2] e1000e: Increase iteration on polling MDIC ready bit

Commit Message

Comments

Patch