mbox series

[RFC,net-next,00/13] Phylink PCS updates

Message ID 20200630142754.GC1551@shell.armlinux.org.uk
Headers show
Series Phylink PCS updates | expand

Message

Russell King (Oracle) June 30, 2020, 2:27 p.m. UTC
Hi,

This series updates the rudimentary phylink PCS support with the
results of the last four months of development of that.  Phylink
PCS support was initially added back at the end of March, when it
became clear that the current approach of treating everything at
the MAC end as being part of the MAC was inadequate.

However, this rudimentary implementation was fine initially for
mvneta and similar, but in practice had a fair number of issues,
particularly when ethtool interfaces were used to change various
link properties.

It became apparent that relying on the phylink_config structure for
the PCS was also bad when it became clear that the same PCS was used
in DSA drivers as well as in NXPs other offerings, and there was a
desire to re-use that code.

It also became apparent that splitting the "configuration" step on
an interface mode configuration between the MAC and PCS using just
mac_config() and pcs_config() methods was not sufficient for some
setups, as the MAC needed to be "taken down" prior to making changes,
and once all settings were complete, the MAC could only then be
resumed.

This series addresses these points, progressing PCS support, and
has been developed with mvneta and DPAA2 setups, with work on both
those drivers to prove this approach.  It has been rigorously tested
with mvneta, as that provides the most flexibility for testing the
various code paths.

To solve the phylink_config reuse problem, we introduce a struct
phylink_pcs, which contains the minimal information necessary, and it
is intended that this is embedded in the PCS private data structure.

To solve the interface mode configuration problem, we introduce two
new MAC methods, mac_prepare() and mac_finish() which wrap the entire
interface mode configuration only.  This has the additional benefit of
relieving MAC drivers from working out whether an interface change has
occurred, and whether they need to do some major work.

I have not yet updated all the interface documentation for these
changes yet, that work remains, but this patch set is provided in the
hope that those working on PCS support in NXP will find this useful.

Since there is a lot of change here, this is the reason why I strongly
advise that everyone has converted to the mac_link_up() way of
configuring the link parameters when the link comes up, rather than
the old way of using mac_config() - especially as splitting the PCS
changes how and when phylink calls mac_config(). Although no change
for existing users is intended, that is something I no longer am able
to test.

 drivers/net/phy/phylink.c | 365 +++++++++++++++++++++++++++++++---------------
 include/linux/phylink.h   | 103 ++++++++++---
 2 files changed, 337 insertions(+), 131 deletions(-)

Comments

Vladimir Oltean July 14, 2020, 8:49 a.m. UTC | #1
On Tue, Jun 30, 2020 at 03:27:54PM +0100, Russell King - ARM Linux admin wrote:
> Hi,

> 

> This series updates the rudimentary phylink PCS support with the

> results of the last four months of development of that.  Phylink

> PCS support was initially added back at the end of March, when it

> became clear that the current approach of treating everything at

> the MAC end as being part of the MAC was inadequate.

> 

> However, this rudimentary implementation was fine initially for

> mvneta and similar, but in practice had a fair number of issues,

> particularly when ethtool interfaces were used to change various

> link properties.

> 

> It became apparent that relying on the phylink_config structure for

> the PCS was also bad when it became clear that the same PCS was used

> in DSA drivers as well as in NXPs other offerings, and there was a

> desire to re-use that code.

> 

> It also became apparent that splitting the "configuration" step on

> an interface mode configuration between the MAC and PCS using just

> mac_config() and pcs_config() methods was not sufficient for some

> setups, as the MAC needed to be "taken down" prior to making changes,

> and once all settings were complete, the MAC could only then be

> resumed.

> 

> This series addresses these points, progressing PCS support, and

> has been developed with mvneta and DPAA2 setups, with work on both

> those drivers to prove this approach.  It has been rigorously tested

> with mvneta, as that provides the most flexibility for testing the

> various code paths.

> 

> To solve the phylink_config reuse problem, we introduce a struct

> phylink_pcs, which contains the minimal information necessary, and it

> is intended that this is embedded in the PCS private data structure.

> 

> To solve the interface mode configuration problem, we introduce two

> new MAC methods, mac_prepare() and mac_finish() which wrap the entire

> interface mode configuration only.  This has the additional benefit of

> relieving MAC drivers from working out whether an interface change has

> occurred, and whether they need to do some major work.

> 

> I have not yet updated all the interface documentation for these

> changes yet, that work remains, but this patch set is provided in the

> hope that those working on PCS support in NXP will find this useful.

> 

> Since there is a lot of change here, this is the reason why I strongly

> advise that everyone has converted to the mac_link_up() way of

> configuring the link parameters when the link comes up, rather than

> the old way of using mac_config() - especially as splitting the PCS

> changes how and when phylink calls mac_config(). Although no change

> for existing users is intended, that is something I no longer am able

> to test.

> 

>  drivers/net/phy/phylink.c | 365 +++++++++++++++++++++++++++++++---------------

>  include/linux/phylink.h   | 103 ++++++++++---

>  2 files changed, 337 insertions(+), 131 deletions(-)

> 

> -- 

> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/

> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Are you going to post a non-RFC version?

I think this series makes a lot of sense overall and is a good
consolidation for the type of interfaces that are already established in
Linux.

This changes the landscape of how Linux is dealing with a MAC-side
clause 37 PCS, and should constitute a workable base even for clause 49
PCSs when those use a clause 37 auto-negotiation system (like USXGMII
and its various multi-port variants). Where I have some doubts is a
clause 49 PCS which uses a clause 73 auto-negotiation system, I would
like to understand your vision of how deep phylink is going to go into
the PMD layer, especially where it is not obvious that said layer is
integrated with the MAC.

Thanks,
-Vladimir
Russell King (Oracle) July 14, 2020, 1:18 p.m. UTC | #2
On Tue, Jul 14, 2020 at 11:49:58AM +0300, Vladimir Oltean wrote:
> Are you going to post a non-RFC version?


I'm waiting for the remaining patches to be reviewed; Florian reviewed
the first six patches (which are not the important ones in the series)
and that seems to be where things have stopped. There has been no
change, so I don't see there's much point to reposting the series.

I'd actually given up pushing this further; I've seen patches go by that
purpetuate the idea that the PCS is handled by phylib.  I feel like I'm
wasting my time with this.

> I think this series makes a lot of sense overall and is a good

> consolidation for the type of interfaces that are already established in

> Linux.

> 

> This changes the landscape of how Linux is dealing with a MAC-side

> clause 37 PCS, and should constitute a workable base even for clause 49

> PCSs when those use a clause 37 auto-negotiation system (like USXGMII

> and its various multi-port variants).


Yes.

> Where I have some doubts is a

> clause 49 PCS which uses a clause 73 auto-negotiation system, I would

> like to understand your vision of how deep phylink is going to go into

> the PMD layer, especially where it is not obvious that said layer is

> integrated with the MAC.


I have only considered up to 10GBASE-R based systems as that is the
limit of what I can practically test here.  I have one system that
offers a QSFP+ cage, and I have a QSFP+ to 4x SFP+ (10G) splitter
cable - so that's basically 4x 10GBASE-CR rather than 40GBASE-CR.

I am anticipating that clause 73 will be handled in a very similar way
to clause 37.  The clause 73 "technology ability" field defines the
capabilities of the link, but as we are finding with 10GBASE-R based
setups with copper PHYs, the capabilities of the link may not be what
we want to report to the user, especially if the copper PHY is capable
of rate adaption.  Hence, it may be possible to have a backplane link
that connects to a copper PHY that does rate adaption:

MAC <--> Clause 73 PCS <--backplane--> PHY <--base-T--> remote system

This is entirely possible from what I've seen in one NBASE-T PHY
datasheet.  The PHY is documented as being capable of negotiating a
10GBASE-KR link with the host system, while offering 10GBASE-R,
1000BASE-X, 10GBASE-T, 5GBASE-T, 2.5GBASE-T, 1GBASE-T, 100BASE-T, and
10BASE-T on the media side.  The follow-on question is whether that
PHY is likely to be accessible to the system on the other end of the
backplane link somehow - either through some kind of firmware link
or direct access.  That is not possible to know without having
experience with such systems.

That said, with the splitting of the PCS from the MAC in phylink, there
is the possibility for the PCS to be implemented as an entirely
separate driver to the MAC driver, although there needs to be some
infrastructure to make that work sanely.  Right now, it is the MAC
responsibility to attach the PCS to phylink, which is the right way
to handle it for setups where the PCS is closely tied to the MAC, such
as Marvell NETA and PP2 where the PCS is indistinguishable from the
MAC, and will likely remain so for such setups.  However, if we need
to also support entirely separate PCS, I don't see any big issues
with that now that we have this split.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Florian Fainelli July 14, 2020, 9:22 p.m. UTC | #3
On 7/14/2020 6:18 AM, Russell King - ARM Linux admin wrote:
> On Tue, Jul 14, 2020 at 11:49:58AM +0300, Vladimir Oltean wrote:

>> Are you going to post a non-RFC version?

> 

> I'm waiting for the remaining patches to be reviewed; Florian reviewed

> the first six patches (which are not the important ones in the series)

> and that seems to be where things have stopped. There has been no

> change, so I don't see there's much point to reposting the series.


Sorry for giving an impression that this had stalled, I reviewed the
obvious changes and am now reviewing the not so obvious changes, would
certainly appreciate if other NXP folks as well as Andrew and Heiner
looked at those changes obviously.
-- 
Florian
Vladimir Oltean July 14, 2020, 11:46 p.m. UTC | #4
I'll preface my reply by stating that this is my opinion and that it is
not representative of what you might be visualizing as "NXP".
It's not that I am trying to evade responsibility by saying this, but
rather that backplane Ethernet is simply not my responsibility, in fact
I have no saying at all when it comes to decision making there. I am
simply trying to get a better understanding for myself, especially of
the design goals of phylink.

On Tue, Jul 14, 2020 at 02:18:32PM +0100, Russell King - ARM Linux admin wrote:
> 

> I'd actually given up pushing this further; I've seen patches go by that

> purpetuate the idea that the PCS is handled by phylib.  I feel like I'm

> wasting my time with this.

> 


By this I think you are aiming squarely at "[PATCH net-next v3 0/9] net:
ethernet backplane support on DPAA1". If I understand you correctly, you
are saying that because of the non-phylink models used to represent that
system comprised of a clause 49 PCS + clause 72 PMD + clause 73 AN/LT,
it is not worth pursuing this phylink-based representation of a clause
37 PCS.

> > Where I have some doubts is a

> > clause 49 PCS which uses a clause 73 auto-negotiation system, I would

> > like to understand your vision of how deep phylink is going to go into

> > the PMD layer, especially where it is not obvious that said layer is

> > integrated with the MAC.

> 

> I have only considered up to 10GBASE-R based systems as that is the

> limit of what I can practically test here.  I have one system that

> offers a QSFP+ cage, and I have a QSFP+ to 4x SFP+ (10G) splitter

> cable - so that's basically 4x 10GBASE-CR rather than 40GBASE-CR.

> 


Ok, no problem here, we can keep this discussion at principle level.

> I am anticipating that clause 73 will be handled in a very similar way

> to clause 37.


This is "Figure 37-1-Location of the Auto-Negotiation function" (clause
37 is "Auto-Negotiation function, type 1000BASE-X").

  OSI
  REFERENCE
  MODEL
  LAYERS

 +--------------+
 | APPLICATION  |
 +--------------+
 | PRESENTATION |
 +--------------+
 | SESSION      |
 +--------------+
 | TRANSPORT    |
 +--------------+
 | NETWORK      |
 +--------------+   +----------------------------------------------------+
 | DATA LINK    |   |    LLC (LOGICAL LINK CONTROL) OR OTHER MAC CLIENT  |
 |              |   +----------------------------------------------------+
 |              |   |              MAC CONTROL (OPTIONAL)                |
 |              |   +----------------------------------------------------+
 |              |   |            MAC - MEDIA ACCESS CONTROL              |
 +--------------+   +----------------------------------------------------+
 | PHYSICAL     |   |                  RECONCILIATION                    |
 |              |   +----------------------------------------------------+
 |              |                         |       |
 |              |                   GMII  |       |
 |              |   +----------------------------------------------------+\
 |              |   |          PCS, INCLUDING AUTO-NEGOTIATION           | \
 |              |   +----------------------------------------------------+ |
 |              |   |                        PMA                         | |
 |              |   +----------------------------------------------------+ /
 |              |    |   LX-PMD   |     |   SX-PMD   |    |   CX-PMD   |  / PHY
 |              |    +------------+     +------------+    +------------+ /
 |              |        |    | LX MDI      |    | SX MDI     |    | CX MDI
 +--------------+      +---------+        +---------+       +---------+
                       | MEDIUM  |        | MEDIUM  |       | MEDIUM  |

This is "Figure 28-2-Location of Auto-Negotiation function within the
ISO/IEC OSI reference model" (clause 28 is "Physical Layer link
signaling for Auto-Negotiation on twisted pair", aka BASE-T).

  OSI
  REFERENCE
  MODEL
  LAYERS

 +--------------+
 | APPLICATION  |
 +--------------+
 | PRESENTATION |
 +--------------+
 | SESSION      |
 +--------------+
 | TRANSPORT    |
 +--------------+
 | NETWORK      |
 +--------------+   +------------------------------------------------------+
 | DATA LINK    |   |               LLC (LOGICAL LINK CONTROL)             |
 |              |   +------------------------------------------------------+
 |              |   |               MAC - MEDIA ACCESS CONTROL             |
 +--------------+   +------------------------------------------------------+
 | PHYSICAL     |           |            RECONCILIATION            |
 |              |           +--------------------------------------+
 |              |                          |       |
 |              |                     MII  |       |
 |              |          /+--------------------------------------+
 |              |         / |                 PCS                  |
 |              |         | +--------------------------------------+
 |              |         | |                 PMA                  |
 |              |     PHY | +--------------------------------------+
 |              |         | |                 PMD                  |
 |              |         \ +--------------------------------------+
 |              |          \|               AUTONEG                |
 |              |           +--------------------------------------+
 |              |                           |    | MDI
 +--------------+                         +---------+
                                          | MEDIUM  |

This is "Figure 73-1-Location of Auto-Negotiation function within the
ISO/IEC OSI reference model" (clause 73 is "Auto-Negotiation for
backplane and copper cable assembly").

  OSI
  REFERENCE
  MODEL
  LAYERS

 +--------------+
 | APPLICATION  |
 +--------------+
 | PRESENTATION |
 +--------------+
 | SESSION      |
 +--------------+
 | TRANSPORT    |
 +--------------+
 | NETWORK      |
 +--------------+   +------------------------------------------------------+
 | DATA LINK    |   |  LLC (LOGICAL LINK CONTROL) OR ANY OTHER MAC CLIENT  |
 |              |   +------------------------------------------------------+
 |              |   |               MAC - MEDIA ACCESS CONTROL             |
 +--------------+   +------------------------------------------------------+
 | PHYSICAL     |           |            RECONCILIATION            |
 |              |           +--------------------------------------+
 |              |                          |       | GMII, XGMII, 25GMII,
 |              |                          |       | XLGMII or CGMII
 |              |          /+--------------------------------------+
 |              |         / |                 PCS                  |
 |              |         | +--------------------------------------+
 |              |         | |                 PMA                  |
 |              |     PHY | +--------------------------------------+
 |              |         | |                 PMD                  |
 |              |         \ +--------------------------------------+
 |              |          \|               AUTONEG                |
 |              |           +--------------------------------------+
 |              |                           |    | MDI
 +--------------+                         +---------+
                                          | MEDIUM  |

Identify the position of the auto-negotiation function in these 3
diagrams.

To me, the backplane auto-negotiation look closer to BASE-T than it does
to BASE-X. This auto-negotiation is performed by the PMD, not by the
PCS.

But you are right that clause 28 AN is treated very similarly to clause
37 AN... in phylib.

> The clause 73 "technology ability" field defines the

> capabilities of the link,


Yes.

> but as we are finding with 10GBASE-R based

> setups with copper PHYs, the capabilities of the link may not be what

> we want to report to the user, especially if the copper PHY is capable

> of rate adaption.


Explain?
By "copper PHY" I think you mean "compliant to 10GBASE-T". 10GBASE-KR
serves the same purpose, in the OSI model, as that. Do we not report the
capabilities of the 10GBASE-T link to the user?
Also, on a separate note, is rate adaptation supported by mainline
Linux? Last time I looked, for 2500BASE-X plus the Aquantia PHYs, it
wasn't.

> Hence, it may be possible to have a backplane link

> that connects to a copper PHY that does rate adaption:

> 

> MAC <--> Clause 73 PCS <--backplane--> PHY <--base-T--> remote system

> 

> This is entirely possible from what I've seen in one NBASE-T PHY

> datasheet.  The PHY is documented as being capable of negotiating a

> 10GBASE-KR link with the host system, while offering 10GBASE-R,

> 1000BASE-X, 10GBASE-T, 5GBASE-T, 2.5GBASE-T, 1GBASE-T, 100BASE-T, and

> 10BASE-T on the media side.  The follow-on question is whether that

> PHY is likely to be accessible to the system on the other end of the

> backplane link somehow - either through some kind of firmware link

> or direct access.  That is not possible to know without having

> experience with such systems.

> 


I have not seen said datasheet. That being said, I don't question the
existence of such a device. Because NGBASE-T and 10GBASE-KR are both
media-side protocols, such device would be called a "media converter".
To me this is no different than a 1000BASE-T to 1000BASE-X media
converter. How is that modeled in Linux? Is it?

> That said, with the splitting of the PCS from the MAC in phylink, there

> is the possibility for the PCS to be implemented as an entirely

> separate driver to the MAC driver, although there needs to be some

> infrastructure to make that work sanely.  Right now, it is the MAC

> responsibility to attach the PCS to phylink, which is the right way

> to handle it for setups where the PCS is closely tied to the MAC, such

> as Marvell NETA and PP2 where the PCS is indistinguishable from the

> MAC, and will likely remain so for such setups.  However, if we need

> to also support entirely separate PCS, I don't see any big issues

> with that now that we have this split.

> 


Absolutely.

There is code in phylink for managing a "MAC-side PCS", a concept
introduced by Cisco for SGMII and USXGMII. Because of the implementation
details of this "MAC-side PCS", support for 1000BASE-X comes basically
for free, because hardware speaking, the same state machines are
required for both, just different advertisement parsing logic. So it
makes sense for phylink to manage this 1000BASE-X PCS that comes "for
free" with the MAC-side requirements of SGMII/USXGMII.
But nonetheless, the exact same hardware state machine, i.e. the one
described in "Figure 37-6-Auto-Negotiation state diagram", is managed
twice in the Linux kernel: once in phylib, for any fiber PHY, and once
in phylink. This is fine to me. Yet, in another thread, you are bringing
the argument that:

	If we're not careful, we're going to end up with the Lynx PCS
	being implemented one way, and backplane PCS being implemented
	completely differently and preventing any hope of having a
	backplane PCS connected to a conventional copper PHY.

even though, for SGMII/USXGMII/BASE-X, the auto-negotiation function is
implemented by a completely different part of the OSI model than for
10GBASE-KR, so it is not immediately obvious how you are ok with one but
not the other.

But back to your argument: a backplane PHY could be described as a
phylink MAC-side PCS, and this would have the benefit that the phylib
slot would be free for a 10GBASE-T PHY, were that ever necessary.
That is true, but that is akin to saying that any media converter should
be modeled as a MAC-side phylink PCS and another, singular, phylib PHY
(the converter itself). Why is this the best solution? Because it works?
Sure it does, but is that not a layering violation of sorts? What are
the boundaries of phylink? I am asking because I genuinely don't know.
Is it supposed to manage a SERDES data eye?

To me, it is not obvious at all that a backplane PHY (an integrated one,
at that, but there are integrated BASE-T PHYs as well) shouldn't be
modeled using phylib. If you read carefully through the standard, you'll
notice that the most significant portion of backplane Ethernet is in
"72.6.10.4 State diagrams", for link training. My fear is that phylink
will need to acquire a lot of junk in order to support one
software-driven implementation of 10GBASE-KR AN/LT, just to prove a
point. We have zero other implementations to compare with, but we can
just imagine that others might either do it software-driven, like NXP,
or software-driven but hidden in firmware, or completely in hardware.
Either way, phylib provides just the necessary abstractions for this to
be hidden away. Whereas phylink might need some extra timers apart from
the existing .pcs_get_state() to get this link training job done.

As for that BASE-KR to BASE-T PHY: you've used the argument before that
we should avoid "over-designing for situations that we don't know". When
the time comes that such a system must be supported, the people who'll
need to support it will figure out what the options are. They can range
from:
- do nothing, if the MDIO side of the PHY is not accessible
- do nothing, even if the MDIO side of the PHY is accessible. Things
  might 'just work' even if we completely ignore the existence of this
  PHY.
- extend the phylib to do something about media converters, if things
  get really ugly
- move the backplane PCS/PMA/PMD to phylink, as a workaround more than
  anything else, in my opinion

Basically, the whole reason I've written this email is that I don't
understand why you see such a blocker in the fact that not everybody
warms up to the "phylink should control the backplane PCS/PMA/PMD" idea.
There are situations where phylink is appropriate, and it is not obvious
to me that this is one of them. When it will be, I'm sure people will
react differently too.
Last but not least, a phylink-based solution for non-backplane can
co-exist with a phylib-based solution for backplane. So I really, really
don't see where the problem is, maybe you could clarify.

Thanks,
-Vladimir
Russell King (Oracle) July 15, 2020, 9:53 a.m. UTC | #5
On Tue, Jul 14, 2020 at 02:22:08PM -0700, Florian Fainelli wrote:
> 

> 

> On 7/14/2020 6:18 AM, Russell King - ARM Linux admin wrote:

> > On Tue, Jul 14, 2020 at 11:49:58AM +0300, Vladimir Oltean wrote:

> >> Are you going to post a non-RFC version?

> > 

> > I'm waiting for the remaining patches to be reviewed; Florian reviewed

> > the first six patches (which are not the important ones in the series)

> > and that seems to be where things have stopped. There has been no

> > change, so I don't see there's much point to reposting the series.

> 

> Sorry for giving an impression that this had stalled, I reviewed the

> obvious changes and am now reviewing the not so obvious changes, would

> certainly appreciate if other NXP folks as well as Andrew and Heiner

> looked at those changes obviously.


Thanks Florian.  Yes, it would be useful to have reviewed-bys or
acked-bys from those who are using it, and would take some of the
load off yourself, Andrew and Heiner.

It also makes sense when some of the changes to phylink are made
in response to improving things for other use cases. For example,
splitting the PCS in phylink is not something I'm doing for my own
self-interest (apart from a desire to keep phylink maintainable),
but is to support NXP's and others platforms where the PCS would
logically be a separate block of code from the MAC. So, to me it
would make sense for NXP to get involved with the review of this
set.

If anything, splitting the PCS has meant that I've had to go back
and re-examine Marvell NETA and PP2 (in fact, several times) to
adapt them to solidly test this approach and make sure I haven't
missed anything - these two drivers are my main test-bed for
phylink at the moment, they're also hardware where the PCS is
completely indistinguishable from the MAC at register level, so
you can't say "this group of registers are definitely PCS only
functions".

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Russell King (Oracle) July 15, 2020, 11:21 a.m. UTC | #6
On Wed, Jul 15, 2020 at 02:46:52AM +0300, Vladimir Oltean wrote:
> On Tue, Jul 14, 2020 at 02:18:32PM +0100, Russell King - ARM Linux admin wrote:

> > I'd actually given up pushing this further; I've seen patches go by that

> > purpetuate the idea that the PCS is handled by phylib.  I feel like I'm

> > wasting my time with this.

> 

> By this I think you are aiming squarely at "[PATCH net-next v3 0/9] net:

> ethernet backplane support on DPAA1". If I understand you correctly, you

> are saying that because of the non-phylink models used to represent that

> system comprised of a clause 49 PCS + clause 72 PMD + clause 73 AN/LT,

> it is not worth pursuing this phylink-based representation of a clause

> 37 PCS.


Actually, that is not what I was aiming that comment at - that is not
something that has been posted recently.  I'm not going to explicitly
point at a patch set.

> > > Where I have some doubts is a

> > > clause 49 PCS which uses a clause 73 auto-negotiation system, I would

> > > like to understand your vision of how deep phylink is going to go into

> > > the PMD layer, especially where it is not obvious that said layer is

> > > integrated with the MAC.

> > 

> > I have only considered up to 10GBASE-R based systems as that is the

> > limit of what I can practically test here.  I have one system that

> > offers a QSFP+ cage, and I have a QSFP+ to 4x SFP+ (10G) splitter

> > cable - so that's basically 4x 10GBASE-CR rather than 40GBASE-CR.

> 

> Ok, no problem here, we can keep this discussion at principle level.

> 

> > I am anticipating that clause 73 will be handled in a very similar way

> > to clause 37.

> 

> This is "Figure 37-1-Location of the Auto-Negotiation function" (clause

> 37 is "Auto-Negotiation function, type 1000BASE-X").

> 

>   OSI

>   REFERENCE

>   MODEL

>   LAYERS

> 

>  +--------------+

>  | APPLICATION  |

>  +--------------+

>  | PRESENTATION |

>  +--------------+

>  | SESSION      |

>  +--------------+

>  | TRANSPORT    |

>  +--------------+

>  | NETWORK      |

>  +--------------+   +----------------------------------------------------+

>  | DATA LINK    |   |    LLC (LOGICAL LINK CONTROL) OR OTHER MAC CLIENT  |

>  |              |   +----------------------------------------------------+

>  |              |   |              MAC CONTROL (OPTIONAL)                |

>  |              |   +----------------------------------------------------+

>  |              |   |            MAC - MEDIA ACCESS CONTROL              |

>  +--------------+   +----------------------------------------------------+

>  | PHYSICAL     |   |                  RECONCILIATION                    |

>  |              |   +----------------------------------------------------+

>  |              |                         |       |

>  |              |                   GMII  |       |

>  |              |   +----------------------------------------------------+\

>  |              |   |          PCS, INCLUDING AUTO-NEGOTIATION           | \

>  |              |   +----------------------------------------------------+ |

>  |              |   |                        PMA                         | |

>  |              |   +----------------------------------------------------+ /

>  |              |    |   LX-PMD   |     |   SX-PMD   |    |   CX-PMD   |  / PHY

>  |              |    +------------+     +------------+    +------------+ /

>  |              |        |    | LX MDI      |    | SX MDI     |    | CX MDI

>  +--------------+      +---------+        +---------+       +---------+

>                        | MEDIUM  |        | MEDIUM  |       | MEDIUM  |

> 

> This is "Figure 28-2-Location of Auto-Negotiation function within the

> ISO/IEC OSI reference model" (clause 28 is "Physical Layer link

> signaling for Auto-Negotiation on twisted pair", aka BASE-T).

> 

>   OSI

>   REFERENCE

>   MODEL

>   LAYERS

> 

>  +--------------+

>  | APPLICATION  |

>  +--------------+

>  | PRESENTATION |

>  +--------------+

>  | SESSION      |

>  +--------------+

>  | TRANSPORT    |

>  +--------------+

>  | NETWORK      |

>  +--------------+   +------------------------------------------------------+

>  | DATA LINK    |   |               LLC (LOGICAL LINK CONTROL)             |

>  |              |   +------------------------------------------------------+

>  |              |   |               MAC - MEDIA ACCESS CONTROL             |

>  +--------------+   +------------------------------------------------------+

>  | PHYSICAL     |           |            RECONCILIATION            |

>  |              |           +--------------------------------------+

>  |              |                          |       |

>  |              |                     MII  |       |

>  |              |          /+--------------------------------------+

>  |              |         / |                 PCS                  |

>  |              |         | +--------------------------------------+

>  |              |         | |                 PMA                  |

>  |              |     PHY | +--------------------------------------+

>  |              |         | |                 PMD                  |

>  |              |         \ +--------------------------------------+

>  |              |          \|               AUTONEG                |

>  |              |           +--------------------------------------+

>  |              |                           |    | MDI

>  +--------------+                         +---------+

>                                           | MEDIUM  |

> 

> This is "Figure 73-1-Location of Auto-Negotiation function within the

> ISO/IEC OSI reference model" (clause 73 is "Auto-Negotiation for

> backplane and copper cable assembly").

> 

>   OSI

>   REFERENCE

>   MODEL

>   LAYERS

> 

>  +--------------+

>  | APPLICATION  |

>  +--------------+

>  | PRESENTATION |

>  +--------------+

>  | SESSION      |

>  +--------------+

>  | TRANSPORT    |

>  +--------------+

>  | NETWORK      |

>  +--------------+   +------------------------------------------------------+

>  | DATA LINK    |   |  LLC (LOGICAL LINK CONTROL) OR ANY OTHER MAC CLIENT  |

>  |              |   +------------------------------------------------------+

>  |              |   |               MAC - MEDIA ACCESS CONTROL             |

>  +--------------+   +------------------------------------------------------+

>  | PHYSICAL     |           |            RECONCILIATION            |

>  |              |           +--------------------------------------+

>  |              |                          |       | GMII, XGMII, 25GMII,

>  |              |                          |       | XLGMII or CGMII

>  |              |          /+--------------------------------------+

>  |              |         / |                 PCS                  |

>  |              |         | +--------------------------------------+

>  |              |         | |                 PMA                  |

>  |              |     PHY | +--------------------------------------+

>  |              |         | |                 PMD                  |

>  |              |         \ +--------------------------------------+

>  |              |          \|               AUTONEG                |

>  |              |           +--------------------------------------+

>  |              |                           |    | MDI

>  +--------------+                         +---------+

>                                           | MEDIUM  |

> 

> Identify the position of the auto-negotiation function in these 3

> diagrams.

> 

> To me, the backplane auto-negotiation look closer to BASE-T than it does

> to BASE-X. This auto-negotiation is performed by the PMD, not by the

> PCS.

> 

> But you are right that clause 28 AN is treated very similarly to clause

> 37 AN... in phylib.

> 

> > The clause 73 "technology ability" field defines the

> > capabilities of the link,

> 

> Yes.

> 

> > but as we are finding with 10GBASE-R based

> > setups with copper PHYs, the capabilities of the link may not be what

> > we want to report to the user, especially if the copper PHY is capable

> > of rate adaption.

> 

> Explain?

> By "copper PHY" I think you mean "compliant to 10GBASE-T". 10GBASE-KR

> serves the same purpose, in the OSI model, as that. Do we not report the

> capabilities of the 10GBASE-T link to the user?


The IEEE diagrams are good for describing the expected layers down to
the "media" but what IEEE calls the "media" may not be what the user
expects.

As an example, take a system where you have a backplane which causes
backplane ethernet, and a series of line cards that have PHYs on them
giving you a range of different connectivity options.  As I illustrated
below, one option would be to turn the backplane ethernet into
conventional twisted-pair ethernet - and there are PHYs on the market
that will do this.

So, in that case, the IEEE diagrams no longer represent the system as
far as the user is concerned - yes, we have figure 73-1 up to the
point of the backplane, and then we have a PHY converting the backplane
ethernet back to (conceptually) MII, and then everything from MII
downards in figure 28-2.

It believe it would be possible to setup such a test scenario on
Macchiatobin hardware if the firmware for training the COMPHY (the
serdes PHY) for a 10GBASE-KR link has not been withdrawn, and the
88x3310 PHY was switched to 10GBASE-KR mode on its host side.

> Also, on a separate note, is rate adaptation supported by mainline

> Linux? Last time I looked, for 2500BASE-X plus the Aquantia PHYs, it

> wasn't.


It depends - and I think I've mentioned this point before in response
to the Aquantia situation.  The 88x3310 PHY supports rate adaption when
in certain host-interface modes.  However, depending on what features
are available on the PHY (such as MACsec, and whether that is enabled)
controls whether pause frames are used, or whether the PHY expects the
egress rate from the upstream MAC to be adapted to the media speed.

In the case of a 88x3310 without MACsec, there are no support in the
PHY to send/receive and act on pause frames while it is performing
rate adaption, so force-enabling pause mode at the MAC is not useful
and is potentially harmful.

The 88x3310 driver used to assume that its host-side interface always
dynamically changed between 10GBASE-R, 5GBASE-R, 2500BASE-X, and SGMII,
but that is not the full story.  One configuration places the host side
in 10GBASE-R mode with rate adaption.  We now have a patch merged that
effectively disables the dynamic changing of the host-side interface
if the PHY is in this mode.  (It also does this for RXAUI mode, fwiw.)

phylib will provide the results of link negotiation as (for example):
	interface = 10GBASE-R (or RXAUI)
	speed = 100M
	duplex = full
	pause = tx | rx

where "speed", "duplex" and "pause" in this instance are the media side
results.  "interface" remains the operational interface between the
PHY and MAC.

I did attempt to start a discussion about how we should approach
rate adaption in this scenario, specifically because I have one board
where the 88x3310 is in RXAUI mode with rate adaption connected to a
DSA switch.  In that case, if we program the speed and duplex into the
MAC, despite the link being in RXAUI mode, the link between the PHY
and DSA switch fails to pass data.  Other MAC drivers don't bother
looking at "speed" and "duplex" when in RXAUI or 10GBASE-R mode,
because they know that the link can only operate at 10G speeds.  Of
course, what ends up being missed is spacing the transmitted packets
out to the media speed - but that is a per-MAC issue.

However, it still leaves the open question I've had for some time (in
fact, since your Aquantia PHY issue) about how we should be dealing
with rate adaption overall, which remains unaddressed - and the
question about how we should enable pause modes at the host MAC when
the PHY requires it is still unresolved.

But... I feel we're getting bogged down in another issue here.

> > Hence, it may be possible to have a backplane link

> > that connects to a copper PHY that does rate adaption:

> > 

> > MAC <--> Clause 73 PCS <--backplane--> PHY <--base-T--> remote system

> > 

> > This is entirely possible from what I've seen in one NBASE-T PHY

> > datasheet.  The PHY is documented as being capable of negotiating a

> > 10GBASE-KR link with the host system, while offering 10GBASE-R,

> > 1000BASE-X, 10GBASE-T, 5GBASE-T, 2.5GBASE-T, 1GBASE-T, 100BASE-T, and

> > 10BASE-T on the media side.  The follow-on question is whether that

> > PHY is likely to be accessible to the system on the other end of the

> > backplane link somehow - either through some kind of firmware link

> > or direct access.  That is not possible to know without having

> > experience with such systems.

> > 

> 

> I have not seen said datasheet. That being said, I don't question the

> existence of such a device. Because NGBASE-T and 10GBASE-KR are both

> media-side protocols, such device would be called a "media converter".

> To me this is no different than a 1000BASE-T to 1000BASE-X media

> converter. How is that modeled in Linux? Is it?


Why is it a "media converter" ?  What about:

MAC <--SGMII/RGMII/...--> PHY <--1000BASE-X-->

which is actually very common.

The difference I see with 10GBASE-KR is that it is intended in part to
for "backplanes".  Putting my electronic engineer hat on, a backplane
is a board that links several pluggable cards together inside a
chassis.  This is actually described at the beginning of clause 69.
Hence, the "media" that is referred to is the internal media to a
piece of equipment, rather than the media that is presented to a user
accessible socket.

At some point, that internal media will be connected either to another
MAC (e.g. another SoC on a different card) or to the external world,
where it will have to be converted to something more suitable for
transmission in the external world, such as 10GBASE-T.

So, merely focusing on what is in IEEE 802.3 misses out on the bigger
picture, which IMHO it is the bigger picture that matters for operating
systems that provide interfaces to users.

> > That said, with the splitting of the PCS from the MAC in phylink, there

> > is the possibility for the PCS to be implemented as an entirely

> > separate driver to the MAC driver, although there needs to be some

> > infrastructure to make that work sanely.  Right now, it is the MAC

> > responsibility to attach the PCS to phylink, which is the right way

> > to handle it for setups where the PCS is closely tied to the MAC, such

> > as Marvell NETA and PP2 where the PCS is indistinguishable from the

> > MAC, and will likely remain so for such setups.  However, if we need

> > to also support entirely separate PCS, I don't see any big issues

> > with that now that we have this split.

> 

> Absolutely.

> 

> There is code in phylink for managing a "MAC-side PCS", a concept

> introduced by Cisco for SGMII and USXGMII. Because of the implementation

> details of this "MAC-side PCS", support for 1000BASE-X comes basically

> for free, because hardware speaking, the same state machines are

> required for both, just different advertisement parsing logic.


You make it sound like phylink happened because of SGMII, and
1000BASE-X was an "accidental side effect".

The truth of the matter is that phylink came about due to the need to
support SFP modules on the SolidRun Clearfog platform (which supports
up to 2.5G on the serdes lane.)  At the time, network drivers that
supported SFP cages had that support tightly integrated into their
network drivers, which meant we were looking at having to implement
SFP support in the Marvell NETA driver and who knows how many other
drivers - which includes parsing the module EEPROMs, detecting the
insertion/removal and so on and so forth.

The problem was split up between a SFP socket driver, handling the
specifics of the socket, and phylink, handling the protocol side and
dealing with the connectivity issues between a MAC and whatever was
external to the MAC.

Due to the wide range of SFPs out there with their diverse
capabilities, some of which have a PHY on, we needed to support
hot-plugging PHYs - which phylink deals with, and when you have a
PHY, SGMII becomes necessary.

> But nonetheless, the exact same hardware state machine, i.e. the one

> described in "Figure 37-6-Auto-Negotiation state diagram", is managed

> twice in the Linux kernel: once in phylib, for any fiber PHY, and once

> in phylink. This is fine to me.


When you have a PHY in SGMII mode, then you have _two_ hardware state
machines in the link between the media and the MAC, both of which need
to indicate "link up" for the overall connection to pass data.  It so
happens that with SGMII mode, we can mostly ignore the PHY side of
things as the PHY conveys the results of the media side negotiation
back to the MAC, with the exception of pause modes - we still need to
query the PHY for that.

Now, in Marvell NETA and PP2, the quirks of hardware state machine is
not visible - all we have are the reported link parameters from the
buried PCS, and an indication whether the link is up or not (it's not
even latched-low.)  So, no, we are not "handling a state machine"
in phylink (and actually that slightly concerns me if polling mode is
not used for the PCS.)

> Yet, in another thread, you are bringing

> the argument that:

> 

> 	If we're not careful, we're going to end up with the Lynx PCS

> 	being implemented one way, and backplane PCS being implemented

> 	completely differently and preventing any hope of having a

> 	backplane PCS connected to a conventional copper PHY.


I see absolutely no conflict between these statements.

> But back to your argument: a backplane PHY could be described as a

> phylink MAC-side PCS, and this would have the benefit that the phylib

> slot would be free for a 10GBASE-T PHY, were that ever necessary.

> That is true, but that is akin to saying that any media converter should

> be modeled as a MAC-side phylink PCS and another, singular, phylib PHY

> (the converter itself). Why is this the best solution? Because it works?


It gives the greatest flexibility and gives options.

> Sure it does, but is that not a layering violation of sorts? What are

> the boundaries of phylink? I am asking because I genuinely don't know.

> Is it supposed to manage a SERDES data eye?


No.  Phylink only cares about the _protocol_ between the phylib PHY or
SFP cage and the MAC.  It doesn't care about the electrical properties
of that link - that is the domain of other parts of the code, and
already _is_ via use (in Marvell based systems) with the use of the
drivers/phy layer to control the COMPHY (serdes PHY).

As for "a layering violation of sorts" is that not what SGMII and
USXGMII fundamentally is?  You have the full OSI 1000BASE-X stack,
modified by some custom modifications, followed by something that
converts the 1000BASE-X modified stack back to MII and then a
BASE-T OSI stack.  Is the PHY in that case a "media converter" ?

> To me, it is not obvious at all that a backplane PHY (an integrated one,

> at that, but there are integrated BASE-T PHYs as well) shouldn't be

> modeled using phylib. If you read carefully through the standard, you'll

> notice that the most significant portion of backplane Ethernet is in

> "72.6.10.4 State diagrams", for link training. My fear is that phylink

> will need to acquire a lot of junk in order to support one

> software-driven implementation of 10GBASE-KR AN/LT, just to prove a

> point. We have zero other implementations to compare with, but we can

> just imagine that others might either do it software-driven, like NXP,

> or software-driven but hidden in firmware, or completely in hardware.

> Either way, phylib provides just the necessary abstractions for this to

> be hidden away. Whereas phylink might need some extra timers apart from

> the existing .pcs_get_state() to get this link training job done.


I don't get it.  The phylink interface is quite simple - if we are
using in-band mode, then we are asking the PCS block for its state.
We are also telling the PCS block what configuration we want it to
be in, and we also inform it when the link overall comes up.
What the PCS does internally is of no concern to phylink.  If it needs
to do some link training, that is a matter internal to the PCS driver.
If a PCS driver wants to use a library that implements the link
training, or call firmware to do link training, that is up to the PCS
driver and of no concern to phylink.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Russell King (Oracle) July 15, 2020, 11:34 a.m. UTC | #7
I'm sorry Vladimir, I can't cope with these replies that take hours to
write to your emails; it just takes up way too much time and interferes
way too much, I'm going to have to go back to the short sharp replies
out of necessity or just not reply.

Sorry.

On Wed, Jul 15, 2020 at 12:21:00PM +0100, Russell King - ARM Linux admin wrote:
> On Wed, Jul 15, 2020 at 02:46:52AM +0300, Vladimir Oltean wrote:

> > On Tue, Jul 14, 2020 at 02:18:32PM +0100, Russell King - ARM Linux admin wrote:

> > > I'd actually given up pushing this further; I've seen patches go by that

> > > purpetuate the idea that the PCS is handled by phylib.  I feel like I'm

> > > wasting my time with this.

> > 

> > By this I think you are aiming squarely at "[PATCH net-next v3 0/9] net:

> > ethernet backplane support on DPAA1". If I understand you correctly, you

> > are saying that because of the non-phylink models used to represent that

> > system comprised of a clause 49 PCS + clause 72 PMD + clause 73 AN/LT,

> > it is not worth pursuing this phylink-based representation of a clause

> > 37 PCS.

> 

> Actually, that is not what I was aiming that comment at - that is not

> something that has been posted recently.  I'm not going to explicitly

> point at a patch set.

> 

> > > > Where I have some doubts is a

> > > > clause 49 PCS which uses a clause 73 auto-negotiation system, I would

> > > > like to understand your vision of how deep phylink is going to go into

> > > > the PMD layer, especially where it is not obvious that said layer is

> > > > integrated with the MAC.

> > > 

> > > I have only considered up to 10GBASE-R based systems as that is the

> > > limit of what I can practically test here.  I have one system that

> > > offers a QSFP+ cage, and I have a QSFP+ to 4x SFP+ (10G) splitter

> > > cable - so that's basically 4x 10GBASE-CR rather than 40GBASE-CR.

> > 

> > Ok, no problem here, we can keep this discussion at principle level.

> > 

> > > I am anticipating that clause 73 will be handled in a very similar way

> > > to clause 37.

> > 

> > This is "Figure 37-1-Location of the Auto-Negotiation function" (clause

> > 37 is "Auto-Negotiation function, type 1000BASE-X").

> > 

> >   OSI

> >   REFERENCE

> >   MODEL

> >   LAYERS

> > 

> >  +--------------+

> >  | APPLICATION  |

> >  +--------------+

> >  | PRESENTATION |

> >  +--------------+

> >  | SESSION      |

> >  +--------------+

> >  | TRANSPORT    |

> >  +--------------+

> >  | NETWORK      |

> >  +--------------+   +----------------------------------------------------+

> >  | DATA LINK    |   |    LLC (LOGICAL LINK CONTROL) OR OTHER MAC CLIENT  |

> >  |              |   +----------------------------------------------------+

> >  |              |   |              MAC CONTROL (OPTIONAL)                |

> >  |              |   +----------------------------------------------------+

> >  |              |   |            MAC - MEDIA ACCESS CONTROL              |

> >  +--------------+   +----------------------------------------------------+

> >  | PHYSICAL     |   |                  RECONCILIATION                    |

> >  |              |   +----------------------------------------------------+

> >  |              |                         |       |

> >  |              |                   GMII  |       |

> >  |              |   +----------------------------------------------------+\

> >  |              |   |          PCS, INCLUDING AUTO-NEGOTIATION           | \

> >  |              |   +----------------------------------------------------+ |

> >  |              |   |                        PMA                         | |

> >  |              |   +----------------------------------------------------+ /

> >  |              |    |   LX-PMD   |     |   SX-PMD   |    |   CX-PMD   |  / PHY

> >  |              |    +------------+     +------------+    +------------+ /

> >  |              |        |    | LX MDI      |    | SX MDI     |    | CX MDI

> >  +--------------+      +---------+        +---------+       +---------+

> >                        | MEDIUM  |        | MEDIUM  |       | MEDIUM  |

> > 

> > This is "Figure 28-2-Location of Auto-Negotiation function within the

> > ISO/IEC OSI reference model" (clause 28 is "Physical Layer link

> > signaling for Auto-Negotiation on twisted pair", aka BASE-T).

> > 

> >   OSI

> >   REFERENCE

> >   MODEL

> >   LAYERS

> > 

> >  +--------------+

> >  | APPLICATION  |

> >  +--------------+

> >  | PRESENTATION |

> >  +--------------+

> >  | SESSION      |

> >  +--------------+

> >  | TRANSPORT    |

> >  +--------------+

> >  | NETWORK      |

> >  +--------------+   +------------------------------------------------------+

> >  | DATA LINK    |   |               LLC (LOGICAL LINK CONTROL)             |

> >  |              |   +------------------------------------------------------+

> >  |              |   |               MAC - MEDIA ACCESS CONTROL             |

> >  +--------------+   +------------------------------------------------------+

> >  | PHYSICAL     |           |            RECONCILIATION            |

> >  |              |           +--------------------------------------+

> >  |              |                          |       |

> >  |              |                     MII  |       |

> >  |              |          /+--------------------------------------+

> >  |              |         / |                 PCS                  |

> >  |              |         | +--------------------------------------+

> >  |              |         | |                 PMA                  |

> >  |              |     PHY | +--------------------------------------+

> >  |              |         | |                 PMD                  |

> >  |              |         \ +--------------------------------------+

> >  |              |          \|               AUTONEG                |

> >  |              |           +--------------------------------------+

> >  |              |                           |    | MDI

> >  +--------------+                         +---------+

> >                                           | MEDIUM  |

> > 

> > This is "Figure 73-1-Location of Auto-Negotiation function within the

> > ISO/IEC OSI reference model" (clause 73 is "Auto-Negotiation for

> > backplane and copper cable assembly").

> > 

> >   OSI

> >   REFERENCE

> >   MODEL

> >   LAYERS

> > 

> >  +--------------+

> >  | APPLICATION  |

> >  +--------------+

> >  | PRESENTATION |

> >  +--------------+

> >  | SESSION      |

> >  +--------------+

> >  | TRANSPORT    |

> >  +--------------+

> >  | NETWORK      |

> >  +--------------+   +------------------------------------------------------+

> >  | DATA LINK    |   |  LLC (LOGICAL LINK CONTROL) OR ANY OTHER MAC CLIENT  |

> >  |              |   +------------------------------------------------------+

> >  |              |   |               MAC - MEDIA ACCESS CONTROL             |

> >  +--------------+   +------------------------------------------------------+

> >  | PHYSICAL     |           |            RECONCILIATION            |

> >  |              |           +--------------------------------------+

> >  |              |                          |       | GMII, XGMII, 25GMII,

> >  |              |                          |       | XLGMII or CGMII

> >  |              |          /+--------------------------------------+

> >  |              |         / |                 PCS                  |

> >  |              |         | +--------------------------------------+

> >  |              |         | |                 PMA                  |

> >  |              |     PHY | +--------------------------------------+

> >  |              |         | |                 PMD                  |

> >  |              |         \ +--------------------------------------+

> >  |              |          \|               AUTONEG                |

> >  |              |           +--------------------------------------+

> >  |              |                           |    | MDI

> >  +--------------+                         +---------+

> >                                           | MEDIUM  |

> > 

> > Identify the position of the auto-negotiation function in these 3

> > diagrams.

> > 

> > To me, the backplane auto-negotiation look closer to BASE-T than it does

> > to BASE-X. This auto-negotiation is performed by the PMD, not by the

> > PCS.

> > 

> > But you are right that clause 28 AN is treated very similarly to clause

> > 37 AN... in phylib.

> > 

> > > The clause 73 "technology ability" field defines the

> > > capabilities of the link,

> > 

> > Yes.

> > 

> > > but as we are finding with 10GBASE-R based

> > > setups with copper PHYs, the capabilities of the link may not be what

> > > we want to report to the user, especially if the copper PHY is capable

> > > of rate adaption.

> > 

> > Explain?

> > By "copper PHY" I think you mean "compliant to 10GBASE-T". 10GBASE-KR

> > serves the same purpose, in the OSI model, as that. Do we not report the

> > capabilities of the 10GBASE-T link to the user?

> 

> The IEEE diagrams are good for describing the expected layers down to

> the "media" but what IEEE calls the "media" may not be what the user

> expects.

> 

> As an example, take a system where you have a backplane which causes

> backplane ethernet, and a series of line cards that have PHYs on them

> giving you a range of different connectivity options.  As I illustrated

> below, one option would be to turn the backplane ethernet into

> conventional twisted-pair ethernet - and there are PHYs on the market

> that will do this.

> 

> So, in that case, the IEEE diagrams no longer represent the system as

> far as the user is concerned - yes, we have figure 73-1 up to the

> point of the backplane, and then we have a PHY converting the backplane

> ethernet back to (conceptually) MII, and then everything from MII

> downards in figure 28-2.

> 

> It believe it would be possible to setup such a test scenario on

> Macchiatobin hardware if the firmware for training the COMPHY (the

> serdes PHY) for a 10GBASE-KR link has not been withdrawn, and the

> 88x3310 PHY was switched to 10GBASE-KR mode on its host side.

> 

> > Also, on a separate note, is rate adaptation supported by mainline

> > Linux? Last time I looked, for 2500BASE-X plus the Aquantia PHYs, it

> > wasn't.

> 

> It depends - and I think I've mentioned this point before in response

> to the Aquantia situation.  The 88x3310 PHY supports rate adaption when

> in certain host-interface modes.  However, depending on what features

> are available on the PHY (such as MACsec, and whether that is enabled)

> controls whether pause frames are used, or whether the PHY expects the

> egress rate from the upstream MAC to be adapted to the media speed.

> 

> In the case of a 88x3310 without MACsec, there are no support in the

> PHY to send/receive and act on pause frames while it is performing

> rate adaption, so force-enabling pause mode at the MAC is not useful

> and is potentially harmful.

> 

> The 88x3310 driver used to assume that its host-side interface always

> dynamically changed between 10GBASE-R, 5GBASE-R, 2500BASE-X, and SGMII,

> but that is not the full story.  One configuration places the host side

> in 10GBASE-R mode with rate adaption.  We now have a patch merged that

> effectively disables the dynamic changing of the host-side interface

> if the PHY is in this mode.  (It also does this for RXAUI mode, fwiw.)

> 

> phylib will provide the results of link negotiation as (for example):

> 	interface = 10GBASE-R (or RXAUI)

> 	speed = 100M

> 	duplex = full

> 	pause = tx | rx

> 

> where "speed", "duplex" and "pause" in this instance are the media side

> results.  "interface" remains the operational interface between the

> PHY and MAC.

> 

> I did attempt to start a discussion about how we should approach

> rate adaption in this scenario, specifically because I have one board

> where the 88x3310 is in RXAUI mode with rate adaption connected to a

> DSA switch.  In that case, if we program the speed and duplex into the

> MAC, despite the link being in RXAUI mode, the link between the PHY

> and DSA switch fails to pass data.  Other MAC drivers don't bother

> looking at "speed" and "duplex" when in RXAUI or 10GBASE-R mode,

> because they know that the link can only operate at 10G speeds.  Of

> course, what ends up being missed is spacing the transmitted packets

> out to the media speed - but that is a per-MAC issue.

> 

> However, it still leaves the open question I've had for some time (in

> fact, since your Aquantia PHY issue) about how we should be dealing

> with rate adaption overall, which remains unaddressed - and the

> question about how we should enable pause modes at the host MAC when

> the PHY requires it is still unresolved.

> 

> But... I feel we're getting bogged down in another issue here.

> 

> > > Hence, it may be possible to have a backplane link

> > > that connects to a copper PHY that does rate adaption:

> > > 

> > > MAC <--> Clause 73 PCS <--backplane--> PHY <--base-T--> remote system

> > > 

> > > This is entirely possible from what I've seen in one NBASE-T PHY

> > > datasheet.  The PHY is documented as being capable of negotiating a

> > > 10GBASE-KR link with the host system, while offering 10GBASE-R,

> > > 1000BASE-X, 10GBASE-T, 5GBASE-T, 2.5GBASE-T, 1GBASE-T, 100BASE-T, and

> > > 10BASE-T on the media side.  The follow-on question is whether that

> > > PHY is likely to be accessible to the system on the other end of the

> > > backplane link somehow - either through some kind of firmware link

> > > or direct access.  That is not possible to know without having

> > > experience with such systems.

> > > 

> > 

> > I have not seen said datasheet. That being said, I don't question the

> > existence of such a device. Because NGBASE-T and 10GBASE-KR are both

> > media-side protocols, such device would be called a "media converter".

> > To me this is no different than a 1000BASE-T to 1000BASE-X media

> > converter. How is that modeled in Linux? Is it?

> 

> Why is it a "media converter" ?  What about:

> 

> MAC <--SGMII/RGMII/...--> PHY <--1000BASE-X-->

> 

> which is actually very common.

> 

> The difference I see with 10GBASE-KR is that it is intended in part to

> for "backplanes".  Putting my electronic engineer hat on, a backplane

> is a board that links several pluggable cards together inside a

> chassis.  This is actually described at the beginning of clause 69.

> Hence, the "media" that is referred to is the internal media to a

> piece of equipment, rather than the media that is presented to a user

> accessible socket.

> 

> At some point, that internal media will be connected either to another

> MAC (e.g. another SoC on a different card) or to the external world,

> where it will have to be converted to something more suitable for

> transmission in the external world, such as 10GBASE-T.

> 

> So, merely focusing on what is in IEEE 802.3 misses out on the bigger

> picture, which IMHO it is the bigger picture that matters for operating

> systems that provide interfaces to users.

> 

> > > That said, with the splitting of the PCS from the MAC in phylink, there

> > > is the possibility for the PCS to be implemented as an entirely

> > > separate driver to the MAC driver, although there needs to be some

> > > infrastructure to make that work sanely.  Right now, it is the MAC

> > > responsibility to attach the PCS to phylink, which is the right way

> > > to handle it for setups where the PCS is closely tied to the MAC, such

> > > as Marvell NETA and PP2 where the PCS is indistinguishable from the

> > > MAC, and will likely remain so for such setups.  However, if we need

> > > to also support entirely separate PCS, I don't see any big issues

> > > with that now that we have this split.

> > 

> > Absolutely.

> > 

> > There is code in phylink for managing a "MAC-side PCS", a concept

> > introduced by Cisco for SGMII and USXGMII. Because of the implementation

> > details of this "MAC-side PCS", support for 1000BASE-X comes basically

> > for free, because hardware speaking, the same state machines are

> > required for both, just different advertisement parsing logic.

> 

> You make it sound like phylink happened because of SGMII, and

> 1000BASE-X was an "accidental side effect".

> 

> The truth of the matter is that phylink came about due to the need to

> support SFP modules on the SolidRun Clearfog platform (which supports

> up to 2.5G on the serdes lane.)  At the time, network drivers that

> supported SFP cages had that support tightly integrated into their

> network drivers, which meant we were looking at having to implement

> SFP support in the Marvell NETA driver and who knows how many other

> drivers - which includes parsing the module EEPROMs, detecting the

> insertion/removal and so on and so forth.

> 

> The problem was split up between a SFP socket driver, handling the

> specifics of the socket, and phylink, handling the protocol side and

> dealing with the connectivity issues between a MAC and whatever was

> external to the MAC.

> 

> Due to the wide range of SFPs out there with their diverse

> capabilities, some of which have a PHY on, we needed to support

> hot-plugging PHYs - which phylink deals with, and when you have a

> PHY, SGMII becomes necessary.

> 

> > But nonetheless, the exact same hardware state machine, i.e. the one

> > described in "Figure 37-6-Auto-Negotiation state diagram", is managed

> > twice in the Linux kernel: once in phylib, for any fiber PHY, and once

> > in phylink. This is fine to me.

> 

> When you have a PHY in SGMII mode, then you have _two_ hardware state

> machines in the link between the media and the MAC, both of which need

> to indicate "link up" for the overall connection to pass data.  It so

> happens that with SGMII mode, we can mostly ignore the PHY side of

> things as the PHY conveys the results of the media side negotiation

> back to the MAC, with the exception of pause modes - we still need to

> query the PHY for that.

> 

> Now, in Marvell NETA and PP2, the quirks of hardware state machine is

> not visible - all we have are the reported link parameters from the

> buried PCS, and an indication whether the link is up or not (it's not

> even latched-low.)  So, no, we are not "handling a state machine"

> in phylink (and actually that slightly concerns me if polling mode is

> not used for the PCS.)

> 

> > Yet, in another thread, you are bringing

> > the argument that:

> > 

> > 	If we're not careful, we're going to end up with the Lynx PCS

> > 	being implemented one way, and backplane PCS being implemented

> > 	completely differently and preventing any hope of having a

> > 	backplane PCS connected to a conventional copper PHY.

> 

> I see absolutely no conflict between these statements.

> 

> > But back to your argument: a backplane PHY could be described as a

> > phylink MAC-side PCS, and this would have the benefit that the phylib

> > slot would be free for a 10GBASE-T PHY, were that ever necessary.

> > That is true, but that is akin to saying that any media converter should

> > be modeled as a MAC-side phylink PCS and another, singular, phylib PHY

> > (the converter itself). Why is this the best solution? Because it works?

> 

> It gives the greatest flexibility and gives options.

> 

> > Sure it does, but is that not a layering violation of sorts? What are

> > the boundaries of phylink? I am asking because I genuinely don't know.

> > Is it supposed to manage a SERDES data eye?

> 

> No.  Phylink only cares about the _protocol_ between the phylib PHY or

> SFP cage and the MAC.  It doesn't care about the electrical properties

> of that link - that is the domain of other parts of the code, and

> already _is_ via use (in Marvell based systems) with the use of the

> drivers/phy layer to control the COMPHY (serdes PHY).

> 

> As for "a layering violation of sorts" is that not what SGMII and

> USXGMII fundamentally is?  You have the full OSI 1000BASE-X stack,

> modified by some custom modifications, followed by something that

> converts the 1000BASE-X modified stack back to MII and then a

> BASE-T OSI stack.  Is the PHY in that case a "media converter" ?

> 

> > To me, it is not obvious at all that a backplane PHY (an integrated one,

> > at that, but there are integrated BASE-T PHYs as well) shouldn't be

> > modeled using phylib. If you read carefully through the standard, you'll

> > notice that the most significant portion of backplane Ethernet is in

> > "72.6.10.4 State diagrams", for link training. My fear is that phylink

> > will need to acquire a lot of junk in order to support one

> > software-driven implementation of 10GBASE-KR AN/LT, just to prove a

> > point. We have zero other implementations to compare with, but we can

> > just imagine that others might either do it software-driven, like NXP,

> > or software-driven but hidden in firmware, or completely in hardware.

> > Either way, phylib provides just the necessary abstractions for this to

> > be hidden away. Whereas phylink might need some extra timers apart from

> > the existing .pcs_get_state() to get this link training job done.

> 

> I don't get it.  The phylink interface is quite simple - if we are

> using in-band mode, then we are asking the PCS block for its state.

> We are also telling the PCS block what configuration we want it to

> be in, and we also inform it when the link overall comes up.

> What the PCS does internally is of no concern to phylink.  If it needs

> to do some link training, that is a matter internal to the PCS driver.

> If a PCS driver wants to use a library that implements the link

> training, or call firmware to do link training, that is up to the PCS

> driver and of no concern to phylink.

> 

> -- 

> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/

> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Vladimir Oltean July 15, 2020, 12:31 p.m. UTC | #8
On Wed, Jul 15, 2020 at 12:21:01PM +0100, Russell King - ARM Linux admin wrote:
> On Wed, Jul 15, 2020 at 02:46:52AM +0300, Vladimir Oltean wrote:

> > By this I think you are aiming squarely at "[PATCH net-next v3 0/9] net:

> > ethernet backplane support on DPAA1". If I understand you correctly, you

> > are saying that because of the non-phylink models used to represent that

> > system comprised of a clause 49 PCS + clause 72 PMD + clause 73 AN/LT,

> > it is not worth pursuing this phylink-based representation of a clause

> > 37 PCS.

> 

> Actually, that is not what I was aiming that comment at - that is not

> something that has been posted recently.  I'm not going to explicitly

> point at a patch set.

> 


You are making it unnecessarily difficult to have a meaningful
conversation. I'm not going to guess about what patch set you were
talking about. I don't know of anything else that is using the phylib
state machine to drive a PCS than the backplane series. For the PCS
support in Felix/Seville/ENETC (and
drivers/net/ethernet/fman/fman_memac.c by the way), the struct
phy_device is just being used as a container, it is not driven by
phylib.

On Wed, Jul 15, 2020 at 12:34:41PM +0100, Russell King - ARM Linux admin wrote:
> I'm sorry Vladimir, I can't cope with these replies that take hours to

> write to your emails; it just takes up way too much time and interferes

> way too much, I'm going to have to go back to the short sharp replies

> out of necessity or just not reply.

> 

> Sorry.

> 


You know what the problem is here, I'm not given the option to "just not
reply" to you, and you don't seem to like my "short sharp replies"
either.

I'll be frank and state that the big problem I see with phylink is that
there's only one person who can ever be right about it, and that is you,
by definition. I have no mental image about what phylink really is
about, where does it start, where does it end, why does it even exist
and it's not just integrated with phylib. I used to be clear about the
part with SFPs, I'd read the documentation from a few years back when I
was trying to see whether the Lynx PCS is a good fit into phylink, and
it was so centered around MAC ops and a pluggable SFP, that to me, it
was absolutely clear that managing a standalone PCS was not something of
its concern. I would have felt like making "unauthorized modifications"
to it. Managing a standalone PCS could be shoehorned into the existing
ops, and that's exactly what I ended up doing, due to lack of the
greater vision that you have.

Yet, now not only are we talking about adding a standalone PCS layer to
phylink, but also about integrating some functionality which goes deeply
into PMA/PMD territory. I don't necessarily oppose (nor would I have the
power to, as mentioned in my preface), but you can't expect people to
sign up to something that they have no clear idea of what its role is,
not to mention the very volatile API which may not be to everybody's
taste when we're talking about old, stable drivers which support not
only new ARM parts, but also very old PowerPC parts.

The best you came with is that phylink gives you flexibility and
options, and sure it does, when you add a lot of stuff to it to make it
do that. But I don't want to know why phylink is an option, I want to
know why phylib isn't. Phylink is your creation, which as far as I'm
concerned stems out of the need to support more setups than phylib did,
and you took the route of working around phylib rather than extending
it. So, I would have expected an answer from you why phylib is not a
valid place for this.

Regards,
-Vladimir
Andrew Lunn July 15, 2020, 2:20 p.m. UTC | #9
> The best you came with is that phylink gives you flexibility and

> options, and sure it does, when you add a lot of stuff to it to make it

> do that. But I don't want to know why phylink is an option, I want to

> know why phylib isn't. Phylink is your creation, which as far as I'm

> concerned stems out of the need to support more setups than phylib did,

> and you took the route of working around phylib rather than extending

> it. So, I would have expected an answer from you why phylib is not a

> valid place for this.


phylib assumes it is the last thing. There is nothing after it, just
the copper media. And it assumes the world is copper, and not
hot-pluggable.  For a long time, this was sufficient. The MAC was
directly connected to the PHY via MII, or GMII, RGMII and everything
is static.  It does this job well.

But other the last few years, things have changed. We have
intermediary blocks. An SFP is such an intermediary block, when you
have a copper module. We have SERDES blocks which can need
configuration. We potentially have a backplane, with an SFP at the
other end, with copper PHY plugged into it. And all thus can
dynamically change.

The phylib architecture is not flexiable enough to handle this
chain. phylib is good for copper PHYs, but not much more. phylink is
there to help link together a number of blocks into a complete chain,
and declare the link is up when all blocks in the chain are up. If the
end block happens to be an copper PHY, phylink will use phylib to
control the PHY, because that is what phylib is for. I would not say
phylink works around phylib, it incorporates phylib. 

If you have a simple setup of a MAC directly connected to a copper PHY
in a simple static setup, feel free to use phylib. It is not going
away. But don't push the boundary, or we are likely to reject it.

     Andrew
Russell King (Oracle) July 15, 2020, 5:02 p.m. UTC | #10
On Wed, Jul 15, 2020 at 03:31:53PM +0300, Vladimir Oltean wrote:
> On Wed, Jul 15, 2020 at 12:21:01PM +0100, Russell King - ARM Linux admin wrote:

> > On Wed, Jul 15, 2020 at 02:46:52AM +0300, Vladimir Oltean wrote:

> > > By this I think you are aiming squarely at "[PATCH net-next v3 0/9] net:

> > > ethernet backplane support on DPAA1". If I understand you correctly, you

> > > are saying that because of the non-phylink models used to represent that

> > > system comprised of a clause 49 PCS + clause 72 PMD + clause 73 AN/LT,

> > > it is not worth pursuing this phylink-based representation of a clause

> > > 37 PCS.

> > 

> > Actually, that is not what I was aiming that comment at - that is not

> > something that has been posted recently.  I'm not going to explicitly

> > point at a patch set.

> > 

> 

> You are making it unnecessarily difficult to have a meaningful

> conversation.


Sorry, but no, I really don't have time to spend hours writing endless
replies to you explaining in great detail about every minute issue.
I spent an hour and a half on that email, and quite frankly, you've
used your quota of my time for today.

Life is already difficult enough during this time of Covid 19, I
really do not need extra pressure from people who need minute detail
in every email.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Ioana Ciornei July 20, 2020, 10:24 a.m. UTC | #11
On 6/30/20 5:29 PM, Russell King wrote:
> Simplify the ksettings_set() implementation to look more like phylib's
> implementation; use a switch() for validating the autoneg setting, and
> use the linkmode_modify() helper to set the autoneg bit in the
> advertisement mask.
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>

> ---
>   drivers/net/phy/phylink.c | 25 ++++++++++++-------------
>   1 file changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 424a927d7889..103d2a550415 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -1314,25 +1314,24 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
>   	__ETHTOOL_DECLARE_LINK_MODE_MASK(support);
>   	struct ethtool_link_ksettings our_kset;
>   	struct phylink_link_state config;
> +	const struct phy_setting *s;
>   	int ret;
>   
>   	ASSERT_RTNL();
>   
> -	if (kset->base.autoneg != AUTONEG_DISABLE &&
> -	    kset->base.autoneg != AUTONEG_ENABLE)
> -		return -EINVAL;
> -
>   	linkmode_copy(support, pl->supported);
>   	config = pl->link_config;
> +	config.an_enabled = kset->base.autoneg == AUTONEG_ENABLE;
>   
> -	/* Mask out unsupported advertisements */
> +	/* Mask out unsupported advertisements, and force the autoneg bit */
>   	linkmode_and(config.advertising, kset->link_modes.advertising,
>   		     support);
> +	linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, config.advertising,
> +			 config.an_enabled);
>   
>   	/* FIXME: should we reject autoneg if phy/mac does not support it? */
> -	if (kset->base.autoneg == AUTONEG_DISABLE) {
> -		const struct phy_setting *s;
> -
> +	switch (kset->base.autoneg) {
> +	case AUTONEG_DISABLE:
>   		/* Autonegotiation disabled, select a suitable speed and
>   		 * duplex.
>   		 */
> @@ -1351,19 +1350,19 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
>   
>   		config.speed = s->speed;
>   		config.duplex = s->duplex;
> -		config.an_enabled = false;
> +		break;
>   
> -		__clear_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, config.advertising);
> -	} else {
> +	case AUTONEG_ENABLE:
>   		/* If we have a fixed link, refuse to enable autonegotiation */
>   		if (pl->cur_link_an_mode == MLO_AN_FIXED)
>   			return -EINVAL;
>   
>   		config.speed = SPEED_UNKNOWN;
>   		config.duplex = DUPLEX_UNKNOWN;
> -		config.an_enabled = true;
> +		break;
>   
> -		__set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, config.advertising);
> +	default:
> +		return -EINVAL;
>   	}
>   
>   	if (pl->phydev) {
>
Ioana Ciornei July 20, 2020, 10:52 a.m. UTC | #12
On 6/30/20 5:29 PM, Russell King wrote:
> For fixed links, we only allow the current settings, so this should be
> a matter of merely rejecting an attempt to change the settings.  If the
> settings agree, then there is nothing more we need to do.
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>

> ---
>   drivers/net/phy/phylink.c | 31 ++++++++++++++++++++-----------
>   1 file changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 967c068d16c8..b91151062cdc 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -1360,22 +1360,31 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
>   		if (!s)
>   			return -EINVAL;
>   
> -		/* If we have a fixed link (as specified by firmware), refuse
> -		 * to change link parameters.
> +		/* If we have a fixed link, refuse to change link parameters.
> +		 * If the link parameters match, accept them but do nothing.
>   		 */
> -		if (pl->cur_link_an_mode == MLO_AN_FIXED &&
> -		    (s->speed != pl->link_config.speed ||
> -		     s->duplex != pl->link_config.duplex))
> -			return -EINVAL;
> +		if (pl->cur_link_an_mode == MLO_AN_FIXED) {
> +			if (s->speed != pl->link_config.speed ||
> +			    s->duplex != pl->link_config.duplex)
> +				return -EINVAL;
> +			return 0;
> +		}
>   
>   		config.speed = s->speed;
>   		config.duplex = s->duplex;
>   		break;
>   
>   	case AUTONEG_ENABLE:
> -		/* If we have a fixed link, refuse to enable autonegotiation */
> -		if (pl->cur_link_an_mode == MLO_AN_FIXED)
> -			return -EINVAL;
> +		/* If we have a fixed link, allow autonegotiation (since that
> +		 * is our default case) but do not allow the advertisement to
> +		 * be changed. If the advertisement matches, simply return.
> +		 */
> +		if (pl->cur_link_an_mode == MLO_AN_FIXED) {
> +			if (!linkmode_equal(config.advertising,
> +					    pl->link_config.advertising))
> +				return -EINVAL;
> +			return 0;
> +		}
>   
>   		config.speed = SPEED_UNKNOWN;
>   		config.duplex = DUPLEX_UNKNOWN;
> @@ -1385,8 +1394,8 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
>   		return -EINVAL;
>   	}
>   
> -	/* For a fixed link, this isn't able to change any parameters,
> -	 * which just leaves inband mode.
> +	/* We have ruled out the case with a PHY attached, and the
> +	 * fixed-link cases.  All that is left are in-band links.
>   	 */
>   	if (phylink_validate(pl, support, &config))
>   		return -EINVAL;
>
Ioana Ciornei July 20, 2020, 11:40 a.m. UTC | #13
On 6/30/20 5:29 PM, Russell King wrote:
> With PCS support, how we implement interface reconfiguration is not up
> to the job; we end up reconfiguring the PCS for an interface change
> while the link could potentially be up.  In order to solve this, add
> two additional MAC methods for interface configuration, one to prepare
> for the change, and one to finish the change.
> 
> This allows mvneta and mvpp2 to shutdown what they require prior to the
> MAC and PCS configuration calls, and then restart as appropriate.
> 
> This impacts ksettings_set(), which now needs to identify whether the
> change is a minor tweak to the advertisement masks or whether the
> interface mode has changed, and call the appropriate function for that
> update.
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
>   drivers/net/phy/phylink.c | 80 ++++++++++++++++++++++++++-------------
>   include/linux/phylink.h   | 48 +++++++++++++++++++++++
>   2 files changed, 102 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 09f4aeef15c7..a31a00fb4974 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -433,23 +433,47 @@ static void phylink_mac_pcs_an_restart(struct phylink *pl)
>   	}
>   }
>   
> -static void phylink_pcs_config(struct phylink *pl, bool force_restart,
> -			       const struct phylink_link_state *state)
> +static void phylink_change_interface(struct phylink *pl, bool restart,
> +				     const struct phylink_link_state *state)
>   {
> -	bool restart = force_restart;
> +	int err;
> +
> +	phylink_dbg(pl, "change interface %s\n", phy_modes(state->interface));
>   
> -	if (pl->pcs_ops && pl->pcs_ops->pcs_config(pl->config,
> -						   pl->cur_link_an_mode,
> -						   state->interface,
> -						   state->advertising,
> -						   !!(pl->link_config.pause &
> -						      MLO_PAUSE_AN)))
> -		restart = true;
> +	if (pl->mac_ops->mac_prepare) {
> +		err = pl->mac_ops->mac_prepare(pl->config, pl->cur_link_an_mode,
> +					       state->interface);
> +		if (err < 0) {
> +			phylink_err(pl, "mac_prepare failed: %pe\n",
> +				    ERR_PTR(err));
> +			return;
> +		}
> +	}
>   
>   	phylink_mac_config(pl, state);
>   
> +	if (pl->pcs_ops) {
> +		err = pl->pcs_ops->pcs_config(pl->config, pl->cur_link_an_mode,
> +					      state->interface,
> +					      state->advertising,
> +					      !!(pl->link_config.pause &
> +						 MLO_PAUSE_AN));
> +		if (err < 0)
> +			phylink_err(pl, "pcs_config failed: %pe\n",
> +				    ERR_PTR(err));
> +		if (err > 0)
> +			restart = true;
> +	}
>   	if (restart)
>   		phylink_mac_pcs_an_restart(pl);
> +
> +	if (pl->mac_ops->mac_finish) {
> +		err = pl->mac_ops->mac_finish(pl->config, pl->cur_link_an_mode,
> +					      state->interface);
> +		if (err < 0)
> +			phylink_err(pl, "mac_prepare failed: %pe\n",
> +				    ERR_PTR(err));
> +	}
>   }
>   
>   /*
> @@ -555,7 +579,7 @@ static void phylink_mac_initial_config(struct phylink *pl, bool force_restart)
>   	link_state.link = false;
>   
>   	phylink_apply_manual_flow(pl, &link_state);
> -	phylink_pcs_config(pl, force_restart, &link_state);
> +	phylink_change_interface(pl, force_restart, &link_state);
>   }
>   
>   static const char *phylink_pause_to_str(int pause)
> @@ -674,7 +698,7 @@ static void phylink_resolve(struct work_struct *w)
>   				phylink_link_down(pl);
>   				cur_link_state = false;
>   			}
> -			phylink_pcs_config(pl, false, &link_state);
> +			phylink_change_interface(pl, false, &link_state);
>   			pl->link_config.interface = link_state.interface;
>   		} else if (!pl->pcs_ops) {
>   			/* The interface remains unchanged, only the speed,
> @@ -1450,22 +1474,26 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
>   		return -EINVAL;
>   
>   	mutex_lock(&pl->state_mutex);
> -	linkmode_copy(pl->link_config.advertising, config.advertising);
> -	pl->link_config.interface = config.interface;
>   	pl->link_config.speed = config.speed;
>   	pl->link_config.duplex = config.duplex;
> -	pl->link_config.an_enabled = kset->base.autoneg !=
> -				     AUTONEG_DISABLE;
> -
> -	if (pl->cur_link_an_mode == MLO_AN_INBAND &&
> -	    !test_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state)) {
> -		/* If in 802.3z mode, this updates the advertisement.
> -		 *
> -		 * If we are in SGMII mode without a PHY, there is no
> -		 * advertisement; the only thing we have is the pause
> -		 * modes which can only come from a PHY.
> -		 */
> -		phylink_pcs_config(pl, true, &pl->link_config);
> +	pl->link_config.an_enabled = kset->base.autoneg != AUTONEG_DISABLE;
> +
> +	if (pl->link_config.interface != config.interface) {


I cannot seem to understand where in this function config.interface 
could become different from pl->link_config.interface.

Is there something obvious that I am missing?

Ioana

> +		/* The interface changed, e.g. 1000base-X <-> 2500base-X */
> +		/* We need to force the link down, then change the interface */
> +		if (pl->old_link_state) {
> +			phylink_link_down(pl);
> +			pl->old_link_state = false;
> +		}
> +		if (!test_bit(PHYLINK_DISABLE_STOPPED,
> +			      &pl->phylink_disable_state))
> +			phylink_change_interface(pl, false, &config);
> +		pl->link_config.interface = config.interface;
> +		linkmode_copy(pl->link_config.advertising, config.advertising);
> +	} else if (!linkmode_equal(pl->link_config.advertising,
> +				   config.advertising)) {
> +		linkmode_copy(pl->link_config.advertising, config.advertising);
> +		phylink_change_inband_advert(pl);
>   	}
>   	mutex_unlock(&pl->state_mutex);
>   
> diff --git a/include/linux/phylink.h b/include/linux/phylink.h
> index d9913d8e6b91..2f1315f32113 100644
> --- a/include/linux/phylink.h
> +++ b/include/linux/phylink.h
> @@ -76,7 +76,9 @@ struct phylink_config {
>    * struct phylink_mac_ops - MAC operations structure.
>    * @validate: Validate and update the link configuration.
>    * @mac_pcs_get_state: Read the current link state from the hardware.
> + * @mac_prepare: prepare for a major reconfiguration of the interface.
>    * @mac_config: configure the MAC for the selected mode and state.
> + * @mac_finish: finish a major reconfiguration of the interface.
>    * @mac_an_restart: restart 802.3z BaseX autonegotiation.
>    * @mac_link_down: take the link down.
>    * @mac_link_up: allow the link to come up.
> @@ -89,8 +91,12 @@ struct phylink_mac_ops {
>   			 struct phylink_link_state *state);
>   	void (*mac_pcs_get_state)(struct phylink_config *config,
>   				  struct phylink_link_state *state);
> +	int (*mac_prepare)(struct phylink_config *config, unsigned int mode,
> +			   phy_interface_t iface);
>   	void (*mac_config)(struct phylink_config *config, unsigned int mode,
>   			   const struct phylink_link_state *state);
> +	int (*mac_finish)(struct phylink_config *config, unsigned int mode,
> +			  phy_interface_t iface);
>   	void (*mac_an_restart)(struct phylink_config *config);
>   	void (*mac_link_down)(struct phylink_config *config, unsigned int mode,
>   			      phy_interface_t interface);
> @@ -145,6 +151,31 @@ void validate(struct phylink_config *config, unsigned long *supported,
>   void mac_pcs_get_state(struct phylink_config *config,
>   		       struct phylink_link_state *state);
>   
> +/**
> + * mac_prepare() - prepare to change the PHY interface mode
> + * @config: a pointer to a &struct phylink_config.
> + * @mode: one of %MLO_AN_FIXED, %MLO_AN_PHY, %MLO_AN_INBAND.
> + * @iface: interface mode to switch to
> + *
> + * phylink will call this method at the beginning of a full initialisation
> + * of the link, which includes changing the interface mode or at initial
> + * startup time. It may be called for the current mode. The MAC driver
> + * should perform whatever actions are required, e.g. disabling the
> + * Serdes PHY.
> + *
> + * This will be the first call in the sequence:
> + * - mac_prepare()
> + * - mac_config()
> + * - pcs_config()
> + * - possible pcs_an_restart()
> + * - mac_finish()
> + *
> + * Returns zero on success, or negative errno on failure which will be
> + * reported to the kernel log.
> + */
> +int mac_prepare(struct phylink_config *config, unsigned int mode,
> +		phy_interface_t iface);
> +
>   /**
>    * mac_config() - configure the MAC for the selected mode and state
>    * @config: a pointer to a &struct phylink_config.
> @@ -220,6 +251,23 @@ void mac_pcs_get_state(struct phylink_config *config,
>   void mac_config(struct phylink_config *config, unsigned int mode,
>   		const struct phylink_link_state *state);
>   
> +/**
> + * mac_finish() - finish a to change the PHY interface mode
> + * @config: a pointer to a &struct phylink_config.
> + * @mode: one of %MLO_AN_FIXED, %MLO_AN_PHY, %MLO_AN_INBAND.
> + * @iface: interface mode to switch to
> + *
> + * phylink will call this if it called mac_prepare() to allow the MAC to
> + * complete any necessary steps after the MAC and PCS have been configured
> + * for the @mode and @iface. E.g. a MAC driver may wish to re-enable the
> + * Serdes PHY here if it was previously disabled by mac_prepare().
> + *
> + * Returns zero on success, or negative errno on failure which will be
> + * reported to the kernel log.
> + */
> +int mac_finish(struct phylink_config *config, unsigned int mode,
> +		phy_interface_t iface);
> +
>   /**
>    * mac_an_restart() - restart 802.3z BaseX autonegotiation
>    * @config: a pointer to a &struct phylink_config.
>
Russell King (Oracle) July 20, 2020, 12:44 p.m. UTC | #14
On Mon, Jul 20, 2020 at 11:40:44AM +0000, Ioana Ciornei wrote:
> On 6/30/20 5:29 PM, Russell King wrote:

> > With PCS support, how we implement interface reconfiguration is not up

> > to the job; we end up reconfiguring the PCS for an interface change

> > while the link could potentially be up.  In order to solve this, add

> > two additional MAC methods for interface configuration, one to prepare

> > for the change, and one to finish the change.

> > 

> > This allows mvneta and mvpp2 to shutdown what they require prior to the

> > MAC and PCS configuration calls, and then restart as appropriate.

> > 

> > This impacts ksettings_set(), which now needs to identify whether the

> > change is a minor tweak to the advertisement masks or whether the

> > interface mode has changed, and call the appropriate function for that

> > update.

> > 

> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

> > ---

> >   drivers/net/phy/phylink.c | 80 ++++++++++++++++++++++++++-------------

> >   include/linux/phylink.h   | 48 +++++++++++++++++++++++

> >   2 files changed, 102 insertions(+), 26 deletions(-)

> > 

> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c

> > index 09f4aeef15c7..a31a00fb4974 100644

> > --- a/drivers/net/phy/phylink.c

> > +++ b/drivers/net/phy/phylink.c

> > @@ -433,23 +433,47 @@ static void phylink_mac_pcs_an_restart(struct phylink *pl)

> >   	}

> >   }

> >   

> > -static void phylink_pcs_config(struct phylink *pl, bool force_restart,

> > -			       const struct phylink_link_state *state)

> > +static void phylink_change_interface(struct phylink *pl, bool restart,

> > +				     const struct phylink_link_state *state)

> >   {

> > -	bool restart = force_restart;

> > +	int err;

> > +

> > +	phylink_dbg(pl, "change interface %s\n", phy_modes(state->interface));

> >   

> > -	if (pl->pcs_ops && pl->pcs_ops->pcs_config(pl->config,

> > -						   pl->cur_link_an_mode,

> > -						   state->interface,

> > -						   state->advertising,

> > -						   !!(pl->link_config.pause &

> > -						      MLO_PAUSE_AN)))

> > -		restart = true;

> > +	if (pl->mac_ops->mac_prepare) {

> > +		err = pl->mac_ops->mac_prepare(pl->config, pl->cur_link_an_mode,

> > +					       state->interface);

> > +		if (err < 0) {

> > +			phylink_err(pl, "mac_prepare failed: %pe\n",

> > +				    ERR_PTR(err));

> > +			return;

> > +		}

> > +	}

> >   

> >   	phylink_mac_config(pl, state);

> >   

> > +	if (pl->pcs_ops) {

> > +		err = pl->pcs_ops->pcs_config(pl->config, pl->cur_link_an_mode,

> > +					      state->interface,

> > +					      state->advertising,

> > +					      !!(pl->link_config.pause &

> > +						 MLO_PAUSE_AN));

> > +		if (err < 0)

> > +			phylink_err(pl, "pcs_config failed: %pe\n",

> > +				    ERR_PTR(err));

> > +		if (err > 0)

> > +			restart = true;

> > +	}

> >   	if (restart)

> >   		phylink_mac_pcs_an_restart(pl);

> > +

> > +	if (pl->mac_ops->mac_finish) {

> > +		err = pl->mac_ops->mac_finish(pl->config, pl->cur_link_an_mode,

> > +					      state->interface);

> > +		if (err < 0)

> > +			phylink_err(pl, "mac_prepare failed: %pe\n",

> > +				    ERR_PTR(err));

> > +	}

> >   }

> >   

> >   /*

> > @@ -555,7 +579,7 @@ static void phylink_mac_initial_config(struct phylink *pl, bool force_restart)

> >   	link_state.link = false;

> >   

> >   	phylink_apply_manual_flow(pl, &link_state);

> > -	phylink_pcs_config(pl, force_restart, &link_state);

> > +	phylink_change_interface(pl, force_restart, &link_state);

> >   }

> >   

> >   static const char *phylink_pause_to_str(int pause)

> > @@ -674,7 +698,7 @@ static void phylink_resolve(struct work_struct *w)

> >   				phylink_link_down(pl);

> >   				cur_link_state = false;

> >   			}

> > -			phylink_pcs_config(pl, false, &link_state);

> > +			phylink_change_interface(pl, false, &link_state);

> >   			pl->link_config.interface = link_state.interface;

> >   		} else if (!pl->pcs_ops) {

> >   			/* The interface remains unchanged, only the speed,

> > @@ -1450,22 +1474,26 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,

> >   		return -EINVAL;

> >   

> >   	mutex_lock(&pl->state_mutex);

> > -	linkmode_copy(pl->link_config.advertising, config.advertising);

> > -	pl->link_config.interface = config.interface;

> >   	pl->link_config.speed = config.speed;

> >   	pl->link_config.duplex = config.duplex;

> > -	pl->link_config.an_enabled = kset->base.autoneg !=

> > -				     AUTONEG_DISABLE;

> > -

> > -	if (pl->cur_link_an_mode == MLO_AN_INBAND &&

> > -	    !test_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state)) {

> > -		/* If in 802.3z mode, this updates the advertisement.

> > -		 *

> > -		 * If we are in SGMII mode without a PHY, there is no

> > -		 * advertisement; the only thing we have is the pause

> > -		 * modes which can only come from a PHY.

> > -		 */

> > -		phylink_pcs_config(pl, true, &pl->link_config);

> > +	pl->link_config.an_enabled = kset->base.autoneg != AUTONEG_DISABLE;

> > +

> > +	if (pl->link_config.interface != config.interface) {

> 

> 

> I cannot seem to understand where in this function config.interface 

> could become different from pl->link_config.interface.

> 

> Is there something obvious that I am missing?


The validate() method is free to change the interface if required -
there's a helper that MACs can use to achieve that for switching
between 1000base-X and 2500base-X.  Useful if you have a FC SFP
plugged in capable of 2500base-X, but want to communicate with a
1000base-X link partner.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Ioana Ciornei July 20, 2020, 1:02 p.m. UTC | #15
On 7/20/20 3:44 PM, Russell King - ARM Linux admin wrote:
> On Mon, Jul 20, 2020 at 11:40:44AM +0000, Ioana Ciornei wrote:
>> On 6/30/20 5:29 PM, Russell King wrote:
>>> With PCS support, how we implement interface reconfiguration is not up
>>> to the job; we end up reconfiguring the PCS for an interface change
>>> while the link could potentially be up.  In order to solve this, add
>>> two additional MAC methods for interface configuration, one to prepare
>>> for the change, and one to finish the change.
>>>
>>> This allows mvneta and mvpp2 to shutdown what they require prior to the
>>> MAC and PCS configuration calls, and then restart as appropriate.
>>>
>>> This impacts ksettings_set(), which now needs to identify whether the
>>> change is a minor tweak to the advertisement masks or whether the
>>> interface mode has changed, and call the appropriate function for that
>>> update.
>>>
>>> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
>>> ---
>>>    drivers/net/phy/phylink.c | 80 ++++++++++++++++++++++++++-------------
>>>    include/linux/phylink.h   | 48 +++++++++++++++++++++++
>>>    2 files changed, 102 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
>>> index 09f4aeef15c7..a31a00fb4974 100644
>>> --- a/drivers/net/phy/phylink.c
>>> +++ b/drivers/net/phy/phylink.c
>>> @@ -433,23 +433,47 @@ static void phylink_mac_pcs_an_restart(struct phylink *pl)
>>>    	}
>>>    }
>>>    
>>> -static void phylink_pcs_config(struct phylink *pl, bool force_restart,
>>> -			       const struct phylink_link_state *state)
>>> +static void phylink_change_interface(struct phylink *pl, bool restart,
>>> +				     const struct phylink_link_state *state)
>>>    {
>>> -	bool restart = force_restart;
>>> +	int err;
>>> +
>>> +	phylink_dbg(pl, "change interface %s\n", phy_modes(state->interface));
>>>    
>>> -	if (pl->pcs_ops && pl->pcs_ops->pcs_config(pl->config,
>>> -						   pl->cur_link_an_mode,
>>> -						   state->interface,
>>> -						   state->advertising,
>>> -						   !!(pl->link_config.pause &
>>> -						      MLO_PAUSE_AN)))
>>> -		restart = true;
>>> +	if (pl->mac_ops->mac_prepare) {
>>> +		err = pl->mac_ops->mac_prepare(pl->config, pl->cur_link_an_mode,
>>> +					       state->interface);
>>> +		if (err < 0) {
>>> +			phylink_err(pl, "mac_prepare failed: %pe\n",
>>> +				    ERR_PTR(err));
>>> +			return;
>>> +		}
>>> +	}
>>>    
>>>    	phylink_mac_config(pl, state);
>>>    
>>> +	if (pl->pcs_ops) {
>>> +		err = pl->pcs_ops->pcs_config(pl->config, pl->cur_link_an_mode,
>>> +					      state->interface,
>>> +					      state->advertising,
>>> +					      !!(pl->link_config.pause &
>>> +						 MLO_PAUSE_AN));
>>> +		if (err < 0)
>>> +			phylink_err(pl, "pcs_config failed: %pe\n",
>>> +				    ERR_PTR(err));
>>> +		if (err > 0)
>>> +			restart = true;
>>> +	}
>>>    	if (restart)
>>>    		phylink_mac_pcs_an_restart(pl);
>>> +
>>> +	if (pl->mac_ops->mac_finish) {
>>> +		err = pl->mac_ops->mac_finish(pl->config, pl->cur_link_an_mode,
>>> +					      state->interface);
>>> +		if (err < 0)
>>> +			phylink_err(pl, "mac_prepare failed: %pe\n",
>>> +				    ERR_PTR(err));
>>> +	}
>>>    }
>>>    
>>>    /*
>>> @@ -555,7 +579,7 @@ static void phylink_mac_initial_config(struct phylink *pl, bool force_restart)
>>>    	link_state.link = false;
>>>    
>>>    	phylink_apply_manual_flow(pl, &link_state);
>>> -	phylink_pcs_config(pl, force_restart, &link_state);
>>> +	phylink_change_interface(pl, force_restart, &link_state);
>>>    }
>>>    
>>>    static const char *phylink_pause_to_str(int pause)
>>> @@ -674,7 +698,7 @@ static void phylink_resolve(struct work_struct *w)
>>>    				phylink_link_down(pl);
>>>    				cur_link_state = false;
>>>    			}
>>> -			phylink_pcs_config(pl, false, &link_state);
>>> +			phylink_change_interface(pl, false, &link_state);
>>>    			pl->link_config.interface = link_state.interface;
>>>    		} else if (!pl->pcs_ops) {
>>>    			/* The interface remains unchanged, only the speed,
>>> @@ -1450,22 +1474,26 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
>>>    		return -EINVAL;
>>>    
>>>    	mutex_lock(&pl->state_mutex);
>>> -	linkmode_copy(pl->link_config.advertising, config.advertising);
>>> -	pl->link_config.interface = config.interface;
>>>    	pl->link_config.speed = config.speed;
>>>    	pl->link_config.duplex = config.duplex;
>>> -	pl->link_config.an_enabled = kset->base.autoneg !=
>>> -				     AUTONEG_DISABLE;
>>> -
>>> -	if (pl->cur_link_an_mode == MLO_AN_INBAND &&
>>> -	    !test_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state)) {
>>> -		/* If in 802.3z mode, this updates the advertisement.
>>> -		 *
>>> -		 * If we are in SGMII mode without a PHY, there is no
>>> -		 * advertisement; the only thing we have is the pause
>>> -		 * modes which can only come from a PHY.
>>> -		 */
>>> -		phylink_pcs_config(pl, true, &pl->link_config);
>>> +	pl->link_config.an_enabled = kset->base.autoneg != AUTONEG_DISABLE;
>>> +
>>> +	if (pl->link_config.interface != config.interface) {
>>
>>
>> I cannot seem to understand where in this function config.interface
>> could become different from pl->link_config.interface.
>>
>> Is there something obvious that I am missing?
> 
> The validate() method is free to change the interface if required -
> there's a helper that MACs can use to achieve that for switching
> between 1000base-X and 2500base-X.  Useful if you have a FC SFP
> plugged in capable of 2500base-X, but want to communicate with a
> 1000base-X link partner.
> 

Ok, I was not aware of this possibility. Now that I looked, it's even 
documented in phylink's header file.

My confusion stems from the fact that I expected this to come from the 
SFP itself requesting an interface type or the other. So the usage here 
is to know what is at the other end of the line and configure the 
appropriate advertisement with ethtool?

Ioana
Russell King (Oracle) July 20, 2020, 2:19 p.m. UTC | #16
On Mon, Jul 20, 2020 at 01:02:21PM +0000, Ioana Ciornei wrote:
> On 7/20/20 3:44 PM, Russell King - ARM Linux admin wrote:

> > On Mon, Jul 20, 2020 at 11:40:44AM +0000, Ioana Ciornei wrote:

> >> On 6/30/20 5:29 PM, Russell King wrote:

> >>> With PCS support, how we implement interface reconfiguration is not up

> >>> to the job; we end up reconfiguring the PCS for an interface change

> >>> while the link could potentially be up.  In order to solve this, add

> >>> two additional MAC methods for interface configuration, one to prepare

> >>> for the change, and one to finish the change.

> >>>

> >>> This allows mvneta and mvpp2 to shutdown what they require prior to the

> >>> MAC and PCS configuration calls, and then restart as appropriate.

> >>>

> >>> This impacts ksettings_set(), which now needs to identify whether the

> >>> change is a minor tweak to the advertisement masks or whether the

> >>> interface mode has changed, and call the appropriate function for that

> >>> update.

> >>>

> >>> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

> >>> ---

> >>>    drivers/net/phy/phylink.c | 80 ++++++++++++++++++++++++++-------------

> >>>    include/linux/phylink.h   | 48 +++++++++++++++++++++++

> >>>    2 files changed, 102 insertions(+), 26 deletions(-)

> >>>

> >>> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c

> >>> index 09f4aeef15c7..a31a00fb4974 100644

> >>> --- a/drivers/net/phy/phylink.c

> >>> +++ b/drivers/net/phy/phylink.c

> >>> @@ -433,23 +433,47 @@ static void phylink_mac_pcs_an_restart(struct phylink *pl)

> >>>    	}

> >>>    }

> >>>    

> >>> -static void phylink_pcs_config(struct phylink *pl, bool force_restart,

> >>> -			       const struct phylink_link_state *state)

> >>> +static void phylink_change_interface(struct phylink *pl, bool restart,

> >>> +				     const struct phylink_link_state *state)

> >>>    {

> >>> -	bool restart = force_restart;

> >>> +	int err;

> >>> +

> >>> +	phylink_dbg(pl, "change interface %s\n", phy_modes(state->interface));

> >>>    

> >>> -	if (pl->pcs_ops && pl->pcs_ops->pcs_config(pl->config,

> >>> -						   pl->cur_link_an_mode,

> >>> -						   state->interface,

> >>> -						   state->advertising,

> >>> -						   !!(pl->link_config.pause &

> >>> -						      MLO_PAUSE_AN)))

> >>> -		restart = true;

> >>> +	if (pl->mac_ops->mac_prepare) {

> >>> +		err = pl->mac_ops->mac_prepare(pl->config, pl->cur_link_an_mode,

> >>> +					       state->interface);

> >>> +		if (err < 0) {

> >>> +			phylink_err(pl, "mac_prepare failed: %pe\n",

> >>> +				    ERR_PTR(err));

> >>> +			return;

> >>> +		}

> >>> +	}

> >>>    

> >>>    	phylink_mac_config(pl, state);

> >>>    

> >>> +	if (pl->pcs_ops) {

> >>> +		err = pl->pcs_ops->pcs_config(pl->config, pl->cur_link_an_mode,

> >>> +					      state->interface,

> >>> +					      state->advertising,

> >>> +					      !!(pl->link_config.pause &

> >>> +						 MLO_PAUSE_AN));

> >>> +		if (err < 0)

> >>> +			phylink_err(pl, "pcs_config failed: %pe\n",

> >>> +				    ERR_PTR(err));

> >>> +		if (err > 0)

> >>> +			restart = true;

> >>> +	}

> >>>    	if (restart)

> >>>    		phylink_mac_pcs_an_restart(pl);

> >>> +

> >>> +	if (pl->mac_ops->mac_finish) {

> >>> +		err = pl->mac_ops->mac_finish(pl->config, pl->cur_link_an_mode,

> >>> +					      state->interface);

> >>> +		if (err < 0)

> >>> +			phylink_err(pl, "mac_prepare failed: %pe\n",

> >>> +				    ERR_PTR(err));

> >>> +	}

> >>>    }

> >>>    

> >>>    /*

> >>> @@ -555,7 +579,7 @@ static void phylink_mac_initial_config(struct phylink *pl, bool force_restart)

> >>>    	link_state.link = false;

> >>>    

> >>>    	phylink_apply_manual_flow(pl, &link_state);

> >>> -	phylink_pcs_config(pl, force_restart, &link_state);

> >>> +	phylink_change_interface(pl, force_restart, &link_state);

> >>>    }

> >>>    

> >>>    static const char *phylink_pause_to_str(int pause)

> >>> @@ -674,7 +698,7 @@ static void phylink_resolve(struct work_struct *w)

> >>>    				phylink_link_down(pl);

> >>>    				cur_link_state = false;

> >>>    			}

> >>> -			phylink_pcs_config(pl, false, &link_state);

> >>> +			phylink_change_interface(pl, false, &link_state);

> >>>    			pl->link_config.interface = link_state.interface;

> >>>    		} else if (!pl->pcs_ops) {

> >>>    			/* The interface remains unchanged, only the speed,

> >>> @@ -1450,22 +1474,26 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,

> >>>    		return -EINVAL;

> >>>    

> >>>    	mutex_lock(&pl->state_mutex);

> >>> -	linkmode_copy(pl->link_config.advertising, config.advertising);

> >>> -	pl->link_config.interface = config.interface;

> >>>    	pl->link_config.speed = config.speed;

> >>>    	pl->link_config.duplex = config.duplex;

> >>> -	pl->link_config.an_enabled = kset->base.autoneg !=

> >>> -				     AUTONEG_DISABLE;

> >>> -

> >>> -	if (pl->cur_link_an_mode == MLO_AN_INBAND &&

> >>> -	    !test_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state)) {

> >>> -		/* If in 802.3z mode, this updates the advertisement.

> >>> -		 *

> >>> -		 * If we are in SGMII mode without a PHY, there is no

> >>> -		 * advertisement; the only thing we have is the pause

> >>> -		 * modes which can only come from a PHY.

> >>> -		 */

> >>> -		phylink_pcs_config(pl, true, &pl->link_config);

> >>> +	pl->link_config.an_enabled = kset->base.autoneg != AUTONEG_DISABLE;

> >>> +

> >>> +	if (pl->link_config.interface != config.interface) {

> >>

> >>

> >> I cannot seem to understand where in this function config.interface

> >> could become different from pl->link_config.interface.

> >>

> >> Is there something obvious that I am missing?

> > 

> > The validate() method is free to change the interface if required -

> > there's a helper that MACs can use to achieve that for switching

> > between 1000base-X and 2500base-X.  Useful if you have a FC SFP

> > plugged in capable of 2500base-X, but want to communicate with a

> > 1000base-X link partner.

> > 

> 

> Ok, I was not aware of this possibility. Now that I looked, it's even 

> documented in phylink's header file.

> 

> My confusion stems from the fact that I expected this to come from the 

> SFP itself requesting an interface type or the other. So the usage here 

> is to know what is at the other end of the line and configure the 

> appropriate advertisement with ethtool?


SFPs do not specify their host interface - they specify a range of
parameters which describe the media side.  Even then, it is not
unusual for various parameters in the EEPROM to be incorrect.

Also, 2500BASE-X at one end and 1000BASE-X at the other end is not
compatible - you won't even get synchronisation between the two ends,
so there's no possibility for passing any data about capabilities.

The idea is that if you have a module that can support 2500BASE-X and
1000BASE-X (such as a fibrechannel SFP) then, rather than being solidly
tied by the kernel to one or other of the speeds, you can request the
kernel use the other speed via ethtool - and switching between the two
requires changing the phy_interface_t setting.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!