Message ID | 20210903151436.529478-2-maciej.machnikowski@intel.com |
---|---|
State | New |
Headers | show |
Series | Add RTNL interface for SyncE | expand |
On Mon, 6 Sep 2021 18:30:40 +0000 Machnikowski, Maciej wrote: > > -----Original Message----- > > From: Jakub Kicinski <kuba@kernel.org> > > Sent: Saturday, September 4, 2021 12:14 AM > > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > > message to get SyncE status > > > > On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski wrote: > > > This patch series introduces basic interface for reading the Ethernet > > > Equipment Clock (EEC) state on a SyncE capable device. This state gives > > > information about the state of EEC. This interface is required to > > > implement Synchronization Status Messaging on upper layers. > > > > > > Initial implementation returns SyncE EEC state and flags attributes. > > > The only flag currently implemented is the EEC_SRC_PORT. When it's set > > > the EEC is synchronized to the recovered clock recovered from the > > > current port. > > > > > > SyncE EEC state read needs to be implemented as a ndo_get_eec_state > > > function. > > > > > > Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> > > > > Since we're talking SyncE-only now my intuition would be to put this > > op in ethtool. Was there a reason ethtool was not chosen? If not what > > do others think / if yes can the reason be included in the commit > > message? > > Hmm. Main reason for netlink is that linuxptp already supports it, > and it was suggested by Richard. > Having an NDO would also make it easier to add a SyncE-related > files to the sysfs for easier operation (following the ideas from the ptp > pins subsystem). > But I'm open for suggestions. I think linuxptp will need support for ethtool netlink sockets sooner rather than later. Moving this to ethtool makes sense to me since it's very much a Ethernet-oriented API at this point. > > > +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the > > port is > > > + * currently the source for the EEC > > > + */ > > > > Why include it then? Just leave the value out and if the attr is not > > present user space should assume the source is port. > > This bit has a different meaning. If it's set the port in question > is a frequency source for the multiport device, if it's cleared - some other > source is used as a source. This is needed to prevent setting invalid > configurations in the PHY (like setting the frequency source as a Master > in AN) or sending invalid messages. If the port is a frequency source > it must always send back QL-DNU messages to prevent synchronization > loops. Ah! I see. Is being the "source" negotiated somehow? Don't we need to give the user / linuxptp to select the source based on whatever info it has about topology? > > or don't check the ifindex at all and let dev_get_by.. fail. > > > > > > Thanks for pushing this API forward! > > Addressed all other comments - and thanks for giving a lot of helpful > suggestions! Thanks, BTW I think I forgot to ask for documentation, dumping info about the API and context under Documentation/ would be great!
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Tuesday, September 7, 2021 3:01 AM > To: Machnikowski, Maciej <maciej.machnikowski@intel.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Mon, 6 Sep 2021 19:01:54 +0000 Machnikowski, Maciej wrote: > > > > Hmm. Main reason for netlink is that linuxptp already supports it, > > > > and it was suggested by Richard. > > > > Having an NDO would also make it easier to add a SyncE-related > > > > files to the sysfs for easier operation (following the ideas from the ptp > > > > pins subsystem). > > > > But I'm open for suggestions. > > > > > > I think linuxptp will need support for ethtool netlink sockets sooner > > > rather than later. Moving this to ethtool makes sense to me since it's > > > very much a Ethernet-oriented API at this point. > > > > Ethtool also makes a lot of sense, but will it be possible to still make sysfs, > > and it makes sense to add it for some deployments (more on that below) > > It should not make much difference whether ndo or ethtool op is used. Will look into it then. > > > > > This bit has a different meaning. If it's set the port in question > > > > is a frequency source for the multiport device, if it's cleared - some > other > > > > source is used as a source. This is needed to prevent setting invalid > > > > configurations in the PHY (like setting the frequency source as a Master > > > > in AN) or sending invalid messages. If the port is a frequency source > > > > it must always send back QL-DNU messages to prevent synchronization > > > > loops. > > > > > > Ah! I see. Is being the "source" negotiated somehow? Don't we need to > > > give the user / linuxptp to select the source based on whatever info > > > it has about topology? > > > > The frequency source can be either pre-set statically, negotiated using > > ESMC QL-levels (if working in QL-Enabled mode), or follow automatic > > fallback inside the device. This flag gives feedback about the validity > > of recovered clock coming from a given port and is useful when you > > enable multiple recovered clocks on more than one port in > > active-passive model. In that case the "driving" port may change > > dynamically, so it's a good idea to have some interface to reflect that. > > The ESMC messages are handled by Linux or some form of firmware? > I don't see how you can implement any selection policy with a read-only > API. It can be either in FW or in Linux - depending on the deployment. We try to define the API that would enable Linux to manage that. EEC state will be read-only, but the recovered clock management part will allow changes for QL-disabled SyncE deployments that only need to see if the clock they receive on a given port is valid or not. > In general it would be more natural to place a "source id" at the > DPLL/clock, the "source" flag seems to mark the wrong end of the > relationship. If there ever are multiple consumers we won't be able > to tell which "target" the "source" is referring to. Hard to judge > how much of a problem that could be by looking at a small slice of > the system. The DPLL will operate on pins, so it will have a pin connected from the MAC/PHY that will have the recovered clock, but the recovered clock can be enabled from any port/lane. That information is kept in the MAC/PHY and the DPLL side will not be aware who it belongs to. We can come up with a better name, but think of it like: You have multiport device (switch/NIC). One port is recovering the clock, the PHY/MAC outputs that clock through the pin to the EEC (DPLL). The DPLL knows if it locked to the signal coming from the multiport PHY/MAC, but it doesn't know which port is the one that generates that clock signal. All other ports can also present the "locked" state, but they are following the clock that was received in the chosen port. If we drop this flag we won't be able to easily tell which port/lane drives the recovered clock. In short: the port with that flag on is following the network clock and leading clock of other ports of the multiport device. In the most basic SyncE deployment you can put the passive DPLL that will only give you the lock/holdover/unlocked info and just use this flag to know who currently drives the DPLL. > > That's where sysfs file be useful. When I add the implementation for > > recovered clock configuration, the sysfs may be used as standalone > > interface for configuring them when no dynamic change is needed. > > I didn't get that. Do you mean using a sysfs file to configure > the parameters of the DPLL? Only the PHY/MAC side of thing which is recovered clock configuration and the ECC state. > If the DPLL has its own set of concerns we should go ahead and create > explicit object / configuration channel for it. > > Somehow I got it into my head that you care mostly about transmitting > the clock, IOW recovering it from one port and using on another but > that's probably not even a strong use case for you or NICs in general :S This is the right thinking. The DPLL can also have different external sources, like the GNSS, and can also drive different output clocks. But for the most basic SyncE implementation, which only runs on a recovered clock, we won't need the DPLL subsystem. > > > > Addressed all other comments - and thanks for giving a lot of helpful > > > > suggestions! > > > > > > Thanks, BTW I think I forgot to ask for documentation, dumping info > > > about the API and context under Documentation/ would be great! > > > > Could you suggest where to add that? Grepping for ndo_ don't give much. > > I can add a new synce.rst file if it makes sense. > > New networking/synce.rst file makes perfect sense to me. And perhaps > link to it from driver-api/ptp.rst. OK will try to come up with something there
On Tue, 7 Sep 2021 08:50:55 +0000 Machnikowski, Maciej wrote: > > > The frequency source can be either pre-set statically, negotiated using > > > ESMC QL-levels (if working in QL-Enabled mode), or follow automatic > > > fallback inside the device. This flag gives feedback about the validity > > > of recovered clock coming from a given port and is useful when you > > > enable multiple recovered clocks on more than one port in > > > active-passive model. In that case the "driving" port may change > > > dynamically, so it's a good idea to have some interface to reflect that. > > > > The ESMC messages are handled by Linux or some form of firmware? > > I don't see how you can implement any selection policy with a read-only > > API. > > It can be either in FW or in Linux - depending on the deployment. > We try to define the API that would enable Linux to manage that. We should implement the API for Linux to manage things from the get go. > EEC state will be read-only, but the recovered clock management part > will allow changes for QL-disabled SyncE deployments that only need > to see if the clock they receive on a given port is valid or not. > > > In general it would be more natural to place a "source id" at the > > DPLL/clock, the "source" flag seems to mark the wrong end of the > > relationship. If there ever are multiple consumers we won't be able > > to tell which "target" the "source" is referring to. Hard to judge > > how much of a problem that could be by looking at a small slice of > > the system. > > The DPLL will operate on pins, so it will have a pin connected from the > MAC/PHY that will have the recovered clock, but the recovered clock > can be enabled from any port/lane. That information is kept in the > MAC/PHY and the DPLL side will not be aware who it belongs to. So the clock outputs are muxed to a single pin at the Ethernet IP level, in your design. I wonder if this is the common implementation and therefore if it's safe to bake that into the API. Input from other vendors would be great... Also do I understand correctly that the output of the Ethernet IP is just the raw Rx clock once receiver is locked and the DPLL which enum if_synce_state refers to is in the time IP, that DPLL could be driven by GNSS etc? > We can come up with a better name, but think of it like: > You have multiport device (switch/NIC). One port is recovering > the clock, the PHY/MAC outputs that clock through the pin > to the EEC (DPLL). The DPLL knows if it locked to the signal coming > from the multiport PHY/MAC, but it doesn't know which port is the one > that generates that clock signal. All other ports can also present the > "locked" state, but they are following the clock that was received > in the chosen port. If we drop this flag we won't be able to easily tell > which port/lane drives the recovered clock. > In short: the port with that flag on is following the network clock > and leading clock of other ports of the multiport device. > > In the most basic SyncE deployment you can put the passive DPLL that > will only give you the lock/holdover/unlocked info and just use this flag > to know who currently drives the DPLL. > > > > That's where sysfs file be useful. When I add the implementation for > > > recovered clock configuration, the sysfs may be used as standalone > > > interface for configuring them when no dynamic change is needed. > > > > I didn't get that. Do you mean using a sysfs file to configure > > the parameters of the DPLL? > > Only the PHY/MAC side of thing which is recovered clock configuration > and the ECC state. > > > If the DPLL has its own set of concerns we should go ahead and create > > explicit object / configuration channel for it. > > > > Somehow I got it into my head that you care mostly about transmitting > > the clock, IOW recovering it from one port and using on another but > > that's probably not even a strong use case for you or NICs in general :S > > This is the right thinking. The DPLL can also have different external sources, > like the GNSS, and can also drive different output clocks. But for the most > basic SyncE implementation, which only runs on a recovered clock, we won't > need the DPLL subsystem. The GNSS pulse would come in over an external pin, tho, right? Your earlier version of the patchset had GNSS as an enum value, how would the driver / FW know that a given pin means GNSS? > > > Could you suggest where to add that? Grepping for ndo_ don't give much. > > > I can add a new synce.rst file if it makes sense. > > > > New networking/synce.rst file makes perfect sense to me. And perhaps > > link to it from driver-api/ptp.rst. > > OK will try to come up with something there
On Tue, 7 Sep 2021 15:47:05 +0000 Machnikowski, Maciej wrote: > > > It can be either in FW or in Linux - depending on the deployment. > > > We try to define the API that would enable Linux to manage that. > > > > We should implement the API for Linux to manage things from the get go. > > Yep! Yet let's go one step at a time. I believe once we have the basics (EEC > monitoring and recovered clock configuration) we'll be able to implement > a basic functionality - and add bells and whistles later on, as there are more > capabilities that we could support in SW. The set API may shape how the get API looks. We need a minimal viable API where the whole control part of it is not "firmware or proprietary tools take care of that". Do you have public docs on how the whole solution works? > > > The DPLL will operate on pins, so it will have a pin connected from the > > > MAC/PHY that will have the recovered clock, but the recovered clock > > > can be enabled from any port/lane. That information is kept in the > > > MAC/PHY and the DPLL side will not be aware who it belongs to. > > > > So the clock outputs are muxed to a single pin at the Ethernet IP > > level, in your design. I wonder if this is the common implementation > > and therefore if it's safe to bake that into the API. Input from other > > vendors would be great... > > I believe this is the state-of-art: here's the Broadcom public one > https://docs.broadcom.com/doc/1211168567832, I believe Marvel > has similar solution. But would also be happy to hear others. Interesting. That reveals the need for also marking the backup (/secondary) clock. Have you seen any docs on how systems with discreet PHY ASICs mux the clocks? > > Also do I understand correctly that the output of the Ethernet IP > > is just the raw Rx clock once receiver is locked and the DPLL which > > enum if_synce_state refers to is in the time IP, that DPLL could be > > driven by GNSS etc? > > Ethernet IP/PHY usually outputs a divided clock signal (since it's > easier to route) derived from the RX clock. > The DPLL connectivity is vendor-specific, as you can use it to connect > some external signals, but you can as well just care about relying > the SyncE clock and only allow recovering it and passing along > the QL info when your EEC is locked. That's why I backed up from > a full DPLL implementation in favor of a more generic EEC clock. What is an ECC clock? To me the PLL state in the Ethernet port is the state of the recovered clock. enum if_eec_state has values like holdover which seem to be more applicable to the "system wide" PLL. Let me ask this - if one port is training the link and the other one has the lock and is the source - what state will be reported for each port? > The Time IP is again relative and vendor-specific. If SyncE is deployed > alongside PTP it will most likely be tightly coupled, but if you only > care about having a frequency source - it's not mandatory and it can be > as well in the PHY IP. I would not think having just the freq is very useful. > Also I think I will strip the reported states to the bare minimum defined > in the ITU-T G.781 instead of reusing the states that were already defined > for a specific DPLL. > > > > This is the right thinking. The DPLL can also have different external sources, > > > like the GNSS, and can also drive different output clocks. But for the most > > > basic SyncE implementation, which only runs on a recovered clock, we won't > > > need the DPLL subsystem. > > > > The GNSS pulse would come in over an external pin, tho, right? Your > > earlier version of the patchset had GNSS as an enum value, how would > > the driver / FW know that a given pin means GNSS? > > The GNSS 1PPS will more likely go directly to the "full" DPLL. > The pin topology can be derived from FW or any vendor-specific way of mapping > pins to their sources. And, in "worst" case can just be hardcoded for a specific > device.
On Wed, 8 Sep 2021 08:03:35 +0000 Machnikowski, Maciej wrote: > > > Yep! Yet let's go one step at a time. I believe once we have the basics (EEC > > > monitoring and recovered clock configuration) we'll be able to implement > > > a basic functionality - and add bells and whistles later on, as there are more > > > capabilities that we could support in SW. > > > > The set API may shape how the get API looks. We need a minimal viable > > API where the whole control part of it is not "firmware or proprietary > > tools take care of that". > > > > Do you have public docs on how the whole solution works? > > The best reference would be my netdev 0x15 tutorial: > https://netdevconf.info/0x15/session.html?Introduction-to-time-synchronization-over-Ethernet > The SyncE API considerations starts ~54:00, but basically we need API for: > - Controlling the lane to pin mapping for clock recovery > - Check the EEC/DPLL state and see what's the source of reference frequency > (in more advanced deployments) > - control additional input and output pins (GNSS input, external inputs, recovered > frequency reference) Thanks, good presentation! I haven't seen much in terms of system design, at the level similar to the Broadcom whitepaper you shared. > > > I believe this is the state-of-art: here's the Broadcom public one > > > https://docs.broadcom.com/doc/1211168567832, I believe Marvel > > > has similar solution. But would also be happy to hear others. > > > > Interesting. That reveals the need for also marking the backup > > (/secondary) clock. > > That's optional, but useful. And here's where we need a feedback > on which port/lane is currently used, as the switch may be automatic What do you mean by optional? How does the user know if there is fallback or not? Is it that Intel is intending to support only devices with up to 2 ports and the second port is always the backup (apologies for speculating). > > Have you seen any docs on how systems with discreet PHY ASICs mux > > the clocks? > > Yes - unfortunately they are not public :( Mumble, mumble. Ido, Florian - any devices within your purview which would support SyncE? > > > Ethernet IP/PHY usually outputs a divided clock signal (since it's > > > easier to route) derived from the RX clock. > > > The DPLL connectivity is vendor-specific, as you can use it to connect > > > some external signals, but you can as well just care about relying > > > the SyncE clock and only allow recovering it and passing along > > > the QL info when your EEC is locked. That's why I backed up from > > > a full DPLL implementation in favor of a more generic EEC clock. > > > > What is an ECC clock? To me the PLL state in the Ethernet port is the > > state of the recovered clock. enum if_eec_state has values like > > holdover which seem to be more applicable to the "system wide" PLL. > > EEC is Ethernet Equipment Clock. In most cases this will be a DPLL, but that's > not mandatory and I believe it may be different is switches where > you only need to drive all ports TX from a single frequency source. In this > case the DPLL can be embedded in the multiport PHY, > > > Let me ask this - if one port is training the link and the other one has > > the lock and is the source - what state will be reported for each port? > > In this case the port that has the lock source will report the lock and > the EEC_SRC_PORT flag. The port that trains the link will show the > lock without the flag and once it completes the training sequence it will > use the EEC's frequency to transmit the data so that the next hop is able > to synchronize its EEC to the incoming RX frequency Alright, I don't like that. It feels like you're attaching one object's information (ECC) to other objects (ports), and repeating it. Prof Goczyła and dr Landowska would not be proud. > > > The Time IP is again relative and vendor-specific. If SyncE is deployed > > > alongside PTP it will most likely be tightly coupled, but if you only > > > care about having a frequency source - it's not mandatory and it can be > > > as well in the PHY IP. > > > > I would not think having just the freq is very useful. > > This depends on the deployment. There are couple popular frequencies > Most popular are 2,048 kHz, 10 MHz and 64 kHz. There are many > deployments that only require frequency sync without the phase > and/or time. I.e. if you deploy frequency division duplex you only need the > frequency reference, and the higher frequency you have - the faster you can > lock to it.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Wednesday, September 8, 2021 6:21 PM > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Wed, 8 Sep 2021 08:03:35 +0000 Machnikowski, Maciej wrote: > > > > Yep! Yet let's go one step at a time. I believe once we have the basics > (EEC > > > > monitoring and recovered clock configuration) we'll be able to > implement > > > > a basic functionality - and add bells and whistles later on, as there are > more > > > > capabilities that we could support in SW. > > > > > > The set API may shape how the get API looks. We need a minimal viable > > > API where the whole control part of it is not "firmware or proprietary > > > tools take care of that". > > > > > > Do you have public docs on how the whole solution works? > > > > The best reference would be my netdev 0x15 tutorial: > > https://netdevconf.info/0x15/session.html?Introduction-to-time- > synchronization-over-Ethernet > > The SyncE API considerations starts ~54:00, but basically we need API for: > > - Controlling the lane to pin mapping for clock recovery > > - Check the EEC/DPLL state and see what's the source of reference > frequency > > (in more advanced deployments) > > - control additional input and output pins (GNSS input, external inputs, > recovered > > frequency reference) > > Thanks, good presentation! I haven't seen much in terms of system > design, at the level similar to the Broadcom whitepaper you shared. See the "drawing" below. > > > > I believe this is the state-of-art: here's the Broadcom public one > > > > https://docs.broadcom.com/doc/1211168567832, I believe Marvel > > > > has similar solution. But would also be happy to hear others. > > > > > > Interesting. That reveals the need for also marking the backup > > > (/secondary) clock. > > > > That's optional, but useful. And here's where we need a feedback > > on which port/lane is currently used, as the switch may be automatic > > What do you mean by optional? How does the user know if there is > fallback or not? Is it that Intel is intending to support only > devices with up to 2 ports and the second port is always the > backup (apologies for speculating). That will be a part of pin config API. It needs to give info about the number of supported pins that the PHY/MAC can use as recovered clock outputs. Once the pin is assigned to the lane the recovered clock (divided or not) will appear on the configured PHY/MAC pin and EEC will be able to use it. If there is more than 1 available - they will have some priority assigned to them that will be known to the EEC/board designer. And we plan to support devices that only comes with 1 recovered clock output as well. > > > Have you seen any docs on how systems with discreet PHY ASICs mux > > > the clocks? > > > > Yes - unfortunately they are not public :( > > Mumble, mumble. Ido, Florian - any devices within your purview which > would support SyncE? OK Found one that's public: https://www.marvell.com/content/dam/marvell/en/public-collateral/transceivers/marvell-phys-transceivers-alaska-c-88x5113-datasheet.pdf see Fig. 23 and chapter 3.11 for details, but conceptually it's similar. > > > > Ethernet IP/PHY usually outputs a divided clock signal (since it's > > > > easier to route) derived from the RX clock. > > > > The DPLL connectivity is vendor-specific, as you can use it to connect > > > > some external signals, but you can as well just care about relying > > > > the SyncE clock and only allow recovering it and passing along > > > > the QL info when your EEC is locked. That's why I backed up from > > > > a full DPLL implementation in favor of a more generic EEC clock. > > > > > > What is an ECC clock? To me the PLL state in the Ethernet port is the > > > state of the recovered clock. enum if_eec_state has values like > > > holdover which seem to be more applicable to the "system wide" PLL. > > > > EEC is Ethernet Equipment Clock. In most cases this will be a DPLL, but that's > > not mandatory and I believe it may be different is switches where > > you only need to drive all ports TX from a single frequency source. In this > > case the DPLL can be embedded in the multiport PHY, > > > > > Let me ask this - if one port is training the link and the other one has > > > the lock and is the source - what state will be reported for each port? > > > > In this case the port that has the lock source will report the lock and > > the EEC_SRC_PORT flag. The port that trains the link will show the > > lock without the flag and once it completes the training sequence it will > > use the EEC's frequency to transmit the data so that the next hop is able > > to synchronize its EEC to the incoming RX frequency > > Alright, I don't like that. It feels like you're attaching one object's > information (ECC) to other objects (ports), and repeating it. Prof > Goczyła and dr Landowska would not be proud. Hahaha - did not expect them to pop up here :) It's true, but the problem is that they depend on each other. The path is: Lane0 ------------- |\ Pin0 RefN ____ ------------- | |-----------------| | synced clk ... | |-----------------| EEC |------------------ ------------- |/ PinN RefM|____ | Lane N MUX To get the full info a port needs to know the EEC state and which lane is used as a source (or rather - my lane or any other). The lane -> Pin mapping is buried in the PHY/MAC, but the source of frequency is in the EEC. What's even more - the Pin->Ref mapping is board specific. The viable solutions are: - Limit to the proposed "I drive the clock" vs "Someone drives it" and assume the Driver returns all info - return the EEC Ref index, figure out which pin is connected to it and then check which MAC/PHY lane that drives it. I assume option one is easy to implement and keep in the future even if we finally move to option 2 once we define EEC/DPLL subsystem. In future #1 can take the lock information from the DPLL subsystem, but will also enable simple deployments that won't expose the whole DPLL, like a filter PLL embedded in a multiport PHY that will only work for SyncE in which case this API will only touch a single component. > > > > The Time IP is again relative and vendor-specific. If SyncE is deployed > > > > alongside PTP it will most likely be tightly coupled, but if you only > > > > care about having a frequency source - it's not mandatory and it can be > > > > as well in the PHY IP. > > > > > > I would not think having just the freq is very useful. > > > > This depends on the deployment. There are couple popular frequencies > > Most popular are 2,048 kHz, 10 MHz and 64 kHz. There are many > > deployments that only require frequency sync without the phase > > and/or time. I.e. if you deploy frequency division duplex you only need the > > frequency reference, and the higher frequency you have - the faster you > can > > lock to it.
> > > The SyncE API considerations starts ~54:00, but basically we need API for: > > > - Controlling the lane to pin mapping for clock recovery > > > - Check the EEC/DPLL state and see what's the source of reference > > frequency > > > (in more advanced deployments) > > > - control additional input and output pins (GNSS input, external inputs, > > recovered > > > frequency reference) Now that you have pointed to a datasheet... > - Controlling the lane to pin mapping for clock recovery So this is a PHY property. That could be Linux driving the PHY, via phylib, drivers/net/phy, or there could be firmware in the MAC driver which hides the PHY and gives you some sort of API to access it. > Check the EEC/DPLL state and see what's the source of reference > frequency Where is the EEC/DPLL implemented? Is it typically also in the PHY? Or some other hardware block? I just want to make sure we have an API which we can easily delegate to different subsystems, some of it in the PHY driver, maybe some of it somewhere else. Also, looking at the Marvell datasheet, it appears these registers are in the MDIO_MMD_VEND2 range. Has any of this been specified? Can we expect to be able to write a generic implementation sometime in the future which PHY drivers can share? I just looked at a 1G Marvell PHY. It uses RGMII or SGMII towards the host. But there is no indication you can take the clock from the SGMII SERDES, it is only the recovered clock from the line. And the recovered clock always goes out the CLK125 pin, which can either be 125MHz or 25MHz. So in this case, you have no need to control the lane to pin mapping, it is fixed, but do we want to be able to control the divider? Do we need a mechanism to actually enumerate what the hardware can do? Since we are talking about clocks and dividers, and multiplexors, should all this be using the common clock framework, which already supports most of this? Do we actually need something new? Andrew
On Wed, Sep 08, 2021 at 03:20:27PM -0700, Jakub Kicinski wrote: > On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote: > > Since we are talking about clocks and dividers, and multiplexors, > > should all this be using the common clock framework, which already > > supports most of this? Do we actually need something new? > > Does the common clock framework expose any user space API? Ah, good point. No, i don't think it does, apart from debugfs, which is not really a user space API, and it contains read only descriptions of the clock tree, current status, mux settings, dividers etc. So probably anybody implemented the proposed API the Linux way will use the common clock framework and its internal API, and debug the implementation via debugfs. PHY drivers already do use it, e.g. when the PHY is using a clock provided by the MAC, it uses the API to enable/disable and set ifs frequency as needed. Andrew
On Thu, Sep 09, 2021 at 12:59:27AM +0200, Andrew Lunn wrote: > On Wed, Sep 08, 2021 at 03:20:27PM -0700, Jakub Kicinski wrote: > > On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote: > > > Since we are talking about clocks and dividers, and multiplexors, > > > should all this be using the common clock framework, which already > > > supports most of this? Do we actually need something new? > > > > Does the common clock framework expose any user space API? > > Ah, good point. No, i don't think it does, apart from debugfs, which > is not really a user space API, and it contains read only descriptions > of the clock tree, current status, mux settings, dividers etc. Wouldn't it make sense to develop some kind of user land API to manipulate the common clock framework at run time? Just dreaming... Richard
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Thursday, September 9, 2021 12:19 AM > To: Machnikowski, Maciej <maciej.machnikowski@intel.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Wed, 8 Sep 2021 17:30:24 +0000 Machnikowski, Maciej wrote: > > Lane0 > > ------------- |\ Pin0 RefN ____ > > ------------- | |-----------------| | synced clk > > | |-----------------| EEC |------------------ > > ------------- |/ PinN RefM|____ | > > Lane N MUX > > > > To get the full info a port needs to know the EEC state and which lane is > used > > as a source (or rather - my lane or any other). > > EEC here is what the PHY documentation calls "Cleanup PLL" right? It's a generic term for an internal or external PLL that takes the reference frequency from the certain port/lane and drives its TX part. In a specific case it can be the internal cleanup PLL. > > The lane -> Pin mapping is buried in the PHY/MAC, but the source of > frequency > > is in the EEC. > > Not sure what "source of frequency" means here. There's a lot of > frequencies here. It's the source lane/port that drives the DPLL reference. In other words, the DPLL will tune its frequency to match the one recovered from a certain PHY lane/port. > > What's even more - the Pin->Ref mapping is board specific. > > Breaking down the system into components we have: > > Port > A.1 Rx lanes > A.2 Rx pins (outputs) > A.3 Rx clk divider > B.1 Tx lanes > B.2 Tx pins (inputs) > > ECC > C.1 Inputs > C.2 Outputs > C.3 PLL state > > In the most general case we want to be able to: > map recovered clocks to PHY output pins (A.1 <> A.2) > set freq div on the recovered clock (A.2 <> A.3) Technically the pin (if exists) will be the last thing, so a better way would be to map lane to the divider (A.1<>A.3) and then a divider to pin (A.3<>A.2), but the idea is the same > set the priorities of inputs on ECC (C.1) > read the ECC state (C.3) > control outputs of the ECC (C.2) Yes! > select the clock source for port Tx (B.2 <> B.1) And here we usually don't allow control over this. The DPLL is preconfigured to output the frequency that's expected on the PHY/MAC clock input and we shouldn't mess with it in runtime. > As you said, pin -> ref mapping is board specific, so the API should > not assume knowledge of routing between Port and ECC. If it does just > give the pins matching names. I believe referring to a board user guide is enough to cover that one, just like with PTP pins. There may be 1000 different ways of connecting those signals. > We don't have to implement the entire design but the pieces we do create > must be right for the larger context. With the current code the > ECC/Cleanup PLL is not represented as a separate entity, and mapping of > what source means is on the wrong "end" of the A.3 <> C.1 relationship. That's why I initially started with the EEC state + pin idx of the driving source. I believe it's a cleaner solution, as then we can implement the same pin indexes in the recovered clock part of the interface and the user will be able to see the state and driving pin from the EEC_STATE (both belongs to the DPLL) and use the PHY pins subsystem to see if the driving pin index matches the index that I drive. In that case we keep all C-thingies in the C subsystem and A stuff in A subsystem. The only "mix" is in the pin indexes that would use numbering from C. If it's an attribute - it can as well be optional for the deployments that don't need/support it. > > The viable solutions are: > > - Limit to the proposed "I drive the clock" vs "Someone drives it" and > assume the > > Driver returns all info > > - return the EEC Ref index, figure out which pin is connected to it and then > check > > which MAC/PHY lane that drives it. > > > > I assume option one is easy to implement and keep in the future even if > we > > finally move to option 2 once we define EEC/DPLL subsystem. > > > > In future #1 can take the lock information from the DPLL subsystem, but > > will also enable simple deployments that won't expose the whole DPLL, > > like a filter PLL embedded in a multiport PHY that will only work for > > SyncE in which case this API will only touch a single component. > > Imagine a system with two cascaded switch ASICs and a bunch of PHYs. > How do you express that by pure extensions to the proposed API? > Here either the cleanup PLLs would be cascaded (subordinate one needs > to express that its "source" is another PLL) or single lane can be > designated as a source for both PLLs (but then there is only one > "source" bit and multiple "enum if_eec_state"s). In that case - once we have pins- we'll see that the leader DPLL is synced to the pin that comes from the PHY/MAC and be able to find the corresponding lane, and on the follower side we'll see that it's locked to the pin that corresponds to the master DPLL. The logic to "do something" with that knowledge needs to be in the userspace app, as there may be some external connections needed that are unknown at the board level (think of 2 separate adapters connected with an external cable). > I think we can't avoid having a separate object for ECC/Cleanup PLL. > You can add it as a subobject to devlink but new genetlink family seems > much preferable given the devlink instances themselves have unclear > semantics at this point. Or you can try to convince Richard that ECC > belongs as part of PTP :) > In fact I don't think you care about any of the PHY / port stuff > currently. All you need is the ECC side of the API. IIUC you have > relatively simple setup where there is only one pin per port, and > you don't care about syncing the Tx clock. I actually do. There's (almost) always less recovered clock resources (aka pins) than ports/lanes in the system. The TX clock will be synchronized once the EEC reports the lock state, as it's the part that generates clocks for the TX part of the PHY.
> -----Original Message----- > From: Richard Cochran <richardcochran@gmail.com> > Sent: Thursday, September 9, 2021 4:09 AM > To: Andrew Lunn <andrew@lunn.ch> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Thu, Sep 09, 2021 at 12:59:27AM +0200, Andrew Lunn wrote: > > On Wed, Sep 08, 2021 at 03:20:27PM -0700, Jakub Kicinski wrote: > > > On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote: > > > > Since we are talking about clocks and dividers, and multiplexors, > > > > should all this be using the common clock framework, which already > > > > supports most of this? Do we actually need something new? > > > > > > Does the common clock framework expose any user space API? > > > > Ah, good point. No, i don't think it does, apart from debugfs, which > > is not really a user space API, and it contains read only descriptions > > of the clock tree, current status, mux settings, dividers etc. > > Wouldn't it make sense to develop some kind of user land API to > manipulate the common clock framework at run time? > > Just dreaming... > > Richard That may be dangerous. In SyncE world we control clocks that are only references to the EEC block. The worst that can happen is that the DPLL will ignore incoming clock as it has invalid frequency, or it will drift off to the edge of allowed range. Controlling the clock that actually drives any components (PHY/MAC) in runtime can be a good way to brick the part. I feel that, while the reuse of structures may be a good idea, the userspace API for clocks is not. They are usually set up once at the board init level and stay like that "forever". The outputs we need to control are only a subset of all of them and they rather fall in the PTP pins level of details, rather than clock ones.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Thursday, September 9, 2021 1:58 AM > To: Andrew Lunn <andrew@lunn.ch> > Cc: Machnikowski, Maciej <maciej.machnikowski@intel.com>; > netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; > richardcochran@gmail.com; abyagowi@fb.com; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; davem@davemloft.net; linux- > kselftest@vger.kernel.org; Michal Kubecek <mkubecek@suse.cz>; Saeed > Mahameed <saeed@kernel.org>; Michael Chan > <michael.chan@broadcom.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Thu, 9 Sep 2021 01:14:36 +0200 Andrew Lunn wrote: > > > As you said, pin -> ref mapping is board specific, so the API should > > > not assume knowledge of routing between Port and ECC. > > > > That information will probably end up in device tree. And X different > > implementations of ACPI, unless somebody puts there foot down and > > stops the snow flakes. > > > > > Imagine a system with two cascaded switch ASICs and a bunch of PHYs. > > > How do you express that by pure extensions to the proposed API? > > > > Device tree is good at that. ACPI might eventually catch up. > > I could well be wrong but some of those connectors could well be just > SMAs on the back plate, no? User could cable those up to their heart > content. > > https://engineering.fb.com/2021/08/11/open-source/time-appliance/ Yep! We should base on the abstract indexes, rather than the device tree. The user needs to make sense of the indexes, just like with PTP pins. > > How complex a setup do we actually expect? Can there be multiple > > disjoint SyncE trees within an Ethernet switch cluster? Or would it be > > reasonable to say all you need to configure is the clock source, and > > all other ports of the switches are slaves if SyncE is enabled for the > > port? I've never see any SOHO switch hardware which allows you to have > > disjoint PTP trees, so it does not sound too unreasonable to only > > allow a single SyncE tree per switch cluster. > > Not sure. I get the (perhaps unfounded) feeling that just forwarding > the clock from one port to the rest is simpler. Maciej cares primarily > about exposing the clock to other non-Ethernet things, the device would > be an endpoint here, using the clock for LTE or whatnot. Also multiple PTP domain switches starts appearing on the market. I know Cisco makes them: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus3548/sw/system_mgmt/602/b_N3548_System_Management_Config_602A11/b_N3548_Sysetm_Management_Config_602A11_chapter_011.html Disjoint SyncE trees will be harder to implement due to the physical nature of it. > > Also, if you are cascading switches, you generally don't put PHYs in > > the middle, you just connect the SERDES lanes together. > > My concern was a case where PHY connected to one switch exposes the > refclock on an aux pin which is then muxed to more than one switch ASIC. > IOW the "source port" is not actually under the same switch. > > Again, IDK if that's realistic. It can be done, but the userspace app should make sense out of this config. Coding all scenarios in kernel would make 1000000 different possible configurations in the driver - which probably no one wants to keep/maintain. We can return info about hardwired pins (like GNSS, PHY recovered clks, PTP clock and so on) but should also give some generic ones.
Dave, Are there any free slots on Plumbers to discuss and close on SyncE interfaces (or can we add an extra one). I can reuse the slides from the Netdev to give background and a live discussion may help closing opens around it, and I'd be happy to co-present with anyone who wants to also join this effort. Regards Maciek
On Wed, Sep 08, 2021 at 09:21:15AM -0700, Jakub Kicinski wrote: > Mumble, mumble. Ido, Florian - any devices within your purview which > would support SyncE? Sorry for the delay, was AFK. I remember SyncE being mentioned a few times in the past, so it is likely that we will be requested to add SyncE support in mlxsw in the future. I will try to solicit feedback from colleagues more familiar with the subject and provide it here next week.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7c41593c1d6a..de78b8ed903c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1344,6 +1344,8 @@ struct netdev_net_notifier { * The caller must be under RCU read context. * int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path); * Get the forwarding path to reach the real device from the HW destination address + * int (*ndo_get_synce_state)(struct net_device *dev, struct if_eec_state_msg *state) + * Get state of physical layer frequency syntonization (SyncE) */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1563,6 +1565,10 @@ struct net_device_ops { struct net_device * (*ndo_get_peer_dev)(struct net_device *dev); int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path); + int (*ndo_get_eec_state)(struct net_device *dev, + enum if_eec_state *state, + u32 *eec_flags, + struct netlink_ext_ack *extack); }; /** diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index eebd3894fe89..78a8a5af8cd8 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1273,4 +1273,35 @@ enum { #define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1) +/* SyncE section */ + +enum if_eec_state { + IF_EEC_STATE_INVALID = 0, + IF_EEC_STATE_FREERUN, + IF_EEC_STATE_LOCKACQ, + IF_EEC_STATE_LOCKREC, + IF_EEC_STATE_LOCKED, + IF_EEC_STATE_HOLDOVER, + IF_EEC_STATE_OPEN_LOOP, + __IF_EEC_STATE_MAX, +}; + +#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1) +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the port is + * currently the source for the EEC + */ + +struct if_eec_state_msg { + __u32 ifindex; +}; + +enum { + IFLA_EEC_UNSPEC, + IFLA_EEC_STATE, + IFLA_EEC_FLAGS, + __IFLA_EEC_MAX, +}; + +#define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1) + #endif /* _UAPI_LINUX_IF_LINK_H */ diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 5888492a5257..642ac18a09d9 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -185,6 +185,9 @@ enum { RTM_GETNEXTHOPBUCKET, #define RTM_GETNEXTHOPBUCKET RTM_GETNEXTHOPBUCKET + RTM_GETEECSTATE = 120, +#define RTM_GETEECSTATE RTM_GETEECSTATE + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 972c8cb303a5..11d4e96d3371 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -5468,6 +5468,75 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; } +static int rtnl_fill_eec_state(struct sk_buff *skb, struct net_device *dev, + u32 portid, u32 seq, struct netlink_callback *cb, + int flags, struct netlink_ext_ack *extack) +{ + const struct net_device_ops *ops = dev->netdev_ops; + struct if_eec_state_msg *state_msg; + enum if_eec_state state; + struct nlmsghdr *nlh; + u32 eec_flags; + int err; + + ASSERT_RTNL(); + + if (!ops->ndo_get_eec_state) + return -EOPNOTSUPP; + + err = ops->ndo_get_eec_state(dev, &state, &eec_flags, extack); + if (err) + return err; + + nlh = nlmsg_put(skb, portid, seq, RTM_GETEECSTATE, sizeof(*state_msg), + flags); + if (!nlh) + return -EMSGSIZE; + + state_msg = nlmsg_data(nlh); + state_msg->ifindex = dev->ifindex; + + if (nla_put(skb, IFLA_EEC_STATE, sizeof(state), &state) || + nla_put_u32(skb, IFLA_EEC_FLAGS, eec_flags)) + return -EMSGSIZE; + + nlmsg_end(skb, nlh); + return 0; +} + +static int rtnl_eec_state_get(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct net *net = sock_net(skb->sk); + struct if_eec_state_msg *state; + struct net_device *dev; + struct sk_buff *nskb; + int err; + + state = nlmsg_data(nlh); + if (state->ifindex > 0) + dev = __dev_get_by_index(net, state->ifindex); + else + return -EINVAL; + + if (!dev) + return -ENODEV; + + nskb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!nskb) + return -ENOBUFS; + + err = rtnl_fill_eec_state(nskb, dev, NETLINK_CB(skb).portid, + nlh->nlmsg_seq, NULL, nlh->nlmsg_flags, + extack); + if (err < 0) + kfree_skb(nskb); + else + err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid); + + return err; +} + /* Process one rtnetlink message. */ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, @@ -5693,4 +5762,6 @@ void __init rtnetlink_init(void) rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump, 0); + + rtnl_register(PF_UNSPEC, RTM_GETEECSTATE, rtnl_eec_state_get, NULL, 0); } diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c index d59276f48d4f..0e2d84d6045f 100644 --- a/security/selinux/nlmsgtab.c +++ b/security/selinux/nlmsgtab.c @@ -91,6 +91,7 @@ static const struct nlmsg_perm nlmsg_route_perms[] = { RTM_NEWNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_DELNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_GETNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_GETEECSTATE, NETLINK_ROUTE_SOCKET__NLMSG_READ }, }; static const struct nlmsg_perm nlmsg_tcpdiag_perms[] = @@ -174,7 +175,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm) * structures at the top of this file with the new mappings * before updating the BUILD_BUG_ON() macro! */ - BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOPBUCKET + 3)); + BUILD_BUG_ON(RTM_MAX != (RTM_GETEECSTATE + 3)); err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms, sizeof(nlmsg_route_perms)); break;
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the state of EEC. This interface is required to implement Synchronization Status Messaging on upper layers. Initial implementation returns SyncE EEC state and flags attributes. The only flag currently implemented is the EEC_SRC_PORT. When it's set the EEC is synchronized to the recovered clock recovered from the current port. SyncE EEC state read needs to be implemented as a ndo_get_eec_state function. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> --- include/linux/netdevice.h | 6 +++ include/uapi/linux/if_link.h | 31 +++++++++++++++ include/uapi/linux/rtnetlink.h | 3 ++ net/core/rtnetlink.c | 71 ++++++++++++++++++++++++++++++++++ security/selinux/nlmsgtab.c | 3 +- 5 files changed, 113 insertions(+), 1 deletion(-)