Message ID | 20210903151436.529478-2-maciej.machnikowski@intel.com |
---|---|
State | New |
Headers | show |
Series | Add RTNL interface for SyncE | expand |
> -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Friday, September 3, 2021 6:19 PM > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Fri, 3 Sep 2021 17:14:35 +0200 > Maciej Machnikowski <maciej.machnikowski@intel.com> wrote: > > > This patch series introduces basic interface for reading the Ethernet > > Equipment Clock (EEC) state on a SyncE capable device. This state gives > > information about the state of EEC. This interface is required to > > implement Synchronization Status Messaging on upper layers. > > > > Initial implementation returns SyncE EEC state and flags attributes. > > The only flag currently implemented is the EEC_SRC_PORT. When it's set > > the EEC is synchronized to the recovered clock recovered from the > > current port. > > > > SyncE EEC state read needs to be implemented as a ndo_get_eec_state > > function. > > > > Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> > > Is there a simpler way to do this? Seems like you are adding > a lot for a use case specific to a small class of devices. > For example adding a new network device operation adds small > amount of bloat to every other network device in the kernel. Hi! I couldn't find any simpler way. Do you have something specific in mind? A function pointer is only U64. I can hardly think of anything smaller. Regards Maciek
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Saturday, September 4, 2021 12:14 AM > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski wrote: > > This patch series introduces basic interface for reading the Ethernet > > Equipment Clock (EEC) state on a SyncE capable device. This state gives > > information about the state of EEC. This interface is required to > > implement Synchronization Status Messaging on upper layers. > > > > Initial implementation returns SyncE EEC state and flags attributes. > > The only flag currently implemented is the EEC_SRC_PORT. When it's set > > the EEC is synchronized to the recovered clock recovered from the > > current port. > > > > SyncE EEC state read needs to be implemented as a ndo_get_eec_state > > function. > > > > Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> > > Since we're talking SyncE-only now my intuition would be to put this > op in ethtool. Was there a reason ethtool was not chosen? If not what > do others think / if yes can the reason be included in the commit > message? Hmm. Main reason for netlink is that linuxptp already supports it, and it was suggested by Richard. Having an NDO would also make it easier to add a SyncE-related files to the sysfs for easier operation (following the ideas from the ptp pins subsystem). But I'm open for suggestions. > > > > +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the > port is > > + * currently the source for the EEC > > + */ > > Why include it then? Just leave the value out and if the attr is not > present user space should assume the source is port. This bit has a different meaning. If it's set the port in question is a frequency source for the multiport device, if it's cleared - some other source is used as a source. This is needed to prevent setting invalid configurations in the PHY (like setting the frequency source as a Master in AN) or sending invalid messages. If the port is a frequency source it must always send back QL-DNU messages to prevent synchronization loops. > > or don't check the ifindex at all and let dev_get_by.. fail. > > > Thanks for pushing this API forward! Addressed all other comments - and thanks for giving a lot of helpful suggestions! Regards Maciek
On Mon, 6 Sep 2021 18:30:40 +0000 Machnikowski, Maciej wrote: > > -----Original Message----- > > From: Jakub Kicinski <kuba@kernel.org> > > Sent: Saturday, September 4, 2021 12:14 AM > > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > > message to get SyncE status > > > > On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski wrote: > > > This patch series introduces basic interface for reading the Ethernet > > > Equipment Clock (EEC) state on a SyncE capable device. This state gives > > > information about the state of EEC. This interface is required to > > > implement Synchronization Status Messaging on upper layers. > > > > > > Initial implementation returns SyncE EEC state and flags attributes. > > > The only flag currently implemented is the EEC_SRC_PORT. When it's set > > > the EEC is synchronized to the recovered clock recovered from the > > > current port. > > > > > > SyncE EEC state read needs to be implemented as a ndo_get_eec_state > > > function. > > > > > > Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> > > > > Since we're talking SyncE-only now my intuition would be to put this > > op in ethtool. Was there a reason ethtool was not chosen? If not what > > do others think / if yes can the reason be included in the commit > > message? > > Hmm. Main reason for netlink is that linuxptp already supports it, > and it was suggested by Richard. > Having an NDO would also make it easier to add a SyncE-related > files to the sysfs for easier operation (following the ideas from the ptp > pins subsystem). > But I'm open for suggestions. I think linuxptp will need support for ethtool netlink sockets sooner rather than later. Moving this to ethtool makes sense to me since it's very much a Ethernet-oriented API at this point. > > > +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the > > port is > > > + * currently the source for the EEC > > > + */ > > > > Why include it then? Just leave the value out and if the attr is not > > present user space should assume the source is port. > > This bit has a different meaning. If it's set the port in question > is a frequency source for the multiport device, if it's cleared - some other > source is used as a source. This is needed to prevent setting invalid > configurations in the PHY (like setting the frequency source as a Master > in AN) or sending invalid messages. If the port is a frequency source > it must always send back QL-DNU messages to prevent synchronization > loops. Ah! I see. Is being the "source" negotiated somehow? Don't we need to give the user / linuxptp to select the source based on whatever info it has about topology? > > or don't check the ifindex at all and let dev_get_by.. fail. > > > > > > Thanks for pushing this API forward! > > Addressed all other comments - and thanks for giving a lot of helpful > suggestions! Thanks, BTW I think I forgot to ask for documentation, dumping info about the API and context under Documentation/ would be great!
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Monday, September 6, 2021 8:39 PM > To: Machnikowski, Maciej <maciej.machnikowski@intel.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Mon, 6 Sep 2021 18:30:40 +0000 Machnikowski, Maciej wrote: > > > -----Original Message----- > > > From: Jakub Kicinski <kuba@kernel.org> > > > Sent: Saturday, September 4, 2021 12:14 AM > > > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > > > message to get SyncE status > > > > > > On Fri, 3 Sep 2021 17:14:35 +0200 Maciej Machnikowski wrote: > > > > This patch series introduces basic interface for reading the Ethernet > > > > Equipment Clock (EEC) state on a SyncE capable device. This state gives > > > > information about the state of EEC. This interface is required to > > > > implement Synchronization Status Messaging on upper layers. > > > > > > > > Initial implementation returns SyncE EEC state and flags attributes. > > > > The only flag currently implemented is the EEC_SRC_PORT. When it's > set > > > > the EEC is synchronized to the recovered clock recovered from the > > > > current port. > > > > > > > > SyncE EEC state read needs to be implemented as a ndo_get_eec_state > > > > function. > > > > > > > > Signed-off-by: Maciej Machnikowski > <maciej.machnikowski@intel.com> > > > > > > Since we're talking SyncE-only now my intuition would be to put this > > > op in ethtool. Was there a reason ethtool was not chosen? If not what > > > do others think / if yes can the reason be included in the commit > > > message? > > > > Hmm. Main reason for netlink is that linuxptp already supports it, > > and it was suggested by Richard. > > Having an NDO would also make it easier to add a SyncE-related > > files to the sysfs for easier operation (following the ideas from the ptp > > pins subsystem). > > But I'm open for suggestions. > > I think linuxptp will need support for ethtool netlink sockets sooner > rather than later. Moving this to ethtool makes sense to me since it's > very much a Ethernet-oriented API at this point. Ethtool also makes a lot of sense, but will it be possible to still make sysfs, and it makes sense to add it for some deployments (more on that below) > > > > +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the > > > port is > > > > + * currently the source for the EEC > > > > + */ > > > > > > Why include it then? Just leave the value out and if the attr is not > > > present user space should assume the source is port. > > > > This bit has a different meaning. If it's set the port in question > > is a frequency source for the multiport device, if it's cleared - some other > > source is used as a source. This is needed to prevent setting invalid > > configurations in the PHY (like setting the frequency source as a Master > > in AN) or sending invalid messages. If the port is a frequency source > > it must always send back QL-DNU messages to prevent synchronization > > loops. > > Ah! I see. Is being the "source" negotiated somehow? Don't we need to > give the user / linuxptp to select the source based on whatever info > it has about topology? The frequency source can be either pre-set statically, negotiated using ESMC QL-levels (if working in QL-Enabled mode), or follow automatic fallback inside the device. This flag gives feedback about the validity of recovered clock coming from a given port and is useful when you enable multiple recovered clocks on more than one port in active-passive model. In that case the "driving" port may change dynamically, so it's a good idea to have some interface to reflect that. That's where sysfs file be useful. When I add the implementation for recovered clock configuration, the sysfs may be used as standalone interface for configuring them when no dynamic change is needed. > > > or don't check the ifindex at all and let dev_get_by.. fail. > > > > > > > > > Thanks for pushing this API forward! > > > > Addressed all other comments - and thanks for giving a lot of helpful > > suggestions! > > Thanks, BTW I think I forgot to ask for documentation, dumping info > about the API and context under Documentation/ would be great! Could you suggest where to add that? Grepping for ndo_ don't give much. I can add a new synce.rst file if it makes sense.
On Mon, 6 Sep 2021 19:01:54 +0000 Machnikowski, Maciej wrote: > > > Hmm. Main reason for netlink is that linuxptp already supports it, > > > and it was suggested by Richard. > > > Having an NDO would also make it easier to add a SyncE-related > > > files to the sysfs for easier operation (following the ideas from the ptp > > > pins subsystem). > > > But I'm open for suggestions. > > > > I think linuxptp will need support for ethtool netlink sockets sooner > > rather than later. Moving this to ethtool makes sense to me since it's > > very much a Ethernet-oriented API at this point. > > Ethtool also makes a lot of sense, but will it be possible to still make sysfs, > and it makes sense to add it for some deployments (more on that below) It should not make much difference whether ndo or ethtool op is used. > > > This bit has a different meaning. If it's set the port in question > > > is a frequency source for the multiport device, if it's cleared - some other > > > source is used as a source. This is needed to prevent setting invalid > > > configurations in the PHY (like setting the frequency source as a Master > > > in AN) or sending invalid messages. If the port is a frequency source > > > it must always send back QL-DNU messages to prevent synchronization > > > loops. > > > > Ah! I see. Is being the "source" negotiated somehow? Don't we need to > > give the user / linuxptp to select the source based on whatever info > > it has about topology? > > The frequency source can be either pre-set statically, negotiated using > ESMC QL-levels (if working in QL-Enabled mode), or follow automatic > fallback inside the device. This flag gives feedback about the validity > of recovered clock coming from a given port and is useful when you > enable multiple recovered clocks on more than one port in > active-passive model. In that case the "driving" port may change > dynamically, so it's a good idea to have some interface to reflect that. The ESMC messages are handled by Linux or some form of firmware? I don't see how you can implement any selection policy with a read-only API. In general it would be more natural to place a "source id" at the DPLL/clock, the "source" flag seems to mark the wrong end of the relationship. If there ever are multiple consumers we won't be able to tell which "target" the "source" is referring to. Hard to judge how much of a problem that could be by looking at a small slice of the system. > That's where sysfs file be useful. When I add the implementation for > recovered clock configuration, the sysfs may be used as standalone > interface for configuring them when no dynamic change is needed. I didn't get that. Do you mean using a sysfs file to configure the parameters of the DPLL? If the DPLL has its own set of concerns we should go ahead and create explicit object / configuration channel for it. Somehow I got it into my head that you care mostly about transmitting the clock, IOW recovering it from one port and using on another but that's probably not even a strong use case for you or NICs in general :S > > > Addressed all other comments - and thanks for giving a lot of helpful > > > suggestions! > > > > Thanks, BTW I think I forgot to ask for documentation, dumping info > > about the API and context under Documentation/ would be great! > > Could you suggest where to add that? Grepping for ndo_ don't give much. > I can add a new synce.rst file if it makes sense. New networking/synce.rst file makes perfect sense to me. And perhaps link to it from driver-api/ptp.rst.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Tuesday, September 7, 2021 3:01 AM > To: Machnikowski, Maciej <maciej.machnikowski@intel.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Mon, 6 Sep 2021 19:01:54 +0000 Machnikowski, Maciej wrote: > > > > Hmm. Main reason for netlink is that linuxptp already supports it, > > > > and it was suggested by Richard. > > > > Having an NDO would also make it easier to add a SyncE-related > > > > files to the sysfs for easier operation (following the ideas from the ptp > > > > pins subsystem). > > > > But I'm open for suggestions. > > > > > > I think linuxptp will need support for ethtool netlink sockets sooner > > > rather than later. Moving this to ethtool makes sense to me since it's > > > very much a Ethernet-oriented API at this point. > > > > Ethtool also makes a lot of sense, but will it be possible to still make sysfs, > > and it makes sense to add it for some deployments (more on that below) > > It should not make much difference whether ndo or ethtool op is used. Will look into it then. > > > > > This bit has a different meaning. If it's set the port in question > > > > is a frequency source for the multiport device, if it's cleared - some > other > > > > source is used as a source. This is needed to prevent setting invalid > > > > configurations in the PHY (like setting the frequency source as a Master > > > > in AN) or sending invalid messages. If the port is a frequency source > > > > it must always send back QL-DNU messages to prevent synchronization > > > > loops. > > > > > > Ah! I see. Is being the "source" negotiated somehow? Don't we need to > > > give the user / linuxptp to select the source based on whatever info > > > it has about topology? > > > > The frequency source can be either pre-set statically, negotiated using > > ESMC QL-levels (if working in QL-Enabled mode), or follow automatic > > fallback inside the device. This flag gives feedback about the validity > > of recovered clock coming from a given port and is useful when you > > enable multiple recovered clocks on more than one port in > > active-passive model. In that case the "driving" port may change > > dynamically, so it's a good idea to have some interface to reflect that. > > The ESMC messages are handled by Linux or some form of firmware? > I don't see how you can implement any selection policy with a read-only > API. It can be either in FW or in Linux - depending on the deployment. We try to define the API that would enable Linux to manage that. EEC state will be read-only, but the recovered clock management part will allow changes for QL-disabled SyncE deployments that only need to see if the clock they receive on a given port is valid or not. > In general it would be more natural to place a "source id" at the > DPLL/clock, the "source" flag seems to mark the wrong end of the > relationship. If there ever are multiple consumers we won't be able > to tell which "target" the "source" is referring to. Hard to judge > how much of a problem that could be by looking at a small slice of > the system. The DPLL will operate on pins, so it will have a pin connected from the MAC/PHY that will have the recovered clock, but the recovered clock can be enabled from any port/lane. That information is kept in the MAC/PHY and the DPLL side will not be aware who it belongs to. We can come up with a better name, but think of it like: You have multiport device (switch/NIC). One port is recovering the clock, the PHY/MAC outputs that clock through the pin to the EEC (DPLL). The DPLL knows if it locked to the signal coming from the multiport PHY/MAC, but it doesn't know which port is the one that generates that clock signal. All other ports can also present the "locked" state, but they are following the clock that was received in the chosen port. If we drop this flag we won't be able to easily tell which port/lane drives the recovered clock. In short: the port with that flag on is following the network clock and leading clock of other ports of the multiport device. In the most basic SyncE deployment you can put the passive DPLL that will only give you the lock/holdover/unlocked info and just use this flag to know who currently drives the DPLL. > > That's where sysfs file be useful. When I add the implementation for > > recovered clock configuration, the sysfs may be used as standalone > > interface for configuring them when no dynamic change is needed. > > I didn't get that. Do you mean using a sysfs file to configure > the parameters of the DPLL? Only the PHY/MAC side of thing which is recovered clock configuration and the ECC state. > If the DPLL has its own set of concerns we should go ahead and create > explicit object / configuration channel for it. > > Somehow I got it into my head that you care mostly about transmitting > the clock, IOW recovering it from one port and using on another but > that's probably not even a strong use case for you or NICs in general :S This is the right thinking. The DPLL can also have different external sources, like the GNSS, and can also drive different output clocks. But for the most basic SyncE implementation, which only runs on a recovered clock, we won't need the DPLL subsystem. > > > > Addressed all other comments - and thanks for giving a lot of helpful > > > > suggestions! > > > > > > Thanks, BTW I think I forgot to ask for documentation, dumping info > > > about the API and context under Documentation/ would be great! > > > > Could you suggest where to add that? Grepping for ndo_ don't give much. > > I can add a new synce.rst file if it makes sense. > > New networking/synce.rst file makes perfect sense to me. And perhaps > link to it from driver-api/ptp.rst. OK will try to come up with something there
On Tue, 7 Sep 2021 08:50:55 +0000 Machnikowski, Maciej wrote: > > > The frequency source can be either pre-set statically, negotiated using > > > ESMC QL-levels (if working in QL-Enabled mode), or follow automatic > > > fallback inside the device. This flag gives feedback about the validity > > > of recovered clock coming from a given port and is useful when you > > > enable multiple recovered clocks on more than one port in > > > active-passive model. In that case the "driving" port may change > > > dynamically, so it's a good idea to have some interface to reflect that. > > > > The ESMC messages are handled by Linux or some form of firmware? > > I don't see how you can implement any selection policy with a read-only > > API. > > It can be either in FW or in Linux - depending on the deployment. > We try to define the API that would enable Linux to manage that. We should implement the API for Linux to manage things from the get go. > EEC state will be read-only, but the recovered clock management part > will allow changes for QL-disabled SyncE deployments that only need > to see if the clock they receive on a given port is valid or not. > > > In general it would be more natural to place a "source id" at the > > DPLL/clock, the "source" flag seems to mark the wrong end of the > > relationship. If there ever are multiple consumers we won't be able > > to tell which "target" the "source" is referring to. Hard to judge > > how much of a problem that could be by looking at a small slice of > > the system. > > The DPLL will operate on pins, so it will have a pin connected from the > MAC/PHY that will have the recovered clock, but the recovered clock > can be enabled from any port/lane. That information is kept in the > MAC/PHY and the DPLL side will not be aware who it belongs to. So the clock outputs are muxed to a single pin at the Ethernet IP level, in your design. I wonder if this is the common implementation and therefore if it's safe to bake that into the API. Input from other vendors would be great... Also do I understand correctly that the output of the Ethernet IP is just the raw Rx clock once receiver is locked and the DPLL which enum if_synce_state refers to is in the time IP, that DPLL could be driven by GNSS etc? > We can come up with a better name, but think of it like: > You have multiport device (switch/NIC). One port is recovering > the clock, the PHY/MAC outputs that clock through the pin > to the EEC (DPLL). The DPLL knows if it locked to the signal coming > from the multiport PHY/MAC, but it doesn't know which port is the one > that generates that clock signal. All other ports can also present the > "locked" state, but they are following the clock that was received > in the chosen port. If we drop this flag we won't be able to easily tell > which port/lane drives the recovered clock. > In short: the port with that flag on is following the network clock > and leading clock of other ports of the multiport device. > > In the most basic SyncE deployment you can put the passive DPLL that > will only give you the lock/holdover/unlocked info and just use this flag > to know who currently drives the DPLL. > > > > That's where sysfs file be useful. When I add the implementation for > > > recovered clock configuration, the sysfs may be used as standalone > > > interface for configuring them when no dynamic change is needed. > > > > I didn't get that. Do you mean using a sysfs file to configure > > the parameters of the DPLL? > > Only the PHY/MAC side of thing which is recovered clock configuration > and the ECC state. > > > If the DPLL has its own set of concerns we should go ahead and create > > explicit object / configuration channel for it. > > > > Somehow I got it into my head that you care mostly about transmitting > > the clock, IOW recovering it from one port and using on another but > > that's probably not even a strong use case for you or NICs in general :S > > This is the right thinking. The DPLL can also have different external sources, > like the GNSS, and can also drive different output clocks. But for the most > basic SyncE implementation, which only runs on a recovered clock, we won't > need the DPLL subsystem. The GNSS pulse would come in over an external pin, tho, right? Your earlier version of the patchset had GNSS as an enum value, how would the driver / FW know that a given pin means GNSS? > > > Could you suggest where to add that? Grepping for ndo_ don't give much. > > > I can add a new synce.rst file if it makes sense. > > > > New networking/synce.rst file makes perfect sense to me. And perhaps > > link to it from driver-api/ptp.rst. > > OK will try to come up with something there
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Tuesday, September 7, 2021 4:55 PM > To: Machnikowski, Maciej <maciej.machnikowski@intel.com> > Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; > richardcochran@gmail.com; abyagowi@fb.com; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; davem@davemloft.net; linux- > kselftest@vger.kernel.org; Andrew Lunn <andrew@lunn.ch>; Michal > Kubecek <mkubecek@suse.cz>; Saeed Mahameed <saeed@kernel.org>; > Michael Chan <michael.chan@broadcom.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Tue, 7 Sep 2021 08:50:55 +0000 Machnikowski, Maciej wrote: > > > > The frequency source can be either pre-set statically, negotiated using > > > > ESMC QL-levels (if working in QL-Enabled mode), or follow automatic > > > > fallback inside the device. This flag gives feedback about the validity > > > > of recovered clock coming from a given port and is useful when you > > > > enable multiple recovered clocks on more than one port in > > > > active-passive model. In that case the "driving" port may change > > > > dynamically, so it's a good idea to have some interface to reflect that. > > > > > > The ESMC messages are handled by Linux or some form of firmware? > > > I don't see how you can implement any selection policy with a read-only > > > API. > > > > It can be either in FW or in Linux - depending on the deployment. > > We try to define the API that would enable Linux to manage that. > > We should implement the API for Linux to manage things from the get go. Yep! Yet let's go one step at a time. I believe once we have the basics (EEC monitoring and recovered clock configuration) we'll be able to implement a basic functionality - and add bells and whistles later on, as there are more capabilities that we could support in SW. > > EEC state will be read-only, but the recovered clock management part > > will allow changes for QL-disabled SyncE deployments that only need > > to see if the clock they receive on a given port is valid or not. > > > > > In general it would be more natural to place a "source id" at the > > > DPLL/clock, the "source" flag seems to mark the wrong end of the > > > relationship. If there ever are multiple consumers we won't be able > > > to tell which "target" the "source" is referring to. Hard to judge > > > how much of a problem that could be by looking at a small slice of > > > the system. > > > > The DPLL will operate on pins, so it will have a pin connected from the > > MAC/PHY that will have the recovered clock, but the recovered clock > > can be enabled from any port/lane. That information is kept in the > > MAC/PHY and the DPLL side will not be aware who it belongs to. > > So the clock outputs are muxed to a single pin at the Ethernet IP > level, in your design. I wonder if this is the common implementation > and therefore if it's safe to bake that into the API. Input from other > vendors would be great... I believe this is the state-of-art: here's the Broadcom public one https://docs.broadcom.com/doc/1211168567832, I believe Marvel has similar solution. But would also be happy to hear others. > Also do I understand correctly that the output of the Ethernet IP > is just the raw Rx clock once receiver is locked and the DPLL which > enum if_synce_state refers to is in the time IP, that DPLL could be > driven by GNSS etc? Ethernet IP/PHY usually outputs a divided clock signal (since it's easier to route) derived from the RX clock. The DPLL connectivity is vendor-specific, as you can use it to connect some external signals, but you can as well just care about relying the SyncE clock and only allow recovering it and passing along the QL info when your EEC is locked. That's why I backed up from a full DPLL implementation in favor of a more generic EEC clock. The Time IP is again relative and vendor-specific. If SyncE is deployed alongside PTP it will most likely be tightly coupled, but if you only care about having a frequency source - it's not mandatory and it can be as well in the PHY IP. Also I think I will strip the reported states to the bare minimum defined in the ITU-T G.781 instead of reusing the states that were already defined for a specific DPLL. > > We can come up with a better name, but think of it like: > > You have multiport device (switch/NIC). One port is recovering > > the clock, the PHY/MAC outputs that clock through the pin > > to the EEC (DPLL). The DPLL knows if it locked to the signal coming > > from the multiport PHY/MAC, but it doesn't know which port is the one > > that generates that clock signal. All other ports can also present the > > "locked" state, but they are following the clock that was received > > in the chosen port. If we drop this flag we won't be able to easily tell > > which port/lane drives the recovered clock. > > In short: the port with that flag on is following the network clock > > and leading clock of other ports of the multiport device. > > > > In the most basic SyncE deployment you can put the passive DPLL that > > will only give you the lock/holdover/unlocked info and just use this flag > > to know who currently drives the DPLL. > > > > > > That's where sysfs file be useful. When I add the implementation for > > > > recovered clock configuration, the sysfs may be used as standalone > > > > interface for configuring them when no dynamic change is needed. > > > > > > I didn't get that. Do you mean using a sysfs file to configure > > > the parameters of the DPLL? > > > > Only the PHY/MAC side of thing which is recovered clock configuration > > and the ECC state. > > > > > If the DPLL has its own set of concerns we should go ahead and create > > > explicit object / configuration channel for it. > > > > > > Somehow I got it into my head that you care mostly about transmitting > > > the clock, IOW recovering it from one port and using on another but > > > that's probably not even a strong use case for you or NICs in general :S > > > > This is the right thinking. The DPLL can also have different external sources, > > like the GNSS, and can also drive different output clocks. But for the most > > basic SyncE implementation, which only runs on a recovered clock, we > won't > > need the DPLL subsystem. > > The GNSS pulse would come in over an external pin, tho, right? Your > earlier version of the patchset had GNSS as an enum value, how would > the driver / FW know that a given pin means GNSS? The GNSS 1PPS will more likely go directly to the "full" DPLL. The pin topology can be derived from FW or any vendor-specific way of mapping pins to their sources. And, in "worst" case can just be hardcoded for a specific device. > > > > Could you suggest where to add that? Grepping for ndo_ don't give > much. > > > > I can add a new synce.rst file if it makes sense. > > > > > > New networking/synce.rst file makes perfect sense to me. And perhaps > > > link to it from driver-api/ptp.rst. > > > > OK will try to come up with something there
On Tue, 7 Sep 2021 15:47:05 +0000 Machnikowski, Maciej wrote: > > > It can be either in FW or in Linux - depending on the deployment. > > > We try to define the API that would enable Linux to manage that. > > > > We should implement the API for Linux to manage things from the get go. > > Yep! Yet let's go one step at a time. I believe once we have the basics (EEC > monitoring and recovered clock configuration) we'll be able to implement > a basic functionality - and add bells and whistles later on, as there are more > capabilities that we could support in SW. The set API may shape how the get API looks. We need a minimal viable API where the whole control part of it is not "firmware or proprietary tools take care of that". Do you have public docs on how the whole solution works? > > > The DPLL will operate on pins, so it will have a pin connected from the > > > MAC/PHY that will have the recovered clock, but the recovered clock > > > can be enabled from any port/lane. That information is kept in the > > > MAC/PHY and the DPLL side will not be aware who it belongs to. > > > > So the clock outputs are muxed to a single pin at the Ethernet IP > > level, in your design. I wonder if this is the common implementation > > and therefore if it's safe to bake that into the API. Input from other > > vendors would be great... > > I believe this is the state-of-art: here's the Broadcom public one > https://docs.broadcom.com/doc/1211168567832, I believe Marvel > has similar solution. But would also be happy to hear others. Interesting. That reveals the need for also marking the backup (/secondary) clock. Have you seen any docs on how systems with discreet PHY ASICs mux the clocks? > > Also do I understand correctly that the output of the Ethernet IP > > is just the raw Rx clock once receiver is locked and the DPLL which > > enum if_synce_state refers to is in the time IP, that DPLL could be > > driven by GNSS etc? > > Ethernet IP/PHY usually outputs a divided clock signal (since it's > easier to route) derived from the RX clock. > The DPLL connectivity is vendor-specific, as you can use it to connect > some external signals, but you can as well just care about relying > the SyncE clock and only allow recovering it and passing along > the QL info when your EEC is locked. That's why I backed up from > a full DPLL implementation in favor of a more generic EEC clock. What is an ECC clock? To me the PLL state in the Ethernet port is the state of the recovered clock. enum if_eec_state has values like holdover which seem to be more applicable to the "system wide" PLL. Let me ask this - if one port is training the link and the other one has the lock and is the source - what state will be reported for each port? > The Time IP is again relative and vendor-specific. If SyncE is deployed > alongside PTP it will most likely be tightly coupled, but if you only > care about having a frequency source - it's not mandatory and it can be > as well in the PHY IP. I would not think having just the freq is very useful. > Also I think I will strip the reported states to the bare minimum defined > in the ITU-T G.781 instead of reusing the states that were already defined > for a specific DPLL. > > > > This is the right thinking. The DPLL can also have different external sources, > > > like the GNSS, and can also drive different output clocks. But for the most > > > basic SyncE implementation, which only runs on a recovered clock, we won't > > > need the DPLL subsystem. > > > > The GNSS pulse would come in over an external pin, tho, right? Your > > earlier version of the patchset had GNSS as an enum value, how would > > the driver / FW know that a given pin means GNSS? > > The GNSS 1PPS will more likely go directly to the "full" DPLL. > The pin topology can be derived from FW or any vendor-specific way of mapping > pins to their sources. And, in "worst" case can just be hardcoded for a specific > device.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Tuesday, September 7, 2021 9:48 PM > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Tue, 7 Sep 2021 15:47:05 +0000 Machnikowski, Maciej wrote: > > > > It can be either in FW or in Linux - depending on the deployment. > > > > We try to define the API that would enable Linux to manage that. > > > > > > We should implement the API for Linux to manage things from the get > go. > > > > Yep! Yet let's go one step at a time. I believe once we have the basics (EEC > > monitoring and recovered clock configuration) we'll be able to implement > > a basic functionality - and add bells and whistles later on, as there are more > > capabilities that we could support in SW. > > The set API may shape how the get API looks. We need a minimal viable > API where the whole control part of it is not "firmware or proprietary > tools take care of that". > > Do you have public docs on how the whole solution works? The best reference would be my netdev 0x15 tutorial: https://netdevconf.info/0x15/session.html?Introduction-to-time-synchronization-over-Ethernet The SyncE API considerations starts ~54:00, but basically we need API for: - Controlling the lane to pin mapping for clock recovery - Check the EEC/DPLL state and see what's the source of reference frequency (in more advanced deployments) - control additional input and output pins (GNSS input, external inputs, recovered frequency reference) > > > > The DPLL will operate on pins, so it will have a pin connected from the > > > > MAC/PHY that will have the recovered clock, but the recovered clock > > > > can be enabled from any port/lane. That information is kept in the > > > > MAC/PHY and the DPLL side will not be aware who it belongs to. > > > > > > So the clock outputs are muxed to a single pin at the Ethernet IP > > > level, in your design. I wonder if this is the common implementation > > > and therefore if it's safe to bake that into the API. Input from other > > > vendors would be great... > > > > I believe this is the state-of-art: here's the Broadcom public one > > https://docs.broadcom.com/doc/1211168567832, I believe Marvel > > has similar solution. But would also be happy to hear others. > > Interesting. That reveals the need for also marking the backup > (/secondary) clock. That's optional, but useful. And here's where we need a feedback on which port/lane is currently used, as the switch may be automatic > Have you seen any docs on how systems with discreet PHY ASICs mux > the clocks? Yes - unfortunately they are not public :( > > > Also do I understand correctly that the output of the Ethernet IP > > > is just the raw Rx clock once receiver is locked and the DPLL which > > > enum if_synce_state refers to is in the time IP, that DPLL could be > > > driven by GNSS etc? > > > > Ethernet IP/PHY usually outputs a divided clock signal (since it's > > easier to route) derived from the RX clock. > > The DPLL connectivity is vendor-specific, as you can use it to connect > > some external signals, but you can as well just care about relying > > the SyncE clock and only allow recovering it and passing along > > the QL info when your EEC is locked. That's why I backed up from > > a full DPLL implementation in favor of a more generic EEC clock. > > What is an ECC clock? To me the PLL state in the Ethernet port is the > state of the recovered clock. enum if_eec_state has values like > holdover which seem to be more applicable to the "system wide" PLL. EEC is Ethernet Equipment Clock. In most cases this will be a DPLL, but that's not mandatory and I believe it may be different is switches where you only need to drive all ports TX from a single frequency source. In this case the DPLL can be embedded in the multiport PHY, > Let me ask this - if one port is training the link and the other one has > the lock and is the source - what state will be reported for each port? In this case the port that has the lock source will report the lock and the EEC_SRC_PORT flag. The port that trains the link will show the lock without the flag and once it completes the training sequence it will use the EEC's frequency to transmit the data so that the next hop is able to synchronize its EEC to the incoming RX frequency > > The Time IP is again relative and vendor-specific. If SyncE is deployed > > alongside PTP it will most likely be tightly coupled, but if you only > > care about having a frequency source - it's not mandatory and it can be > > as well in the PHY IP. > > I would not think having just the freq is very useful. This depends on the deployment. There are couple popular frequencies Most popular are 2,048 kHz, 10 MHz and 64 kHz. There are many deployments that only require frequency sync without the phase and/or time. I.e. if you deploy frequency division duplex you only need the frequency reference, and the higher frequency you have - the faster you can lock to it. > > Also I think I will strip the reported states to the bare minimum defined > > in the ITU-T G.781 instead of reusing the states that were already defined > > for a specific DPLL. > > > > > > This is the right thinking. The DPLL can also have different external > sources, > > > > like the GNSS, and can also drive different output clocks. But for the > most > > > > basic SyncE implementation, which only runs on a recovered clock, we > won't > > > > need the DPLL subsystem. > > > > > > The GNSS pulse would come in over an external pin, tho, right? Your > > > earlier version of the patchset had GNSS as an enum value, how would > > > the driver / FW know that a given pin means GNSS? > > > > The GNSS 1PPS will more likely go directly to the "full" DPLL. > > The pin topology can be derived from FW or any vendor-specific way of > mapping > > pins to their sources. And, in "worst" case can just be hardcoded for a > specific > > device.
On Wed, 8 Sep 2021 08:03:35 +0000 Machnikowski, Maciej wrote: > > > Yep! Yet let's go one step at a time. I believe once we have the basics (EEC > > > monitoring and recovered clock configuration) we'll be able to implement > > > a basic functionality - and add bells and whistles later on, as there are more > > > capabilities that we could support in SW. > > > > The set API may shape how the get API looks. We need a minimal viable > > API where the whole control part of it is not "firmware or proprietary > > tools take care of that". > > > > Do you have public docs on how the whole solution works? > > The best reference would be my netdev 0x15 tutorial: > https://netdevconf.info/0x15/session.html?Introduction-to-time-synchronization-over-Ethernet > The SyncE API considerations starts ~54:00, but basically we need API for: > - Controlling the lane to pin mapping for clock recovery > - Check the EEC/DPLL state and see what's the source of reference frequency > (in more advanced deployments) > - control additional input and output pins (GNSS input, external inputs, recovered > frequency reference) Thanks, good presentation! I haven't seen much in terms of system design, at the level similar to the Broadcom whitepaper you shared. > > > I believe this is the state-of-art: here's the Broadcom public one > > > https://docs.broadcom.com/doc/1211168567832, I believe Marvel > > > has similar solution. But would also be happy to hear others. > > > > Interesting. That reveals the need for also marking the backup > > (/secondary) clock. > > That's optional, but useful. And here's where we need a feedback > on which port/lane is currently used, as the switch may be automatic What do you mean by optional? How does the user know if there is fallback or not? Is it that Intel is intending to support only devices with up to 2 ports and the second port is always the backup (apologies for speculating). > > Have you seen any docs on how systems with discreet PHY ASICs mux > > the clocks? > > Yes - unfortunately they are not public :( Mumble, mumble. Ido, Florian - any devices within your purview which would support SyncE? > > > Ethernet IP/PHY usually outputs a divided clock signal (since it's > > > easier to route) derived from the RX clock. > > > The DPLL connectivity is vendor-specific, as you can use it to connect > > > some external signals, but you can as well just care about relying > > > the SyncE clock and only allow recovering it and passing along > > > the QL info when your EEC is locked. That's why I backed up from > > > a full DPLL implementation in favor of a more generic EEC clock. > > > > What is an ECC clock? To me the PLL state in the Ethernet port is the > > state of the recovered clock. enum if_eec_state has values like > > holdover which seem to be more applicable to the "system wide" PLL. > > EEC is Ethernet Equipment Clock. In most cases this will be a DPLL, but that's > not mandatory and I believe it may be different is switches where > you only need to drive all ports TX from a single frequency source. In this > case the DPLL can be embedded in the multiport PHY, > > > Let me ask this - if one port is training the link and the other one has > > the lock and is the source - what state will be reported for each port? > > In this case the port that has the lock source will report the lock and > the EEC_SRC_PORT flag. The port that trains the link will show the > lock without the flag and once it completes the training sequence it will > use the EEC's frequency to transmit the data so that the next hop is able > to synchronize its EEC to the incoming RX frequency Alright, I don't like that. It feels like you're attaching one object's information (ECC) to other objects (ports), and repeating it. Prof Goczyła and dr Landowska would not be proud. > > > The Time IP is again relative and vendor-specific. If SyncE is deployed > > > alongside PTP it will most likely be tightly coupled, but if you only > > > care about having a frequency source - it's not mandatory and it can be > > > as well in the PHY IP. > > > > I would not think having just the freq is very useful. > > This depends on the deployment. There are couple popular frequencies > Most popular are 2,048 kHz, 10 MHz and 64 kHz. There are many > deployments that only require frequency sync without the phase > and/or time. I.e. if you deploy frequency division duplex you only need the > frequency reference, and the higher frequency you have - the faster you can > lock to it.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Wednesday, September 8, 2021 6:21 PM > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Wed, 8 Sep 2021 08:03:35 +0000 Machnikowski, Maciej wrote: > > > > Yep! Yet let's go one step at a time. I believe once we have the basics > (EEC > > > > monitoring and recovered clock configuration) we'll be able to > implement > > > > a basic functionality - and add bells and whistles later on, as there are > more > > > > capabilities that we could support in SW. > > > > > > The set API may shape how the get API looks. We need a minimal viable > > > API where the whole control part of it is not "firmware or proprietary > > > tools take care of that". > > > > > > Do you have public docs on how the whole solution works? > > > > The best reference would be my netdev 0x15 tutorial: > > https://netdevconf.info/0x15/session.html?Introduction-to-time- > synchronization-over-Ethernet > > The SyncE API considerations starts ~54:00, but basically we need API for: > > - Controlling the lane to pin mapping for clock recovery > > - Check the EEC/DPLL state and see what's the source of reference > frequency > > (in more advanced deployments) > > - control additional input and output pins (GNSS input, external inputs, > recovered > > frequency reference) > > Thanks, good presentation! I haven't seen much in terms of system > design, at the level similar to the Broadcom whitepaper you shared. See the "drawing" below. > > > > I believe this is the state-of-art: here's the Broadcom public one > > > > https://docs.broadcom.com/doc/1211168567832, I believe Marvel > > > > has similar solution. But would also be happy to hear others. > > > > > > Interesting. That reveals the need for also marking the backup > > > (/secondary) clock. > > > > That's optional, but useful. And here's where we need a feedback > > on which port/lane is currently used, as the switch may be automatic > > What do you mean by optional? How does the user know if there is > fallback or not? Is it that Intel is intending to support only > devices with up to 2 ports and the second port is always the > backup (apologies for speculating). That will be a part of pin config API. It needs to give info about the number of supported pins that the PHY/MAC can use as recovered clock outputs. Once the pin is assigned to the lane the recovered clock (divided or not) will appear on the configured PHY/MAC pin and EEC will be able to use it. If there is more than 1 available - they will have some priority assigned to them that will be known to the EEC/board designer. And we plan to support devices that only comes with 1 recovered clock output as well. > > > Have you seen any docs on how systems with discreet PHY ASICs mux > > > the clocks? > > > > Yes - unfortunately they are not public :( > > Mumble, mumble. Ido, Florian - any devices within your purview which > would support SyncE? OK Found one that's public: https://www.marvell.com/content/dam/marvell/en/public-collateral/transceivers/marvell-phys-transceivers-alaska-c-88x5113-datasheet.pdf see Fig. 23 and chapter 3.11 for details, but conceptually it's similar. > > > > Ethernet IP/PHY usually outputs a divided clock signal (since it's > > > > easier to route) derived from the RX clock. > > > > The DPLL connectivity is vendor-specific, as you can use it to connect > > > > some external signals, but you can as well just care about relying > > > > the SyncE clock and only allow recovering it and passing along > > > > the QL info when your EEC is locked. That's why I backed up from > > > > a full DPLL implementation in favor of a more generic EEC clock. > > > > > > What is an ECC clock? To me the PLL state in the Ethernet port is the > > > state of the recovered clock. enum if_eec_state has values like > > > holdover which seem to be more applicable to the "system wide" PLL. > > > > EEC is Ethernet Equipment Clock. In most cases this will be a DPLL, but that's > > not mandatory and I believe it may be different is switches where > > you only need to drive all ports TX from a single frequency source. In this > > case the DPLL can be embedded in the multiport PHY, > > > > > Let me ask this - if one port is training the link and the other one has > > > the lock and is the source - what state will be reported for each port? > > > > In this case the port that has the lock source will report the lock and > > the EEC_SRC_PORT flag. The port that trains the link will show the > > lock without the flag and once it completes the training sequence it will > > use the EEC's frequency to transmit the data so that the next hop is able > > to synchronize its EEC to the incoming RX frequency > > Alright, I don't like that. It feels like you're attaching one object's > information (ECC) to other objects (ports), and repeating it. Prof > Goczyła and dr Landowska would not be proud. Hahaha - did not expect them to pop up here :) It's true, but the problem is that they depend on each other. The path is: Lane0 ------------- |\ Pin0 RefN ____ ------------- | |-----------------| | synced clk ... | |-----------------| EEC |------------------ ------------- |/ PinN RefM|____ | Lane N MUX To get the full info a port needs to know the EEC state and which lane is used as a source (or rather - my lane or any other). The lane -> Pin mapping is buried in the PHY/MAC, but the source of frequency is in the EEC. What's even more - the Pin->Ref mapping is board specific. The viable solutions are: - Limit to the proposed "I drive the clock" vs "Someone drives it" and assume the Driver returns all info - return the EEC Ref index, figure out which pin is connected to it and then check which MAC/PHY lane that drives it. I assume option one is easy to implement and keep in the future even if we finally move to option 2 once we define EEC/DPLL subsystem. In future #1 can take the lock information from the DPLL subsystem, but will also enable simple deployments that won't expose the whole DPLL, like a filter PLL embedded in a multiport PHY that will only work for SyncE in which case this API will only touch a single component. > > > > The Time IP is again relative and vendor-specific. If SyncE is deployed > > > > alongside PTP it will most likely be tightly coupled, but if you only > > > > care about having a frequency source - it's not mandatory and it can be > > > > as well in the PHY IP. > > > > > > I would not think having just the freq is very useful. > > > > This depends on the deployment. There are couple popular frequencies > > Most popular are 2,048 kHz, 10 MHz and 64 kHz. There are many > > deployments that only require frequency sync without the phase > > and/or time. I.e. if you deploy frequency division duplex you only need the > > frequency reference, and the higher frequency you have - the faster you > can > > lock to it.
> > > The SyncE API considerations starts ~54:00, but basically we need API for: > > > - Controlling the lane to pin mapping for clock recovery > > > - Check the EEC/DPLL state and see what's the source of reference > > frequency > > > (in more advanced deployments) > > > - control additional input and output pins (GNSS input, external inputs, > > recovered > > > frequency reference) Now that you have pointed to a datasheet... > - Controlling the lane to pin mapping for clock recovery So this is a PHY property. That could be Linux driving the PHY, via phylib, drivers/net/phy, or there could be firmware in the MAC driver which hides the PHY and gives you some sort of API to access it. > Check the EEC/DPLL state and see what's the source of reference > frequency Where is the EEC/DPLL implemented? Is it typically also in the PHY? Or some other hardware block? I just want to make sure we have an API which we can easily delegate to different subsystems, some of it in the PHY driver, maybe some of it somewhere else. Also, looking at the Marvell datasheet, it appears these registers are in the MDIO_MMD_VEND2 range. Has any of this been specified? Can we expect to be able to write a generic implementation sometime in the future which PHY drivers can share? I just looked at a 1G Marvell PHY. It uses RGMII or SGMII towards the host. But there is no indication you can take the clock from the SGMII SERDES, it is only the recovered clock from the line. And the recovered clock always goes out the CLK125 pin, which can either be 125MHz or 25MHz. So in this case, you have no need to control the lane to pin mapping, it is fixed, but do we want to be able to control the divider? Do we need a mechanism to actually enumerate what the hardware can do? Since we are talking about clocks and dividers, and multiplexors, should all this be using the common clock framework, which already supports most of this? Do we actually need something new? Andrew
> -----Original Message----- > From: Andrew Lunn <andrew@lunn.ch> > Sent: Wednesday, September 8, 2021 9:35 PM > To: Machnikowski, Maciej <maciej.machnikowski@intel.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > > > > The SyncE API considerations starts ~54:00, but basically we need API > for: > > > > - Controlling the lane to pin mapping for clock recovery > > > > - Check the EEC/DPLL state and see what's the source of reference > > > frequency > > > > (in more advanced deployments) > > > > - control additional input and output pins (GNSS input, external inputs, > > > recovered > > > > frequency reference) > > Now that you have pointed to a datasheet... > > > - Controlling the lane to pin mapping for clock recovery > > So this is a PHY property. That could be Linux driving the PHY, via > phylib, drivers/net/phy, or there could be firmware in the MAC driver > which hides the PHY and gives you some sort of API to access it. Yes, that's deployment dependent - in our case we use MAC driver that proxies that. > > Check the EEC/DPLL state and see what's the source of reference > > frequency > > Where is the EEC/DPLL implemented? Is it typically also in the PHY? Or > some other hardware block? In most cases it will be an external device, but there are implementations That have a PLL/DPLL inside the PHY (like a Broadcom example). And I don't know how switches implements that. > I just want to make sure we have an API which we can easily delegate > to different subsystems, some of it in the PHY driver, maybe some of > it somewhere else. The reasoning behind putting it in the ndo subsystem is because ultimately the netdev receives the ESMC message and is the entity that should know how it is connected to the PHY. That's because it will be very complex to move the mapping between netdevs and PHY lanes/ports to the SW running in the userspace. I believe the right way is to let the netdev driver manage that complexity and forward the request to the PHY subsystem if needed. The logic is: - Receive ESMC message on a netdev - send the message to enable recovered clock to the netdev that received it - ask the netdev to give the status of EEC that drives it. > Also, looking at the Marvell datasheet, it appears these registers are > in the MDIO_MMD_VEND2 range. Has any of this been specified? Can we > expect to be able to write a generic implementation sometime in the > future which PHY drivers can share? I believe it will be vendor-specific, as there are different implementations of it. It can be an internal PLL that gets all inputs and syncs to a one of them or it can be a MUX with divder(s) > I just looked at a 1G Marvell PHY. It uses RGMII or SGMII towards the > host. But there is no indication you can take the clock from the SGMII > SERDES, it is only the recovered clock from the line. And the > recovered clock always goes out the CLK125 pin, which can either be > 125MHz or 25MHz. > > So in this case, you have no need to control the lane to pin mapping, > it is fixed, but do we want to be able to control the divider? That's a complex question. Controlling the divider would make sense when we have control over the DPLL, and it still needs to be optional, as it's not always available. I assume that in the initial implementation we can rely on MAC driver to set up the dividers to output the expected frequency to the DPLL. On faster interfaces the RCLK speed is also speed-dependent and differs across the link speeds. > Do we need a mechanism to actually enumerate what the hardware can do? For recovered clocks I think we need to start with the number of PHY outputs that go towards the DPLL. We can add additional attributes like a frequency and others once we need them. I want to avoid creating a swiss-knife with too many options here :) > Since we are talking about clocks and dividers, and multiplexors, > should all this be using the common clock framework, which already > supports most of this? Do we actually need something new? I believe that's a specific enough case that deserves a separate one Regards Maciek
On Wed, 8 Sep 2021 17:30:24 +0000 Machnikowski, Maciej wrote: > Lane0 > ------------- |\ Pin0 RefN ____ > ------------- | |-----------------| | synced clk > | |-----------------| EEC |------------------ > ------------- |/ PinN RefM|____ | > Lane N MUX > > To get the full info a port needs to know the EEC state and which lane is used > as a source (or rather - my lane or any other). EEC here is what the PHY documentation calls "Cleanup PLL" right? > The lane -> Pin mapping is buried in the PHY/MAC, but the source of frequency > is in the EEC. Not sure what "source of frequency" means here. There's a lot of frequencies here. > What's even more - the Pin->Ref mapping is board specific. Breaking down the system into components we have: Port A.1 Rx lanes A.2 Rx pins (outputs) A.3 Rx clk divider B.1 Tx lanes B.2 Tx pins (inputs) ECC C.1 Inputs C.2 Outputs C.3 PLL state In the most general case we want to be able to: map recovered clocks to PHY output pins (A.1 <> A.2) set freq div on the recovered clock (A.2 <> A.3) set the priorities of inputs on ECC (C.1) read the ECC state (C.3) control outputs of the ECC (C.2) select the clock source for port Tx (B.2 <> B.1) As you said, pin -> ref mapping is board specific, so the API should not assume knowledge of routing between Port and ECC. If it does just give the pins matching names. We don't have to implement the entire design but the pieces we do create must be right for the larger context. With the current code the ECC/Cleanup PLL is not represented as a separate entity, and mapping of what source means is on the wrong "end" of the A.3 <> C.1 relationship. > The viable solutions are: > - Limit to the proposed "I drive the clock" vs "Someone drives it" and assume the > Driver returns all info > - return the EEC Ref index, figure out which pin is connected to it and then check > which MAC/PHY lane that drives it. > > I assume option one is easy to implement and keep in the future even if we > finally move to option 2 once we define EEC/DPLL subsystem. > > In future #1 can take the lock information from the DPLL subsystem, but > will also enable simple deployments that won't expose the whole DPLL, > like a filter PLL embedded in a multiport PHY that will only work for > SyncE in which case this API will only touch a single component. Imagine a system with two cascaded switch ASICs and a bunch of PHYs. How do you express that by pure extensions to the proposed API? Here either the cleanup PLLs would be cascaded (subordinate one needs to express that its "source" is another PLL) or single lane can be designated as a source for both PLLs (but then there is only one "source" bit and multiple "enum if_eec_state"s). I think we can't avoid having a separate object for ECC/Cleanup PLL. You can add it as a subobject to devlink but new genetlink family seems much preferable given the devlink instances themselves have unclear semantics at this point. Or you can try to convince Richard that ECC belongs as part of PTP :) In fact I don't think you care about any of the PHY / port stuff currently. All you need is the ECC side of the API. IIUC you have relatively simple setup where there is only one pin per port, and you don't care about syncing the Tx clock.
On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote: > Since we are talking about clocks and dividers, and multiplexors, > should all this be using the common clock framework, which already > supports most of this? Do we actually need something new? Does the common clock framework expose any user space API?
On Wed, Sep 08, 2021 at 03:20:27PM -0700, Jakub Kicinski wrote: > On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote: > > Since we are talking about clocks and dividers, and multiplexors, > > should all this be using the common clock framework, which already > > supports most of this? Do we actually need something new? > > Does the common clock framework expose any user space API? Ah, good point. No, i don't think it does, apart from debugfs, which is not really a user space API, and it contains read only descriptions of the clock tree, current status, mux settings, dividers etc. So probably anybody implemented the proposed API the Linux way will use the common clock framework and its internal API, and debug the implementation via debugfs. PHY drivers already do use it, e.g. when the PHY is using a clock provided by the MAC, it uses the API to enable/disable and set ifs frequency as needed. Andrew
> As you said, pin -> ref mapping is board specific, so the API should > not assume knowledge of routing between Port and ECC. That information will probably end up in device tree. And X different implementations of ACPI, unless somebody puts there foot down and stops the snow flakes. > Imagine a system with two cascaded switch ASICs and a bunch of PHYs. > How do you express that by pure extensions to the proposed API? Device tree is good at that. ACPI might eventually catch up. How complex a setup do we actually expect? Can there be multiple disjoint SyncE trees within an Ethernet switch cluster? Or would it be reasonable to say all you need to configure is the clock source, and all other ports of the switches are slaves if SyncE is enabled for the port? I've never see any SOHO switch hardware which allows you to have disjoint PTP trees, so it does not sound too unreasonable to only allow a single SyncE tree per switch cluster. Also, if you are cascading switches, you generally don't put PHYs in the middle, you just connect the SERDES lanes together. Andrew
On Thu, 9 Sep 2021 01:14:36 +0200 Andrew Lunn wrote: > > As you said, pin -> ref mapping is board specific, so the API should > > not assume knowledge of routing between Port and ECC. > > That information will probably end up in device tree. And X different > implementations of ACPI, unless somebody puts there foot down and > stops the snow flakes. > > > Imagine a system with two cascaded switch ASICs and a bunch of PHYs. > > How do you express that by pure extensions to the proposed API? > > Device tree is good at that. ACPI might eventually catch up. I could well be wrong but some of those connectors could well be just SMAs on the back plate, no? User could cable those up to their heart content. https://engineering.fb.com/2021/08/11/open-source/time-appliance/ > How complex a setup do we actually expect? Can there be multiple > disjoint SyncE trees within an Ethernet switch cluster? Or would it be > reasonable to say all you need to configure is the clock source, and > all other ports of the switches are slaves if SyncE is enabled for the > port? I've never see any SOHO switch hardware which allows you to have > disjoint PTP trees, so it does not sound too unreasonable to only > allow a single SyncE tree per switch cluster. Not sure. I get the (perhaps unfounded) feeling that just forwarding the clock from one port to the rest is simpler. Maciej cares primarily about exposing the clock to other non-Ethernet things, the device would be an endpoint here, using the clock for LTE or whatnot. > Also, if you are cascading switches, you generally don't put PHYs in > the middle, you just connect the SERDES lanes together. My concern was a case where PHY connected to one switch exposes the refclock on an aux pin which is then muxed to more than one switch ASIC. IOW the "source port" is not actually under the same switch. Again, IDK if that's realistic.
On Thu, Sep 09, 2021 at 12:59:27AM +0200, Andrew Lunn wrote: > On Wed, Sep 08, 2021 at 03:20:27PM -0700, Jakub Kicinski wrote: > > On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote: > > > Since we are talking about clocks and dividers, and multiplexors, > > > should all this be using the common clock framework, which already > > > supports most of this? Do we actually need something new? > > > > Does the common clock framework expose any user space API? > > Ah, good point. No, i don't think it does, apart from debugfs, which > is not really a user space API, and it contains read only descriptions > of the clock tree, current status, mux settings, dividers etc. Wouldn't it make sense to develop some kind of user land API to manipulate the common clock framework at run time? Just dreaming... Richard
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Thursday, September 9, 2021 12:19 AM > To: Machnikowski, Maciej <maciej.machnikowski@intel.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Wed, 8 Sep 2021 17:30:24 +0000 Machnikowski, Maciej wrote: > > Lane0 > > ------------- |\ Pin0 RefN ____ > > ------------- | |-----------------| | synced clk > > | |-----------------| EEC |------------------ > > ------------- |/ PinN RefM|____ | > > Lane N MUX > > > > To get the full info a port needs to know the EEC state and which lane is > used > > as a source (or rather - my lane or any other). > > EEC here is what the PHY documentation calls "Cleanup PLL" right? It's a generic term for an internal or external PLL that takes the reference frequency from the certain port/lane and drives its TX part. In a specific case it can be the internal cleanup PLL. > > The lane -> Pin mapping is buried in the PHY/MAC, but the source of > frequency > > is in the EEC. > > Not sure what "source of frequency" means here. There's a lot of > frequencies here. It's the source lane/port that drives the DPLL reference. In other words, the DPLL will tune its frequency to match the one recovered from a certain PHY lane/port. > > What's even more - the Pin->Ref mapping is board specific. > > Breaking down the system into components we have: > > Port > A.1 Rx lanes > A.2 Rx pins (outputs) > A.3 Rx clk divider > B.1 Tx lanes > B.2 Tx pins (inputs) > > ECC > C.1 Inputs > C.2 Outputs > C.3 PLL state > > In the most general case we want to be able to: > map recovered clocks to PHY output pins (A.1 <> A.2) > set freq div on the recovered clock (A.2 <> A.3) Technically the pin (if exists) will be the last thing, so a better way would be to map lane to the divider (A.1<>A.3) and then a divider to pin (A.3<>A.2), but the idea is the same > set the priorities of inputs on ECC (C.1) > read the ECC state (C.3) > control outputs of the ECC (C.2) Yes! > select the clock source for port Tx (B.2 <> B.1) And here we usually don't allow control over this. The DPLL is preconfigured to output the frequency that's expected on the PHY/MAC clock input and we shouldn't mess with it in runtime. > As you said, pin -> ref mapping is board specific, so the API should > not assume knowledge of routing between Port and ECC. If it does just > give the pins matching names. I believe referring to a board user guide is enough to cover that one, just like with PTP pins. There may be 1000 different ways of connecting those signals. > We don't have to implement the entire design but the pieces we do create > must be right for the larger context. With the current code the > ECC/Cleanup PLL is not represented as a separate entity, and mapping of > what source means is on the wrong "end" of the A.3 <> C.1 relationship. That's why I initially started with the EEC state + pin idx of the driving source. I believe it's a cleaner solution, as then we can implement the same pin indexes in the recovered clock part of the interface and the user will be able to see the state and driving pin from the EEC_STATE (both belongs to the DPLL) and use the PHY pins subsystem to see if the driving pin index matches the index that I drive. In that case we keep all C-thingies in the C subsystem and A stuff in A subsystem. The only "mix" is in the pin indexes that would use numbering from C. If it's an attribute - it can as well be optional for the deployments that don't need/support it. > > The viable solutions are: > > - Limit to the proposed "I drive the clock" vs "Someone drives it" and > assume the > > Driver returns all info > > - return the EEC Ref index, figure out which pin is connected to it and then > check > > which MAC/PHY lane that drives it. > > > > I assume option one is easy to implement and keep in the future even if > we > > finally move to option 2 once we define EEC/DPLL subsystem. > > > > In future #1 can take the lock information from the DPLL subsystem, but > > will also enable simple deployments that won't expose the whole DPLL, > > like a filter PLL embedded in a multiport PHY that will only work for > > SyncE in which case this API will only touch a single component. > > Imagine a system with two cascaded switch ASICs and a bunch of PHYs. > How do you express that by pure extensions to the proposed API? > Here either the cleanup PLLs would be cascaded (subordinate one needs > to express that its "source" is another PLL) or single lane can be > designated as a source for both PLLs (but then there is only one > "source" bit and multiple "enum if_eec_state"s). In that case - once we have pins- we'll see that the leader DPLL is synced to the pin that comes from the PHY/MAC and be able to find the corresponding lane, and on the follower side we'll see that it's locked to the pin that corresponds to the master DPLL. The logic to "do something" with that knowledge needs to be in the userspace app, as there may be some external connections needed that are unknown at the board level (think of 2 separate adapters connected with an external cable). > I think we can't avoid having a separate object for ECC/Cleanup PLL. > You can add it as a subobject to devlink but new genetlink family seems > much preferable given the devlink instances themselves have unclear > semantics at this point. Or you can try to convince Richard that ECC > belongs as part of PTP :) > In fact I don't think you care about any of the PHY / port stuff > currently. All you need is the ECC side of the API. IIUC you have > relatively simple setup where there is only one pin per port, and > you don't care about syncing the Tx clock. I actually do. There's (almost) always less recovered clock resources (aka pins) than ports/lanes in the system. The TX clock will be synchronized once the EEC reports the lock state, as it's the part that generates clocks for the TX part of the PHY.
> -----Original Message----- > From: Richard Cochran <richardcochran@gmail.com> > Sent: Thursday, September 9, 2021 4:09 AM > To: Andrew Lunn <andrew@lunn.ch> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Thu, Sep 09, 2021 at 12:59:27AM +0200, Andrew Lunn wrote: > > On Wed, Sep 08, 2021 at 03:20:27PM -0700, Jakub Kicinski wrote: > > > On Wed, 8 Sep 2021 21:34:37 +0200 Andrew Lunn wrote: > > > > Since we are talking about clocks and dividers, and multiplexors, > > > > should all this be using the common clock framework, which already > > > > supports most of this? Do we actually need something new? > > > > > > Does the common clock framework expose any user space API? > > > > Ah, good point. No, i don't think it does, apart from debugfs, which > > is not really a user space API, and it contains read only descriptions > > of the clock tree, current status, mux settings, dividers etc. > > Wouldn't it make sense to develop some kind of user land API to > manipulate the common clock framework at run time? > > Just dreaming... > > Richard That may be dangerous. In SyncE world we control clocks that are only references to the EEC block. The worst that can happen is that the DPLL will ignore incoming clock as it has invalid frequency, or it will drift off to the edge of allowed range. Controlling the clock that actually drives any components (PHY/MAC) in runtime can be a good way to brick the part. I feel that, while the reuse of structures may be a good idea, the userspace API for clocks is not. They are usually set up once at the board init level and stay like that "forever". The outputs we need to control are only a subset of all of them and they rather fall in the PTP pins level of details, rather than clock ones.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Thursday, September 9, 2021 1:58 AM > To: Andrew Lunn <andrew@lunn.ch> > Cc: Machnikowski, Maciej <maciej.machnikowski@intel.com>; > netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; > richardcochran@gmail.com; abyagowi@fb.com; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; davem@davemloft.net; linux- > kselftest@vger.kernel.org; Michal Kubecek <mkubecek@suse.cz>; Saeed > Mahameed <saeed@kernel.org>; Michael Chan > <michael.chan@broadcom.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Thu, 9 Sep 2021 01:14:36 +0200 Andrew Lunn wrote: > > > As you said, pin -> ref mapping is board specific, so the API should > > > not assume knowledge of routing between Port and ECC. > > > > That information will probably end up in device tree. And X different > > implementations of ACPI, unless somebody puts there foot down and > > stops the snow flakes. > > > > > Imagine a system with two cascaded switch ASICs and a bunch of PHYs. > > > How do you express that by pure extensions to the proposed API? > > > > Device tree is good at that. ACPI might eventually catch up. > > I could well be wrong but some of those connectors could well be just > SMAs on the back plate, no? User could cable those up to their heart > content. > > https://engineering.fb.com/2021/08/11/open-source/time-appliance/ Yep! We should base on the abstract indexes, rather than the device tree. The user needs to make sense of the indexes, just like with PTP pins. > > How complex a setup do we actually expect? Can there be multiple > > disjoint SyncE trees within an Ethernet switch cluster? Or would it be > > reasonable to say all you need to configure is the clock source, and > > all other ports of the switches are slaves if SyncE is enabled for the > > port? I've never see any SOHO switch hardware which allows you to have > > disjoint PTP trees, so it does not sound too unreasonable to only > > allow a single SyncE tree per switch cluster. > > Not sure. I get the (perhaps unfounded) feeling that just forwarding > the clock from one port to the rest is simpler. Maciej cares primarily > about exposing the clock to other non-Ethernet things, the device would > be an endpoint here, using the clock for LTE or whatnot. Also multiple PTP domain switches starts appearing on the market. I know Cisco makes them: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus3548/sw/system_mgmt/602/b_N3548_System_Management_Config_602A11/b_N3548_Sysetm_Management_Config_602A11_chapter_011.html Disjoint SyncE trees will be harder to implement due to the physical nature of it. > > Also, if you are cascading switches, you generally don't put PHYs in > > the middle, you just connect the SERDES lanes together. > > My concern was a case where PHY connected to one switch exposes the > refclock on an aux pin which is then muxed to more than one switch ASIC. > IOW the "source port" is not actually under the same switch. > > Again, IDK if that's realistic. It can be done, but the userspace app should make sense out of this config. Coding all scenarios in kernel would make 1000000 different possible configurations in the driver - which probably no one wants to keep/maintain. We can return info about hardwired pins (like GNSS, PHY recovered clks, PTP clock and so on) but should also give some generic ones.
Dave, Are there any free slots on Plumbers to discuss and close on SyncE interfaces (or can we add an extra one). I can reuse the slides from the Netdev to give background and a live discussion may help closing opens around it, and I'd be happy to co-present with anyone who wants to also join this effort. Regards Maciek
From: "Machnikowski, Maciej" <maciej.machnikowski@intel.com> Date: Thu, 9 Sep 2021 09:24:07 +0000 > Dave, > > Are there any free slots on Plumbers to discuss and close on SyncE interfaces > (or can we add an extra one). I can reuse the slides from the Netdev to give > background and a live discussion may help closing opens around it, > and I'd be happy to co-present with anyone who wants to also join this effort. Sorry, I think it's much too late for this.
On Thu, Sep 09, 2021 at 08:18:10AM +0000, Machnikowski, Maciej wrote: > Controlling the clock that actually drives any components (PHY/MAC) in > runtime can be a good way to brick the part. I didn't say that. > I feel that, while the reuse > of structures may be a good idea, the userspace API for clocks is not. > They are usually set up once at the board init level and stay like that "forever". > > The outputs we need to control are only a subset of all of them and they > rather fall in the PTP pins level of details, rather than clock ones. clk-gate.c clk-mux.c Making that available for user space to twiddle is a better way that tacking on to the PTP stuff. You can model your device as having a multiplexer in front of it. Thanks, Richard
On Wed, Sep 08, 2021 at 09:21:15AM -0700, Jakub Kicinski wrote: > Mumble, mumble. Ido, Florian - any devices within your purview which > would support SyncE? Sorry for the delay, was AFK. I remember SyncE being mentioned a few times in the past, so it is likely that we will be requested to add SyncE support in mlxsw in the future. I will try to solicit feedback from colleagues more familiar with the subject and provide it here next week.
On Fri, Sep 03, 2021 at 05:14:35PM +0200, Maciej Machnikowski wrote: > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index eebd3894fe89..78a8a5af8cd8 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -1273,4 +1273,35 @@ enum { > > #define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1) > > +/* SyncE section */ > + > +enum if_eec_state { > + IF_EEC_STATE_INVALID = 0, > + IF_EEC_STATE_FREERUN, > + IF_EEC_STATE_LOCKACQ, > + IF_EEC_STATE_LOCKREC, > + IF_EEC_STATE_LOCKED, > + IF_EEC_STATE_HOLDOVER, > + IF_EEC_STATE_OPEN_LOOP, > + __IF_EEC_STATE_MAX, Can you document these states? I'm not clear on what LOCKACQ, LOCKREC and OPEN_LOOP mean. I also don't see ice using them and it's not really a good practice to add new uAPI without any current users. From v1 I understand that these states were copied from commit 797d3186544f ("ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.") from Renesas. Figure 11 in the following PDF describes the states, but it seems specific to the particular device and probably shouldn't be exposed to user space as-is: https://www.renesas.com/us/en/document/dst/8a34041-datasheet I have a few questions about this being a per-netdev attribute: 1. My understanding is that in the multi-port adapter you are working with you have a single EEC that is used to set the Tx frequency of all the ports. Is that correct? 2. Assuming the above is correct, is it possible that one port is in LOCKED state and another (for some reason) is in HOLDOVER state? If yes, then I agree about this being a per-netdev attribute. The interface can also be extended with another attribute specifying the HOLDOVER reason. For example, netdev being down. Regardless, I agree with previous comments made about this belonging in ethtool rather than rtnetlink. > +}; > + > +#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1) > +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the port is > + * currently the source for the EEC > + */ I'm not sure about this one. If we are going to expose this as a per-netdev attribute (see more below), any reason it can't be added as another state (e.g., IF_EEC_STATE_LOCKED_SRC)? IIUC, in the common case of a simple NE the source of the EEC is always one of the ports, but in the case of a PRC the source is most likely external (e.g., GPS). If so, I think we need a way to represent the EEC as a separate object with the ability to set its source and report it via the same interface. I'm unclear on how exactly an external source looks like, but for the netdev case maybe something like this: devlink clock show [ dev clock CLOCK ] devlink clock set DEV clock CLOCK [ { src_type SRC_TYPE } ] SRC_TYPE : = { port DEV/PORT_INDEX } The only source type above is 'port' with the ability to set the relevant port, but more can be added. Obviously, 'devlink clock show' will give you the current source in addition to other information such as frequency difference with respect to the input frequency. Finally, can you share more info about the relation to the PHC? My understanding is that one of the primary use cases for SyncE is to drive all the PHCs in the network using the same frequency. How do you imagine this configuration is going to look like? Can the above interface be extended for that? All of the above might be complete nonsense as I'm still trying to wrap my head around the subject. Thanks for working on this
On Wed, Sep 08, 2021 at 09:21:15AM -0700, Jakub Kicinski wrote: > Mumble, mumble. Ido, Florian - any devices within your purview which > would support SyncE? So it's public info that Connect-X has it: https://www.mellanox.com/files/doc-2020/pb-connectx-6-dx-en-card.pdf Given the nature of SyncE, I think you can extrapolate from that about other devices :) Anyway, Petr and I started discussing this with the colleagues defining the HW/SW interfaces and we will help with the review to make sure we end up with a uAPI that works across multiple implementations.
> -----Original Message----- > From: Ido Schimmel <idosch@idosch.org> > Sent: Tuesday, September 21, 2021 3:16 PM > To: Machnikowski, Maciej <maciej.machnikowski@intel.com> > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > message to get SyncE status > > On Fri, Sep 03, 2021 at 05:14:35PM +0200, Maciej Machnikowski wrote: > > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > > index eebd3894fe89..78a8a5af8cd8 100644 > > --- a/include/uapi/linux/if_link.h > > +++ b/include/uapi/linux/if_link.h > > @@ -1273,4 +1273,35 @@ enum { > > > > #define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1) > > > > +/* SyncE section */ > > + > > +enum if_eec_state { > > + IF_EEC_STATE_INVALID = 0, > > + IF_EEC_STATE_FREERUN, > > + IF_EEC_STATE_LOCKACQ, > > + IF_EEC_STATE_LOCKREC, > > + IF_EEC_STATE_LOCKED, > > + IF_EEC_STATE_HOLDOVER, > > + IF_EEC_STATE_OPEN_LOOP, > > + __IF_EEC_STATE_MAX, > > Can you document these states? I'm not clear on what LOCKACQ, LOCKREC > and OPEN_LOOP mean. I also don't see ice using them and it's not really > a good practice to add new uAPI without any current users. > I'll fix that enum to use generic states defined in G.781 which are limited to: - Freerun - LockedACQ (locked, acquiring holdover memory) - Locked (locked with holdover acquired) - Holdover > From v1 I understand that these states were copied from commit > 797d3186544f ("ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.") > from Renesas. > > Figure 11 in the following PDF describes the states, but it seems > specific to the particular device and probably shouldn't be exposed to > user space as-is: > https://www.renesas.com/us/en/document/dst/8a34041-datasheet > > I have a few questions about this being a per-netdev attribute: > > 1. My understanding is that in the multi-port adapter you are working > with you have a single EEC that is used to set the Tx frequency of all > the ports. Is that correct? That's correct. > 2. Assuming the above is correct, is it possible that one port is in > LOCKED state and another (for some reason) is in HOLDOVER state? If yes, > then I agree about this being a per-netdev attribute. The interface can > also be extended with another attribute specifying the HOLDOVER reason. > For example, netdev being down. All ports driven by a single EEC will report the same state. > Regardless, I agree with previous comments made about this belonging in > ethtool rather than rtnetlink. Will take a look at it - as it will require support in linuxptp as well. > > +}; > > + > > +#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1) > > +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the > port is > > + * currently the source for the EEC > > + */ > > I'm not sure about this one. If we are going to expose this as a > per-netdev attribute (see more below), any reason it can't be added as > another state (e.g., IF_EEC_STATE_LOCKED_SRC)? OK! That's a great idea! Yet we'll need LOCKED_SRC and LOCKED_ACQ_SRC, but still sounds good. > IIUC, in the common case of a simple NE the source of the EEC is always > one of the ports, but in the case of a PRC the source is most likely > external (e.g., GPS). True > If so, I think we need a way to represent the EEC as a separate object > with the ability to set its source and report it via the same interface. > I'm unclear on how exactly an external source looks like, but for the > netdev case maybe something like this: > > devlink clock show [ dev clock CLOCK ] > devlink clock set DEV clock CLOCK [ { src_type SRC_TYPE } ] > SRC_TYPE : = { port DEV/PORT_INDEX } Unfortunately, EEC lives in 2 worlds - it belongs to a netdev (in very simple deployments the EEC may be a part of the PHY and only allow synchronizing the TX frequency to a single lane/port), but also can go outside of netdev and be a boar-wise frequency source. > The only source type above is 'port' with the ability to set the > relevant port, but more can be added. Obviously, 'devlink clock show' > will give you the current source in addition to other information such > as frequency difference with respect to the input frequency. We considered devlink interface for configuring the clock/DPLL, but a new concept was born at the list to add a DPLL subsystem that will cover more use cases, like a TimeCard. > Finally, can you share more info about the relation to the PHC? My > understanding is that one of the primary use cases for SyncE is to drive > all the PHCs in the network using the same frequency. How do you imagine > this configuration is going to look like? Can the above interface be > extended for that? That would be a configurable parameter/option of the PTP program. Just like it can check the existence of link on a given port, it'll also be able to check if we use EEC and it's locked. If it is, and is a source of PHC frequency - the ptp tool can decide to not apply frequency corrections to the PHC, just like the ptp4l does when nullf servo is used, but can do that dynamically. > All of the above might be complete nonsense as I'm still trying to wrap > my head around the subject. It's certainly not! Thanks for your input!
On Tue, Sep 21, 2021 at 01:37:37PM +0000, Machnikowski, Maciej wrote: > > -----Original Message----- > > From: Ido Schimmel <idosch@idosch.org> > > Sent: Tuesday, September 21, 2021 3:16 PM > > To: Machnikowski, Maciej <maciej.machnikowski@intel.com> > > Subject: Re: [PATCH net-next 1/2] rtnetlink: Add new RTM_GETEECSTATE > > message to get SyncE status > > > > On Fri, Sep 03, 2021 at 05:14:35PM +0200, Maciej Machnikowski wrote: > > > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > > > index eebd3894fe89..78a8a5af8cd8 100644 > > > --- a/include/uapi/linux/if_link.h > > > +++ b/include/uapi/linux/if_link.h > > > @@ -1273,4 +1273,35 @@ enum { > > > > > > #define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1) > > > > > > +/* SyncE section */ > > > + > > > +enum if_eec_state { > > > + IF_EEC_STATE_INVALID = 0, > > > + IF_EEC_STATE_FREERUN, > > > + IF_EEC_STATE_LOCKACQ, > > > + IF_EEC_STATE_LOCKREC, > > > + IF_EEC_STATE_LOCKED, > > > + IF_EEC_STATE_HOLDOVER, > > > + IF_EEC_STATE_OPEN_LOOP, > > > + __IF_EEC_STATE_MAX, > > > > Can you document these states? I'm not clear on what LOCKACQ, LOCKREC > > and OPEN_LOOP mean. I also don't see ice using them and it's not really > > a good practice to add new uAPI without any current users. > > > > I'll fix that enum to use generic states defined in G.781 which are limited to: > - Freerun > - LockedACQ (locked, acquiring holdover memory) > - Locked (locked with holdover acquired) > - Holdover Thanks, it is good to conform to a standard. Can ice distinguish between LockedACQ and Locked? From G.781 I understand that the former is a transient state. Is the distinction between the two important for user space / the selection operation? If not, maybe we only need Locked? > > > From v1 I understand that these states were copied from commit > > 797d3186544f ("ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.") > > from Renesas. > > > > Figure 11 in the following PDF describes the states, but it seems > > specific to the particular device and probably shouldn't be exposed to > > user space as-is: > > https://www.renesas.com/us/en/document/dst/8a34041-datasheet > > > > I have a few questions about this being a per-netdev attribute: > > > > 1. My understanding is that in the multi-port adapter you are working > > with you have a single EEC that is used to set the Tx frequency of all > > the ports. Is that correct? > > That's correct. > > > 2. Assuming the above is correct, is it possible that one port is in > > LOCKED state and another (for some reason) is in HOLDOVER state? If yes, > > then I agree about this being a per-netdev attribute. The interface can > > also be extended with another attribute specifying the HOLDOVER reason. > > For example, netdev being down. > > All ports driven by a single EEC will report the same state. So maybe we just need to report via ethtool the association between the EEC and the netdev and expose the state as an attribute of the EEC (along with the selected source and other info)? This is similar to how PHC/netdev association is queried via ethtool. For a given netdev, TSINFO_GET will report the PTP hw clock index via ETHTOOL_A_TSINFO_PHC_INDEX. See Documentation/networking/ethtool-netlink.rst > > > Regardless, I agree with previous comments made about this belonging in > > ethtool rather than rtnetlink. > > Will take a look at it - as it will require support in linuxptp as well. > > > > +}; > > > + > > > +#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1) > > > +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the > > port is > > > + * currently the source for the EEC > > > + */ > > > > I'm not sure about this one. If we are going to expose this as a > > per-netdev attribute (see more below), any reason it can't be added as > > another state (e.g., IF_EEC_STATE_LOCKED_SRC)? > > OK! That's a great idea! Yet we'll need LOCKED_SRC and LOCKED_ACQ_SRC, > but still sounds good. > > > IIUC, in the common case of a simple NE the source of the EEC is always > > one of the ports, but in the case of a PRC the source is most likely > > external (e.g., GPS). > > True > > > If so, I think we need a way to represent the EEC as a separate object > > with the ability to set its source and report it via the same interface. > > I'm unclear on how exactly an external source looks like, but for the > > netdev case maybe something like this: > > > > devlink clock show [ dev clock CLOCK ] > > devlink clock set DEV clock CLOCK [ { src_type SRC_TYPE } ] > > SRC_TYPE : = { port DEV/PORT_INDEX } > > Unfortunately, EEC lives in 2 worlds - it belongs to a netdev (in very simple > deployments the EEC may be a part of the PHY and only allow synchronizing > the TX frequency to a single lane/port), but also can go outside of netdev > and be a boar-wise frequency source. > > > The only source type above is 'port' with the ability to set the > > relevant port, but more can be added. Obviously, 'devlink clock show' > > will give you the current source in addition to other information such > > as frequency difference with respect to the input frequency. > > We considered devlink interface for configuring the clock/DPLL, but a > new concept was born at the list to add a DPLL subsystem that will > cover more use cases, like a TimeCard. The reason I suggested devlink is that it is suited for device-wide configuration and it is already used by both MAC drivers and the TimeCard driver. If we have a good reason to create a new generic netlink family for this stuff, then OK. > > > Finally, can you share more info about the relation to the PHC? My > > understanding is that one of the primary use cases for SyncE is to drive > > all the PHCs in the network using the same frequency. How do you imagine > > this configuration is going to look like? Can the above interface be > > extended for that? > > That would be a configurable parameter/option of the PTP program. > Just like it can check the existence of link on a given port, it'll also be > able to check if we use EEC and it's locked. If it is, and is a source of > PHC frequency - the ptp tool can decide to not apply frequency corrections > to the PHC, just like the ptp4l does when nullf servo is used, but can do that > dynamically. The part I don't understand is "is a source of PHC frequency". How do we configure / query that?
On Tue, 21 Sep 2021 17:58:05 +0300 Ido Schimmel wrote: > > > The only source type above is 'port' with the ability to set the > > > relevant port, but more can be added. Obviously, 'devlink clock show' > > > will give you the current source in addition to other information such > > > as frequency difference with respect to the input frequency. > > > > We considered devlink interface for configuring the clock/DPLL, but a > > new concept was born at the list to add a DPLL subsystem that will > > cover more use cases, like a TimeCard. > > The reason I suggested devlink is that it is suited for device-wide > configuration and it is already used by both MAC drivers and the > TimeCard driver. If we have a good reason to create a new generic > netlink family for this stuff, then OK. For NICs mapping between devlink instances and HW is not clear. Most register devlink per PCI dev which usually maps to a Eth port. So if we have one DPLL on a 2 port NIC mapping will get icky, no? Is the motivation to save the boilerplate code associated with new genetlink family or something more?
On Tue, Sep 21, 2021 at 02:14:45PM -0700, Jakub Kicinski wrote: > On Tue, 21 Sep 2021 17:58:05 +0300 Ido Schimmel wrote: > > > > The only source type above is 'port' with the ability to set the > > > > relevant port, but more can be added. Obviously, 'devlink clock show' > > > > will give you the current source in addition to other information such > > > > as frequency difference with respect to the input frequency. > > > > > > We considered devlink interface for configuring the clock/DPLL, but a > > > new concept was born at the list to add a DPLL subsystem that will > > > cover more use cases, like a TimeCard. > > > > The reason I suggested devlink is that it is suited for device-wide > > configuration and it is already used by both MAC drivers and the > > TimeCard driver. If we have a good reason to create a new generic > > netlink family for this stuff, then OK. > > For NICs mapping between devlink instances and HW is not clear. > Most register devlink per PCI dev which usually maps to a Eth port. > So if we have one DPLL on a 2 port NIC mapping will get icky, no? Yes, having to represent the same EEC in multiple devlink instances is not nice. > > Is the motivation to save the boilerplate code associated with new > genetlink family or something more? I don't mind either way. I simply wanted to understand the motivation for not using any existing framework. The above argument is convincing enough, IMO.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7c41593c1d6a..de78b8ed903c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1344,6 +1344,8 @@ struct netdev_net_notifier { * The caller must be under RCU read context. * int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path); * Get the forwarding path to reach the real device from the HW destination address + * int (*ndo_get_synce_state)(struct net_device *dev, struct if_eec_state_msg *state) + * Get state of physical layer frequency syntonization (SyncE) */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1563,6 +1565,10 @@ struct net_device_ops { struct net_device * (*ndo_get_peer_dev)(struct net_device *dev); int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path); + int (*ndo_get_eec_state)(struct net_device *dev, + enum if_eec_state *state, + u32 *eec_flags, + struct netlink_ext_ack *extack); }; /** diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index eebd3894fe89..78a8a5af8cd8 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1273,4 +1273,35 @@ enum { #define IFLA_MCTP_MAX (__IFLA_MCTP_MAX - 1) +/* SyncE section */ + +enum if_eec_state { + IF_EEC_STATE_INVALID = 0, + IF_EEC_STATE_FREERUN, + IF_EEC_STATE_LOCKACQ, + IF_EEC_STATE_LOCKREC, + IF_EEC_STATE_LOCKED, + IF_EEC_STATE_HOLDOVER, + IF_EEC_STATE_OPEN_LOOP, + __IF_EEC_STATE_MAX, +}; + +#define IF_EEC_STATE_MAX (__IF_EEC_STATE_MAX - 1) +#define EEC_SRC_PORT (1 << 0) /* recovered clock from the port is + * currently the source for the EEC + */ + +struct if_eec_state_msg { + __u32 ifindex; +}; + +enum { + IFLA_EEC_UNSPEC, + IFLA_EEC_STATE, + IFLA_EEC_FLAGS, + __IFLA_EEC_MAX, +}; + +#define IFLA_EEC_MAX (__IFLA_EEC_MAX - 1) + #endif /* _UAPI_LINUX_IF_LINK_H */ diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 5888492a5257..642ac18a09d9 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -185,6 +185,9 @@ enum { RTM_GETNEXTHOPBUCKET, #define RTM_GETNEXTHOPBUCKET RTM_GETNEXTHOPBUCKET + RTM_GETEECSTATE = 120, +#define RTM_GETEECSTATE RTM_GETEECSTATE + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 972c8cb303a5..11d4e96d3371 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -5468,6 +5468,75 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; } +static int rtnl_fill_eec_state(struct sk_buff *skb, struct net_device *dev, + u32 portid, u32 seq, struct netlink_callback *cb, + int flags, struct netlink_ext_ack *extack) +{ + const struct net_device_ops *ops = dev->netdev_ops; + struct if_eec_state_msg *state_msg; + enum if_eec_state state; + struct nlmsghdr *nlh; + u32 eec_flags; + int err; + + ASSERT_RTNL(); + + if (!ops->ndo_get_eec_state) + return -EOPNOTSUPP; + + err = ops->ndo_get_eec_state(dev, &state, &eec_flags, extack); + if (err) + return err; + + nlh = nlmsg_put(skb, portid, seq, RTM_GETEECSTATE, sizeof(*state_msg), + flags); + if (!nlh) + return -EMSGSIZE; + + state_msg = nlmsg_data(nlh); + state_msg->ifindex = dev->ifindex; + + if (nla_put(skb, IFLA_EEC_STATE, sizeof(state), &state) || + nla_put_u32(skb, IFLA_EEC_FLAGS, eec_flags)) + return -EMSGSIZE; + + nlmsg_end(skb, nlh); + return 0; +} + +static int rtnl_eec_state_get(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct net *net = sock_net(skb->sk); + struct if_eec_state_msg *state; + struct net_device *dev; + struct sk_buff *nskb; + int err; + + state = nlmsg_data(nlh); + if (state->ifindex > 0) + dev = __dev_get_by_index(net, state->ifindex); + else + return -EINVAL; + + if (!dev) + return -ENODEV; + + nskb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!nskb) + return -ENOBUFS; + + err = rtnl_fill_eec_state(nskb, dev, NETLINK_CB(skb).portid, + nlh->nlmsg_seq, NULL, nlh->nlmsg_flags, + extack); + if (err < 0) + kfree_skb(nskb); + else + err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid); + + return err; +} + /* Process one rtnetlink message. */ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, @@ -5693,4 +5762,6 @@ void __init rtnetlink_init(void) rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump, 0); + + rtnl_register(PF_UNSPEC, RTM_GETEECSTATE, rtnl_eec_state_get, NULL, 0); } diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c index d59276f48d4f..0e2d84d6045f 100644 --- a/security/selinux/nlmsgtab.c +++ b/security/selinux/nlmsgtab.c @@ -91,6 +91,7 @@ static const struct nlmsg_perm nlmsg_route_perms[] = { RTM_NEWNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_DELNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_GETNEXTHOPBUCKET, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_GETEECSTATE, NETLINK_ROUTE_SOCKET__NLMSG_READ }, }; static const struct nlmsg_perm nlmsg_tcpdiag_perms[] = @@ -174,7 +175,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm) * structures at the top of this file with the new mappings * before updating the BUILD_BUG_ON() macro! */ - BUILD_BUG_ON(RTM_MAX != (RTM_NEWNEXTHOPBUCKET + 3)); + BUILD_BUG_ON(RTM_MAX != (RTM_GETEECSTATE + 3)); err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms, sizeof(nlmsg_route_perms)); break;
This patch series introduces basic interface for reading the Ethernet Equipment Clock (EEC) state on a SyncE capable device. This state gives information about the state of EEC. This interface is required to implement Synchronization Status Messaging on upper layers. Initial implementation returns SyncE EEC state and flags attributes. The only flag currently implemented is the EEC_SRC_PORT. When it's set the EEC is synchronized to the recovered clock recovered from the current port. SyncE EEC state read needs to be implemented as a ndo_get_eec_state function. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> --- include/linux/netdevice.h | 6 +++ include/uapi/linux/if_link.h | 31 +++++++++++++++ include/uapi/linux/rtnetlink.h | 3 ++ net/core/rtnetlink.c | 71 ++++++++++++++++++++++++++++++++++ security/selinux/nlmsgtab.c | 3 +- 5 files changed, 113 insertions(+), 1 deletion(-)