mbox series

[net-next,v3,00/12] net: dsa: microchip: PTP support for KSZ956x

Message ID 20201118203013.5077-1-ceggers@arri.de
Headers show
Series net: dsa: microchip: PTP support for KSZ956x | expand

Message

Christian Eggers Nov. 18, 2020, 8:30 p.m. UTC
This series adds support for PTP to the KSZ956x and KSZ9477 devices.

There is only little documentation for PTP available on the data sheet
[1] (more or less only the register reference). Questions to the
Microchip support were seldom answered comprehensively or in reasonable
time. So this is more or less the result of reverse engineering.

[1]
http://ww1.microchip.com/downloads/en/DeviceDoc/KSZ9563R-Data-Sheet-DS00002419D.pdf

Changes from v2  --> v3
------------------------
Applied all changes requested by Vladimir Oltean. v3 depends on my other
netdev patches from 2020-11-18:
- net: ptp: introduce common defines for PTP message types
- net: dsa: avoid potential use-after-free error

[1/11]-->[1/12]  - dts: remove " OR BSD-2-Clause" from SPDX-License-Identifier
                 - dts: add "additionalProperties"
                 - dts: remove quotes
[2/11]-->[2/12]  - Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
[3/11]           - [Patch removed] (split ksz_common.h)
[4/11]-->[3/12]  - Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
                 - fixed summary
[5/11]-->[4/12]  - Use "interrupts-extended" syntax
[6/11]-->[5+6/12] - Split up patch
                 - style fixes
                 - use GENMASK()
                 - IRQF_ONESHOT|IRQF_SHARED
[7/11]-->[7/12]  - Remove "default n" from Kconfig
                 - use mutex in adjfine()
                 - style fixes
[8/11]-->[8/12]  - Be more verbose in commit message
                 - Rename helper
                 - provide correction instead of t2
                 - simplify location of UDP header
[9/11]-->[9+10/12] - Split up patch
                 - Update commmit messages
                 - don't use OR operator on irqreturn_t
                 - spin_lock_irqsave() --> spin_lock_bh()
                 - style fixes
                 - remove rx_filter cases for DELAY_REQ
                 - use new PTP_MSGTYPE_* defines
                 - inline ksz9477_ptp_should_tstamp()
                 - ksz9477_tstamp_to_clock() --> ksz9477_tstamp_reconstruct()
                 - use shared data in include/linux/net/dsa/ksz_common.h
                 - wait for tx time stamp (within sleepable context)
                 - use boolean for tx time stamp enable
                 - move t2 from correction to tail tag (again)
                 - ...

Changes from RFC --> v2
------------------------
I think that all open questions regarding the RFC version could be solved.
dts: referenced to dsa.yaml
dts: changed node name to "switch" in example
dts: changed "ports" subnode to "ethernet-ports"
ksz_common: support "ethernet-ports" subnode
tag_ksz: fix usage of correction field (32 bit ns + 16 bit sub-ns)
tag_ksz: use cached PTP header from device's .port_txtstamp function
tag_ksz: refactored ksz9477_tstamp_to_clock()
tag_ksz: pdelay_req: only subtract 2 bit seconds from the correction field
tag_ksz: pdelay_resp: don't move (negative) correction to the egress tail tag
ptp_classify: add ptp_onestep_p2p_move_t2_to_correction helper
ksz9477_ptp: removed E2E support (as suggested by Vladimir)
ksz9477_ptp: removed master/slave sysfs attributes (nacked by Richard)
ksz9477_ptp: refactored ksz9477_ptp_port_txtstamp
ksz9477_ptp: removed "pulse" attribute
kconfig: depend on PTP_1588_CLOCK (instead of "imply")

Comments

Christian Eggers Nov. 19, 2020, 5:28 a.m. UTC | #1
Hi Vladimir,

On Thursday, 19 November 2020, 00:40:18 CET, Vladimir Oltean wrote:
> On Wed, Nov 18, 2020 at 09:30:01PM +0100, Christian Eggers wrote:
> > This series adds support for PTP to the KSZ956x and KSZ9477 devices.
> > 
> > There is only little documentation for PTP available on the data sheet
> > [1] (more or less only the register reference). Questions to the
> > Microchip support were seldom answered comprehensively or in reasonable
> > time. So this is more or less the result of reverse engineering.
> 
> [...]
> One thing that should definitely not be part of this series though is
> patch 11/12. Christian, given the conversation we had on your previous
> patch:
> https://lore.kernel.org/netdev/20201113025311.jpkplhmacjz6lkc5@skbuf/
sorry, I didn't read that carefully enough. Some of the other requested changes
were quite challenging for me. Additionally, finding the UDP checksum bug
needed some time for identifying because I didn't recognize that when it got
introduced.

> as well as the documentation patch that was submitted in the meantime:
> https://lore.kernel.org/netdev/20201117213826.18235-1-a.fatoum@pengutronix.de/ 
I am not subscribed to the list. 

> obviously you chose to completely disregard that. May we know why? How
> are you even making use of the PTP_CLK_REQ_PPS feature?
Of course I will drop that patch from the next series.

regards
Christian
Rob Herring (Arm) Nov. 19, 2020, 1:42 p.m. UTC | #2
On Wed, 18 Nov 2020 21:30:02 +0100, Christian Eggers wrote:
> Convert the bindings document for Microchip KSZ Series Ethernet switches
> from txt to yaml.
> 
> Signed-off-by: Christian Eggers <ceggers@arri.de>
> ---
>  .../devicetree/bindings/net/dsa/ksz.txt       | 125 --------------
>  .../bindings/net/dsa/microchip,ksz.yaml       | 152 ++++++++++++++++++
>  MAINTAINERS                                   |   2 +-
>  3 files changed, 153 insertions(+), 126 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/net/dsa/ksz.txt
>  create mode 100644 Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml
> 


My bot found errors running 'make dt_binding_check' on your patch:

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/net/dsa/microchip,ksz.example.dt.yaml: switch@0: 'ethernet-ports', 'reg', 'spi-cpha', 'spi-cpol', 'spi-max-frequency' do not match any of the regexes: 'pinctrl-[0-9]+'
	From schema: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/net/dsa/microchip,ksz.example.dt.yaml: switch@1: 'ethernet-ports', 'reg', 'spi-cpha', 'spi-cpol', 'spi-max-frequency' do not match any of the regexes: 'pinctrl-[0-9]+'
	From schema: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml


See https://patchwork.ozlabs.org/patch/1402525

The base for the patch is generally the last rc1. Any dependencies
should be noted.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.
Tristram.Ha@microchip.com Nov. 19, 2020, 6:51 p.m. UTC | #3
> On Thursday, 19 November 2020, 00:40:18 CET, Vladimir Oltean wrote:

> > On Wed, Nov 18, 2020 at 09:30:01PM +0100, Christian Eggers wrote:

> > > This series adds support for PTP to the KSZ956x and KSZ9477 devices.

> > >

> > > There is only little documentation for PTP available on the data sheet

> > > [1] (more or less only the register reference). Questions to the

> > > Microchip support were seldom answered comprehensively or in

> reasonable

> > > time. So this is more or less the result of reverse engineering.

> >

> > [...]

> > One thing that should definitely not be part of this series though is

> > patch 11/12. Christian, given the conversation we had on your previous

> > patch:

> > https://lore.kernel.org/netdev/20201113025311.jpkplhmacjz6lkc5@skbuf/

> sorry, I didn't read that carefully enough. Some of the other requested

> changes

> were quite challenging for me. Additionally, finding the UDP checksum bug

> needed some time for identifying because I didn't recognize that when it got

> introduced.

> 

> > as well as the documentation patch that was submitted in the meantime:

> > https://lore.kernel.org/netdev/20201117213826.18235-1-

> a.fatoum@pengutronix.de/

> I am not subscribed to the list.

> 

> > obviously you chose to completely disregard that. May we know why? How

> > are you even making use of the PTP_CLK_REQ_PPS feature?

> Of course I will drop that patch from the next series.


These are general comments about this PTP patch.

The initial proposal in tag_ksz.c is for the switch driver to provide callback functions
to handle receiving and transmitting.  Then each switch driver can be added to
process the tail tag in its own driver and leave tag_ksz.c unchanged.

It was rejected because of wanting to keep tag_ksz.c code and switch driver code
separate and concern about performance.

Now tag_ksz.c is filled with PTP code that is not relevant for other switches and will
need to be changed again when another switch driver with PTP function is added.

Can we implement that callback mechanism?

One issue with transmission with PTP enabled is that the tail tag needs to contain 4
additional bytes.  When the PTP function is off the bytes are not added.  This should
be monitored all the time.

The extra 4 bytes are only used for 1-step Pdelay_Resp.  It should contain the receive
timestamp of previous Pdelay_Req with latency adjusted.  The correction field in
Pdelay_Resp should be zero.  It may be a hardware bug to have wrong UDP checksum
when the message is sent.

I think the right implementation is for the driver to remember this receive timestamp
of Pdelay_Req and puts it in the tail tag when it sees a 1-step Pdelay_Resp is sent.

There is one more requirement that is a little difficult to do.  The calculated peer delay
needs to be programmed in hardware register, but the regular PTP stack has no way to
send that command.  I think the driver has to do its own calculation by snooping on the
Pdelay_Req/Pdelay_Resp/Pdelay_Resp_Follow_Up messages.

The receive and transmit latencies are different for different connected speed.  So the
driver needs to change them when the link changes.  For that reason the PTP stack
should not use its own latency values as generally the application does not care about
the linked speed.
Christian Eggers Nov. 19, 2020, 8:16 p.m. UTC | #4
Hi Tristram,

thank you for joining this thread.

On Thursday, 19 November 2020, 19:51:15 CET, Tristram.Ha@microchip.com wrote:
> > On Thursday, 19 November 2020, 00:40:18 CET, Vladimir Oltean wrote:
> > > On Wed, Nov 18, 2020 at 09:30:01PM +0100, Christian Eggers wrote:
> > > [...]
> > [...]
> These are general comments about this PTP patch.
> 
> The initial proposal in tag_ksz.c is for the switch driver to provide
> callback functions to handle receiving and transmitting.  Then each switch
> driver can be added to process the tail tag in its own driver and leave
> tag_ksz.c unchanged. 
> It was rejected because of wanting to keep tag_ksz.c code and switch driver
> code separate and concern about performance.
> 
> Now tag_ksz.c is filled with PTP code that is not relevant for other
> switches and will need to be changed again when another switch driver with
> PTP function is added. 
> Can we implement that callback mechanism?
I didn't read the full history of the tagging driver. Vladimir already asked
whether I could put more stuff into the device driver. Lets wait for his
advice how to do this best.

> One issue with transmission with PTP enabled is that the tail tag needs to
> contain 4 additional bytes.  When the PTP function is off the bytes are
> not added.  This should be monitored all the time.
Currently, enabling the PTP function is only dependent on
CONFIG_NET_DSA_MICROCHIP_KSZ9477_PTP. The same condition is used for inserting
the additional 4 bytes.

> The extra 4 bytes are only used for 1-step Pdelay_Resp.  It should contain
> the receive timestamp of previous Pdelay_Req with latency adjusted.  The
> correction field in Pdelay_Resp should be zero.  It may be a hardware bug
> to have wrong UDP checksum when the message is sent.
Thanks for clarifying this.

> I think the right implementation is for the driver to remember this receive
> timestamp of Pdelay_Req and puts it in the tail tag when it sees a 1-step
> Pdelay_Resp is sent. 
I would like keep the current method ("time stamp to correction" on RX, 
"correction to tail tag" on TX). Otherwise the driver would have to keep a 
list
of rx time stamps which could grow if no corresponding PDelay_Resp is sent.
It was also discussed about creating a new interface for bringing the time
stamp to user space and then back into the kernel. But this has been rejected.

> There is one more requirement that is a little difficult to do.  The
> calculated peer delay needs to be programmed in hardware register, but the
> regular PTP stack has no way to send that command.
I already recognized that register. Can you please provide some more 
information what the switch does with this value? At least when I connect only
two boards, I get almost perfect synchronization (PPS output) without writing
anything to this register. Looks like this only affects forwarded messages,
right?

> I think the driver has to do its own calculation by snooping on the
> Pdelay_Req/Pdelay_Resp/Pdelay_Resp_Follow_Up messages.
As I already wrote, I am definitely not an expert for PTP. But if I remember
correctly, the delay values used by ptp4l are the result of filtering several 
delay measurements. I don't think that this algorithm should be duplicated in
the kernel. 

On the other hand, there is currently no interface for this. In my internal
tree, I have created sysfs entries for this, so that (a modified version of)
ptp4l could write the measured values. I also recognized, that ptp4l has some
kind of remote interface (I haven't really looked at it). Maybe it is possible
to do necessary management of the switch outside ptp4l in a separate process.

One other important question was about the internal "filter". Richard rejected
the idea of "manually" switching between the "master" and "slave" mode. Is
there any (undocumented) register bit for disabling filtering of Sync/
Delay_Req
messages entirely?

> The receive and transmit latencies are different for different connected
> speed.  So the driver needs to change them when the link changes.  For
> that reason the PTP stack should not use its own latency values as
> generally the application does not care about the linked speed.
Up to now, I didn't configure any latency values in ptp4l. I assume that the
power on default values are fine for 1000 MBit/s. Can you provide the latency
values for other links speeds? Would it be a major limitation if PTP 
functionality depend on 1000 MBit/s?

regards
Christian
Christian Eggers Nov. 19, 2020, 8:22 p.m. UTC | #5
On Thursday, 19 November 2020, 14:48:01 CET, Rob Herring wrote:
> On Wed, Nov 18, 2020 at 09:30:02PM +0100, Christian Eggers wrote:
> > Convert the bindings document for Microchip KSZ Series Ethernet switches
> > from txt to yaml.
> > 
> > Signed-off-by: Christian Eggers <ceggers@arri.de>
> > ---
> > 
> >  .../devicetree/bindings/net/dsa/ksz.txt       | 125 --------------
> >  .../bindings/net/dsa/microchip,ksz.yaml       | 152 ++++++++++++++++++
> >  MAINTAINERS                                   |   2 +-
> >  3 files changed, 153 insertions(+), 126 deletions(-)
> >  delete mode 100644 Documentation/devicetree/bindings/net/dsa/ksz.txt
> >  create mode 100644
> >  Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml> 
> > diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt
> > b/Documentation/devicetree/bindings/net/dsa/ksz.txt deleted file mode
> > 100644
> > index 95e91e84151c..000000000000
> > --- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
> > +++ /dev/null
>> [...]
> > diff --git a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml
> > b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml new file
> > mode 100644
> > index 000000000000..010adb09a68f
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml
> > @@ -0,0 +1,152 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/net/dsa/microchip,ksz.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Microchip KSZ Series Ethernet switches
> > +
> > +allOf:
> > +  - $ref: dsa.yaml#
> 
> Move this after 'maintainers'.
changed for v4

> 
> > +
> > +maintainers:
> > +  - Marek Vasut <marex@denx.de>
> > +  - Woojung Huh <Woojung.Huh@microchip.com>
> > +
> > +properties:
> > +  # See Documentation/devicetree/bindings/net/dsa/dsa.yaml for a list of
> > additional +  # required and optional properties.
> > +  compatible:
> > +    enum:
> > +      - microchip,ksz8765
> > +      - microchip,ksz8794
> > +      - microchip,ksz8795
> > +      - microchip,ksz9477
> > +      - microchip,ksz9897
> > +      - microchip,ksz9896
> > +      - microchip,ksz9567
> > +      - microchip,ksz8565
> > +      - microchip,ksz9893
> > +      - microchip,ksz9563
> > +      - microchip,ksz8563
> > +
> > +  reset-gpios:
> > +    description:
> > +      Should be a gpio specifier for a reset line.
> > +    maxItems: 1
> > +
> > +  microchip,synclko-125:
> > +    $ref: /schemas/types.yaml#/definitions/flag
> > +    description:
> > +      Set if the output SYNCLKO frequency should be set to 125MHz instead
> > of 25MHz. +
> > +required:
> > +  - compatible
> > +  - reg
> > +
> > +additionalProperties: false
> 
> You need to use unevaluatedProperties instead.
dt_binding_check is happy now

> 
> > +
> > +examples:
> > +  - |
> > +    #include <dt-bindings/gpio/gpio.h>
> > +
> > +    // Ethernet switch connected via SPI to the host, CPU port wired to
> > eth0: +    eth0 {
> > +        fixed-link {
> > +            speed = <1000>;
> > +            full-duplex;
> > +        };
> > +    };
> > +
> > +    spi0 {
> > +        #address-cells = <1>;
> > +        #size-cells = <0>;
> > +
> > +        pinctrl-0 = <&pinctrl_spi_ksz>;
> > +        cs-gpios = <&pioC 25 0>;
> > +        id = <1>;
> > +
> > +        ksz9477: switch@0 {
> > +            compatible = "microchip,ksz9477";
> > +            reg = <0>;
> > +            reset-gpios = <&gpio5 0 GPIO_ACTIVE_LOW>;
> > +
> > +            spi-max-frequency = <44000000>;
> > +            spi-cpha;
> > +            spi-cpol;
> 
> Are these 2 optional or required? Being optional is rare as most
> devices support 1 mode, but not unheard of.
Vladimir Oltean Nov. 21, 2020, 1:26 a.m. UTC | #6
On Thu, Nov 19, 2020 at 06:51:15PM +0000, Tristram.Ha@microchip.com wrote:
> The initial proposal in tag_ksz.c is for the switch driver to provide callback functions
> to handle receiving and transmitting.  Then each switch driver can be added to
> process the tail tag in its own driver and leave tag_ksz.c unchanged.
>
> It was rejected because of wanting to keep tag_ksz.c code and switch driver code
> separate and concern about performance.
>
> Now tag_ksz.c is filled with PTP code that is not relevant for other switches and will
> need to be changed again when another switch driver with PTP function is added.
>
> Can we implement that callback mechanism?

I, too, lack the context here. But it sounds like feedback that Andrew
would give.

If you don't like the #ifdef's, I am not in love with them either. But
maybe Christian is just optimizing too aggressively, and doesn't actually
need to put those #ifdef's there and provide stub implementations, but
could actually just leave the ksz9477_rcv_timestamp and ksz9477_xmit_timestamp
always compiled-in, and "dead at runtime" in the case there is no PTP.

If there is something else you don't like, what is it? If you know that
other KSZ switches don't implement timestamping in the same way, well,
we don't know that. I thought that it's generally up to the second
implementer to recognize which parts of the code are common and should
be reused, not for the first one to guess. I would not add function
pointers for a single implementation if they don't have a clear
justification.

> One issue with transmission with PTP enabled is that the tail tag needs to contain 4
> additional bytes.  When the PTP function is off the bytes are not added.  This should
> be monitored all the time.
>
> The extra 4 bytes are only used for 1-step Pdelay_Resp.  It should contain the receive
> timestamp of previous Pdelay_Req with latency adjusted.  The correction field in
> Pdelay_Resp should be zero.  It may be a hardware bug to have wrong UDP checksum
> when the message is sent.

It "may" be a hardware bug? Are you unsure or polite?
As for the phrase "the correction field in Pdelay_Resp should be zero".
Consider the case where there is an E2E TC switch attached to that port.
It will update the correctionField of the Pdelay_Req message. Then the
application stack running on this ksz9477 switch is forced by the
standard to copy the correctionField as-is from the Pdelay_Req into the
Pdelay_Resp message. So that correctionField is never guaranteed to be
zero, even if Christian doesn't fiddle with it within the driver. Are
you saying that for proper UDP checksum calculation, the driver should
be forcing the correctionField to zero and moving that value into the
tail tag?

> I think the right implementation is for the driver to remember this receive timestamp
> of Pdelay_Req and puts it in the tail tag when it sees a 1-step Pdelay_Resp is sent.

I have mixed feelings about this. IIUC, you're saying "let's implement a
fixed-size FIFO of RX timestamps of Pdelay_Req messages, and let's match
them on TX to Pdelay_Resp messages, by {sequenceId, domainNumber}."

But how deep should we make that FIFO? I.e. how many Pdelay_Req messages
should we expect before the user space will inject back a Pdelay_Resp
for transmission?

Again, consider the case of an E2E TC attached to a ksz9477 port. Even
if we run peer delay, it's not guaranteed that we only have one peer.
That E2E TC might connect us to a plethora of other peers. And the more
peers we are connected to, the higher the chance that the size of this
Pdelay_Req RX timestamp FIFO will not be adequately chosen.

> There is one more requirement that is a little difficult to do.  The calculated peer delay
> needs to be programmed in hardware register, but the regular PTP stack has no way to
> send that command.  I think the driver has to do its own calculation by snooping on the
> Pdelay_Req/Pdelay_Resp/Pdelay_Resp_Follow_Up messages.

What register, and what does the switch do with this peer delay information?

> The receive and transmit latencies are different for different connected speed.  So the
> driver needs to change them when the link changes.  For that reason the PTP stack
> should not use its own latency values as generally the application does not care about
> the linked speed.

The thing is, ptp4l already has ingressLatency and egressLatency
settings, and I would not be surprised if those config options would get
extended to cover values at multiple link speeds.

In the general case, the ksz9477 MAC could be attached to any external
PHY, having its own propagation delay characteristics, or any number of
other things that cause clock domain crossings. I'm not sure how feasible
it is for the kernel to abstract this away completely, and adjust
timestamps automatically based on any and all combinations of MAC and
PHY. Maybe this is just wishful thinking.

Oh, and by the way, Christian, I'm not even sure if you aren't in fact
just beating around the bush with these tstamp_rx_latency_ns and
tstamp_tx_latency_ns values? I mean, the switch adds the latency value
to the timestamps. And you, from the driver, read the value of the
register, so you can subtract the value from the timestamp, to
compensate for its correction. So, all in all, there is no net latency
compensation seen by the outside world?! If that is the case, can't you
just set the latency registers to zero, do your compensation from the
application stack and call it a day?
Richard Cochran Nov. 22, 2020, 1:36 a.m. UTC | #7
On Sat, Nov 21, 2020 at 03:26:11AM +0200, Vladimir Oltean wrote:
> On Thu, Nov 19, 2020 at 06:51:15PM +0000, Tristram.Ha@microchip.com wrote:

> > The receive and transmit latencies are different for different connected speed.  So the

> > driver needs to change them when the link changes.  For that reason the PTP stack

> > should not use its own latency values as generally the application does not care about

> > the linked speed.

> 

> The thing is, ptp4l already has ingressLatency and egressLatency

> settings, and I would not be surprised if those config options would get

> extended to cover values at multiple link speeds.

> 

> In the general case, the ksz9477 MAC could be attached to any external

> PHY, having its own propagation delay characteristics, or any number of

> other things that cause clock domain crossings. I'm not sure how feasible

> it is for the kernel to abstract this away completely, and adjust

> timestamps automatically based on any and all combinations of MAC and

> PHY. Maybe this is just wishful thinking.


The idea that the driver will correctly adjust time stamps according
to link speed sounds nice in theory, but in practice it fails.  There
is a at least one other driver that attempted this, but, surprise,
surprise, the hard coded correction values turned out to be wrong.

I think the best way would be to let user space monitor the link speed
and apply the matching correction value.  That way, we avoid bogus,
hard coded values in kernel space.  (This isn't implemented in
linuxptp, but it certainly could be.)

Thanks,
Richard
Christian Eggers Nov. 25, 2020, 9:08 p.m. UTC | #8
I need some help from Microchip, please read below.

On Thursday, 19 November 2020, 19:51:15 CET, Tristram.Ha@microchip.com wrote:
> There is one more requirement that is a little difficult to do.  The calculated peer delay

> needs to be programmed in hardware register, but the regular PTP stack has no way to

> send that command.  I think the driver has to do its own calculation by snooping on the

> Pdelay_Req/Pdelay_Resp/Pdelay_Resp_Follow_Up messages.


In an (offline) discussion with Vladimir we discovered, that the KSZ switch
behaves different as ptp4l expects: 

The KSZ switch forwards PTP (e.g. SYNC) messages in hardware (with updating
the correction field). For this, the peer delays need be configured for each
port.

ptp4l in turn expects to do the forwarding in software (for the P2P_TC clock
configuration). For this, no hardware configuration of the peer delay is
necessary. But due to limitations of currently available hardware, this TC
forwarding is currently only supported for 2 step clocks, as a one-step clock
would probably fully replace the originTimestamp field (similar as a BC, but
not as a TC).

Vladimir suggested to configure an ACL in the KSZ switch to block forwarding
of PTP messages between the user ports and to run ptp4l as BC. My idea is to
simply block forwarding of UDP messages with destination ports 319+320 and
L2 messages with the PTP Ether-Type.

I installed the following ACL (for UDP) in the Port ACL Access registers 0-F:
|_0__1__2__3__4__5__6__7__8__9__A__B__C__D__E__F
| 00 39 01 40 01 3F 42 22 00 00 00 60 00 00 00 01
ACL index: 0

Match: 
- MD=11 (L4)
- ENB=10 (UDP ports)
- S/D=0 (dst)
- EQ=1 (equal)
- MAX_PORT=320
- MIN_PORT=319
- PC=01 (min or max)
- PRO=17 (UDP, don't care?)
- FME=0 (disabled)

Action:
- PM=0 (disabled)
- P=0 (don't care)
- RPE=0 (disabled)
- RP=0 (don't care)
- MM=11 (replace)
- PORT_FWD_MAP: all ports to 0

Processing entry:
- Ruleset=0x0001
- FRN=0

Unfortunately, with this configuration PTP messages are still forwarded from
port 1 to port 2. Although I was successful in blocking other communication
(e.g. by MAC address), the matching rules above seem not to work. Is there an
error in the ACL, or is forwarding of PTP traffic independent of configured
ACLs?

regards
Christian
Christian Eggers Nov. 26, 2020, 4:53 p.m. UTC | #9
Hi Microchip,

as ACL based blocking of PTP traffic seems not to work, I tried to install MAC
based static lookup rules on the switch I successfully managed to block other
non-PTP traffic, but for PTP the lookup table entry (see below) seems not to
work. Incoming SYNC messages on port are still forwarded to port 2.

The table entry is based on the multicast MAC used for PTP. With PTP domains!=0
there could be 128 possible MAC addresses that needs to blocked (but the switch
has only 16 entries in the static table). Is there any way to block the whole
PTP multicast address range (01:00:5E:00:01:81-01:00:5E:00:01:ff)? The data sheet
mentions that the static address table can be used for multicast addresses,
so there should be a way.

Alternatively, is there a hidden "disable TC" setting which disables the
transparent clock entirely?

regards
Christian

        Look-up Tables
        ALU_STAT_CTL 00000001  TABLE_INDEX       0       START_FINISH   idle             TABLE_SELECT Static Address
                               ACTION     read

        Static Address Table
        ALU_VAL_A    80000000  VALID      valid            SRC_FILTER     disabled       DST_FILTER   disabled
                               PRIORITY             0      MSTP                     0
        ALU_VAL_B    80000000  OVERRIDE   enabled          USE_FID        disabled
                               PRT3_FWD   disabled         PRT2_FWD       disabled       PRT1_FWD     disabled
        ALU_VAL_C    00000100  FID                  0      MAC_0_1              01:00
        ALU_VAL_D    5E000181  MAC_2_5    5E:00:01:81


On Wednesday, 25 November 2020, 22:08:39 CET, Christian Eggers wrote:
> I need some help from Microchip, please read below.
> 
> On Thursday, 19 November 2020, 19:51:15 CET, Tristram.Ha@microchip.com wrote:
> > There is one more requirement that is a little difficult to do.  The calculated peer delay
> > needs to be programmed in hardware register, but the regular PTP stack has no way to
> > send that command.  I think the driver has to do its own calculation by snooping on the
> > Pdelay_Req/Pdelay_Resp/Pdelay_Resp_Follow_Up messages.
> 
> In an (offline) discussion with Vladimir we discovered, that the KSZ switch
> behaves different as ptp4l expects: 
> 
> The KSZ switch forwards PTP (e.g. SYNC) messages in hardware (with updating
> the correction field). For this, the peer delays need be configured for each
> port.
> 
> ptp4l in turn expects to do the forwarding in software (for the P2P_TC clock
> configuration). For this, no hardware configuration of the peer delay is
> necessary. But due to limitations of currently available hardware, this TC
> forwarding is currently only supported for 2 step clocks, as a one-step clock
> would probably fully replace the originTimestamp field (similar as a BC, but
> not as a TC).
> 
> Vladimir suggested to configure an ACL in the KSZ switch to block forwarding
> of PTP messages between the user ports and to run ptp4l as BC. My idea is to
> simply block forwarding of UDP messages with destination ports 319+320 and
> L2 messages with the PTP Ether-Type.
> 
> I installed the following ACL (for UDP) in the Port ACL Access registers 0-F:
> |_0__1__2__3__4__5__6__7__8__9__A__B__C__D__E__F
> | 00 39 01 40 01 3F 42 22 00 00 00 60 00 00 00 01
> ACL index: 0
> 
> Match: 
> - MD=11 (L4)
> - ENB=10 (UDP ports)
> - S/D=0 (dst)
> - EQ=1 (equal)
> - MAX_PORT=320
> - MIN_PORT=319
> - PC=01 (min or max)
> - PRO=17 (UDP, don't care?)
> - FME=0 (disabled)
> 
> Action:
> - PM=0 (disabled)
> - P=0 (don't care)
> - RPE=0 (disabled)
> - RP=0 (don't care)
> - MM=11 (replace)
> - PORT_FWD_MAP: all ports to 0
> 
> Processing entry:
> - Ruleset=0x0001
> - FRN=0
> 
> Unfortunately, with this configuration PTP messages are still forwarded from
> port 1 to port 2. Although I was successful in blocking other communication
> (e.g. by MAC address), the matching rules above seem not to work. Is there an
> error in the ACL, or is forwarding of PTP traffic independent of configured
> ACLs?
> 
> regards
> Christian
>
Tristram.Ha@microchip.com Nov. 30, 2020, 9:01 p.m. UTC | #10
> Hi Microchip,

> 

> as ACL based blocking of PTP traffic seems not to work, I tried to install MAC

> based static lookup rules on the switch I successfully managed to block other

> non-PTP traffic, but for PTP the lookup table entry (see below) seems not to

> work. Incoming SYNC messages on port are still forwarded to port 2.

> 

> The table entry is based on the multicast MAC used for PTP. With PTP

> domains!=0

> there could be 128 possible MAC addresses that needs to blocked (but the

> switch

> has only 16 entries in the static table). Is there any way to block the whole

> PTP multicast address range (01:00:5E:00:01:81-01:00:5E:00:01:ff)? The data

> sheet

> mentions that the static address table can be used for multicast addresses,

> so there should be a way.

> 

> Alternatively, is there a hidden "disable TC" setting which disables the

> transparent clock entirely?


The 1588 PTP engine in the KSZ switches was designed to be controlled closely by
a PTP stack, so it is a little difficult to use when there is a layer of kernel support
between the application and the driver.

The default mode to use should be 1-step E2E where the switch acts as an E2E
Transparent Clock.

The 16-bit register 0x514 specifies the basic operation mode of the switch.

Bit 0 is for 1-step clock mode.
Bit 1 is for master mode, which should be off when the clock is acting as a master.
Bit 2 is for P2P mode.
Bit 7 stops the automatic forwarding and every PTP message goes to the host port.
This is the mode to use when the switch acts as a Boundary Clock or 2-step Clock.

When master mode is on Delay_Resp will not be forwarded to the host port.
When master mode is off Delay_Req will not be forwarded to the host port.

When P2P mode is off Pdelay_Req/Pdelay_Resp/Pdelay_Resp_Follow_Up will not
be forwarded to the host port.
When P2P mode is on those messages can be sent and received even though the port
Is closed for normal communication.

Bit 5 recognizes L2 PTP messages and the switch acts accordingly.
Bit 4 is for UDPv4 while bit 3 is for UDPv6.

It is rather pointless to actively filter certain PTP messages through other means.
It is better to leave the kernel PTP receive filter as coarse as possible.

When using P2P in 1-step clock mode the port id in the PTP header is automatically
changed by hardware to be the same as the real port, so it is useless to arbitrarily
use a different port id.  The original intent is to send 1 Pdelay_Req and receive
several Pdelay_Resp in each port.

The calculated peer delay from the port needs to be programmed to the port
register so that the Sync message can be compensated correctly while it travels
through the switches.  This poses a problem as the driver normally does not do the
calculation.

The 2-step clock mode avoids some of the mentioned issues.  However there are some
hardware bugs associated with this operation mode and it is not recommended to use.

For some profiles that require 2-step operation like gPTP there are ways to workaround.

For Sync it is quite simple to send Follow_Up after it even though the Sync contains the
transmit timestamp.  The Follow_Up just repeats that information.

For 2-step Pdelay_Resp it is harder as the hardware puts the turnaround time in
the correction field.  The driver workaround is to report the transmit timestamp
differently such that it is the same as Pdelay_Req receive timestamp so that the net
calculation of the peer delay is just the same as receiving 1-step Pdelay_Resp.

I will try my own implementation to see how these steps can be done.
Richard Cochran Nov. 30, 2020, 10:28 p.m. UTC | #11
On Mon, Nov 30, 2020 at 09:01:25PM +0000, Tristram.Ha@microchip.com wrote:
> The 1588 PTP engine in the KSZ switches was designed to be controlled closely by

> a PTP stack, so it is a little difficult to use when there is a layer of kernel support

> between the application and the driver.


Are you saying that linuxptp is not a PTP stack?

Maybe it would be wiser to design your HW so that it can work under Linux?

Nah, nobody cares about Linux support these days.