diff mbox series

[RFC,net-next,02/12] Documentation: networking: dsa: rewrite chapter about tagging protocol

Message ID 20210221213355.1241450-3-olteanv@gmail.com
State New
Headers show
Series Documentation updates for switchdev and DSA | expand

Commit Message

Vladimir Oltean Feb. 21, 2021, 9:33 p.m. UTC
From: Vladimir Oltean <vladimir.oltean@nxp.com>

The chapter about tagging protocols is out of date because it doesn't
mention all taggers that have been added since last documentation
update. But judging based on that, it will always tend to lag behind,
and there's no good reason why we would enumerate the supported
hardware. Instead we could do something more useful and explain what
there is to know about tagging protocols instead.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 Documentation/networking/dsa/dsa.rst | 126 +++++++++++++++++++++++++--
 1 file changed, 118 insertions(+), 8 deletions(-)

Comments

Florian Fainelli Feb. 22, 2021, 5:12 a.m. UTC | #1
On 2/21/2021 13:33, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>

> 

> The chapter about tagging protocols is out of date because it doesn't

> mention all taggers that have been added since last documentation

> update. But judging based on that, it will always tend to lag behind,

> and there's no good reason why we would enumerate the supported

> hardware. Instead we could do something more useful and explain what

> there is to know about tagging protocols instead.

> 

> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

> ---


[snip]

> +Some tagging protocols, such as those in category 1 (shifting the MAC DA as

> +seen by the DSA master), require the DSA master to operate in promiscuous mode,

> +to receive all frames regardless of the value of the MAC DA. This can be done

> +by setting the ``promisc_on_master`` property of the ``struct dsa_device_ops``.


Nit: may require. DSA_TAG_PROTO_BRCM_PREPEND is an example of category 1 
tagger however the usual (and only?) DSA master (bgmac) does not require 
promiscuous mode. With that:

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

-- 
Florian
Andrew Lunn Feb. 24, 2021, 11:54 p.m. UTC | #2
> +It is desirable that all tagging protocols are testable with the ``dsa_loop``

> +mockup driver, which can be attached to any network interface. The goal is that

> +any network interface should be able of transmitting the same packet in the


should be _capable_ of ??

Reviewed-by: Andrew Lunn <andrew@lunn.ch>


    Andrew
Tobias Waldekranz Feb. 25, 2021, 8:29 p.m. UTC | #3
On Sun, Feb 21, 2021 at 23:33, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>

>

> The chapter about tagging protocols is out of date because it doesn't

> mention all taggers that have been added since last documentation

> update. But judging based on that, it will always tend to lag behind,

> and there's no good reason why we would enumerate the supported

> hardware. Instead we could do something more useful and explain what

> there is to know about tagging protocols instead.

>

> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

> ---

>  Documentation/networking/dsa/dsa.rst | 126 +++++++++++++++++++++++++--

>  1 file changed, 118 insertions(+), 8 deletions(-)

>

> diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst

> index e20fbad2241a..fc98b5774fb6 100644

> --- a/Documentation/networking/dsa/dsa.rst

> +++ b/Documentation/networking/dsa/dsa.rst

> @@ -65,14 +65,8 @@ Note that DSA does not currently create network interfaces for the "cpu" and

>  Switch tagging protocols

>  ------------------------

>  

> -DSA currently supports 5 different tagging protocols, and a tag-less mode as

> -well. The different protocols are implemented in:

> -

> -- ``net/dsa/tag_trailer.c``: Marvell's 4 trailer tag mode (legacy)

> -- ``net/dsa/tag_dsa.c``: Marvell's original DSA tag

> -- ``net/dsa/tag_edsa.c``: Marvell's enhanced DSA tag

> -- ``net/dsa/tag_brcm.c``: Broadcom's 4 bytes tag

> -- ``net/dsa/tag_qca.c``: Qualcomm's 2 bytes tag

> +DSA supports many vendor-specific tagging protocols, one software-defined

> +tagging protocol, and a tag-less mode as well (``DSA_TAG_PROTO_NONE``).

>  

>  The exact format of the tag protocol is vendor specific, but in general, they

>  all contain something which:

> @@ -80,6 +74,122 @@ all contain something which:

>  - identifies which port the Ethernet frame came from/should be sent to

>  - provides a reason why this frame was forwarded to the management interface

>  

> +All tagging protocols are in ``net/dsa/tag_*.c`` files and implement the

> +methods of the ``struct dsa_device_ops`` structure, which are detailed below.

> +

> +Tagging protocols generally fall in one of three categories:

> +

> +- The switch-specific frame header is located before the Ethernet header,

> +  shifting to the right (from the perspective of the DSA master's frame

> +  parser) the MAC DA, MAC SA, EtherType and the entire L2 payload.

> +- The switch-specific frame header is located before the EtherType, keeping the

> +  MAC DA and MAC SA in place from the DSA master's perspective, but shifting

> +  the 'real' EtherType and L2 payload to the right.

> +- The switch-specific frame header is located at the tail of the packet,

> +  keeping all frame headers in place and not altering the view of the packet

> +  that the DSA master's frame parser has.


A nit, but should this be a numbered list since "category 1 and 2" is
referenced later?

> +

> +A tagging protocol may tag all packets with switch tags of the same length, or

> +the tag length might vary (for example packets with PTP timestamps might

> +require an extended switch tag, or there might be one tag length on TX and a

> +different one on RX). Either way, the tagging protocol driver must populate the

> +``struct dsa_device_ops::overhead`` with the length in octets of the longest

> +switch frame header. The DSA framework will automatically adjust the MTU of the

> +master interface to accomodate for this extra size in order for DSA user ports

> +to support the standard MTU (L2 payload length) of 1500 octets. The ``overhead``

> +is also used to request from the network stack, on a best-effort basis, the

> +allocation of packets with a ``needed_headroom`` or ``needed_tailroom``

> +sufficient such that the act of pushing the switch tag on transmission of a

> +packet does not cause it to reallocate due to lack of memory.

> +

> +Even though applications are not expected to parse DSA-specific frame headers,

> +the format on the wire of the tagging protocol represents an Application Binary

> +Interface exposed by the kernel towards user space, for decoders such as

> +``libpcap``. The tagging protocol driver must populate the ``proto`` member of

> +``struct dsa_device_ops`` with a value that uniquely describes the

> +characteristics of the interaction required between the switch hardware and the

> +data path driver: the offset of each bit field within the frame header and any

> +stateful processing required to deal with the frames (as may be required for

> +PTP timestamping).

> +

> +By definition, all switches within the same DSA switch tree use the same

> +tagging protocol. In case of a packet transiting a fabric with more than one


This is not strictly true for mv88e6xxx. The connection between the tree
and the CPU may use Ethertyped DSA tags, while inter-switch links use
regular DSA tags.

However, I think it is better to keep this definition short, as it is
"true enough" :)

> +switch, the switch-specific frame header is inserted by the first switch in the

> +fabric that the packet was received on. This header typically contains

> +information regarding its type (whether it is a control frame that must be

> +trapped to the CPU, or a data frame to be forwarded). Control frames should be

> +decapsulated only by the software data path, whereas data frames might also be

> +autonomously forwarded towards other user ports of other switches from the same

> +fabric, and in this case, the outermost switch ports must decapsulate the packet.

> +

> +It is possible to construct cascaded setups of DSA switches even if their

> +tagging protocols are not compatible with one another. In this case, there are

> +no DSA links in this fabric, and each switch constitutes a disjoint DSA switch

> +tree. The DSA links are viewed as simply a pair of a DSA master (the out-facing

> +port of the upstream DSA switch) and a CPU port (the in-facing port of the

> +downstream DSA switch).

> +

> +The tagging protocol of the attached DSA switch tree can be viewed through the

> +``dsa/tagging`` sysfs attribute of the DSA master::

> +

> +    cat /sys/class/net/eth0/dsa/tagging

> +

> +If the hardware and driver are capable, the tagging protocol of the DSA switch

> +tree can be changed at runtime. This is done by writing the new tagging

> +protocol name to the same sysfs device attribute as above (the DSA master and

> +all attached switch ports must be down while doing this).

> +

> +It is desirable that all tagging protocols are testable with the ``dsa_loop``

> +mockup driver, which can be attached to any network interface. The goal is that

> +any network interface should be able of transmitting the same packet in the

> +same way, and the tagger should decode the same received packet in the same way

> +regardless of the driver used for the switch control path, and the driver used

> +for the DSA master.

> +

> +The transmission of a packet goes through the tagger's ``xmit`` function.

> +The passed ``struct sk_buff *skb`` has ``skb->data`` pointing at

> +``skb_mac_header(skb)``, i.e. at the destination MAC address, and the passed

> +``struct net_device *dev`` represents the virtual DSA user network interface

> +whose hardware counterpart the packet must be steered to (i.e. ``swp0``).

> +The job of this method is to prepare the skb in a way that the switch will

> +understand what egress port the packet is for (and not deliver it towards other

> +ports). Typically this is fulfilled by pushing a frame header. Checking for

> +insufficient size in the skb headroom or tailroom is unnecessary provided that

> +the ``overhead`` and ``tail_tag`` properties were filled out properly, because

> +DSA ensures there is enough space before calling this method.

> +

> +The reception of a packet goes through the tagger's ``rcv`` function. The

> +passed ``struct sk_buff *skb`` has ``skb->data`` pointing at

> +``skb_mac_header(skb) + ETH_ALEN`` octets, i.e. to where the first octet after

> +the EtherType would have been, were this frame not tagged. The role of this

> +method is to consume the frame header, adjust ``skb->data`` to really point at

> +the first octet after the EtherType, and to change ``skb->dev`` to point to the

> +virtual DSA user network interface corresponding to the physical front-facing

> +switch port that the packet was received on.

> +

> +Some tagging protocols, such as those in category 1 (shifting the MAC DA as

> +seen by the DSA master), require the DSA master to operate in promiscuous mode,

> +to receive all frames regardless of the value of the MAC DA. This can be done

> +by setting the ``promisc_on_master`` property of the ``struct dsa_device_ops``.

> +

> +Since tagging protocols in category 1 and 2 break software (and most often also

> +hardware) packet dissection on the DSA master, features such as RPS (Receive

> +Packet Steering) on the DSA master would be broken. The DSA framework deals

> +with this by hooking into the flow dissector and shifting the offset at which

> +the IP header is to be found in the tagged frame as seen by the DSA master.

> +This behavior is automatic based on the ``overhead`` value of the tagging

> +protocol. If not all packets are of equal size, the tagger can implement the

> +``flow_dissect`` method of the ``struct dsa_device_ops`` and override this

> +default behavior by specifying the correct offset incurred by each individual

> +RX packet. Tail taggers do not cause issues to the flow dissector.

> +

> +Hardware manufacturers are strongly discouraged to do this, but some tagging

> +protocols might not provide source port information on RX for all packets, but

> +e.g. only for control traffic (link-local PDUs). In this case, by implementing

> +the ``filter`` method of ``struct dsa_device_ops``, the tagger might select

> +which packets are to be redirected on RX towards the virtual DSA user network

> +interfaces, and which are to be left in the DSA master's RX data path.

> +

>  Master network devices

>  ----------------------

>  

> -- 

> 2.25.1


Great stuff!

Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Vladimir Oltean Feb. 26, 2021, 6:12 p.m. UTC | #4
On Thu, Feb 25, 2021 at 09:29:21PM +0100, Tobias Waldekranz wrote:
> This is not strictly true for mv88e6xxx. The connection between the tree

> and the CPU may use Ethertyped DSA tags, while inter-switch links use

> regular DSA tags.

> 

> However, I think it is better to keep this definition short, as it is

> "true enough" :)


What is the use case for this? Build a DSA tree out of old switches
which support only DSA, plus new switches which support both DSA and
EDSA, and have the host CPU see only EDSA, with the cascaded switches
playing the role of DSA->EDSA adapters for the leaf switches?
Is there any point in doing this? If it ever becomes necessary to
support this, can't we just say that you should configure your entire
DSA tree to use either DSA or EDSA, whichever happens to be supported
across all devices? We already have support for changing the tag
protocol, mv88e6xxx should implement it, then we could add some logic
somewhere to scan for the DSA tree at probe time and figure out a common
denominator.
Tobias Waldekranz Feb. 26, 2021, 11:19 p.m. UTC | #5
On Fri, Feb 26, 2021 at 20:12, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Thu, Feb 25, 2021 at 09:29:21PM +0100, Tobias Waldekranz wrote:

>> This is not strictly true for mv88e6xxx. The connection between the tree

>> and the CPU may use Ethertyped DSA tags, while inter-switch links use

>> regular DSA tags.

>> 

>> However, I think it is better to keep this definition short, as it is

>> "true enough" :)

>

> What is the use case for this? Build a DSA tree out of old switches

> which support only DSA, plus new switches which support both DSA and

> EDSA, and have the host CPU see only EDSA, with the cascaded switches

> playing the role of DSA->EDSA adapters for the leaf switches?

> Is there any point in doing this? If it ever becomes necessary to

> support this, can't we just say that you should configure your entire

> DSA tree to use either DSA or EDSA, whichever happens to be supported

> across all devices? We already have support for changing the tag

> protocol, mv88e6xxx should implement it, then we could add some logic

> somewhere to scan for the DSA tree at probe time and figure out a common

> denominator.


This is already supported today. Cascade ports are _always_ set to
DSA. There are 2 reasons for that that I can think of:

1. It is the lowest common denominator, supported by all devices, so it
   makes for an easy algorithm.

2. It adds the minimum amount of overhead (4 bytes less than EDSA). If
   you are saturating your cascade link with 64B packets, that has quite
   an impact on your maximum pps.

As for why you would choose EDSA over DSA for connecting to the CPU: I
would say that on Linux with the DSA driver there is no reason, we could
probably drop the support altogether.

Before /sys/class/net/*/dsa/tagging, tcpdump could produce better
output, but that is no longer an issue.

The other advantage with EDSA is that you can use it for control traffic
(TO_CPU), while receiving data traffic (FORWARD) either untagged
Q-tagged. So you could use more of your NIC's offloads for example. But
this does not really work with the switchdev model as there is no
separation of control/data.

Though, now that I think about it, maybe we _can_ to that with the
filter method I just learned about from reading your excellent
documentation :)

Whether we want to is another question, but my guess is that things like
L3 forwarding performance could improve quite a bit, since there is less
memmoving around of L2 headers.
diff mbox series

Patch

diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
index e20fbad2241a..fc98b5774fb6 100644
--- a/Documentation/networking/dsa/dsa.rst
+++ b/Documentation/networking/dsa/dsa.rst
@@ -65,14 +65,8 @@  Note that DSA does not currently create network interfaces for the "cpu" and
 Switch tagging protocols
 ------------------------
 
-DSA currently supports 5 different tagging protocols, and a tag-less mode as
-well. The different protocols are implemented in:
-
-- ``net/dsa/tag_trailer.c``: Marvell's 4 trailer tag mode (legacy)
-- ``net/dsa/tag_dsa.c``: Marvell's original DSA tag
-- ``net/dsa/tag_edsa.c``: Marvell's enhanced DSA tag
-- ``net/dsa/tag_brcm.c``: Broadcom's 4 bytes tag
-- ``net/dsa/tag_qca.c``: Qualcomm's 2 bytes tag
+DSA supports many vendor-specific tagging protocols, one software-defined
+tagging protocol, and a tag-less mode as well (``DSA_TAG_PROTO_NONE``).
 
 The exact format of the tag protocol is vendor specific, but in general, they
 all contain something which:
@@ -80,6 +74,122 @@  all contain something which:
 - identifies which port the Ethernet frame came from/should be sent to
 - provides a reason why this frame was forwarded to the management interface
 
+All tagging protocols are in ``net/dsa/tag_*.c`` files and implement the
+methods of the ``struct dsa_device_ops`` structure, which are detailed below.
+
+Tagging protocols generally fall in one of three categories:
+
+- The switch-specific frame header is located before the Ethernet header,
+  shifting to the right (from the perspective of the DSA master's frame
+  parser) the MAC DA, MAC SA, EtherType and the entire L2 payload.
+- The switch-specific frame header is located before the EtherType, keeping the
+  MAC DA and MAC SA in place from the DSA master's perspective, but shifting
+  the 'real' EtherType and L2 payload to the right.
+- The switch-specific frame header is located at the tail of the packet,
+  keeping all frame headers in place and not altering the view of the packet
+  that the DSA master's frame parser has.
+
+A tagging protocol may tag all packets with switch tags of the same length, or
+the tag length might vary (for example packets with PTP timestamps might
+require an extended switch tag, or there might be one tag length on TX and a
+different one on RX). Either way, the tagging protocol driver must populate the
+``struct dsa_device_ops::overhead`` with the length in octets of the longest
+switch frame header. The DSA framework will automatically adjust the MTU of the
+master interface to accomodate for this extra size in order for DSA user ports
+to support the standard MTU (L2 payload length) of 1500 octets. The ``overhead``
+is also used to request from the network stack, on a best-effort basis, the
+allocation of packets with a ``needed_headroom`` or ``needed_tailroom``
+sufficient such that the act of pushing the switch tag on transmission of a
+packet does not cause it to reallocate due to lack of memory.
+
+Even though applications are not expected to parse DSA-specific frame headers,
+the format on the wire of the tagging protocol represents an Application Binary
+Interface exposed by the kernel towards user space, for decoders such as
+``libpcap``. The tagging protocol driver must populate the ``proto`` member of
+``struct dsa_device_ops`` with a value that uniquely describes the
+characteristics of the interaction required between the switch hardware and the
+data path driver: the offset of each bit field within the frame header and any
+stateful processing required to deal with the frames (as may be required for
+PTP timestamping).
+
+By definition, all switches within the same DSA switch tree use the same
+tagging protocol. In case of a packet transiting a fabric with more than one
+switch, the switch-specific frame header is inserted by the first switch in the
+fabric that the packet was received on. This header typically contains
+information regarding its type (whether it is a control frame that must be
+trapped to the CPU, or a data frame to be forwarded). Control frames should be
+decapsulated only by the software data path, whereas data frames might also be
+autonomously forwarded towards other user ports of other switches from the same
+fabric, and in this case, the outermost switch ports must decapsulate the packet.
+
+It is possible to construct cascaded setups of DSA switches even if their
+tagging protocols are not compatible with one another. In this case, there are
+no DSA links in this fabric, and each switch constitutes a disjoint DSA switch
+tree. The DSA links are viewed as simply a pair of a DSA master (the out-facing
+port of the upstream DSA switch) and a CPU port (the in-facing port of the
+downstream DSA switch).
+
+The tagging protocol of the attached DSA switch tree can be viewed through the
+``dsa/tagging`` sysfs attribute of the DSA master::
+
+    cat /sys/class/net/eth0/dsa/tagging
+
+If the hardware and driver are capable, the tagging protocol of the DSA switch
+tree can be changed at runtime. This is done by writing the new tagging
+protocol name to the same sysfs device attribute as above (the DSA master and
+all attached switch ports must be down while doing this).
+
+It is desirable that all tagging protocols are testable with the ``dsa_loop``
+mockup driver, which can be attached to any network interface. The goal is that
+any network interface should be able of transmitting the same packet in the
+same way, and the tagger should decode the same received packet in the same way
+regardless of the driver used for the switch control path, and the driver used
+for the DSA master.
+
+The transmission of a packet goes through the tagger's ``xmit`` function.
+The passed ``struct sk_buff *skb`` has ``skb->data`` pointing at
+``skb_mac_header(skb)``, i.e. at the destination MAC address, and the passed
+``struct net_device *dev`` represents the virtual DSA user network interface
+whose hardware counterpart the packet must be steered to (i.e. ``swp0``).
+The job of this method is to prepare the skb in a way that the switch will
+understand what egress port the packet is for (and not deliver it towards other
+ports). Typically this is fulfilled by pushing a frame header. Checking for
+insufficient size in the skb headroom or tailroom is unnecessary provided that
+the ``overhead`` and ``tail_tag`` properties were filled out properly, because
+DSA ensures there is enough space before calling this method.
+
+The reception of a packet goes through the tagger's ``rcv`` function. The
+passed ``struct sk_buff *skb`` has ``skb->data`` pointing at
+``skb_mac_header(skb) + ETH_ALEN`` octets, i.e. to where the first octet after
+the EtherType would have been, were this frame not tagged. The role of this
+method is to consume the frame header, adjust ``skb->data`` to really point at
+the first octet after the EtherType, and to change ``skb->dev`` to point to the
+virtual DSA user network interface corresponding to the physical front-facing
+switch port that the packet was received on.
+
+Some tagging protocols, such as those in category 1 (shifting the MAC DA as
+seen by the DSA master), require the DSA master to operate in promiscuous mode,
+to receive all frames regardless of the value of the MAC DA. This can be done
+by setting the ``promisc_on_master`` property of the ``struct dsa_device_ops``.
+
+Since tagging protocols in category 1 and 2 break software (and most often also
+hardware) packet dissection on the DSA master, features such as RPS (Receive
+Packet Steering) on the DSA master would be broken. The DSA framework deals
+with this by hooking into the flow dissector and shifting the offset at which
+the IP header is to be found in the tagged frame as seen by the DSA master.
+This behavior is automatic based on the ``overhead`` value of the tagging
+protocol. If not all packets are of equal size, the tagger can implement the
+``flow_dissect`` method of the ``struct dsa_device_ops`` and override this
+default behavior by specifying the correct offset incurred by each individual
+RX packet. Tail taggers do not cause issues to the flow dissector.
+
+Hardware manufacturers are strongly discouraged to do this, but some tagging
+protocols might not provide source port information on RX for all packets, but
+e.g. only for control traffic (link-local PDUs). In this case, by implementing
+the ``filter`` method of ``struct dsa_device_ops``, the tagger might select
+which packets are to be redirected on RX towards the virtual DSA user network
+interfaces, and which are to be left in the DSA master's RX data path.
+
 Master network devices
 ----------------------