mbox series

[v8,net-next,00/11] tag_8021q for Ocelot switches

Message ID 20210129010009.3959398-1-olteanv@gmail.com
Headers show
Series tag_8021q for Ocelot switches | expand

Message

Vladimir Oltean Jan. 29, 2021, 12:59 a.m. UTC
From: Vladimir Oltean <vladimir.oltean@nxp.com>

Changes in v8:
- Make tagging driver module reference counting work per DSA switch tree
  instead of per CPU port, to be compatible with the tag protocol changing
  through sysfs.
- Refactor ocelot_apply_bridge_fwd_mask and call it immediately after
  the is_dsa_8021q_cpu variable changes, i.e. in
  felix_8021q_cpu_port_init and felix_8021q_cpu_port_deinit.
- Take reference on tagging driver module in dsa_find_tagger_by_name.
- Replaced DSA_NOTIFIER_TAG_PROTO_SET and DSA_NOTIFIER_TAG_PROTO_DEL
  with a single DSA_NOTIFIER_TAG_PROTO.
- Combined .set_tag_protocol and .del_tag_protocol into a single
  .change_tag_protocol, and we're no longer calling those 2 functions at
  probe and unbind time.
- Adapted Felix to .change_tag_protocol. Kept felix_set_tag_protocol and
  felix_del_tag_protocol, but now calling them privately from
  felix_setup and felix_teardown.
- Used -EPROTONOSUPPORT instead of -EOPNOTSUPP as return code.
- Dropped some review tags due to amount of changes.

Changes in v7:
- Keep a copy of the tagging protocol in the DSA switch tree (patch 7/11)
- Call {set,del}_tag_protocol for DSA links with the tag_ops of the DSA
  tree and not of their own dp, since the latter is an invalid pointer
  never set up by anybody.
- Wrap the calls done at probe and remove time into some helper
  functions called dsa_switch_inform_initial_tag_proto and
  dsa_switch_inform_tag_proto_gone. Call dsa_switch_inform_tag_proto_gone
  more vigorously during the probe error path.

- Hold the rtnl_mutex in dsa_tree_change_tag_proto and change the calling
  convention such that drivers now expect rtnl_mutex to be held.
- Drop the rtnl_lock surrounding dsa_8021q_setup in the felix driver,
  since some callers of .{set,del}_tag_protocol now hold the rtnl_mutex
  and we'd run into a deadlock if we took it. That's also why all
  callers needed to be converted to hold the lock, since otherwise
  dsa_8021q_setup would have no guarantees short of passing it a bool
  rtnl_is_held variable.

Changes in v6:
- Removed redundant tree_index from dsa_notifier_tag_proto_info.
- Call .{set,del}_tag_protocol for the DSA links too.
- Check for ops::set_tag_protocol only once instead of in a loop.
- Check for ops::set_tag_protocol in dsa_switch_tag_proto_set too.

Changes in v5:
- Split patch series in half, removing PTP bits.
- Split previous monolithic patch "net: dsa: felix: add new VLAN-based
  tagger" into 3 smaller patches.
- Updated the sysfs documentation
- Made the tagger_lock per DSA switch tree instead of per DSA switch
- Using dsa_tree_notify instead of dsa_broadcast.

Changes in v4:
- Support simultaneous compilation of tag_ocelot.c and
  tag_ocelot_8021q.c.
- Support runtime switchover between the two taggers, by using
  echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
- We are now actually performing cleanup instead of just probe-time
  setup, which is required for supporting tagger switchover.
- Now draining the CPU queues by continuously reading QS_XTR_READ, same
  as Ocelot, instead of one-time asserting QS_XTR_FLUSH, which actually
  needed a sleep to be effective.

Changes in v3:
Use a per-port bool is_dsa_8021q_cpu instead of a single dsa_8021q_cpu
variable, to be compatible with future work where there may be
potentially multiple tag_8021q CPU ports in a LAG.

Changes in v2:
Posted the entire rework necessary for PTP support using tag_8021q.c.
Added a larger audience to the series.



The Felix switch inside LS1028A has an issue. It has a 2.5G CPU port,
and the external ports, in the majority of use cases, run at 1G. This
means that, when the CPU injects traffic into the switch, it is very
easy to run into congestion. This is not to say that it is impossible to
enter congestion even with all ports running at the same speed, just
that the default configuration is already very prone to that by design.

Normally, the way to deal with that is using Ethernet flow control
(PAUSE frames).

However, this functionality is not working today with the ENETC - Felix
switch pair. The hardware issue is undergoing documentation right now as
an erratum within NXP, but several customers have been requesting a
reasonable workaround for it.

In truth, the LS1028A has 2 internal port pairs. The lack of flow control
is an issue only when NPI mode (Node Processor Interface, aka the mode
where the "CPU port module", which carries DSA-style tagged packets, is
connected to a regular Ethernet port) is used, and NPI mode is supported
by Felix on a single port.

In past BSPs, we have had setups where both internal port pairs were
enabled. We were advertising the following setup:

"data port"     "control port"
  (2.5G)            (1G)

   eno2             eno3
    ^                ^
    |                |
    | regular        | DSA-tagged
    | frames         | frames
    |                |
    v                v
   swp4             swp5

This works but is highly unpractical, due to NXP shifting the task of
designing a functional system (choosing which port to use, depending on
type of traffic required) up to the end user. The swpN interfaces would
have to be bridged with swp4, in order for the eno2 "data port" to have
access to the outside network. And the swpN interfaces would still be
capable of IP networking. So running a DHCP client would give us two IP
interfaces from the same subnet, one assigned to eno2, and the other to
swpN (0, 1, 2, 3).

Also, the dual port design doesn't scale. When attaching another DSA
switch to a Felix port, the end result is that the "data port" cannot
carry any meaningful data to the external world, since it lacks the DSA
tags required to traverse the sja1105 switches below. All that traffic
needs to go through the "control port".

So in newer BSPs there was a desire to simplify that setup, and only
have one internal port pair:

   eno2            eno3
    ^
    |
    | DSA-tagged    x disabled
    | frames
    |
    v
   swp4            swp5

However, this setup only exacerbates the issue of not having flow
control on the NPI port, since that is the only port now. Also, there
are use cases that still require the "data port", such as IEEE 802.1CB
(TSN stream identification doesn't work over an NPI port), source
MAC address learning over NPI, etc.

Again, there is a desire to keep the simplicity of the single internal
port setup, while regaining the benefits of having a dedicated data port
as well. And this series attempts to deliver just that.

So the NPI functionality is disabled conditionally. Its purpose was:
- To ensure individually addressable ports on TX. This can be replaced
  by using some designated VLAN tags which are pushed by the DSA tagger
  code, then removed by the switch (so they are invisible to the outside
  world and to the user).
- To ensure source port identification on RX. Again, this can be
  replaced by using some designated VLAN tags to encapsulate all RX
  traffic (each VLAN uniquely identifies a source port). The DSA tagger
  determines which port it was based on the VLAN number, then removes
  that header.
- To deliver PTP timestamps. This cannot be obtained through VLAN
  headers, so we need to take a step back and see how else we can do
  that. The Microchip Ocelot-1 (VSC7514 MIPS) driver performs manual
  injection/extraction from the CPU port module using register-based
  MMIO, and not over Ethernet. We will need to do the same from DSA,
  which makes this tagger a sort of hybrid between DSA and pure
  switchdev.

Vladimir Oltean (11):
  net: dsa: tag_8021q: add helpers to deduce whether a VLAN ID is RX or
    TX VLAN
  net: mscc: ocelot: export VCAP structures to include/soc/mscc
  net: mscc: ocelot: store a namespaced VCAP filter ID
  net: mscc: ocelot: reapply bridge forwarding mask on bonding
    join/leave
  net: mscc: ocelot: don't use NPI tag prefix for the CPU port module
  net: dsa: document the existing switch tree notifiers and add a new
    one
  net: dsa: keep a copy of the tagging protocol in the DSA switch tree
  net: dsa: allow changing the tag protocol via the "tagging" device
    attribute
  net: dsa: felix: convert to the new .change_tag_protocol DSA API
  net: dsa: add a second tagger for Ocelot switches based on tag_8021q
  net: dsa: felix: perform switch setup for tag_8021q

 Documentation/ABI/testing/sysfs-class-net-dsa |  11 +-
 MAINTAINERS                                   |   1 +
 drivers/net/dsa/ocelot/Kconfig                |   2 +
 drivers/net/dsa/ocelot/felix.c                | 525 ++++++++++++++++--
 drivers/net/dsa/ocelot/felix.h                |   2 +
 drivers/net/dsa/ocelot/felix_vsc9959.c        |   1 +
 drivers/net/dsa/ocelot/seville_vsc9953.c      |   1 +
 drivers/net/ethernet/mscc/ocelot.c            | 120 ++--
 drivers/net/ethernet/mscc/ocelot_flower.c     |   7 +-
 drivers/net/ethernet/mscc/ocelot_net.c        |   1 +
 drivers/net/ethernet/mscc/ocelot_vcap.c       |  19 +-
 drivers/net/ethernet/mscc/ocelot_vcap.h       | 295 +---------
 drivers/net/ethernet/mscc/ocelot_vsc7514.c    |   2 -
 include/linux/dsa/8021q.h                     |  14 +
 include/net/dsa.h                             |  18 +-
 include/soc/mscc/ocelot.h                     |   6 +-
 include/soc/mscc/ocelot_vcap.h                | 297 ++++++++++
 net/dsa/Kconfig                               |  21 +-
 net/dsa/Makefile                              |   1 +
 net/dsa/dsa.c                                 |  26 +
 net/dsa/dsa2.c                                | 128 ++++-
 net/dsa/dsa_priv.h                            |  17 +
 net/dsa/master.c                              |  39 +-
 net/dsa/port.c                                |  44 +-
 net/dsa/slave.c                               |  35 +-
 net/dsa/switch.c                              |  55 ++
 net/dsa/tag_8021q.c                           |  15 +-
 net/dsa/tag_ocelot_8021q.c                    |  68 +++
 28 files changed, 1341 insertions(+), 430 deletions(-)
 create mode 100644 net/dsa/tag_ocelot_8021q.c

Comments

Vladimir Oltean Jan. 29, 2021, 11:43 p.m. UTC | #1
On Fri, Jan 29, 2021 at 03:00:06AM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Currently DSA exposes the following sysfs:
> $ cat /sys/class/net/eno2/dsa/tagging
> ocelot
> 
> which is a read-only device attribute, introduced in the kernel as
> commit 98cdb4807123 ("net: dsa: Expose tagging protocol to user-space"),
> and used by libpcap since its commit 993db3800d7d ("Add support for DSA
> link-layer types").
> 
> It would be nice if we could extend this device attribute by making it
> writable:
> $ echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
> 
> This is useful with DSA switches that can make use of more than one
> tagging protocol. It may be useful in dsa_loop in the future too, to
> perform offline testing of various taggers, or for changing between dsa
> and edsa on Marvell switches, if that is desirable.
> 
> In terms of implementation, drivers can support this feature by
> implementing .change_tag_protocol, which should always leave the switch
> in a consistent state: either with the new protocol if things went well,
> or with the old one if something failed. Teardown of the old protocol,
> if necessary, must be handled by the driver.
> 
> Some things remain as before:
> - The .get_tag_protocol is currently only called at probe time, to load
>   the initial tagging protocol driver. Nonetheless, new drivers should
>   report the tagging protocol in current use now.
> - The driver should manage by itself the initial setup of tagging
>   protocol, no later than the .setup() method, as well as destroying
>   resources used by the last tagger in use, no earlier than the
>   .teardown() method.
> 
> For multi-switch DSA trees, error handling is a bit more complicated,
> since e.g. the 5th out of 7 switches may fail to change the tag
> protocol. When that happens, a revert to the original tag protocol is
> attempted, but that may fail too, leaving the tree in an inconsistent
> state despite each individual switch implementing .change_tag_protocol
> transactionally. Since the intersection between drivers that implement
> .change_tag_protocol and drivers that support D in DSA is currently the
> empty set, the possibility for this error to happen is ignored for now.
> 
> Testing:
> 
> $ insmod mscc_felix.ko
> [   79.549784] mscc_felix 0000:00:00.5: Adding to iommu group 14
> [   79.565712] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517
> $ insmod tag_ocelot.ko
> $ rmmod mscc_felix.ko
> $ insmod mscc_felix.ko
> [   97.261724] libphy: VSC9959 internal MDIO bus: probed
> [   97.267363] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0
> [   97.274998] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1
> [   97.282561] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2
> [   97.289700] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3
> [   97.599163] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
> [   97.862034] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
> [   97.950731] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
> [   97.964278] 8021q: adding VLAN 0 to HW filter on device swp0
> [   98.146161] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
> [   98.238649] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
> [   98.251845] 8021q: adding VLAN 0 to HW filter on device swp1
> [   98.433916] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
> [   98.485542] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
> [   98.503584] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control rx/tx
> [   98.527948] device eno2 entered promiscuous mode
> [   98.544755] DSA: tree 0 setup
> 
> $ ping 10.0.0.1
> PING 10.0.0.1 (10.0.0.1): 56 data bytes
> 64 bytes from 10.0.0.1: seq=0 ttl=64 time=2.337 ms
> 64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.754 ms
> ^C
> --- 10.0.0.1 ping statistics ---

Jakub, I was stupid and I pasted the ping command output into the commit
message, so git will trim anything past the dotted line as not part of
the commit message, which makes your netdev/verify_signedoff test fail.
If by some sort of miracle I don't need to resend a v9, do you think you
could just delete this and the next 2 lines?

> 2 packets transmitted, 2 packets received, 0% packet loss
> round-trip min/avg/max = 0.754/1.545/2.337 ms
> 
> $ cat /sys/class/net/eno2/dsa/tagging
> ocelot
> $ cat ./test_ocelot_8021q.sh
>         #!/bin/bash
> 
>         ip link set swp0 down
>         ip link set swp1 down
>         ip link set swp2 down
>         ip link set swp3 down
>         ip link set swp5 down
>         ip link set eno2 down
>         echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
>         ip link set eno2 up
>         ip link set swp0 up
>         ip link set swp1 up
>         ip link set swp2 up
>         ip link set swp3 up
>         ip link set swp5 up
> $ ./test_ocelot_8021q.sh
> ./test_ocelot_8021q.sh: line 9: echo: write error: Protocol not available
> $ rmmod tag_ocelot.ko
> rmmod: can't unload module 'tag_ocelot': Resource temporarily unavailable
> $ insmod tag_ocelot_8021q.ko
> $ ./test_ocelot_8021q.sh
> $ cat /sys/class/net/eno2/dsa/tagging
> ocelot-8021q
> $ rmmod tag_ocelot.ko
> $ rmmod tag_ocelot_8021q.ko
> rmmod: can't unload module 'tag_ocelot_8021q': Resource temporarily unavailable
> $ ping 10.0.0.1
> PING 10.0.0.1 (10.0.0.1): 56 data bytes
> 64 bytes from 10.0.0.1: seq=0 ttl=64 time=0.953 ms
> 64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.787 ms
> 64 bytes from 10.0.0.1: seq=2 ttl=64 time=0.771 ms
> $ rmmod mscc_felix.ko
> [  645.544426] mscc_felix 0000:00:00.5: Link is Down
> [  645.838608] DSA: tree 0 torn down
> $ rmmod tag_ocelot_8021q.ko
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> Changes in v8:
> - Take reference on tagging driver module in dsa_find_tagger_by_name.
> - Replaced DSA_NOTIFIER_TAG_PROTO_SET and DSA_NOTIFIER_TAG_PROTO_DEL
>   with a single DSA_NOTIFIER_TAG_PROTO.
> - Combined .set_tag_protocol and .del_tag_protocol into a single
>   .change_tag_protocol, and we're no longer calling those 2 functions at
>   probe and unbind time.
> - Dropped review tags due to amount of changes.
Jakub Kicinski Jan. 30, 2021, 12:25 a.m. UTC | #2
On Sat, 30 Jan 2021 01:43:04 +0200 Vladimir Oltean wrote:
> Jakub, I was stupid and I pasted the ping command output into the commit

> message, so git will trim anything past the dotted line as not part of

> the commit message, which makes your netdev/verify_signedoff test fail.

> If by some sort of miracle I don't need to resend a v9, do you think you

> could just delete this and the next 2 lines?


Yeah, I noticed, I'll fix it up.
patchwork-bot+netdevbpf@kernel.org Jan. 30, 2021, 5:40 a.m. UTC | #3
Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Fri, 29 Jan 2021 02:59:58 +0200 you wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>

> 

> Changes in v8:

> - Make tagging driver module reference counting work per DSA switch tree

>   instead of per CPU port, to be compatible with the tag protocol changing

>   through sysfs.

> - Refactor ocelot_apply_bridge_fwd_mask and call it immediately after

>   the is_dsa_8021q_cpu variable changes, i.e. in

>   felix_8021q_cpu_port_init and felix_8021q_cpu_port_deinit.

> - Take reference on tagging driver module in dsa_find_tagger_by_name.

> - Replaced DSA_NOTIFIER_TAG_PROTO_SET and DSA_NOTIFIER_TAG_PROTO_DEL

>   with a single DSA_NOTIFIER_TAG_PROTO.

> - Combined .set_tag_protocol and .del_tag_protocol into a single

>   .change_tag_protocol, and we're no longer calling those 2 functions at

>   probe and unbind time.

> - Adapted Felix to .change_tag_protocol. Kept felix_set_tag_protocol and

>   felix_del_tag_protocol, but now calling them privately from

>   felix_setup and felix_teardown.

> - Used -EPROTONOSUPPORT instead of -EOPNOTSUPP as return code.

> - Dropped some review tags due to amount of changes.

> 

> [...]


Here is the summary with links:
  - [v8,net-next,01/11] net: dsa: tag_8021q: add helpers to deduce whether a VLAN ID is RX or TX VLAN
    https://git.kernel.org/netdev/net-next/c/9c7caf280684
  - [v8,net-next,02/11] net: mscc: ocelot: export VCAP structures to include/soc/mscc
    https://git.kernel.org/netdev/net-next/c/0e9bb4e9d93f
  - [v8,net-next,03/11] net: mscc: ocelot: store a namespaced VCAP filter ID
    https://git.kernel.org/netdev/net-next/c/50c6cc5b9283
  - [v8,net-next,04/11] net: mscc: ocelot: reapply bridge forwarding mask on bonding join/leave
    https://git.kernel.org/netdev/net-next/c/9b521250bff4
  - [v8,net-next,05/11] net: mscc: ocelot: don't use NPI tag prefix for the CPU port module
    https://git.kernel.org/netdev/net-next/c/cacea62fcdda
  - [v8,net-next,06/11] net: dsa: document the existing switch tree notifiers and add a new one
    https://git.kernel.org/netdev/net-next/c/886f8e26f539
  - [v8,net-next,07/11] net: dsa: keep a copy of the tagging protocol in the DSA switch tree
    https://git.kernel.org/netdev/net-next/c/357f203bb3b5
  - [v8,net-next,08/11] net: dsa: allow changing the tag protocol via the "tagging" device attribute
    https://git.kernel.org/netdev/net-next/c/53da0ebaad10
  - [v8,net-next,09/11] net: dsa: felix: convert to the new .change_tag_protocol DSA API
    https://git.kernel.org/netdev/net-next/c/adb3dccf090b
  - [v8,net-next,10/11] net: dsa: add a second tagger for Ocelot switches based on tag_8021q
    https://git.kernel.org/netdev/net-next/c/7c83a7c539ab
  - [v8,net-next,11/11] net: dsa: felix: perform switch setup for tag_8021q
    https://git.kernel.org/netdev/net-next/c/e21268efbe26

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html