mbox series

[RFC,0/4] net: l2switch: Provide support for L2 switch on i.MX28 SoC

Message ID 20201125232459.378-1-lukma@denx.de
Headers show
Series net: l2switch: Provide support for L2 switch on i.MX28 SoC | expand

Message

Lukasz Majewski Nov. 25, 2020, 11:24 p.m. UTC
This is the first attempt to add support for L2 switch available on some NXP
devices - i.e. iMX287 or VF610. This patch set uses common FEC and DSA code.

This code provides _very_ basic switch functionality (packets are passed
between lan1 and lan2 ports and it is possible to send packets via eth0),
at its main purpose is to establish the way of reusing the FEC driver. When
this is done, one can add more advanced features to the switch (like vlan or
port separation).

I also do have a request for testing on e.g. VF610 if this driver works on
it too.
The L2 switch documentation is very scant on NXP's User Manual [0] and most
understanding of how it really works comes from old (2.6.35) NXP driver [1].
The aforementioned old driver [1] was monolitic and now this patch set tries
to mix FEC and DSA.

Open issues:
- I do have a hard time on understanding how to "disable" ENET-MAC{01} ports
in DSA (via port_disable callback in dsa_switch_ops).
When I disable L2 switch port1,2 or the ENET-MAC{01} in control register, I
cannot simply re-enable it with enabling this bit again. The old driver reset
(and setup again) the whole switch.

- The L2 switch is part of the SoC silicon, so we cannot follow the "normal" DSA
pattern with "attaching" it via mdio device. The switch reuses already well
defined ENET-MAC{01}. For that reason the MoreThanIP switch driver is
registered as platform device

- The question regarding power management - at least for my use case there
is no need for runtime power management. The L2 switch shall work always at
it connects other devices. 

- The FEC clock is also used for L2 switch management and configuration (as
the L2 switch is just in the same, large IP block). For now I just keep it
enabled so DSA code can use it. It looks a bit problematic to export 
fec_enet_clk_enable() to be reused on DSA code.

Links:
[0] - "i.MX28 Applications Processor Reference Manual, Rev. 2, 08/2013"
[1] - https://github.com/lmajewski/linux-imx28-l2switch/commit/e3c7a6eab73401e021aef0070e1935a0dba84fb5

Dependencies:
This patch set depends on one, which adds DTS for XEA board. However, it shall
be also possible to work on any board by adding L2 switch specific description.

https://marc.info/?l=devicetree&m=160632122703785&w=2
https://marc.info/?l=devicetree&m=160632122303783&w=2
https://marc.info/?l=devicetree&m=160632123203787&w=2

Those patches has been tested (applied) on 4.9.130-cip and v5.9 (vanila
mainline kernel)


Lukasz Majewski (4):
  net: fec: Move some defines to ./drivers/net/ethernet/freescale/fec.h
    header
  net: dsa: Provide DSA driver for NXP's More Than IP L2 switch
  net: imx: l2switch: Adjust fec_main.c to provide support for L2 switch
  ARM: dts: imx28: Add description for L2 switch on XEA board

 arch/arm/boot/dts/imx28-xea.dts           |  55 +++
 drivers/net/dsa/Kconfig                   |  11 +
 drivers/net/dsa/Makefile                  |   1 +
 drivers/net/dsa/mtip-l2switch.c           | 399 ++++++++++++++++++++++
 drivers/net/dsa/mtip-l2switch.h           | 239 +++++++++++++
 drivers/net/ethernet/freescale/fec.h      |  42 +++
 drivers/net/ethernet/freescale/fec_main.c | 148 ++++++--
 7 files changed, 874 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/dsa/mtip-l2switch.c
 create mode 100644 drivers/net/dsa/mtip-l2switch.h

Comments

Andrew Lunn Nov. 26, 2020, midnight UTC | #1
On Thu, Nov 26, 2020 at 12:24:55AM +0100, Lukasz Majewski wrote:
> This is the first attempt to add support for L2 switch available on some NXP
> devices - i.e. iMX287 or VF610. This patch set uses common FEC and DSA code.

Interesting. I need to take another look at the Vybrid manual. Last
time i looked, i was more thinking of a pure switchdev driver, not a
DSA driver. So i'm not sure this is the correct architecture. But has
been a while since i looked at the datasheet.

    Andrew
Florian Fainelli Nov. 26, 2020, 1:30 a.m. UTC | #2
On 11/25/2020 4:00 PM, Andrew Lunn wrote:
> On Thu, Nov 26, 2020 at 12:24:55AM +0100, Lukasz Majewski wrote:

>> This is the first attempt to add support for L2 switch available on some NXP

>> devices - i.e. iMX287 or VF610. This patch set uses common FEC and DSA code.

> 

> Interesting. I need to take another look at the Vybrid manual. Last

> time i looked, i was more thinking of a pure switchdev driver, not a

> DSA driver. So i'm not sure this is the correct architecture. But has

> been a while since i looked at the datasheet.


Agreed the block diagram shows one DMA for each "switch port" which
definitively fits more within the switchdev model than the DSA model
that re-purposes an existing Ethernet MAC controller as-is and bolts on
an integrated or external switch IC.
-- 
Florian
Andrew Lunn Nov. 26, 2020, 3:10 a.m. UTC | #3
On Wed, Nov 25, 2020 at 05:30:04PM -0800, Florian Fainelli wrote:
> 

> 

> On 11/25/2020 4:00 PM, Andrew Lunn wrote:

> > On Thu, Nov 26, 2020 at 12:24:55AM +0100, Lukasz Majewski wrote:

> >> This is the first attempt to add support for L2 switch available on some NXP

> >> devices - i.e. iMX287 or VF610. This patch set uses common FEC and DSA code.

> > 

> > Interesting. I need to take another look at the Vybrid manual. Last

> > time i looked, i was more thinking of a pure switchdev driver, not a

> > DSA driver. So i'm not sure this is the correct architecture. But has

> > been a while since i looked at the datasheet.

> 

> Agreed the block diagram shows one DMA for each "switch port" which

> definitively fits more within the switchdev model than the DSA model

> that re-purposes an existing Ethernet MAC controller as-is and bolts on

> an integrated or external switch IC.


Hi Florian

I'm not sure it is that simple. I'm looking at the Vybrid
datasheet. There are two major configurations.

1) The switch is pass through, and does nothing. Then two DMA channels
are used, one per external port. You basically just have two Ethernet
interfaces

2) The switch is active. You then have a 3 port switch, 2 ports for
the external interfaces, and one port connected to a single DMA
channel.

So when in an active mode, it does look more like a DSA switch.

What is not yet clear to me is how you direct frames out specific
interfaces. This is where i think we hit problems. I don't see a
generic mechanism, which is probably why Lukasz put tagger as None. It
does appear you can control the output of BPDUs, but it is not very
friendly. You write a register with the port you would like the next
BPDU to go out, queue the frame up on the DMA, and then poll a bit in
the register which flips when the frame is actually processed in the
switch. I don't see how you determine what port a BPDU came in on!
Maybe you have to use the learning interface?

Ah, the ESW_FFEN register can be used to send a frame out a specific
port. Write this register with the destination port, DMA a frame, and
it goes out the specific port. You then need to write the register
again for the next frame.

I get the feeling this is going to need a very close relationship
between the 'tagger' and the DMA engine. I don't see how this can be
done using the DSA architecture, the coupling is too loose.

It seems like the HW design assumes frames from the CPU will be
switched using the switch internal FDB, making Linux integration
"interesting"

It does not look like this is a classic DSA switch with a tagging
protocol. It might be possible to do VLAN per port, in order to direct
frames out a specific port?

       Andrew
Lukasz Majewski Nov. 26, 2020, 10:10 a.m. UTC | #4
Hi Andrew, Florian,

> On Wed, Nov 25, 2020 at 05:30:04PM -0800, Florian Fainelli wrote:

> > 

> > 

> > On 11/25/2020 4:00 PM, Andrew Lunn wrote:  

> > > On Thu, Nov 26, 2020 at 12:24:55AM +0100, Lukasz Majewski wrote:  

> > >> This is the first attempt to add support for L2 switch available

> > >> on some NXP devices - i.e. iMX287 or VF610. This patch set uses

> > >> common FEC and DSA code.  

> > > 

> > > Interesting. I need to take another look at the Vybrid manual.

> > > Last time i looked, i was more thinking of a pure switchdev

> > > driver, not a DSA driver. So i'm not sure this is the correct

> > > architecture. But has been a while since i looked at the

> > > datasheet.  

> > 

> > Agreed the block diagram shows one DMA for each "switch port" which

> > definitively fits more within the switchdev model than the DSA model

> > that re-purposes an existing Ethernet MAC controller as-is and

> > bolts on an integrated or external switch IC.  

> 

> Hi Florian

> 

> I'm not sure it is that simple. I'm looking at the Vybrid

> datasheet. There are two major configurations.

> 

> 1) The switch is pass through, and does nothing. Then two DMA channels

> are used, one per external port.


This is the "default" state (at least for imx28) - Chapter 29.3.2 on
User Manual [0].

Then you use DMA0 and DMA1 to read/write data to ENET-MAC{01}.

> You basically just have two Ethernet

> interfaces


If I may add important note - on the imx28 the ENET-MAC1 is configured
via ENET-MAC0 (it has FEC_QUIRK_SINGLE_MDIO set in fec_main.c). On this
device it is clearly stated that ENET-MAC1 block functionality is
reduced and its main purpose is to be used with L2 switch.

On the contrary, this flag is not set for vf610 in fec_main.c

> 

> 2) The switch is active. You then have a 3 port switch, 2 ports for

> the external interfaces, and one port connected to a single DMA

> channel.


+1 (Only DMA0 is used)

> 

> So when in an active mode, it does look more like a DSA switch.


It also looked like this for me.

> 

> What is not yet clear to me is how you direct frames out specific

> interfaces. This is where i think we hit problems. I don't see a

> generic mechanism, which is probably why Lukasz put tagger as None. 


I've put the "None" tag just to share the "testable" RFC code.

It is possible to "tag" frames - at least from the manual [0]:
Chapter: "29.4.9.2 Forced Forwarding".

With using register HW_ENET_SWI_FORCE_FWD_P0
29.9.34 ENET SWI Enable forced forwarding for a frame processed
from port 0 (HW_ENET_SWI_FORCE_FWD_P0)

One can "tag" the packet going from port0 (internal one from SoC) to be
forwarded to port1 (ENET-MAC0) or port2 (ENET-MAC1).

According to the legacy driver [1]:
"* It only replace the MAC lookup function,
 * all other filtering(eg.VLAN verification) act as normal"

> It

> does appear you can control the output of BPDUs, but it is not very

> friendly. You write a register with the port you would like the next

> BPDU to go out, queue the frame up on the DMA, and then poll a bit in

> the register which flips when the frame is actually processed in the

> switch. I don't see how you determine what port a BPDU came in on!

> Maybe you have to use the learning interface?


The learning interface works with the legacy NXP driver (2.6.35) which
copy can be found here[1].

> 

> Ah, the ESW_FFEN register can be used to send a frame out a specific

> port. Write this register with the destination port, DMA a frame, and

> it goes out the specific port. You then need to write the register

> again for the next frame.


It seems like the description of HW_ENET_SWI_FORCE_FWD_P0 described
above for imx28.

> 

> I get the feeling this is going to need a very close relationship

> between the 'tagger' and the DMA engine. I don't see how this can be

> done using the DSA architecture, the coupling is too loose.


My first impression was that it could be possible to set this
register in the DSA callback (which normally append the tag to the
frame).

This would work if we can assure that after calling this callback this
frame is transmitted (wait and poll?). Have I correctly understood
your above concern?

I do know that L2 switch has some kind of buffer from DMA0 (port0 - its
input port). However, I don't have access so such detailed manual.

> 

> It seems like the HW design assumes frames from the CPU will be

> switched using the switch internal FDB, making Linux integration

> "interesting"


The MoreThanIP L2 switch (on imx28) has 2K entries for setting FDB. It
also uses some hash function to speed up access/presence assessment.

> 

> It does not look like this is a classic DSA switch with a tagging

> protocol. 


This whole L2 switch implementation available on NXP's SoCs is a bit
odd. It is very highly coupled with FEC, ENET and DMA.

The original driver (2.6.35) was just a copy of FEC driver with some
switch adjustments.

Do you have any idea how to proceed? 

The vf610 and imx28 are still produced and widely used, so this is
still "actual" topic.

> It might be possible to do VLAN per port, in order to direct

> frames out a specific port?


The old driver had code to support VLANs.

From the manual[0] (29.1):
" Programmable Ingress and Egress VLAN tag addition, removal and
manipulation supporting single and double-tagged VLAN frames"

Also the "29.4.10 Frame Forwarding Tasks" from [0] sheds some more
light on it.

From the manual (29.4.10.2) it looks like the VLAN tag can be appended.
However, I don't know if it will work with VLAN built with L2 switch output ports.

> 

>        Andrew


Links:

[0] - "i.MX28 Applications Processor Reference Manual, Rev. 2, 08/2013"
[1] -
https://github.com/lmajewski/linux-imx28-l2switch/commit/e3c7a6eab73401e021aef0070e1935a0dba84fb5

Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Vladimir Oltean Nov. 26, 2020, 12:30 p.m. UTC | #5
Hi Lukasz,

On Thu, Nov 26, 2020 at 12:24:55AM +0100, Lukasz Majewski wrote:
> This is the first attempt to add support for L2 switch available on some NXP

> devices - i.e. iMX287 or VF610. This patch set uses common FEC and DSA code.

>

> This code provides _very_ basic switch functionality (packets are passed

> between lan1 and lan2 ports and it is possible to send packets via eth0),

> at its main purpose is to establish the way of reusing the FEC driver. When

> this is done, one can add more advanced features to the switch (like vlan or

> port separation).

>

> I also do have a request for testing on e.g. VF610 if this driver works on

> it too.

> The L2 switch documentation is very scant on NXP's User Manual [0] and most

> understanding of how it really works comes from old (2.6.35) NXP driver [1].

> The aforementioned old driver [1] was monolitic and now this patch set tries

> to mix FEC and DSA.

>

> Open issues:

> - I do have a hard time on understanding how to "disable" ENET-MAC{01} ports

> in DSA (via port_disable callback in dsa_switch_ops).

> When I disable L2 switch port1,2 or the ENET-MAC{01} in control register, I

> cannot simply re-enable it with enabling this bit again. The old driver reset

> (and setup again) the whole switch.


You don't have to disable the ports if that does more harm than good, of course.

> - The L2 switch is part of the SoC silicon, so we cannot follow the "normal" DSA

> pattern with "attaching" it via mdio device. The switch reuses already well

> defined ENET-MAC{01}. For that reason the MoreThanIP switch driver is

> registered as platform device


That is not a problem. Also, I would not say that the "normal" DSA
pattern is to have an MDIO-attached switch. Maybe that was true 10 years
ago. But now, we have DSA switches registered as platform devices, I2C
devices, SPI devices, PCI devices. That is not an issue under any
circumstance.

> - The question regarding power management - at least for my use case there

> is no need for runtime power management. The L2 switch shall work always at

> it connects other devices.

>

> - The FEC clock is also used for L2 switch management and configuration (as

> the L2 switch is just in the same, large IP block). For now I just keep it

> enabled so DSA code can use it. It looks a bit problematic to export

> fec_enet_clk_enable() to be reused on DSA code.

>

> Links:

> [0] - "i.MX28 Applications Processor Reference Manual, Rev. 2, 08/2013"

> [1] - https://github.com/lmajewski/linux-imx28-l2switch/commit/e3c7a6eab73401e021aef0070e1935a0dba84fb5


Disclaimer: I don't know the details of imx28, it's just now that I
downloaded the reference manual to see what it's about.

I would push back and say that the switch offers bridge acceleration for
the FEC. The fact that the bridge acceleration is provided by a different
vendor and requires access to an extra set of register blocks is immaterial.
To qualify as a DSA switch, you need to have indirect networking I/O
through a different network interface. You do not have that. What I
would do is I would expand the fec driver into something that, on
capable SoCs, detects bridging of the ENET_MAC0 and ENETC_MAC1 ports and
configures the switch accordingly to offload that in a seamless manner
for the user. This would also solve your power management issues, since
the entire Ethernet block would be handled by a single driver.
DSA is a complication you do not need. Convince me otherwise.

Also, side note.
Please, please, please, stop calling it "l2 switch" and use something
more specific. If everybody writing a driver for the Linux kernel called
their L2 switch "L2 switch", we would go crazy. The kernel is not a deep
silo like the hardware team that integrated this MTIP switching IP into
imx28, and for whom this L2 switch is the only switch that exists, and
therefore for whom no further qualification was necessary. Andy, Peng or
Fabio might be able to give you a reference to an internal code name
that you can use, or something. Otherwise, I don't care if you need to
invent a name yourself - be as creative as you feel like. mtip-fec-switch,
charlie, matilda, brunhild, all fine by me.
Andrew Lunn Nov. 26, 2020, 2:45 p.m. UTC | #6
> > What is not yet clear to me is how you direct frames out specific

> > interfaces. This is where i think we hit problems. I don't see a

> > generic mechanism, which is probably why Lukasz put tagger as None. 

> 

> I've put the "None" tag just to share the "testable" RFC code.


Tagging is a core feature of DSA. Without being able to direct a
packet out a specific port, it is not really a DSA driver.  It is also
core requirement of integrating a switch into Linux. A DSA driver, or
a pure switchdev driver expects to be able to forward frames out
specific ports.

> It is possible to "tag" frames - at least from the manual [0]:

> Chapter: "29.4.9.2 Forced Forwarding".

> 

> With using register HW_ENET_SWI_FORCE_FWD_P0

> 29.9.34 ENET SWI Enable forced forwarding for a frame processed

> from port 0 (HW_ENET_SWI_FORCE_FWD_P0)

> 

> One can "tag" the packet going from port0 (internal one from SoC) to be

> forwarded to port1 (ENET-MAC0) or port2 (ENET-MAC1).

> 

> According to the legacy driver [1]:

> "* It only replace the MAC lookup function,

>  * all other filtering(eg.VLAN verification) act as normal"


This might solve your outgoing frame problems. But you need to dive
deep into how the FEC driver works, especially in a DSA like
setup. The normal path would be, the slave interface passes a frame to
the tagger driver, living in net/dsa/tag_*.c. Normally, it adds a
header/trailer which the switch looks at. It then hands to packet over
to the master Ethernet driver, which at some point will send the
frame. Because the frame is self contained, we don't care what that
ethernet driver actually does. It can add it to a queue and send it
later. It can look at the QoS tags and send it with low priority after
other frames, or could put it to the head of the queue and send it
before other frames etc.

Since you don't have self contained frames, this is a problem. After
writing to this register, you need to ensure what is transmitted next
is the specific frame you intend. It cannot be added to an existing
queue etc. You need to know when the frame has been sent, so you can
re-write this register for the next frame.

This is why i said i don't know if the DSA architecture will work. You
need a close coupling between the tagger setting the force bits, and
the DMA engine sending the frame.

The other option is you totally ignore most of this and statically
assign VLANs. Frames sent with VLAN 1 are forwarded out port 1. Frames
sent with VLAN 2 are sent out port 2. You need the port to
append/strip these VLAN tags for ingress/egress. tag_8021q.c gives you
some code to help with this. But can you still use the hardware to
switch frames between ports 1 and 2 without them going via the CPU?

       Andrew.
Lukasz Majewski Nov. 26, 2020, 11:35 p.m. UTC | #7
Hi Vladimir,

> Hi Lukasz,

> 

> On Thu, Nov 26, 2020 at 12:24:55AM +0100, Lukasz Majewski wrote:

> > This is the first attempt to add support for L2 switch available on

> > some NXP devices - i.e. iMX287 or VF610. This patch set uses common

> > FEC and DSA code.

> >

> > This code provides _very_ basic switch functionality (packets are

> > passed between lan1 and lan2 ports and it is possible to send

> > packets via eth0), at its main purpose is to establish the way of

> > reusing the FEC driver. When this is done, one can add more

> > advanced features to the switch (like vlan or port separation).

> >

> > I also do have a request for testing on e.g. VF610 if this driver

> > works on it too.

> > The L2 switch documentation is very scant on NXP's User Manual [0]

> > and most understanding of how it really works comes from old

> > (2.6.35) NXP driver [1]. The aforementioned old driver [1] was

> > monolitic and now this patch set tries to mix FEC and DSA.

> >

> > Open issues:

> > - I do have a hard time on understanding how to "disable"

> > ENET-MAC{01} ports in DSA (via port_disable callback in

> > dsa_switch_ops). When I disable L2 switch port1,2 or the

> > ENET-MAC{01} in control register, I cannot simply re-enable it with

> > enabling this bit again. The old driver reset (and setup again) the

> > whole switch.  

> 

> You don't have to disable the ports if that does more harm than good,

> of course.


Ok.

> 

> > - The L2 switch is part of the SoC silicon, so we cannot follow the

> > "normal" DSA pattern with "attaching" it via mdio device. The

> > switch reuses already well defined ENET-MAC{01}. For that reason

> > the MoreThanIP switch driver is registered as platform device  

> 

> That is not a problem. Also, I would not say that the "normal" DSA

> pattern is to have an MDIO-attached switch. Maybe that was true 10

> years ago. But now, we have DSA switches registered as platform

> devices, I2C devices, SPI devices, PCI devices. That is not an issue

> under any circumstance.


Ok. The MTIP switch now registers as platform device.

> 

> > - The question regarding power management - at least for my use

> > case there is no need for runtime power management. The L2 switch

> > shall work always at it connects other devices.

> >

> > - The FEC clock is also used for L2 switch management and

> > configuration (as the L2 switch is just in the same, large IP

> > block). For now I just keep it enabled so DSA code can use it. It

> > looks a bit problematic to export fec_enet_clk_enable() to be

> > reused on DSA code.

> >

> > Links:

> > [0] - "i.MX28 Applications Processor Reference Manual, Rev. 2,

> > 08/2013" [1] -

> > https://github.com/lmajewski/linux-imx28-l2switch/commit/e3c7a6eab73401e021aef0070e1935a0dba84fb5

> >  

> 

> Disclaimer: I don't know the details of imx28, it's just now that I

> downloaded the reference manual to see what it's about.

> 

> I would push back and say that the switch offers bridge acceleration

> for the FEC. 


Am I correct, that the "bridge acceleration" means in-hardware support
for L2 packet bridging? 

And without the switch IP block enabled one shall be able to have
software bridging in Linux in those two interfaces?

> The fact that the bridge acceleration is provided by a

> different vendor and requires access to an extra set of register

> blocks is immaterial. 


Am I correct that you mean not important above (i.e. immaterial == not
important)?

> To qualify as a DSA switch, you need to have

> indirect networking I/O through a different network interface. You do

> not have that.


I do have eth0 (DMA0) -> MoreThanIP switch port 0 input
			 |
			 |----> switch port1 -> ENET-MAC0
			 |
			 |----> switch port2 -> ENET-MAC1

> What I would do is I would expand the fec driver into

> something that, on capable SoCs, detects bridging of the ENET_MAC0

> and ENETC_MAC1 ports and configures the switch accordingly to offload

> that in a seamless manner for the user. 


Do you propose to catch some kind of notification when user calls:

ip link add name br0 type bridge; ip link set br0 up;
ip link set lan1 up; ip link set lan2 up;
ip link set lan1 master br0; ip link set lan2 master br0;
bridge link

And then configure the FEC driver to use this L2 switch driver?

> This would also solve your

> power management issues, since the entire Ethernet block would be

> handled by a single driver. DSA is a complication you do not need.

> Convince me otherwise.


From what I see the MoreThanIP IP block looks like a "typical" L2 switch
(like lan9xxx), with VLAN tagging support, static and dynamic tables,
forcing the packet to be passed to port [*], congestion management,
switch input buffer, priority of packets/queues, broadcast, multicast,
port snooping, and even IEEE1588 timestamps.

Seems like a lot of useful features.

The differences from "normal" DSA switches:

1. It uses mapped memory (for its register space) for
configuration/statistics gathering (instead of e.g. SPI, I2C)

2. The TAG is not appended to the frame outgoing from the "master" FEC
port - it can be setup when DMA transfers packet to MTIP switch internal
buffer.

Note:

[*] - The same situation is in the VF610 and IMX28:
The ESW_FFEN register - Bit 0 -> FEN 

"When set, the next frame received from port 0 (the local DMA port) is
forwarded to the ports defined in FD. The bit resets to zero
automatically when one frame from port 0 has been processed by the
switch (i.e. has been read from the port 0 input buffer; see Figure
32-1). Therefore, the bit must be set again as necessary. See also
Section 32.5.8.2, "Forced Forwarding" for a description."

(Of course the "Section 32.5.8.2" is not available)


According to above the "tag" (engress port) is set when DMA transfers
the packet to input MTIP buffer. This shall allow force forwarding as
we can setup this bit when we normally append tag in the network stack.

I will investigate this issue - and check the port separation. If it
works then DSA (or switchdev) shall be used?

(A side question - DSA uses switchdev, so when one shall use switchdev
standalone?)


> 

> Also, side note.

> Please, please, please, stop calling it "l2 switch" and use something

> more specific. If everybody writing a driver for the Linux kernel

> called their L2 switch "L2 switch", we would go crazy. The kernel is

> not a deep silo like the hardware team that integrated this MTIP

> switching IP into imx28, and for whom this L2 switch is the only

> switch that exists, and therefore for whom no further qualification

> was necessary.


The "l2 switch" name indeed was taken from the NXP's legacy driver :-)

> Andy, Peng or Fabio might be able to give you a

> reference to an internal code name that you can use, or something.


I would prefer to obtain some kind of manual/doc/spec for MTIP IP
block. 

> Otherwise, I don't care if you need to invent a name yourself - be as

> creative as you feel like. mtip-fec-switch, charlie, matilda,

> brunhild, all fine by me.


I think that "mtip-fec-switch" is the most reasonable name.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Lukasz Majewski Nov. 27, 2020, 12:03 a.m. UTC | #8
Hi Andrew,

> > > What is not yet clear to me is how you direct frames out specific

> > > interfaces. This is where i think we hit problems. I don't see a

> > > generic mechanism, which is probably why Lukasz put tagger as

> > > None.   

> > 

> > I've put the "None" tag just to share the "testable" RFC code.  

> 

> Tagging is a core feature of DSA. Without being able to direct a

> packet out a specific port, it is not really a DSA driver.  It is also

> core requirement of integrating a switch into Linux. A DSA driver, or

> a pure switchdev driver expects to be able to forward frames out

> specific ports.


Please find my answer, which I gave to Vladimir in the other mail (you
were CC'ed).

As a backup plan - the vlan tagging may be worth to investigate.

> 

> > It is possible to "tag" frames - at least from the manual [0]:

> > Chapter: "29.4.9.2 Forced Forwarding".

> > 

> > With using register HW_ENET_SWI_FORCE_FWD_P0

> > 29.9.34 ENET SWI Enable forced forwarding for a frame processed

> > from port 0 (HW_ENET_SWI_FORCE_FWD_P0)

> > 

> > One can "tag" the packet going from port0 (internal one from SoC)

> > to be forwarded to port1 (ENET-MAC0) or port2 (ENET-MAC1).

> > 

> > According to the legacy driver [1]:

> > "* It only replace the MAC lookup function,

> >  * all other filtering(eg.VLAN verification) act as normal"  

> 

> This might solve your outgoing frame problems. But you need to dive

> deep into how the FEC driver works, especially in a DSA like

> setup. 


Agree.

> The normal path would be, the slave interface passes a frame to

> the tagger driver, living in net/dsa/tag_*.c. Normally, it adds a

> header/trailer which the switch looks at. It then hands to packet over

> to the master Ethernet driver, which at some point will send the

> frame. Because the frame is self contained, we don't care what that

> ethernet driver actually does. It can add it to a queue and send it

> later. It can look at the QoS tags and send it with low priority after

> other frames, or could put it to the head of the queue and send it

> before other frames etc.

> 


Thanks for the explanation.

> Since you don't have self contained frames, this is a problem. After

> writing to this register, you need to ensure what is transmitted next

> is the specific frame you intend. It cannot be added to an existing

> queue etc. You need to know when the frame has been sent, so you can

> re-write this register for the next frame.


This needs to be assessed as the documentation is very vague. I'm
wondering how MTIP/NXP recommends usage of ESW_FFEN register.

> 

> This is why i said i don't know if the DSA architecture will work. You

> need a close coupling between the tagger setting the force bits, and

> the DMA engine sending the frame.


Maybe it would be just enough to program the ESW_FFEN register when ENET
descriptor is programmed for DMA? Earlier we would append the
superfluous tag in the tag_*.c ?

> 

> The other option is you totally ignore most of this and statically

> assign VLANs. Frames sent with VLAN 1 are forwarded out port 1. Frames

> sent with VLAN 2 are sent out port 2. You need the port to

> append/strip these VLAN tags for ingress/egress. tag_8021q.c gives you

> some code to help with this. But can you still use the hardware to

> switch frames between ports 1 and 2 without them going via the CPU?


Yes, it is possible to switch frames between ENET-MAC{01} ports without
any interaction from CPU.

> 

>        Andrew.





Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Andrew Lunn Nov. 27, 2020, 12:55 a.m. UTC | #9
> (A side question - DSA uses switchdev, so when one shall use switchdev

> standalone?)


DSA gives you a framework for an Ethernet switch connected to a host
via Ethernet for the data plane. Generally, that Ethernet link to the
switch is a MAC to MAC connection. It can be PHY to PHY. But those are
just details. The important thing is you use an Ethernet driver on the
host.

If you look at pure switchdev devices, they generally DMA frames
directly into the switch. There is either one DMA queue per switch
port, or there is a way to multiplex frames over one DMA queue,
generally by additional fields in the buffer descriptor.

For this device, at the moment, it is hard to say which is the best
fit. A lot will depend on how the FEC driver works, if you can reuse
it, while still having the degree of control you need over the DMA
channel. If you can reuse the FEC driver, then a DSA driver might
work. If the coupling it too loose, and you have to take control of
the DMA, then a pure switchdev driver seems more appropriate.

    Andrew
Andrew Lunn Nov. 27, 2020, 1:08 a.m. UTC | #10
> > I would push back and say that the switch offers bridge acceleration

> > for the FEC. 

> 

> Am I correct, that the "bridge acceleration" means in-hardware support

> for L2 packet bridging? 


You should think of the hardware as an accelerator, not a switch. The
hardware is there to accelerate what linux can already do. You setup a
software bridge in linux, and then offload L2 switching to the
accelerator. You setup vlans in linux, and then offload the filtering
of them to the accelerator. If there is something linux can do, but
the hardware cannot accelerate, you leave linux to do it in software.

> Do you propose to catch some kind of notification when user calls:

> 

> ip link add name br0 type bridge; ip link set br0 up;

> ip link set lan1 up; ip link set lan2 up;

> ip link set lan1 master br0; ip link set lan2 master br0;

> bridge link

> 

> And then configure the FEC driver to use this L2 switch driver?


That is what switchdev does. There are various hooks in the network
stack which call into switchdev to ask it to offload operations to the
accelerator.

> The differences from "normal" DSA switches:

> 

> 1. It uses mapped memory (for its register space) for

> configuration/statistics gathering (instead of e.g. SPI, I2C)


That does not matter. And there are memory mapped DSA switches. The
DSA framework puts no restrictions on how the control plane works.

> (Of course the "Section 32.5.8.2" is not available)


It is in the Vybrid datasheet :-)

   Andrew
Lukasz Majewski Nov. 27, 2020, 9:16 a.m. UTC | #11
Hi Andrew,

> > (A side question - DSA uses switchdev, so when one shall use

> > switchdev standalone?)  

> 

> DSA gives you a framework for an Ethernet switch connected to a host

> via Ethernet for the data plane. Generally, that Ethernet link to the

> switch is a MAC to MAC connection. It can be PHY to PHY. But those are

> just details. The important thing is you use an Ethernet driver on the

> host.

> 

> If you look at pure switchdev devices, they generally DMA frames

> directly into the switch. There is either one DMA queue per switch

> port, or there is a way to multiplex frames over one DMA queue,

> generally by additional fields in the buffer descriptor.

> 

> For this device, at the moment, it is hard to say which is the best

> fit. A lot will depend on how the FEC driver works, if you can reuse

> it, while still having the degree of control you need over the DMA

> channel. If you can reuse the FEC driver, then a DSA driver might

> work. If the coupling it too loose, and you have to take control of

> the DMA, then a pure switchdev driver seems more appropriate.

> 

>     Andrew

> 


Thanks for the detailed explanation.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Lukasz Majewski Nov. 27, 2020, 9:25 a.m. UTC | #12
Hi Andrew,

> > > I would push back and say that the switch offers bridge

> > > acceleration for the FEC.   

> > 

> > Am I correct, that the "bridge acceleration" means in-hardware

> > support for L2 packet bridging?   

> 

> You should think of the hardware as an accelerator, not a switch. The

> hardware is there to accelerate what linux can already do. You setup a

> software bridge in linux, and then offload L2 switching to the

> accelerator. You setup vlans in linux, and then offload the filtering

> of them to the accelerator. If there is something linux can do, but

> the hardware cannot accelerate, you leave linux to do it in software.


Ok.

> 

> > Do you propose to catch some kind of notification when user calls:

> > 

> > ip link add name br0 type bridge; ip link set br0 up;

> > ip link set lan1 up; ip link set lan2 up;

> > ip link set lan1 master br0; ip link set lan2 master br0;

> > bridge link

> > 

> > And then configure the FEC driver to use this L2 switch driver?  

> 

> That is what switchdev does. There are various hooks in the network

> stack which call into switchdev to ask it to offload operations to the

> accelerator.


Ok.

> 

> > The differences from "normal" DSA switches:

> > 

> > 1. It uses mapped memory (for its register space) for

> > configuration/statistics gathering (instead of e.g. SPI, I2C)  

> 

> That does not matter. And there are memory mapped DSA switches. The

> DSA framework puts no restrictions on how the control plane works.

> 

> > (Of course the "Section 32.5.8.2" is not available)  

> 

> It is in the Vybrid datasheet :-)


Hmm...

I cannot find such chapter in the official documentation from NXP:
"VFxxx Controller Reference Manual, Rev. 0, 10/2016"

Maybe you have more verbose version? Could you share how the document
is named?

> 

>    Andrew



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Andrew Lunn Nov. 27, 2020, 3:10 p.m. UTC | #13
On Fri, Nov 27, 2020 at 10:25:28AM +0100, Lukasz Majewski wrote:
> Hi Andrew,

> 

> > > > I would push back and say that the switch offers bridge

> > > > acceleration for the FEC.   

> > > 

> > > Am I correct, that the "bridge acceleration" means in-hardware

> > > support for L2 packet bridging?   

> > 

> > You should think of the hardware as an accelerator, not a switch. The

> > hardware is there to accelerate what linux can already do. You setup a

> > software bridge in linux, and then offload L2 switching to the

> > accelerator. You setup vlans in linux, and then offload the filtering

> > of them to the accelerator. If there is something linux can do, but

> > the hardware cannot accelerate, you leave linux to do it in software.

> 

> Ok.

> 

> > 

> > > Do you propose to catch some kind of notification when user calls:

> > > 

> > > ip link add name br0 type bridge; ip link set br0 up;

> > > ip link set lan1 up; ip link set lan2 up;

> > > ip link set lan1 master br0; ip link set lan2 master br0;

> > > bridge link

> > > 

> > > And then configure the FEC driver to use this L2 switch driver?  

> > 

> > That is what switchdev does. There are various hooks in the network

> > stack which call into switchdev to ask it to offload operations to the

> > accelerator.

> 

> Ok.

> 

> > 

> > > The differences from "normal" DSA switches:

> > > 

> > > 1. It uses mapped memory (for its register space) for

> > > configuration/statistics gathering (instead of e.g. SPI, I2C)  

> Hmm...

> 

> I cannot find such chapter in the official documentation from NXP:

> "VFxxx Controller Reference Manual, Rev. 0, 10/2016"


I have

Vybrid Reference Manual
F-Series

Document Number: VYBRIDRM
Rev 7, 06/2014

    Andrew
Vladimir Oltean Nov. 27, 2020, 7:29 p.m. UTC | #14
On Fri, Nov 27, 2020 at 12:35:49AM +0100, Lukasz Majewski wrote:
> > > - The question regarding power management - at least for my use

> > > case there is no need for runtime power management. The L2 switch

> > > shall work always at it connects other devices.

> > >

> > > - The FEC clock is also used for L2 switch management and

> > > configuration (as the L2 switch is just in the same, large IP

> > > block). For now I just keep it enabled so DSA code can use it. It

> > > looks a bit problematic to export fec_enet_clk_enable() to be

> > > reused on DSA code.

> > >

> > > Links:

> > > [0] - "i.MX28 Applications Processor Reference Manual, Rev. 2,

> > > 08/2013" [1] -

> > > https://github.com/lmajewski/linux-imx28-l2switch/commit/e3c7a6eab73401e021aef0070e1935a0dba84fb5

> > >

> >

> > Disclaimer: I don't know the details of imx28, it's just now that I

> > downloaded the reference manual to see what it's about.

> >

> > I would push back and say that the switch offers bridge acceleration

> > for the FEC.

>

> Am I correct, that the "bridge acceleration" means in-hardware support

> for L2 packet bridging?

>

> And without the switch IP block enabled one shall be able to have

> software bridging in Linux in those two interfaces?


So if the switch is bypassed through pin strapping of sx_ena, then the
DMA0 would be connected to ENETC-MAC 0, DMA1 to ENET-MAC 1, and both
these ports could be driven by the regular fec driver, am I right? This
is how people use the imx28 with mainline linux right now, no?

When the sx_ena signal enables the switch, a hardware accelerator
appears between the DMA engine and the same MACs. But that DMA engine is
still compatible with the expectations of the fec driver. And the MACs
still belong to the FEC. So, at the end of the day, there are still 2
FEC interfaces.

Where I was going is that from a user's perspective, it would be natural
to have the exact same view of the system in both cases, aka still two
network interfaces for MAC 0 and MAC 1, they still function in the same
way (i.e. you can still ping through them) but with the additional
ability to do hardware-accelerating bridging, if the MAC 0 and MAC 1
network interfaces are put under the same bridge.

Currently, you do not offer that, at all.
You split the MTIP switch configuration between DSA and the FEC driver.
But you are still not exposing networking-capable net devices for MAC 0
and MAC 1. You still do I/O through the DMA engine.

So why use DSA at all? What benefit does it bring you? Why not do the
entire switch configuration from within FEC, or a separate driver very
closely related to it?

> > The fact that the bridge acceleration is provided by a

> > different vendor and requires access to an extra set of register

> > blocks is immaterial.

>

> Am I correct that you mean not important above (i.e. immaterial == not

> important)?


Yes, sorry if that was not clear.

> > To qualify as a DSA switch, you need to have

> > indirect networking I/O through a different network interface. You do

> > not have that.

>

> I do have eth0 (DMA0) -> MoreThanIP switch port 0 input

> 			 |

> 			 |----> switch port1 -> ENET-MAC0

> 			 |

> 			 |----> switch port2 -> ENET-MAC1


The whole point of DSA is to intercept traffic from a different and
completely unaware network interface, parse a tag, and masquerade the
packet as though it came on a different, virtual, network interface
corresponding to the switch. DSA offers this service so that:
- any switch works with any host Ethernet controller
- all vendors use the same mechanism and do not reinvent the same wheel
The last part is especially relevant. Just grep net/core/ for "dsa" and
be amazed at the number of hacks there. DSA can justify that only by scale.
Aka "we have hardware that works this way, and doesn't work any other way.
And lots of it".

That being said, in your case, the network interface is not unaware of
the switch at all. Come on, you need to put the (allegedly) switch's MAC
in promiscuous mode, from the "DSA master" driver! Hardware resource
ownership is very DSA-unlike in the imx28/vybrid model, it seems. This
is a very big deal.

And there is no tag to parse. You have a switch with a DMA engine which
is a priori known to be register-compatible with the DMA engine used for
plain FEC interfaces. It's not like you have a switch that can be
connected via MII to anything and everything. Force forwarding is done
via writing a register, again very much unlike DSA, and retrieving the
source port on RX is up in the air.

> > What I would do is I would expand the fec driver into

> > something that, on capable SoCs, detects bridging of the ENET_MAC0

> > and ENETC_MAC1 ports and configures the switch accordingly to offload

> > that in a seamless manner for the user.

>

> Do you propose to catch some kind of notification when user calls:

>

> ip link add name br0 type bridge; ip link set br0 up;

> ip link set lan1 up; ip link set lan2 up;

> ip link set lan1 master br0; ip link set lan2 master br0;

> bridge link

>

> And then configure the FEC driver to use this L2 switch driver?


Yes, that can be summarized as:
One option to get this upstream would be to transform fec into a
switchdev driver, or to create a mtip switchdev driver that shares a lot
of code with the fec driver. The correct tool for code sharing is
EXPORT_SYMBOL_GPL, not DSA.

> > This would also solve your

> > power management issues, since the entire Ethernet block would be

> > handled by a single driver. DSA is a complication you do not need.

> > Convince me otherwise.

>

> From what I see the MoreThanIP IP block looks like a "typical" L2 switch

> (like lan9xxx), with VLAN tagging support, static and dynamic tables,

> forcing the packet to be passed to port [*], congestion management,

> switch input buffer, priority of packets/queues, broadcast, multicast,

> port snooping, and even IEEE1588 timestamps.

>

> Seems like a lot of useful features.


I did not say that it is not a typical switch, or that it doesn't have
useful features. I said that DSA does not help you. Adding a
DSA_TAG_PROTO_NONE driver in 2020 is a no-go all around.

> The differences from "normal" DSA switches:

>

> 1. It uses mapped memory (for its register space) for

> configuration/statistics gathering (instead of e.g. SPI, I2C)


Nope, that's not a difference.

> 2. The TAG is not appended to the frame outgoing from the "master" FEC

> port - it can be setup when DMA transfers packet to MTIP switch internal

> buffer.


That is a difference indeed. Since DSA switches are isolated from their
host interface, and there has been a traditional separation at the
MAC-to-MAC layer (or MAC-to-PHY-to-PHY-to-MAC in weird cases), the
routing information is passed in-band via a header in the packet. The
host interface has no idea that this header exists, it is just traffic
as usual. The whole DSA infrastructure is built around intercepting and
decoding that.

> Note:

>

> [*] - The same situation is in the VF610 and IMX28:

> The ESW_FFEN register - Bit 0 -> FEN

>

> "When set, the next frame received from port 0 (the local DMA port) is

> forwarded to the ports defined in FD. The bit resets to zero

> automatically when one frame from port 0 has been processed by the

> switch (i.e. has been read from the port 0 input buffer; see Figure

> 32-1). Therefore, the bit must be set again as necessary. See also

> Section 32.5.8.2, "Forced Forwarding" for a description."

>

> (Of course the "Section 32.5.8.2" is not available)

>

>

> According to above the "tag" (engress port) is set when DMA transfers

> the packet to input MTIP buffer. This shall allow force forwarding as

> we can setup this bit when we normally append tag in the network stack.

>

> I will investigate this issue - and check the port separation. If it

> works then DSA (or switchdev) shall be used?


Source port identification and individual egress port addressing are
things that should behave absolutely the same regardless of whether you
have a DSA or a pure switchdev driver. I don't think that this is clear
enough to you. DSA with DSA_TAG_PROTO_NONE is not the shortcut you're
looking for. Please take a look at Documentation/networking/switchdev.rst,
it explains pretty clearly that a switchdev port must still expose the
same interface as any other net device.

> (A side question - DSA uses switchdev, so when one shall use switchdev

> standalone?)


Short answer: if the system-side packet I/O interface is Ethernet and
the hardware ownership is clearly delineated at the boundary of that
Ethernet port, then it should be DSA, otherwise it shouldn't.

But nonetheless, this raises a fundamental question, and I'll indulge in
attempting to answer it more comprehensively.

You seem to be tapping into an idea which has been circulating for a
while, and which can be also found in Documentation/networking/dsa/dsa.rst:

-----------------------------[cut here]-----------------------------

TODO
====

Making SWITCHDEV and DSA converge towards an unified codebase
-------------------------------------------------------------

SWITCHDEV properly takes care of abstracting the networking stack with offload
capable hardware, but does not enforce a strict switch device driver model. On
the other DSA enforces a fairly strict device driver model, and deals with most
of the switch specific. At some point we should envision a merger between these
two subsystems and get the best of both worlds.

-----------------------------[cut here]-----------------------------

IMO this is never going to happen, nor should it.
To unify DSA and switchdev would mean to force plain switchdev drivers
to have a stricter separation between the control path and the data
path, a la DSA. So, just like DSA has separate drivers for the switch,
the tagging protocol and for the DSA master, plain switchdev would need
a separate driver too for the DMA/rings/queues, or whatever, of the
switch (the piece of code that implements the NAPI instance). That
driver would need to:
(a) expose a net device for the DMA interface
(b) fake a DSA tag that gets added by the DMA net device, just to be
    later removed by the switchdev net device.
This is just for compatibility with what DSA _needs_ to do to drive
hardware that was _designed_ to work that way, and where the _hardware_
adds that tag. But non-DSA drivers don't need any of that, nor does it
really help them in any way! So why should we model things this way?
I don't think the lines between plain switchdev and DSA are blurry at all.
And especially not in this case. You should not have a networking
interface registered to the system for DMA-0 at all, just for MAC-0 and
MAC-1.
Lukasz Majewski Nov. 28, 2020, 12:33 a.m. UTC | #15
Hi Vladimir,

> On Fri, Nov 27, 2020 at 12:35:49AM +0100, Lukasz Majewski wrote:

> > > > - The question regarding power management - at least for my use

> > > > case there is no need for runtime power management. The L2

> > > > switch shall work always at it connects other devices.

> > > >

> > > > - The FEC clock is also used for L2 switch management and

> > > > configuration (as the L2 switch is just in the same, large IP

> > > > block). For now I just keep it enabled so DSA code can use it.

> > > > It looks a bit problematic to export fec_enet_clk_enable() to be

> > > > reused on DSA code.

> > > >

> > > > Links:

> > > > [0] - "i.MX28 Applications Processor Reference Manual, Rev. 2,

> > > > 08/2013" [1] -

> > > > https://github.com/lmajewski/linux-imx28-l2switch/commit/e3c7a6eab73401e021aef0070e1935a0dba84fb5

> > > >  

> > >

> > > Disclaimer: I don't know the details of imx28, it's just now that

> > > I downloaded the reference manual to see what it's about.

> > >

> > > I would push back and say that the switch offers bridge

> > > acceleration for the FEC.  

> >

> > Am I correct, that the "bridge acceleration" means in-hardware

> > support for L2 packet bridging?

> >

> > And without the switch IP block enabled one shall be able to have

> > software bridging in Linux in those two interfaces?  

> 

> So if the switch is bypassed through pin strapping of sx_ena, then the

> DMA0 would be connected to ENETC-MAC 0, DMA1 to ENET-MAC 1, and both

> these ports could be driven by the regular fec driver, am I right?

> This is how people use the imx28 with mainline linux right now, no?


Yes. This is the current situation.

> 

> When the sx_ena signal enables the switch, a hardware accelerator

> appears between the DMA engine and the same MACs.


Yes.

> But that DMA engine

> is still compatible with the expectations of the fec driver.


Yes. This is why I can reuse the fec_main.c driver.

> And the

> MACs still belong to the FEC.


Yes. I do use mdio communication between MAC and PHYs (by re-using FEC
driver code).

> So, at the end of the day, there are

> still 2 FEC interfaces.


I'm a bit confused with the above sentence. What do you mean "2 FEC
interfaces"?

There is the FEC driver (fec_main.c), which handles DMA0
management/setup and communication with PHYs via MDIO (the FEC driver
setups MDIO, link type, speed, provides info if link is up or down).

> 

> Where I was going is that from a user's perspective, it would be

> natural to have the exact same view of the system in both cases, aka

> still two network interfaces for MAC 0 and MAC 1, they still function

> in the same way (i.e. you can still ping through them) but with the

> additional ability to do hardware-accelerating bridging, if the MAC 0

> and MAC 1 network interfaces are put under the same bridge.


At least on imx287 ENET-MAC1 needs MAC0 to be configured:
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/freescale/fec_main.c#L2065

Those two MACs are not fully independent - functionality of MAC1 is
reduced.

> 

> Currently, you do not offer that, at all.

> You split the MTIP switch configuration between DSA and the FEC

> driver. But you are still not exposing networking-capable net devices

> for MAC 0 and MAC 1.


As fair as I can tell (and tested it) - it is possible to bring up
"lan{12}" interfaces with `ip l s lan{12}`

I would also like to set the static table with altering the FDB switch
memory/configuration. Access to it seems natural via lan{12} interfaces
(but of course could be possible from eth0).

Yes, I can send data via eth0 as I don't add tag to it. As pointed out
by Andrew - I could use VLAN tags (or play with FEC_FFEN register).

> You still do I/O through the DMA engine.


Yes, DMA0 is connected to MTIP port0 (input port from the imx28 SoC).
The MTIP port1 -> MAC0 and port2 -> MAC1

> 

> So why use DSA at all? What benefit does it bring you? Why not do the

> entire switch configuration from within FEC, or a separate driver very

> closely related to it?


Mine rationale to use DSA and FEC:
- Make as little changes to FEC as possible

- Provide separate driver to allow programming FDB, MDB, VLAN setup.
  This seems straightforward as MTIP has separate memory region (from
  FEC) for switch configuration, statistics, learning, static table
  programming. What is even more bizarre FEC and MTIP have the same 8
  registers (with different base address and +4 offset :-) ) as
  interface to handle DMA0 transfers.

- According to MTIP description from NXP documentation, there is a
  separate register for frame forwarding, so it _shall_ also fit into
  DSA.


For me it would be enough to have:

- lan{12} - so I could enable/disable it on demand (control when switch
  ports are passing or not packets).

- Use standard net tools (like bridge) to setup FDB/MDB, vlan 

- Read statistics from MTIP ports (all of them)

- I can use lan1 (bridged or not) to send data outside. It would be
  also correct to use eth0.

I'm for the most pragmatic (and simple) solution, which fulfill above
requirements.

> 

> > > The fact that the bridge acceleration is provided by a

> > > different vendor and requires access to an extra set of register

> > > blocks is immaterial.  

> >

> > Am I correct that you mean not important above (i.e. immaterial ==

> > not important)?  

> 

> Yes, sorry if that was not clear.


Ok. I was just not sure.

> 

> > > To qualify as a DSA switch, you need to have

> > > indirect networking I/O through a different network interface.

> > > You do not have that.  

> >

> > I do have eth0 (DMA0) -> MoreThanIP switch port 0 input

> > 			 |

> > 			 |----> switch port1 -> ENET-MAC0

> > 			 |

> > 			 |----> switch port2 -> ENET-MAC1  

> 

> The whole point of DSA is to intercept traffic from a different and

> completely unaware network interface, parse a tag, and masquerade the

> packet as though it came on a different, virtual, network interface

> corresponding to the switch. DSA offers this service so that:

> - any switch works with any host Ethernet controller

> - all vendors use the same mechanism and do not reinvent the same

> wheel The last part is especially relevant. Just grep net/core/ for

> "dsa" and be amazed at the number of hacks there. DSA can justify

> that only by scale. Aka "we have hardware that works this way, and

> doesn't work any other way. And lots of it".


That is why I wanted to use FEC as "fixed-link" interface, so large
portion of it would be disabled (e.g. the ENET-MAC management for eth0).

> 

> That being said, in your case, the network interface is not unaware of

> the switch at all. 


Unfortunately, you are right here. FEC, MTIP, ENET-MAC are very closely
coupled.

> Come on, you need to put the (allegedly) switch's

> MAC in promiscuous mode, from the "DSA master" driver! Hardware

> resource ownership is very DSA-unlike in the imx28/vybrid model, it

> seems. This is a very big deal.


Ok.

> 

> And there is no tag to parse. You have a switch with a DMA engine

> which is a priori known to be register-compatible with the DMA engine

> used for plain FEC interfaces. It's not like you have a switch that

> can be connected via MII to anything and everything. Force forwarding

> is done via writing a register, again very much unlike DSA, and

> retrieving the source port on RX is up in the air.


Yes. You are correct here.

> 

> > > What I would do is I would expand the fec driver into

> > > something that, on capable SoCs, detects bridging of the ENET_MAC0

> > > and ENETC_MAC1 ports and configures the switch accordingly to

> > > offload that in a seamless manner for the user.  

> >

> > Do you propose to catch some kind of notification when user calls:

> >

> > ip link add name br0 type bridge; ip link set br0 up;

> > ip link set lan1 up; ip link set lan2 up;

> > ip link set lan1 master br0; ip link set lan2 master br0;

> > bridge link

> >

> > And then configure the FEC driver to use this L2 switch driver?  

> 

> Yes, that can be summarized as:

> One option to get this upstream would be to transform fec into a

> switchdev driver, or to create a mtip switchdev driver that shares a

> lot of code with the fec driver. The correct tool for code sharing is

> EXPORT_SYMBOL_GPL, not DSA.


One more note - at least for imx28 you can have L2 switch or FEC, not
both.

Considering the above - you would recommend using the switchdev
framework for MTIP?

> 

> > > This would also solve your

> > > power management issues, since the entire Ethernet block would be

> > > handled by a single driver. DSA is a complication you do not need.

> > > Convince me otherwise.  

> >

> > From what I see the MoreThanIP IP block looks like a "typical" L2

> > switch (like lan9xxx), with VLAN tagging support, static and

> > dynamic tables, forcing the packet to be passed to port [*],

> > congestion management, switch input buffer, priority of

> > packets/queues, broadcast, multicast, port snooping, and even

> > IEEE1588 timestamps.

> >

> > Seems like a lot of useful features.  

> 

> I did not say that it is not a typical switch, or that it doesn't have

> useful features. I said that DSA does not help you. Adding a

> DSA_TAG_PROTO_NONE driver in 2020 is a no-go all around.


Ok. I see.

> 

> > The differences from "normal" DSA switches:

> >

> > 1. It uses mapped memory (for its register space) for

> > configuration/statistics gathering (instead of e.g. SPI, I2C)  

> 

> Nope, that's not a difference.


Ok.

> 

> > 2. The TAG is not appended to the frame outgoing from the "master"

> > FEC port - it can be setup when DMA transfers packet to MTIP switch

> > internal buffer.  

> 

> That is a difference indeed. Since DSA switches are isolated from

> their host interface, and there has been a traditional separation at

> the MAC-to-MAC layer (or MAC-to-PHY-to-PHY-to-MAC in weird cases), the

> routing information is passed in-band via a header in the packet. The

> host interface has no idea that this header exists, it is just traffic

> as usual. The whole DSA infrastructure is built around intercepting

> and decoding that.

> 


Ok, so MTIP doesn't provide the core functionality for DSA out of the
box.

> > Note:

> >

> > [*] - The same situation is in the VF610 and IMX28:

> > The ESW_FFEN register - Bit 0 -> FEN

> >

> > "When set, the next frame received from port 0 (the local DMA port)

> > is forwarded to the ports defined in FD. The bit resets to zero

> > automatically when one frame from port 0 has been processed by the

> > switch (i.e. has been read from the port 0 input buffer; see Figure

> > 32-1). Therefore, the bit must be set again as necessary. See also

> > Section 32.5.8.2, "Forced Forwarding" for a description."

> >

> > (Of course the "Section 32.5.8.2" is not available)

> >

> >

> > According to above the "tag" (engress port) is set when DMA

> > transfers the packet to input MTIP buffer. This shall allow force

> > forwarding as we can setup this bit when we normally append tag in

> > the network stack.

> >

> > I will investigate this issue - and check the port separation. If it

> > works then DSA (or switchdev) shall be used?  

> 

> Source port identification and individual egress port addressing are

> things that should behave absolutely the same regardless of whether

> you have a DSA or a pure switchdev driver. I don't think that this is

> clear enough to you. DSA with DSA_TAG_PROTO_NONE is not the shortcut

> you're looking for. Please take a look at

> Documentation/networking/switchdev.rst, it explains pretty clearly

> that a switchdev port must still expose the same interface as any

> other net device.


Ok.

> 

> > (A side question - DSA uses switchdev, so when one shall use

> > switchdev standalone?)  

> 

> Short answer: if the system-side packet I/O interface is Ethernet and

> the hardware ownership is clearly delineated at the boundary of that

> Ethernet port, then it should be DSA, otherwise it shouldn't.

> 


Ok.

> But nonetheless, this raises a fundamental question, and I'll indulge

> in attempting to answer it more comprehensively.

> 

> You seem to be tapping into an idea which has been circulating for a

> while,


There was only NXP's legacy driver (2.6.35), which added support for
MTIP switch. Up till then nobody tried to upstream it....

I do have this driver forward ported to v4.19.y:
https://github.com/lmajewski/linux-imx28-l2switch/commits/master

but its long term maintenance is at best awkward.

> and which can be also found in

> Documentation/networking/dsa/dsa.rst:

> 

> -----------------------------[cut here]-----------------------------

> 

> TODO

> ====

> 

> Making SWITCHDEV and DSA converge towards an unified codebase

> -------------------------------------------------------------

> 

> SWITCHDEV properly takes care of abstracting the networking stack

> with offload capable hardware, but does not enforce a strict switch

> device driver model. On the other DSA enforces a fairly strict device

> driver model, and deals with most of the switch specific. At some

> point we should envision a merger between these two subsystems and

> get the best of both worlds.

> 

> -----------------------------[cut here]-----------------------------

> 

> IMO this is never going to happen, nor should it.

> To unify DSA and switchdev would mean to force plain switchdev drivers

> to have a stricter separation between the control path and the data

> path, a la DSA. So, just like DSA has separate drivers for the switch,

> the tagging protocol and for the DSA master, plain switchdev would

> need a separate driver too for the DMA/rings/queues, or whatever, of

> the switch (the piece of code that implements the NAPI instance). That

> driver would need to:

> (a) expose a net device for the DMA interface

> (b) fake a DSA tag that gets added by the DMA net device, just to be

>     later removed by the switchdev net device.


I also thought for a while to append "tag" byte ind tag_*.c file and
then parse it (+ write FEC_FFEN) and remove when the DMA transfer is
setup. But this seems to be not the "simple solution".

> This is just for compatibility with what DSA _needs_ to do to drive

> hardware that was _designed_ to work that way, and where the

> _hardware_ adds that tag. But non-DSA drivers don't need any of that,

> nor does it really help them in any way! So why should we model

> things this way? I don't think the lines between plain switchdev and

> DSA are blurry at all. And especially not in this case. You should

> not have a networking interface registered to the system for DMA-0 at

> all, just for MAC-0 and MAC-1.


Indeed, there should be lan1, lan2 (and no eth0). Unfortunately, now
FEC driver without switch enabled (on e.g. imx28) only exports eth0.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Florian Fainelli Nov. 28, 2020, 4:34 a.m. UTC | #16
On 11/27/2020 4:33 PM, Lukasz Majewski wrote:
>> So why use DSA at all? What benefit does it bring you? Why not do the

>> entire switch configuration from within FEC, or a separate driver very

>> closely related to it?

> 

> Mine rationale to use DSA and FEC:

> - Make as little changes to FEC as possible


Which is entirely possible if you stick to Vladimir suggestions of
exporting services for the MTIP switch driver.

> 

> - Provide separate driver to allow programming FDB, MDB, VLAN setup.

>   This seems straightforward as MTIP has separate memory region (from

>   FEC) for switch configuration, statistics, learning, static table

>   programming. What is even more bizarre FEC and MTIP have the same 8

>   registers (with different base address and +4 offset :-) ) as

>   interface to handle DMA0 transfers.


OK, not sure how that is relevant here? The register organization should
never ever dictate how to pick a particular subsystem.

> 

> - According to MTIP description from NXP documentation, there is a

>   separate register for frame forwarding, so it _shall_ also fit into

>   DSA.


And yet it does not, Vladimir went into great length into explaining
what makes the MTIP + dual FEC different here and why it does not
qualify for DSA. Basically any time you have DMA + integrated switch
tightly coupled you have what we have coined a "pure switchdev" wrapper.

> 

> 

> For me it would be enough to have:

> 

> - lan{12} - so I could enable/disable it on demand (control when switch

>   ports are passing or not packets).

> 

> - Use standard net tools (like bridge) to setup FDB/MDB, vlan 

> 

> - Read statistics from MTIP ports (all of them)

> 

> - I can use lan1 (bridged or not) to send data outside. It would be

>   also correct to use eth0.


You know you can do that without having DSA, right? Look at mlxsw, look
at rocker. You can call multiple times register_netdevice() with custom
network devices that behave differently whether HW bridging offload is
offered or not, whether the switch is declared in Device Tree or not.

> 

> I'm for the most pragmatic (and simple) solution, which fulfill above

> requirements.


The most pragmatic solution is to implement switchdev operations to
offer HW bridging offload, VLAN programming, FDB/MDB programming.

It seems to me that you are trying to look for a framework to avoid
doing a bit of middle layer work between switchdev and the FEC driver
and that is not setting you for success.
-- 
Florian
Lukasz Majewski Nov. 29, 2020, 9:59 p.m. UTC | #17
Hi Florian,

> On 11/27/2020 4:33 PM, Lukasz Majewski wrote:

> >> So why use DSA at all? What benefit does it bring you? Why not do

> >> the entire switch configuration from within FEC, or a separate

> >> driver very closely related to it?  

> > 

> > Mine rationale to use DSA and FEC:

> > - Make as little changes to FEC as possible  

> 

> Which is entirely possible if you stick to Vladimir suggestions of

> exporting services for the MTIP switch driver.


Ok.

> 

> > 

> > - Provide separate driver to allow programming FDB, MDB, VLAN setup.

> >   This seems straightforward as MTIP has separate memory region

> > (from FEC) for switch configuration, statistics, learning, static

> > table programming. What is even more bizarre FEC and MTIP have the

> > same 8 registers (with different base address and +4 offset :-) ) as

> >   interface to handle DMA0 transfers.  

> 

> OK, not sure how that is relevant here? The register organization

> should never ever dictate how to pick a particular subsystem.

> 

> > 

> > - According to MTIP description from NXP documentation, there is a

> >   separate register for frame forwarding, so it _shall_ also fit

> > into DSA.  

> 

> And yet it does not, Vladimir went into great length into explaining

> what makes the MTIP + dual FEC different here and why it does not

> qualify for DSA. 


I'm very grateful for this insight and explanation from Vladimir.

> Basically any time you have DMA + integrated switch

> tightly coupled you have what we have coined a "pure switchdev"

> wrapper.


Ok.

> 

> > 

> > 

> > For me it would be enough to have:

> > 

> > - lan{12} - so I could enable/disable it on demand (control when

> > switch ports are passing or not packets).

> > 

> > - Use standard net tools (like bridge) to setup FDB/MDB, vlan 

> > 

> > - Read statistics from MTIP ports (all of them)

> > 

> > - I can use lan1 (bridged or not) to send data outside. It would be

> >   also correct to use eth0.  

> 

> You know you can do that without having DSA, right? Look at mlxsw,

> look at rocker. You can call multiple times register_netdevice() with

> custom network devices that behave differently whether HW bridging

> offload is offered or not, whether the switch is declared in Device

> Tree or not.


I will look into those examples and try to follow them for MTIP.

> 

> > 

> > I'm for the most pragmatic (and simple) solution, which fulfill

> > above requirements.  

> 

> The most pragmatic solution is to implement switchdev operations to

> offer HW bridging offload, VLAN programming, FDB/MDB programming.


Ok.

> 

> It seems to me that you are trying to look for a framework to avoid

> doing a bit of middle layer work between switchdev and the FEC driver

> and that is not setting you for success.


I'm not afraid to rework FEC. I just thought that DSA would be the best
fit for the MTIP. However, after posting the RFC, the community gave me
arguments that I was wrong.

I'm happy for having so detailed feedback :-).


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Lukasz Majewski June 17, 2021, 11:08 a.m. UTC | #18
Hi Andrew,

> > > I would push back and say that the switch offers bridge

> > > acceleration for the FEC.   

> > 

> > Am I correct, that the "bridge acceleration" means in-hardware

> > support for L2 packet bridging?   

> 

> You should think of the hardware as an accelerator, not a switch. The

> hardware is there to accelerate what linux can already do. You setup a

> software bridge in linux, and then offload L2 switching to the

> accelerator. You setup vlans in linux, and then offload the filtering

> of them to the accelerator. If there is something linux can do, but

> the hardware cannot accelerate, you leave linux to do it in software.

> 

> > Do you propose to catch some kind of notification when user calls:

> > 

> > ip link add name br0 type bridge; ip link set br0 up;

> > ip link set lan1 up; ip link set lan2 up;

> > ip link set lan1 master br0; ip link set lan2 master br0;

> > bridge link


^^^^^^^^^^^^^ [*]

> > 

> > And then configure the FEC driver to use this L2 switch driver?  

> 

> That is what switchdev does. There are various hooks in the network

> stack which call into switchdev to ask it to offload operations to the

> accelerator.


I'm a bit confused about the interfaces that pop up when I do enable
bridging acceleration.

Without bridge I do have:
- eth0 (this is a 'primary' interface -> it also controls MII/PHY for
  eth1)
- eth1 (it uses the MII/PHY control from eth0)

Both interfaces works correctly.

And after starting the bridge (and switchdev) with commands from [*] I
do have:

- br0 (created bridge - need to assign IP to it to communicate outside,
  even when routing is set via eth0, and eth0 has the same IP address)
- eth0 (just is used to control PHY - ifconfig up/down)
- eth1 (just is used to control PHY - ifconfig up/down)

And now the question, how internally shall I tackle the transmission
(i.e. DMA setups)?

Now, I do use some hacks to re-use code for eth0 to perform the
transmission from/to imx28 L2 switch. The eth1 is stopped (it only
controls the PHY - responds to MII interrupts).

The above setup works, and the code adjustment for fec_main.c driver is
really small.

However, I do wonder how conceptually it "mix" with br0 interface? I
could leave br0 as is, but then why do I need to asign the IP address
to it to communicate?
As I need to do it - then conceptually (re-)using eth0 internal
structures (and the driver in fact) looks like some kind of abusement. 
However, adding the transmission handling to br0 net device would bloat
and potentially duplicate the code.

I would prefer to re-use code from eth0 interface - it would be also
easier to cleanu up after disabling the L2 switch.

Any feedback and help is more than welcome.

> 

> > The differences from "normal" DSA switches:

> > 

> > 1. It uses mapped memory (for its register space) for

> > configuration/statistics gathering (instead of e.g. SPI, I2C)  

> 

> That does not matter. And there are memory mapped DSA switches. The

> DSA framework puts no restrictions on how the control plane works.

> 

> > (Of course the "Section 32.5.8.2" is not available)  

> 

> It is in the Vybrid datasheet :-)

> 

>    Andrew




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Andrew Lunn June 17, 2021, 1:57 p.m. UTC | #19
On Thu, Jun 17, 2021 at 01:08:21PM +0200, Lukasz Majewski wrote:
> Hi Andrew,

> 

> > > > I would push back and say that the switch offers bridge

> > > > acceleration for the FEC.   

> > > 

> > > Am I correct, that the "bridge acceleration" means in-hardware

> > > support for L2 packet bridging?   

> > 

> > You should think of the hardware as an accelerator, not a switch. The

> > hardware is there to accelerate what linux can already do. You setup a

> > software bridge in linux, and then offload L2 switching to the

> > accelerator. You setup vlans in linux, and then offload the filtering

> > of them to the accelerator. If there is something linux can do, but

> > the hardware cannot accelerate, you leave linux to do it in software.

> > 

> > > Do you propose to catch some kind of notification when user calls:

> > > 

> > > ip link add name br0 type bridge; ip link set br0 up;

> > > ip link set lan1 up; ip link set lan2 up;

> > > ip link set lan1 master br0; ip link set lan2 master br0;

> > > bridge link

> 

> ^^^^^^^^^^^^^ [*]

> 

> > > 

> > > And then configure the FEC driver to use this L2 switch driver?  

> > 

> > That is what switchdev does. There are various hooks in the network

> > stack which call into switchdev to ask it to offload operations to the

> > accelerator.

> 

> I'm a bit confused about the interfaces that pop up when I do enable

> bridging acceleration.

> 

> Without bridge I do have:

> - eth0 (this is a 'primary' interface -> it also controls MII/PHY for

>   eth1)

> - eth1 (it uses the MII/PHY control from eth0)


You need to be careful with wording here. eth0 might provide an MDIO
bus which eth1 PHY is connected to, but eth1 is controlling the PHY,
not eth0.

> 

> Both interfaces works correctly.

> 

> And after starting the bridge (and switchdev) with commands from [*] I

> do have:


You don't start switchdev. switchdev is just an API between the
network stack and the hardware accelerator.

> - br0 (created bridge - need to assign IP to it to communicate outside,

>   even when routing is set via eth0, and eth0 has the same IP address)


The IP address should only be on the bridge.

> - eth0 (just is used to control PHY - ifconfig up/down)

> - eth1 (just is used to control PHY - ifconfig up/down)


It is more than that. You can still send on these interfaces. e.g. if
you are using ldap, or L2 PTP, the daemons will use these interfaces,
not the bridge, since they want to send frames out a specific
interface.

> And now the question, how internally shall I tackle the transmission

> (i.e. DMA setups)?

>

> Now, I do use some hacks to re-use code for eth0 to perform the

> transmission from/to imx28 L2 switch.


Avoid hacks. Always. Step back. Look at the hardware. How is the
hardware meant to be used when using the switch?

You have two choices. 

1) A DSA style driver. The SoC has an Ethernet interface connect to
one port of the switch. In this case, there should be two additional
ports of the switch connected to the outside world, i.e. the switch
has 3 ports.

2) A pure switchdev driver. You DMA frames directly to the switch
ingree/egress queues for each port connected to the outside world.  In
this case, you have a 2 port switch.

Once you get the architecture correct, then you can think about the
driver. Maybe you can reuse parts of the FEC? Maybe not.

	Andrew