mbox series

[0/4] RFC CPSW switchdev mode

Message ID 1527144984-31236-1-git-send-email-ilias.apalodimas@linaro.org
Headers show
Series RFC CPSW switchdev mode | expand

Message

Ilias Apalodimas May 24, 2018, 6:56 a.m. UTC
Hello, 

This is adding a new mode on the cpsw driver based around switchdev.
In order to enable this you need to enable CONFIG_NET_SWITCHDEV, 
CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV
and add to udev config: 

SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \
        ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"
Since the phys_switch_id is based on cpsw version, users with different 
version will need to do 'ip -d link show dev sw0p0 | grep switchid' and 
replace with the correct value.

This patch creates 3 ports, sw0p0, sw0p1 and sw0p2.
sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices
while sw0p0 is the switch 'cpu facing port'.
sw0p0 will be unable to receive and transmit traffic and it's not 100% within
switchdev scope but, it's used to configure switch cpu port individually as 
this is needed for various switch features and configuration scenarios.

Bridge setup:
ip link add name br0 type bridge
ip link set dev br0 type bridge ageing_time 1000
ip link set dev br0 type bridge vlan_filtering 1

ip link set dev sw0p1 up
ip link set dev sw0p2 up
ip link set dev sw0p0 up
ip link set dev sw0p0 master br0
ip link set dev sw0p2 master br0
ip link set dev sw0p1 master br0

ip link set br0 address $(cat /sys/class/net/sw0p1/address)
ifconfig br0 up

VLAN config:
untagged:
bridge vlan add dev sw0p1 vid 100 pvid untagged master
bridge vlan add dev sw0p2 vid 100 pvid untagged master

tagged:
bridge vlan add dev sw0p1 vid 100 master
bridge vlan add dev sw0p2 vid 100 master

IP address on br0:
bridge vlan add dev br0 vid 100 pvid untagged self
bridge vlan add dev sw0p0 vid 100 pvid untagged master
udhcpc -i br0

FDBs:
bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p1 master vlan 100
bridge fdb add aa:bb:cc:dd:ee:fe dev sw0p2 master

MDBs:
single vlan:
bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent vid 100

all vlans:
bridge mdb add dev br0 port sw0p2 grp 239.1.1.1 permanent
bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent

Multicast:
setting multicast on and off will affect registered multicast
setting allmulti on and off will affect unregistered multicast
This muct occur before adding VLANs on the interfaces. If you change the
flag after the VLAN configuration you need to re-issue the VLAN config 
commands.

Promiscuous mode:
Adding/removing sw0p0 on the bridge will enable/disable ALE_P0_UNI_FLOOD

NFS:
The only way for NFS to work is by chrooting to a minimal environment when 
switch configuration that will affect connectivity is needed.
Assuming you are booting NFS with eth1 interface(the script is hacky and 
it's just there to prove NFS is doable).

setup.sh:
#!/bin/sh
mkdir proc
mount -t proc none /proc

ifconfig br0  > /dev/null
if [ $? -ne 0 ]; then
        echo "Setting up bridge"
        ip link add name br0 type bridge
        ip link set dev br0 type bridge ageing_time 1000
        ip link set dev br0 type bridge vlan_filtering 1

        ip link set eth1 down 
        ip link set eth1 name sw0p1 
        ip link set dev sw0p1 up
        ip link set dev sw0p2 up
        ip link set dev sw0p0 up
        ip link set dev sw0p0 master br0 
        ip link set dev sw0p2 master br0
        ip link set dev sw0p1 master br0
        ifconfig sw0p1 0.0.0.0
        udhchc -i br0
fi
umount /proc

run_nfs.sh:
#!/bin/sh
mkdir /tmp/root/bin -p
mkdir /tmp/root/lib -p

cp -r /lib/ /tmp/root/
cp -r /bin/ /tmp/root/
cp /sbin/ip /tmp/root/bin
cp /sbin/bridge /tmp/root/bin
cp /sbin/ifconfig /tmp/root/bin
cp /sbin/udhcpc /tmp/root/bin
cp /path/to/setup.sh /tmp/root/bin
chroot /tmp/root/ busybox sh /bin/run_nfs.sh

run ./run_nfs.sh

This is on top of 4.17-rc2 tree.

P.S: I am not 100% sure that the promiscuity handling is correct.
Please let me know if i should change anything on that

Ilias Apalodimas (4):
  cpsw: move common headers definitions to cpsw_priv.h
  cpsw_ale: add support functions for switchdev
  cpsw_switchdev: add switchdev support files
  cpsw: add switchdev support

 drivers/net/ethernet/ti/Kconfig          |   9 +
 drivers/net/ethernet/ti/Makefile         |   1 +
 drivers/net/ethernet/ti/cpsw.c           | 610 ++++++++++++++++++++++---------
 drivers/net/ethernet/ti/cpsw_ale.c       | 129 +++++++
 drivers/net/ethernet/ti/cpsw_ale.h       |   8 +
 drivers/net/ethernet/ti/cpsw_priv.h      | 148 ++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.c | 299 +++++++++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
 8 files changed, 1039 insertions(+), 169 deletions(-)
 create mode 100644 drivers/net/ethernet/ti/cpsw_priv.h
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h

-- 
2.7.4

Comments

Jiri Pirko May 24, 2018, 8:05 a.m. UTC | #1
Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:
>Hello, 

>

>This is adding a new mode on the cpsw driver based around switchdev.

>In order to enable this you need to enable CONFIG_NET_SWITCHDEV, 

>CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV

>and add to udev config: 

>

>SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \

>        ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"

>Since the phys_switch_id is based on cpsw version, users with different 

>version will need to do 'ip -d link show dev sw0p0 | grep switchid' and 

>replace with the correct value.

>

>This patch creates 3 ports, sw0p0, sw0p1 and sw0p2.

>sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices

>while sw0p0 is the switch 'cpu facing port'.


Any reason you need cpu port? We don't need it in mlxsw and also in dsa.

What is this device? Could you give me some pointer to description?
Ilias Apalodimas May 24, 2018, 8:48 a.m. UTC | #2
On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:
> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:

> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.

Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. 
The reason is that TI wants this configured differently from customer facing 
ports. Apparently there are existing customers already using the "feature". 
So OR'ing and adding the cpu port on every operation (add/del vlans add 
ucast/mcast entries etc) was less favoured. 
> 

> What is this device? Could you give me some pointer to description?

This is the switch used on TI's AM5728 and BBB boards. I am pretty sure there 
are other platforms i am not aware of.
http://www.ti.com/lit/ug/spruhz6j/spruhz6j.pdf is the techincal reference
manual. Section 24.11.5.4 "Initialization and Configuration of CPSW" is the
switch part.

Thanks,
Ilias
Andrew Lunn May 24, 2018, 12:54 p.m. UTC | #3
On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:

> > Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:

> > Any reason you need cpu port? We don't need it in mlxsw and also in dsa.

> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. 

> The reason is that TI wants this configured differently from customer facing 

> ports. Apparently there are existing customers already using the "feature". 

> So OR'ing and adding the cpu port on every operation (add/del vlans add 

> ucast/mcast entries etc) was less favoured. 


Hi Ilias

Nice to see this device moving away from its custom model and towards
the switchdev model.

Did you consider making a clean break from the existing code and write
a new driver. Let the existing customers using the existing
driver. Have the new switchdev driver fully conform to switchdev.

I don't like having this 'cpu' interface. As you say, it breaks the
switchhdev model. If we need to extend the switchdev model to support
some use case, lets do that. Please can you fully describe the use
cases, so we can discuss how to implement them cleanly within the
switchdev model.

Thanks
	  Andrew
Ivan Vecera May 24, 2018, 1:44 p.m. UTC | #4
On 24.5.2018 14:54, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote:

>> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:

>>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:

>>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.

>> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. 

>> The reason is that TI wants this configured differently from customer facing 

>> ports. Apparently there are existing customers already using the "feature". 

>> So OR'ing and adding the cpu port on every operation (add/del vlans add 

>> ucast/mcast entries etc) was less favoured. 

> 

> Hi Ilias

> 

> Nice to see this device moving away from its custom model and towards

> the switchdev model.

+1

> Did you consider making a clean break from the existing code and write

> a new driver. Let the existing customers using the existing

> driver. Have the new switchdev driver fully conform to switchdev.


I would also prefer fresh new driver. The existing one can be marked as
'bugfix-only' and later pertinently deprecated/removed.
> 

> I don't like having this 'cpu' interface. As you say, it breaks the

> switchhdev model. If we need to extend the switchdev model to support

> some use case, lets do that. Please can you fully describe the use

> cases, so we can discuss how to implement them cleanly within the

> switchdev model.

+1

Ivan
Ilias Apalodimas May 24, 2018, 2:08 p.m. UTC | #5
On Thu, May 24, 2018 at 03:44:54PM +0200, Ivan Vecera wrote:
> On 24.5.2018 14:54, Andrew Lunn wrote:

> > On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote:

> >> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:

> >>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:

> >>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.

> >> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. 

> >> The reason is that TI wants this configured differently from customer facing 

> >> ports. Apparently there are existing customers already using the "feature". 

> >> So OR'ing and adding the cpu port on every operation (add/del vlans add 

> >> ucast/mcast entries etc) was less favoured. 

> > 

> > Hi Ilias

> > 

> > Nice to see this device moving away from its custom model and towards

> > the switchdev model.

> +1

Thanks. To be honest it opens up so many posibilities for common configuration
from userspace across vendors that doing something new without it doesn't make
any sense (at least to me).

> 

> > Did you consider making a clean break from the existing code and write

> > a new driver. Let the existing customers using the existing

> > driver. Have the new switchdev driver fully conform to switchdev.

> 

> I would also prefer fresh new driver. The existing one can be marked as

> 'bugfix-only' and later pertinently deprecated/removed.

Yes, but given the driver and the platforms it's used at, we ended up patching 
the existing driver. I am not opposed to the idea, but Grygorii is more suited 
to reply on that.

> > 

> > I don't like having this 'cpu' interface. As you say, it breaks the

> > switchhdev model. If we need to extend the switchdev model to support

> > some use case, lets do that. Please can you fully describe the use

> > cases, so we can discuss how to implement them cleanly within the

> > switchdev model.

> +1

There's configuration needs from customers adding or not adding a VLAN to the
CPU port. In my configuration examples for instance, if the cpu port is not
added to the bridge, you cannot get an ip address on it. 
Similar cases exist for customers on adding MDBs as far as i know. So they want
the "customer facing ports" to have the MDBs present but not the cpu port.
In some cases (where the CPE/device that has the switch) participates in the 
traffic they want the cpu port to have the samne MDBs installed.
This is just two simple cases that come in mind, again Grygorii is more suited
to answer and explain existing/more complex use cases better than me.

Adding a cpu port that cannot transmit or receive traffic is a bit "weird", on
the other hand you can access it's configuration using the same userspace tools
and the same commands you do for the "normal" ports. Extending switchdev might
be the proper solution here.

Ilias
Andrew Lunn May 24, 2018, 2:54 p.m. UTC | #6
> There's configuration needs from customers adding or not adding a VLAN to the

> CPU port. In my configuration examples for instance, if the cpu port is not

> added to the bridge, you cannot get an ip address on it. 


If you cannot get an IP address, it is plain broken. The whole idea is
that switch port interfaces are just linux interfaces. A linux
interface which cannot get an IP address is broken.

> Similar cases exist for customers on adding MDBs as far as i know. So they want

> the "customer facing ports" to have the MDBs present but not the cpu port.


That i can understand. And it should actually work now with
switchdev. It performs IGMP snooping, and if there is nothing joining
the group on the CPU, it won't add an MDB entry to forward traffic to
the CPU.

> Adding a cpu port that cannot transmit or receive traffic is a bit "weird"


And how is it supposed to send BPDUs? STP is going to be broken....

    Andrew
Ilias Apalodimas May 24, 2018, 3:07 p.m. UTC | #7
On Thu, May 24, 2018 at 04:54:41PM +0200, Andrew Lunn wrote:
> If you cannot get an IP address, it is plain broken. The whole idea is

> that switch port interfaces are just linux interfaces. A linux

> interface which cannot get an IP address is broken.

The switch interfaces can get ip addresses just like every linux interface. The
cpu port can't (sw0p0)
> 

> > Similar cases exist for customers on adding MDBs as far as i know. So they want

> > the "customer facing ports" to have the MDBs present but not the cpu port.

> 

> That i can understand. And it should actually work now with

> switchdev. It performs IGMP snooping, and if there is nothing joining

> the group on the CPU, it won't add an MDB entry to forward traffic to

> the CPU.

Yes, but this should be configurable (i.e the customer can deny adding the MDB
on the cpu port)
> 

> > Adding a cpu port that cannot transmit or receive traffic is a bit "weird"

> 

> And how is it supposed to send BPDUs? STP is going to be broken....

Not sure about this, i'll have to check

Regards
Ilias
Andrew Lunn May 24, 2018, 3:25 p.m. UTC | #8
> > That i can understand. And it should actually work now with

> > switchdev. It performs IGMP snooping, and if there is nothing joining

> > the group on the CPU, it won't add an MDB entry to forward traffic to

> > the CPU.


> Yes, but this should be configurable (i.e the customer can deny adding the MDB

> on the cpu port)


O.K, back to the basic idea. Switch ports are just normal Linux
interfaces.

How would you configure this with two e1000e put in a bridge? I want
multicast to be bridged between the two e1000e, but the host stack
should not see the packets.

	Andrew
Ilias Apalodimas May 24, 2018, 4:02 p.m. UTC | #9
On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote:
> O.K, back to the basic idea. Switch ports are just normal Linux

> interfaces.

> 

> How would you configure this with two e1000e put in a bridge? I want

> multicast to be bridged between the two e1000e, but the host stack

> should not see the packets.

I am not sure i am following. I might be missing something. In your case you
got two ethernet pci/pcie interfaces bridged through software. You can filter
those if needed. In the case we are trying to cover, you got a hardware that
offers that capability. Since not all switches are pcie based shouldn't we be
able to allow this ?

Regards
Ilias
Andrew Lunn May 24, 2018, 4:33 p.m. UTC | #10
On Thu, May 24, 2018 at 07:02:54PM +0300, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote:

> > O.K, back to the basic idea. Switch ports are just normal Linux

> > interfaces.

> > 

> > How would you configure this with two e1000e put in a bridge? I want

> > multicast to be bridged between the two e1000e, but the host stack

> > should not see the packets.

> I am not sure i am following. I might be missing something. In your case you

> got two ethernet pci/pcie interfaces bridged through software. You can filter

> those if needed. In the case we are trying to cover, you got a hardware that

> offers that capability. Since not all switches are pcie based shouldn't we be

> able to allow this ?


switchdev is about offloading what Linux can do to hardware to
accelerate it. The switch is a block of accelerator hardware, like a
GPU is for accelerating graphics. Linux can render OpenGL, but it is
better to hand it over to the GPU accelerator.

Same applies here. The Linux bridge can bridge multicast. Using the
switchdev API, you can push that down to the accelerator, and let it
do it.

So you need to think about, how do you make the Linux bridge not pass
multicast traffic to the host stack. Then how do you extend the
switchdev API so you can push this down to the accelerator.

To really get switchdev, you often need to pivot your point of view a
bit. People often think, switchdev is about writing drivers for
switches. Its not, its about how you offload networking which Linux
can do down to a switch. And if the switch cannot accelerate it, you
leave Linux to do it.

When you get in the details, i think you will find the switchdev API
actually already has what you need for this use case. What you need to
figure out is how you make the Linux bridge not pass multicast to the
host. Well, actually, not pass multicast it has not asked for. Then
accelerate it.

    Andrew
Ilias Apalodimas May 25, 2018, 6:29 a.m. UTC | #11
On Thu, May 24, 2018 at 06:33:10PM +0200, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 07:02:54PM +0300, Ilias Apalodimas wrote:

> > On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote:

> > > O.K, back to the basic idea. Switch ports are just normal Linux

> > > interfaces.

> > > 

> > > How would you configure this with two e1000e put in a bridge? I want

> > > multicast to be bridged between the two e1000e, but the host stack

> > > should not see the packets.

> > I am not sure i am following. I might be missing something. In your case you

> > got two ethernet pci/pcie interfaces bridged through software. You can filter

> > those if needed. In the case we are trying to cover, you got a hardware that

> > offers that capability. Since not all switches are pcie based shouldn't we be

> > able to allow this ?

> 

> switchdev is about offloading what Linux can do to hardware to

> accelerate it. The switch is a block of accelerator hardware, like a

> GPU is for accelerating graphics. Linux can render OpenGL, but it is

> better to hand it over to the GPU accelerator.

> 

> Same applies here. The Linux bridge can bridge multicast. Using the

> switchdev API, you can push that down to the accelerator, and let it

> do it.

> 

> So you need to think about, how do you make the Linux bridge not pass

> multicast traffic to the host stack. Then how do you extend the

> switchdev API so you can push this down to the accelerator.

> 

> To really get switchdev, you often need to pivot your point of view a

> bit. People often think, switchdev is about writing drivers for

> switches. Its not, its about how you offload networking which Linux

> can do down to a switch. And if the switch cannot accelerate it, you

> leave Linux to do it.

> 

> When you get in the details, i think you will find the switchdev API

> actually already has what you need for this use case. What you need to

> figure out is how you make the Linux bridge not pass multicast to the

> host. Well, actually, not pass multicast it has not asked for. Then

> accelerate it.

> 

Understood, if we missed back anything on handling multicast for
the cpu port we'll go back and fix it (i am assuming snooping is the answer
here). Multicasting is only one part of the equation though. What about the 
need for vlans/FDBs on that port though? 

Ilias
Ilias Apalodimas May 25, 2018, 10:28 a.m. UTC | #12
On Fri, May 25, 2018 at 09:29:02AM +0300, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 06:33:10PM +0200, Andrew Lunn wrote:

> > On Thu, May 24, 2018 at 07:02:54PM +0300, Ilias Apalodimas wrote:

> > > On Thu, May 24, 2018 at 05:25:59PM +0200, Andrew Lunn wrote:

> > > > O.K, back to the basic idea. Switch ports are just normal Linux

> > > > interfaces.

> > > > 

> > > > How would you configure this with two e1000e put in a bridge? I want

> > > > multicast to be bridged between the two e1000e, but the host stack

> > > > should not see the packets.

> > > I am not sure i am following. I might be missing something. In your case you

> > > got two ethernet pci/pcie interfaces bridged through software. You can filter

> > > those if needed. In the case we are trying to cover, you got a hardware that

> > > offers that capability. Since not all switches are pcie based shouldn't we be

> > > able to allow this ?

> > 

> > switchdev is about offloading what Linux can do to hardware to

> > accelerate it. The switch is a block of accelerator hardware, like a

> > GPU is for accelerating graphics. Linux can render OpenGL, but it is

> > better to hand it over to the GPU accelerator.

> > 

> > Same applies here. The Linux bridge can bridge multicast. Using the

> > switchdev API, you can push that down to the accelerator, and let it

> > do it.

> > 

> > So you need to think about, how do you make the Linux bridge not pass

> > multicast traffic to the host stack. Then how do you extend the

> > switchdev API so you can push this down to the accelerator.

> > 

> > To really get switchdev, you often need to pivot your point of view a

> > bit. People often think, switchdev is about writing drivers for

> > switches. Its not, its about how you offload networking which Linux

> > can do down to a switch. And if the switch cannot accelerate it, you

> > leave Linux to do it.

> > 

> > When you get in the details, i think you will find the switchdev API

> > actually already has what you need for this use case. What you need to

> > figure out is how you make the Linux bridge not pass multicast to the

> > host. Well, actually, not pass multicast it has not asked for. Then

> > accelerate it.

> > 

> Understood, if we missed back anything on handling multicast for

> the cpu port we'll go back and fix it (i am assuming snooping is the answer

> here). Multicasting is only one part of the equation though. What about the 

> need for vlans/FDBs on that port though? 

> 

I just noticed this: https://www.spinics.net/lists/netdev/msg504760.html
I tried doing the "bridge vlan add vid 2 dev br0 self" in my initial attempts 
but didn't get a notification to program the CPU port(with the sef argument). 
This obviously solves our vlan configuration issue. So it's only static FBDs 
left.

Regards
Ilias
Andrew Lunn May 25, 2018, 11:59 a.m. UTC | #13
> So it's only static FBDs left.


Please describe the use case. Why would i want to put static FDB
addresses on a CPU port? What in the linux network stack needs this?

	  Andrew
Andrew Lunn May 25, 2018, 12:09 p.m. UTC | #14
> Understood, if we missed back anything on handling multicast for

> the cpu port we'll go back and fix it (i am assuming snooping is the answer

> here).


It is part of the answer. You should also look at .ndo_set_rx_mode.
When the switch ports are not in a bridge, this call i used to pass a
list of MAC addresses which the network stack would like to receiver.
You probably want to turn that list into MBD and FDBs.

    Andrew
Ilias Apalodimas May 31, 2018, 3:27 p.m. UTC | #15
Sorry for the late response i had some time to take another look and do some
extra testing

> switchdev is about offloading what Linux can do to hardware to

> accelerate it. The switch is a block of accelerator hardware, like a

> GPU is for accelerating graphics. Linux can render OpenGL, but it is

> better to hand it over to the GPU accelerator.

>

> Same applies here. The Linux bridge can bridge multicast. Using the

> switchdev API, you can push that down to the accelerator, and let it

> do it.

>

> So you need to think about, how do you make the Linux bridge not pass

> multicast traffic to the host stack. Then how do you extend the

> switchdev API so you can push this down to the accelerator.

>


> To really get switchdev, you often need to pivot your point of view a

> bit. People often think, switchdev is about writing drivers for

> switches. Its not, its about how you offload networking which Linux

> can do down to a switch. And if the switch cannot accelerate it, you

> leave Linux to do it.

>

> When you get in the details, i think you will find the switchdev API

> actually already has what you need for this use case. What you need to

> figure out is how you make the Linux bridge not pass multicast to the

> host. Well, actually, not pass multicast it has not asked for. Then

> accelerate it.

The current driver is already working like that. The difference between the
modes of operation is this:
By registering the 'cpu port' we choose if the linux host is going to see the
br_ip4_multicast_igmp3_report or br_multicast_ipv4_rcv (by configuring the vlan
it participates) and trigger switchdev to add the MDBs
If the cpu port is member of that VLAN then the dynamic entry shows on 'bridge
mdb show' command i.e dev br0 port sw0p1 grp 239.1.1.1 temp offload vid 100
If not the user is able to add it manually.

Anyway i got the main points of the RFC, if Petr's patch get accepted i might be
able to respin this without registering a CPU port. 

Regards
Ilias
Grygorii Strashko June 1, 2018, 9:29 p.m. UTC | #16
Hi Ilias,

On 05/24/2018 01:56 AM, Ilias Apalodimas wrote:
> Hello,

> 

> This is adding a new mode on the cpsw driver based around switchdev.

> In order to enable this you need to enable CONFIG_NET_SWITCHDEV,

> CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV

> and add to udev config:

> 

> SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \

>          ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"

> Since the phys_switch_id is based on cpsw version, users with different

> version will need to do 'ip -d link show dev sw0p0 | grep switchid' and

> replace with the correct value.

> 

> This patch creates 3 ports, sw0p0, sw0p1 and sw0p2.

> sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices

> while sw0p0 is the switch 'cpu facing port'.

> sw0p0 will be unable to receive and transmit traffic and it's not 100% within

> switchdev scope but, it's used to configure switch cpu port individually as

> this is needed for various switch features and configuration scenarios.


First, sorry for delayed reply it was very tough week (new SoC).

Second, Thanks a lot for your great work. I'm still testing it with different
 use cases and trying to consolidate my reply for all questions.

All, thanks for your comments.

-- 
regards,
-grygorii
Andrew Lunn June 2, 2018, 2:08 p.m. UTC | #17
On Fri, Jun 01, 2018 at 04:29:08PM -0500, Grygorii Strashko wrote:
> Hi Ilias,



> Second, Thanks a lot for your great work. I'm still testing it with different

>  use cases and trying to consolidate my reply for all questions.

> 

> All, thanks for your comments.


Hi Grygorii

Something i've said to Ilias already. I would recommend you don't try
to cover all your uses cases with the first version. Keep it simple
and clean, don't do anything controversial and get it merged. Then add
more features one by one. We can then discuss any odd ball features
while being able to look at the complete system, driver, switchdev and
the network stack.

    Andrew
Grygorii Strashko June 2, 2018, 11:28 p.m. UTC | #18
Hi All,

Sry, for delayed reply.

On 05/24/2018 09:08 AM, Ilias Apalodimas wrote:
> On Thu, May 24, 2018 at 03:44:54PM +0200, Ivan Vecera wrote:

>> On 24.5.2018 14:54, Andrew Lunn wrote:

>>> On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote:

>>>> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:

>>>>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:

>>>>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.

>>>> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here.

>>>> The reason is that TI wants this configured differently from customer facing

>>>> ports. Apparently there are existing customers already using the "feature".

>>>> So OR'ing and adding the cpu port on every operation (add/del vlans add

>>>> ucast/mcast entries etc) was less favoured.

>>>

>>> Hi Ilias

>>>

>>> Nice to see this device moving away from its custom model and towards

>>> the switchdev model.

>> +1

> Thanks. To be honest it opens up so many posibilities for common configuration

> from userspace across vendors that doing something new without it doesn't make

> any sense (at least to me).

> 

>>

>>> Did you consider making a clean break from the existing code and write

>>> a new driver. Let the existing customers using the existing

>>> driver. Have the new switchdev driver fully conform to switchdev.

>>

>> I would also prefer fresh new driver. The existing one can be marked as

>> 'bugfix-only' and later pertinently deprecated/removed.

> Yes, but given the driver and the platforms it's used at, we ended up patching

> the existing driver. I am not opposed to the idea, but Grygorii is more suited

> to reply on that.


Correct, we considered two options - start from scratch or hack existing driver
to get working prototype as fast as possible.
Hacking of existing driver was just faster way to go and i agree that new driver 
might be better approach for the future.

> 

>>>

>>> I don't like having this 'cpu' interface. As you say, it breaks the

>>> switchhdev model. If we need to extend the switchdev model to support

>>> some use case, lets do that. Please can you fully describe the use

>>> cases, so we can discuss how to implement them cleanly within the

>>> switchdev model.

>> +1

> There's configuration needs from customers adding or not adding a VLAN to the

> CPU port. In my configuration examples for instance, if the cpu port is not

> added to the bridge, you cannot get an ip address on it.

> Similar cases exist for customers on adding MDBs as far as i know. So they want

> the "customer facing ports" to have the MDBs present but not the cpu port.

> In some cases (where the CPE/device that has the switch) participates in the

> traffic they want the cpu port to have the samne MDBs installed.

> This is just two simple cases that come in mind, again Grygorii is more suited

> to answer and explain existing/more complex use cases better than me.

> 

> Adding a cpu port that cannot transmit or receive traffic is a bit "weird", on

> the other hand you can access it's configuration using the same userspace tools

> and the same commands you do for the "normal" ports. Extending switchdev might

> be the proper solution here.


I'd try to provide more information about this switch and why we end up adding eth0 port.
Please, be patient to me as this is new area for me and I might not be right in some
conclusions or can missing smth. 

*Before this patch set*: Current CPSW driver in switch mode can be described as below
          +----------------------------------------+
          |Linux Host 0                            |
          |                                        |
          |   TI tool (ioctl)/netdev               |
          |  + +                                   |
          +-----------------+----------------------+
             | |            |
Control MMIO | |            | Data DMA
             | |            |
         +------------+-----+-----+-----------------+
         |   | |      |   CPDMA   |        CPSW     |
         |   v |      |           |                 |
         |     |      +-----------+                 |
         |     |      |    P0     |                 |
         |     |      |           |                 |
         |     |      +-----+-----+                 |
         |     |            |  FDB: static MAC Port=0
         |     |      +-----+-----+                 |
         |     +------>    ALE    +-----+           |
         |     +------+           |     |           |
         |     |      +-----------+     |           |
         | +---+----+              +----+---+       |
         | |        |              |        |       |
         | |  P1    |              |  P2    |       |
         +------------------------------------------+
           |        |              |        |
           |  PHY   |              |  PHY   |
           +---+----+              +----+---+
               |                        |
           +---+----+               +---+----+
           |        |               |        |
           |  Host 1|               | Host 2 |
           +--------+               +--------+

Note. This is embedded world and in many cases network configuration is
static, everything which is not allowed - drop (means no such things like
IGMP or even ARP).

The core Part of CPSW is ALE module which perform switching ops according 
to ALE table (fdb/mdb/vlan). From ALE point of view *all* port absolutely equal.
And Linux Host 0 in most of customer use cases is the no much different from Hosts 1 or 2,
so allowed net traffic to Linux Host 0 have to be very carefully configured as Apps running
on it must continue working even if network failed or overloaded (packet storms).

So, as per my understanding, P0 meaning here is absolutely not the same as CPU port in DSA.
It just another switch port with bad luck to be connected directly to CPU.

Current CPSW driver offloads basic switch configuration in HW and additional configuration
can be done using TI custom tool (kernel updated to accept additional IOCTL).
Current driver creates one netdev ETH0 and all net traffic enter/exit through this device
- it possible to distinguish which external port p1/2 received packet or send packet directly
to external port p1/2, but current CPSW doesn't do this.
Static Unicast FDB entries created in ALE table for Port 0 to specify which 
unicast traffic need to be accepted by Linux Host 0.  By default only registered multicast 
or broadcast packets forwarded to Linux Host 0.



*After this patch set*: goal keep things working the same as max as possible and
get rid of TI custom tool.

        +-------------------------------------------------------+
        | Linux Host 0                                          |
        |                            +------------------+       |
        |                            |            br0   |       |
        |                            |                  |       |
        |                            +---+--------------+       |
        |                                |             |data    |
        |     +-------+--------------+---------------+ |        |
        |     |       ^              ^   |           | |        |
        |     |      ++-----+      +-+---+-+    +----+-+-+      |
        |     |      |sw0p0 |      | sw0p1 |    |sw0p2   |      |
        |     |      +------+      +----^--+    +-----^--+      |
        |     |                         |             |         |
        |     |             +-----------+-------------+         |
        |     |             |                                   |
        +----+v+------------------------------------------------+
             | |            |
Control MMIO | |            | Data DMA
             | |            |
         +------------+-----+-----+-----------------------------+
         |   | |      |   CPDMA   |        CPSW                 |
         |   v |      |           |                             |
         |     |      +-----------+                             |
         |     |      |    P0     |                             |
         |     |      |           |                             |
         |     |      +-----+-----+                             |
         |     |            |        FDB: static MACP1 port=0   |
         |     |      +-----+-----+  FDB: static MACP2 port=0   |
         |     +------>    ALE    +-----+                       |
         |     +------+           |     |                       |
         |     |      +-----------+     |                       |
         | +---+----+              +----+---+                   |
         | |        |              |        |                   |
         | |  P1    |              |  P2    |                   |
         +------------------------------------------------------+
           |        |              |        |
           |  PHY   |              |  PHY   |
           +--------+              +--------+

In this implementation switch egress traffic to Linux Host 0 always
split between sw0p1 and sw0p2 depending on which external port P1/P2
packet was received, sw0p0 doesn't produce any traffic.
On switch ingress from Linux Host 0 packets are always sent directly to
external ports P1/P2, with assumption that Linux Bridge knows where packet
should go, and ALE bypassed in this case. sw0p0 doesn't allows to
send any traffic and serves for configuration purposes only.

Below I've described some tested use cases (not include full static configuration),
but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables
adds MQPRIO and CBS Qdisc and targets AVB network features. It required to
offload MQPRIO and CBS parameters on all ports including P0. In case of P0,
CPDMA TX channels shapers need to be configured, and in case 
of sw0p1/sw0p2 internal FIFOS. 
sw0p0 also expected to be used to configure CPDMA interface in general -
number of tx/rx channels, rates, ring sizes.
In addition there is set of global CPSW parameters (not related to P1/P2, like
MAC Authorization Mode, OUI Deny Mode, crc ) which I've 
thought can be added to sw0p0 (using ethtool -priv-flags).

Additional headache is PTP: we have on PHC, but both external interfaces P1/P2
can timestamp packets.

[1] https://lkml.org/lkml/2018/5/18/1134


Below use cases were tried with this approach tried with current LKML,
so possible changes to Net/Bridge framework not considered:

 
1) boot, ping no vlan

# ip link add name br0 type bridge
# echo 0 > /sys/class/net/br0/bridge/default_pvid
# ip link set dev eth2 master br0
# ip link set dev eth0 master br0
# ip link set dev eth1 master br0
# ifconfig br0 192.168.1.2

*Note*: I've had to disable default_pvid as otherwise linux Bridge adds
and offloads default vlan 1, but default configuration for CPSW driver is vid 0.
+  CPSW specific - it can't untag packets for P0.
Another option I've found:
# ip link set dev br0 type bridge vlan_filtering 1.
but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0


2) add vlans.

Host 1 - not vlan capable,
Host 2/ Linux Host 0 can handle vlans

# bridge vlan add dev sw0p1 vid 100 pvid untagged master
# bridge vlan add dev sw0p2 vid 100 master
# bridge vlan add dev sw0p0 vid 100 master

Any combination expected to work.

*Note*. Ilias reused IFF_ALLMULTI and IFF_MULTICAST on each interface to
 configure registered and unregistered multicast ports masks for each ALE VLAN
entries. So, 

# ifconfig eth0 -multicast
# ifconfig eth0 --allmulti
# bridge vlan add dev sw0p0 vid 100 master
expected to stop forwarding  any multicast traffic to Linux Host 0

3) MDB

Allow mcast 239.1.1.1 between P1/P2
# bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent
# bridge mdb add dev br0 port sw0p2 grp 239.1.1.1 permanent

Allow mcast 239.1.1.2 between P0/P2
# bridge mdb add dev br0 port sw0p0 grp 239.1.1.2 permanent
# bridge mdb add dev br0 port sw0p2 grp 239.1.1.2 permanent

*Note !!!!!*. I've found no proper way to add L2 mcast addresses as
01-80-C2-00-00-00 or 01-1B-19-00-00-00. probably I missing smth.

4) stp

I did some updates (added port stp state offload/stp mcats addr configuration
- will post diff on Monday) and tried stp (kernel STP) by connecting P1/P2
ports to the same switch:

# ip link add name br0 type bridge
# echo 0 > /sys/class/net/br0/bridge/default_pvid
# ip link set dev sw0p2 master br0
# ip link set dev sw0p0 master br0
# ip link set dev sw0p1 master br0
# brctl show
# ifconfig br0 192.168.1.2
# brctl stp br0 on
# brctl showstp br0

sw0p1 (3)
 port id                8003                    state                  blocking
sw0p2 (1)
 port id                8001                    state                forwarding


Don't know howto:
1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST
address = blocked MAC
2) add multicast MAC address with Supervisory Packet flag set. 
Such packets will bypass most of checks inside ALE and will be forwarded in all port's
states except "disabled".
3) add "unknown vlan configuration" : ALE provides possibility to configure
default behavior for tagged packets with "unknown vlan" by configuring 
- Unknown VLAN Force Untagged Egress ports Mask.
- Unknown VLAN Registered Multicast Flood Ports Mask
- Unknown VLAN Multicast Flood ports Mask
- Unknown VLAN Member ports List
4) The way to detect "brctl stp br0 on/off"


If you are here. Thanks for reading this and your patience.

-- 
regards,
-grygorii
Andrew Lunn June 3, 2018, 12:08 a.m. UTC | #19
On Sat, Jun 02, 2018 at 06:28:22PM -0500, Grygorii Strashko wrote:

Hi Grygorii

I'm just picking out one thing here... there is lots more good stuff here.

> Additional headache is PTP: we have on PHC, but both external interfaces P1/P2

> can timestamp packets.

 
This should not be a problem. The Marvell switches have one PHC, but
each port can time stamp packets using this counter. Each port has its
own receive and transmit time stamp registers. So i don't think this
will cause you problems.

     Andrew
Andrew Lunn June 3, 2018, 12:26 a.m. UTC | #20
> *After this patch set*: goal keep things working the same as max as

> possible and get rid of TI custom tool.


We are happy to keep things the same, if they fit with the switchdev
model. Anything in your customer TI tool/model which does not fit the
switchdev model you won't be able to keep, except if we agree to
extend the model.

I can say now, sw0p0 is going to cause problems. I really do suggest
you drop it for the moment in order to get a minimal driver
accepted. sw0p0 does not fit the switchdev model.

> Below I've described some tested use cases (not include full static configuration),

> but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables

> adds MQPRIO and CBS Qdisc and targets AVB network features. It required to

> offload MQPRIO and CBS parameters on all ports including P0. In case of P0,

> CPDMA TX channels shapers need to be configured, and in case 

> of sw0p1/sw0p2 internal FIFOS. 

> sw0p0 also expected to be used to configure CPDMA interface in general -

> number of tx/rx channels, rates, ring sizes.


Can this be derives from the configuration on sw0p1 and sw0p2? 
sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx
channels?

> In addition there is set of global CPSW parameters (not related to P1/P2, like

> MAC Authorization Mode, OUI Deny Mode, crc ) which I've 

> thought can be added to sw0p0 (using ethtool -priv-flags).


You should describe these features, and then we can figure out how
best to model them. devlink might be an option if they are switch
global.

     Andrew
Andrew Lunn June 3, 2018, 12:37 a.m. UTC | #21
> 1) boot, ping no vlan

> 

> # ip link add name br0 type bridge

> # echo 0 > /sys/class/net/br0/bridge/default_pvid

> # ip link set dev eth2 master br0

> # ip link set dev eth0 master br0

> # ip link set dev eth1 master br0

> # ifconfig br0 192.168.1.2

> 

> *Note*: I've had to disable default_pvid as otherwise linux Bridge adds

> and offloads default vlan 1, but default configuration for CPSW driver is vid 0.

> +  CPSW specific - it can't untag packets for P0.

> Another option I've found:

> # ip link set dev br0 type bridge vlan_filtering 1.

> but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0


There are three different configurations here you need to worry about,
with respect to vlans:

# CONFIG_VLAN_8021Q is not set

So you don't have any vlan support in the kernel.

CONFIG_VLAN_8021Q=y, vlan_filtering = 0

So you have vlans, but filtering is off

CONFIG_VLAN_8021Q=y, vlan_filtering = 1

So you have vlans, and filtering is on.

Even with vlan_filtering off, the bridge still does a little with
vlans.

And you need all three to work correctly. 

    Andrew
Andrew Lunn June 3, 2018, 12:49 a.m. UTC | #22
Hi Grygorii

> Don't know howto:

> 1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST

> address = blocked MAC

> 2) add multicast MAC address with Supervisory Packet flag set. 

> Such packets will bypass most of checks inside ALE and will be forwarded in all port's

> states except "disabled".

> 3) add "unknown vlan configuration" : ALE provides possibility to configure

> default behavior for tagged packets with "unknown vlan" by configuring 

> - Unknown VLAN Force Untagged Egress ports Mask.

> - Unknown VLAN Registered Multicast Flood Ports Mask

> - Unknown VLAN Multicast Flood ports Mask

> - Unknown VLAN Member ports List

> 4) The way to detect "brctl stp br0 on/off"


You are probably looking at this from the wrong direction. Yes, the
switch can do these things. But the real question is, why would the
network stack want to do this? As i've said before, you are
accelerating the network stack by offloading things to the hardware.

Does the software bridge support FDB with a blocked flag? I don't
think it does. So you first need to extend the software bridge with
this concept. Then you can offload it to the hardware to accelerate
it.

Does the network stack need for forward specific multicast MAC
addresses between bridge ports independent of the state? If there is
no need for it, you don't need to accelerate it.

   Andrew
Grygorii Strashko June 5, 2018, 9:18 p.m. UTC | #23
On 06/02/2018 07:08 PM, Andrew Lunn wrote:
> On Sat, Jun 02, 2018 at 06:28:22PM -0500, Grygorii Strashko wrote:

> 

> Hi Grygorii

> 

> I'm just picking out one thing here... there is lots more good stuff here.

> 

>> Additional headache is PTP: we have on PHC, but both external interfaces P1/P2

>> can timestamp packets.

>   

> This should not be a problem. The Marvell switches have one PHC, but

> each port can time stamp packets using this counter. Each port has its

> own receive and transmit time stamp registers. So i don't think this

> will cause you problems.


I hope you are right - question is always in number of available options
and which one to select - and, most important, explain it to the end user :( 


For example:
phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(),
so which intf should return phc_index?

Still not tested, so jut hope ...


-- 
regards,
-grygorii
Andrew Lunn June 5, 2018, 9:28 p.m. UTC | #24
> I hope you are right - question is always in number of available options

> and which one to select - and, most important, explain it to the end user :( 


The end customer being ptp4linux? At least for Marvell switches, it is
happy about everything except that the switch is a bit slow, so we
need to modify some of the time outs in the configuration file.

> For example:

> phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(),

> so which intf should return phc_index?


It is not a 1:1 relationship. See:

https://elixir.bootlin.com/linux/latest/source/drivers/net/dsa/mv88e6xxx/hwtstamp.c#L61

All interfaces return the same index.

In fact, for a switch, having a PHC per port would be odd. That would
mean you need to sync the PHCs in order to act as a boundary clock.

    Andrew
Grygorii Strashko June 5, 2018, 9:31 p.m. UTC | #25
On 06/02/2018 07:37 PM, Andrew Lunn wrote:
>> 1) boot, ping no vlan

>>

>> # ip link add name br0 type bridge

>> # echo 0 > /sys/class/net/br0/bridge/default_pvid

>> # ip link set dev eth2 master br0

>> # ip link set dev eth0 master br0

>> # ip link set dev eth1 master br0

>> # ifconfig br0 192.168.1.2

>>

>> *Note*: I've had to disable default_pvid as otherwise linux Bridge adds

>> and offloads default vlan 1, but default configuration for CPSW driver is vid 0.

>> +  CPSW specific - it can't untag packets for P0.

>> Another option I've found:

>> # ip link set dev br0 type bridge vlan_filtering 1.

>> but anyway, I've found it confusing that Linux bridge adds default vlan when vlan_filtering == 0

> 

> There are three different configurations here you need to worry about,

> with respect to vlans:

> 

> # CONFIG_VLAN_8021Q is not set

> 

> So you don't have any vlan support in the kernel.

> 

> CONFIG_VLAN_8021Q=y, vlan_filtering = 0

> 

> So you have vlans, but filtering is off

> 

> CONFIG_VLAN_8021Q=y, vlan_filtering = 1

> 

> So you have vlans, and filtering is on.

> 

> Even with vlan_filtering off, the bridge still does a little with

> vlans.

> 

> And you need all three to work correctly.

> 


Right, thanky for the info, but still (sry, to be annoying) why default vlan is added by bridge
when vlan_filtering == 0?

-- 
regards,
-grygorii
Andrew Lunn June 5, 2018, 9:37 p.m. UTC | #26
> Right, thanky for the info, but still (sry, to be annoying) why default vlan is added by bridge

> when vlan_filtering == 0?


I have no idea!

I just made sure the Marvell driver works with this.

You might want to do a git blame and find out who added it, and it
might say why.

      Andrew
Grygorii Strashko June 5, 2018, 9:42 p.m. UTC | #27
On 06/05/2018 04:28 PM, Andrew Lunn wrote:
>> I hope you are right - question is always in number of available options

>> and which one to select - and, most important, explain it to the end user :(

> 

> The end customer being ptp4linux? At least for Marvell switches, it is

> happy about everything except that the switch is a bit slow, so we

> need to modify some of the time outs in the configuration file.

> 

>> For example:

>> phc_index is returned as part of .get_ts_info() = cpsw_get_ts_info(),

>> so which intf should return phc_index?

> 

> It is not a 1:1 relationship. See:

> 

> https://elixir.bootlin.com/linux/latest/source/drivers/net/dsa/mv88e6xxx/hwtstamp.c#L61

> 

> All interfaces return the same index.

> 

> In fact, for a switch, having a PHC per port would be odd. That would

> mean you need to sync the PHCs in order to act as a boundary clock.


PHC only one, but hw timestamping blocks are per port.

-- 
regards,
-grygorii
Andrew Lunn June 5, 2018, 9:55 p.m. UTC | #28
> PHC only one, but hw timestamping blocks are per port.


Yes, same as the Marvell. Per port, there are two receive time stamps
and one transmit time stamp.

    Andrew
Grygorii Strashko June 5, 2018, 10:45 p.m. UTC | #29
Hi Andrew,

Thanks a lot for you comments.

On 06/02/2018 07:49 PM, Andrew Lunn wrote:
> Hi Grygorii

> 

>> Don't know howto:

>> 1) add FDB entry with "blocked" flag - ALE can discard all packets with SRC/DST

>> address = blocked MAC

>> 2) add multicast MAC address with Supervisory Packet flag set.

>> Such packets will bypass most of checks inside ALE and will be forwarded in all port's

>> states except "disabled".

>> 3) add "unknown vlan configuration" : ALE provides possibility to configure

>> default behavior for tagged packets with "unknown vlan" by configuring

>> - Unknown VLAN Force Untagged Egress ports Mask.

>> - Unknown VLAN Registered Multicast Flood Ports Mask

>> - Unknown VLAN Multicast Flood ports Mask

>> - Unknown VLAN Member ports List

>> 4) The way to detect "brctl stp br0 on/off"

> 

> You are probably looking at this from the wrong direction. Yes, the

> switch can do these things. But the real question is, why would the

> network stack want to do this? As i've said before, you are

> accelerating the network stack by offloading things to the hardware.


Right. Thanks. 

> 

> Does the software bridge support FDB with a blocked flag? I don't

> think it does. So you first need to extend the software bridge with

> this concept. Then you can offload it to the hardware to accelerate

> it.


Sry, for possible misunderstanding: in "Don't know howto" i've listed
things I was not able to discover from code or documentation with hope
that expert opinion will help to confirm if this this a real/possible gap
or I/we've just missed smth. And if this is a real gap - get "green" or "red"
flag for future work (which need to be planned somehow). 

So, my understanding for (1) "blocked FDB entry support" is reasonable
extension for bridge/switchdev ("green").

> 

> Does the network stack need for forward specific multicast MAC

> addresses between bridge ports independent of the state? If there is

> no need for it, you don't need to accelerate it.


Assume this is about item 2 - this question is related to STP packets.
CPSW/ALE will drop STP packets if receiving port is in blocking/learning states 
unless corresponding  mcast entry exist in ALE entry with (Supervisory Packet) flag set
(Actually ALE mcast entry has two fields (TRM): 
Supervisory Packet (SUPER): When set, this field indicates that the packet
 with a matching multicast destination address is a supervisory packet.
Multicast Forward State (MCAST_FWD_STATE): Indicates the port state(s) required for the received port
on a destination address lookup in order for the multicast packet to be forwarded to
the transmit port(s). A transmit port must be in the Forwarding state in
order to forward the packet.)

Question 4 was asked with assumption that if (2) not supported and "red" flag
- then option (4) can be used as w/a (again if "green" flag) and STP mcast address
can be added in ALE on event "stp on".


** "unknown vlan configuration"

This is about following use case. Non static network configuration when
CPSW based device knows what traffic it can accept (Host port 0), but
it has no knowledge (or limited) about network segments attached to Port 1 and Port 2.

For example: Host 0 can accept only vlan 100 traffic coming from Port 1.
ALE entry: vid =100, port_mask 0x3

But there can be vlan 50 created in attached network segments.
Unknown VLAN Force Untagged Egress ports Mask = 0x0
Unknown VLAN Registered Multicast Flood Ports Mask = 0x6 (P1|P2)
Unknown VLAN Multicast Flood ports Mask = 0x6 (P1|P2)
Unknown VLAN Member ports List  = 0x6 (P1|P2)

with above configuration packets with "unknown vlan" (no ALE entry) will
still be forwarded between port 1 and 2, but not Port 0. 

So, is it reasonable to add "unknown vlan configuration" to bridge/switchdev
if not exist yet? will any other hw known benefit from it?

-- 
regards,
-grygorii
Grygorii Strashko June 5, 2018, 10:59 p.m. UTC | #30
On 06/02/2018 09:08 AM, Andrew Lunn wrote:
> On Fri, Jun 01, 2018 at 04:29:08PM -0500, Grygorii Strashko wrote:

>> Hi Ilias,

> 

> 

>> Second, Thanks a lot for your great work. I'm still testing it with different

>>   use cases and trying to consolidate my reply for all questions.

>>

>> All, thanks for your comments.

> 

> Hi Grygorii

> 

> Something i've said to Ilias already. I would recommend you don't try

> to cover all your uses cases with the first version. Keep it simple

> and clean, don't do anything controversial and get it merged. Then add

> more features one by one. We can then discuss any odd ball features

> while being able to look at the complete system, driver, switchdev and

> the network stack.

> 


yes. It definitely no problem from my side, except basic customer use-cases 
simply not working without sw0p0, at least with current LKML :(

And I just have to look a little bit in the future as selected approach
expected to be extended on future SoC (and other parts of existing SoCs - ICSS-G SW switch)
 where we going to have more features, like TSN, EST and packet Policing and Classification.

And I very, very appreciated for your (and all others) time and comments.

Thank you.

-- 
regards,
-grygorii
Grygorii Strashko June 5, 2018, 11:23 p.m. UTC | #31
On 06/02/2018 07:26 PM, Andrew Lunn wrote:
>> *After this patch set*: goal keep things working the same as max as

>> possible and get rid of TI custom tool.

> 

> We are happy to keep things the same, if they fit with the switchdev

> model. Anything in your customer TI tool/model which does not fit the

> switchdev model you won't be able to keep, except if we agree to

> extend the model.


Right. That's the main goal of RFC to identify those gaps.

> 

> I can say now, sw0p0 is going to cause problems. I really do suggest

> you drop it for the moment in order to get a minimal driver

> accepted. sw0p0 does not fit the switchdev model.


Honestly, this is not the first patchset and we started without sw0p0,
but then.... (with current LKML)
- default vlan offloading breaks traffic reception to P0
 (Ilias saying it's fixed in next - good)
- adding vlan to P1/P2 works, but not for P0 (again as per Ilias -fixed)
- mcast - no way to manually add static record and include or exclude P0.


:( above are basic functionality required.

> 

>> Below I've described some tested use cases (not include full static configuration),

>> but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables

>> adds MQPRIO and CBS Qdisc and targets AVB network features. It required to

>> offload MQPRIO and CBS parameters on all ports including P0. In case of P0,

>> CPDMA TX channels shapers need to be configured, and in case

>> of sw0p1/sw0p2 internal FIFOS.

>> sw0p0 also expected to be used to configure CPDMA interface in general -

>> number of tx/rx channels, rates, ring sizes.

> 

> Can this be derives from the configuration on sw0p1 and sw0p2?

> sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx

> channels?


This not exactly what is required, sry I probably will need just repeat what Ivan 
described in https://lkml.org/lkml/2018/5/18/1135. I'd try to provide this info tomorrow.

Unfortunately, I'm not sure if all this reworking and switchdev conversation would make sense
if we will not be able to fit Ivan's work in new CPSW driver model ;..(
and do AVB bridge.

> 

>> In addition there is set of global CPSW parameters (not related to P1/P2, like

>> MAC Authorization Mode, OUI Deny Mode, crc ) which I've

>> thought can be added to sw0p0 (using ethtool -priv-flags).

> 

> You should describe these features, and then we can figure out how

> best to model them. devlink might be an option if they are switch

> global.


Ok. that can be postponed.

-- 
regards,
-grygorii
Andrew Lunn June 5, 2018, 11:40 p.m. UTC | #32
> So, my understanding for (1) "blocked FDB entry support" is reasonable

> extension for bridge/switchdev ("green").


You might have to justify it, but yes.

> > Does the network stack need for forward specific multicast MAC

> > addresses between bridge ports independent of the state? If there is

> > no need for it, you don't need to accelerate it.

> 

> Assume this is about item 2 - this question is related to STP packets.

> CPSW/ALE will drop STP packets if receiving port is in blocking/learning states 

> unless corresponding  mcast entry exist in ALE entry with (Supervisory Packet) flag set

> (Actually ALE mcast entry has two fields (TRM): 

> Supervisory Packet (SUPER): When set, this field indicates that the packet

>  with a matching multicast destination address is a supervisory packet.

> Multicast Forward State (MCAST_FWD_STATE): Indicates the port state(s) required for the received port

> on a destination address lookup in order for the multicast packet to be forwarded to

> the transmit port(s). A transmit port must be in the Forwarding state in

> order to forward the packet.)


So i this case, i would expect your driver to just add these entries
to the ALE. No need for configuration for above. Just do it as soon as
a port is made a member of a bridge. And remove it when the port
leaves the bridge.

DSA devices are smarter. They all have hardware which looks for BPDU
and forwards them to the host independent of the state of the port.
They also tend to have hardware looking for IGMP packets, and will
forward them to the host, even if the multicast address is not being
forwarded to the host.

> ** "unknown vlan configuration"

> 

> This is about following use case. Non static network configuration when

> CPSW based device knows what traffic it can accept (Host port 0), but

> it has no knowledge (or limited) about network segments attached to Port 1 and Port 2.

> 

> For example: Host 0 can accept only vlan 100 traffic coming from Port 1.

> ALE entry: vid =100, port_mask 0x3

> 

> But there can be vlan 50 created in attached network segments.

> Unknown VLAN Force Untagged Egress ports Mask = 0x0

> Unknown VLAN Registered Multicast Flood Ports Mask = 0x6 (P1|P2)

> Unknown VLAN Multicast Flood ports Mask = 0x6 (P1|P2)

> Unknown VLAN Member ports List  = 0x6 (P1|P2)

> 

> with above configuration packets with "unknown vlan" (no ALE entry) will

> still be forwarded between port 1 and 2, but not Port 0. 

> 

> So, is it reasonable to add "unknown vlan configuration" to bridge/switchdev

> if not exist yet? will any other hw known benefit from it?


You need to think about the case of two e1000e. How do you configure
this setup to do what you want? It probably is already possible.

     Andrew
Andrew Lunn June 5, 2018, 11:49 p.m. UTC | #33
On Tue, Jun 05, 2018 at 06:23:45PM -0500, Grygorii Strashko wrote:
> 

> 

> On 06/02/2018 07:26 PM, Andrew Lunn wrote:

> >> *After this patch set*: goal keep things working the same as max as

> >> possible and get rid of TI custom tool.

> > 

> > We are happy to keep things the same, if they fit with the switchdev

> > model. Anything in your customer TI tool/model which does not fit the

> > switchdev model you won't be able to keep, except if we agree to

> > extend the model.

> 

> Right. That's the main goal of RFC to identify those gaps.

> 

> > 

> > I can say now, sw0p0 is going to cause problems. I really do suggest

> > you drop it for the moment in order to get a minimal driver

> > accepted. sw0p0 does not fit the switchdev model.

> 

> Honestly, this is not the first patchset and we started without sw0p0,

> but then.... (with current LKML)

> - default vlan offloading breaks traffic reception to P0

>  (Ilias saying it's fixed in next - good)

> - adding vlan to P1/P2 works, but not for P0 (again as per Ilias -fixed)

> - mcast - no way to manually add static record and include or exclude P0.

> 

> 

> :( above are basic functionality required.


For a DSA driver, this is way more than basic. A basic DSA driver just
provides interfaces, and does everything in software. No offload at
all. Generally, FDB offload is next, then MDB, and then VLAN, each as
separate patch sets.

> Unfortunately, I'm not sure if all this reworking and switchdev conversation would make sense

> if we will not be able to fit Ivan's work in new CPSW driver model ;..(

> and do AVB bridge.


AVB bridge should fit the switchdev model. You can offload TC via
switchdev e.g. the b53 has mirred, mellonex has flower and a lot more.

      Andrew
Andrew Lunn June 5, 2018, 11:53 p.m. UTC | #34
> And I just have to look a little bit in the future as selected approach

> expected to be extended on future SoC (and other parts of existing SoCs - ICSS-G SW switch)

> where we going to have more features, like TSN, EST and packet Policing and Classification.


You should probably took a closer look at the Mellonex driver. It has
a lot of TC offload, which is what policing and classification is.

TSN is being worked on in general, and i think the i210 is taking the
lead. So you probably want to keep an eye on that, and join the
discussion.

	Andrew
Ilias Apalodimas June 6, 2018, 6:42 a.m. UTC | #35
Hi Andrew,

>On Wed, Jun 06, 2018 at 01:53:56AM +0200, Andrew Lunn wrote:

> > And I just have to look a little bit in the future as selected approach

> > expected to be extended on future SoC (and other parts of existing SoCs - ICSS-G SW switch)

> > where we going to have more features, like TSN, EST and packet Policing and Classification.

> 

> You should probably took a closer look at the Mellonex driver. It has

> a lot of TC offload, which is what policing and classification is.

> 

I did take a close look to both Mellanox and rocker drivers before issuing this
RFC and we seem to be close on the approach. What Grygorii is reffering to, is
that for CBS to work properly on CPSW there *must* be a way to configure the
CPU port individually.

> TSN is being worked on in general, and i think the i210 is taking the

> lead. So you probably want to keep an eye on that, and join the

> discussion.

> 

TSN is not just "one thing". It's a few IEEE standards bundled up to provide the
needed functionality. i210 is only implementing CBS at the moment and there's
an RFC out there to support what they call "Time based scheduling".
I am already having discussions with Jesus on their current work.

The idea behind using switchdev is that due to it's design, it's a very
convenient way of utilizing netlink and iproute2/tc to configure any kind of
future shapers. Since net_device_ops now has a callback to configure hardware
schedulers being able to expose netdevs as hardware ports and configure them
individually is great. As you can understand you'll end up with TSN capable
switches and NICs being configured with the same commands from a userspace point
of view. I am not sure this is the proper way to go, but at least for me, 
sounds like a viable solution.

Thanks
Ilias
Ivan Khoronzhuk June 6, 2018, 8:23 a.m. UTC | #36
On Tue, Jun 05, 2018 at 06:23:45PM -0500, Grygorii Strashko wrote:
>

>

>On 06/02/2018 07:26 PM, Andrew Lunn wrote:

>>> *After this patch set*: goal keep things working the same as max as

>>> possible and get rid of TI custom tool.

>>

>> We are happy to keep things the same, if they fit with the switchdev

>> model. Anything in your customer TI tool/model which does not fit the

>> switchdev model you won't be able to keep, except if we agree to

>> extend the model.

>

>Right. That's the main goal of RFC to identify those gaps.

>

>>

>> I can say now, sw0p0 is going to cause problems. I really do suggest

>> you drop it for the moment in order to get a minimal driver

>> accepted. sw0p0 does not fit the switchdev model.

>

>Honestly, this is not the first patchset and we started without sw0p0,

>but then.... (with current LKML)

>- default vlan offloading breaks traffic reception to P0

> (Ilias saying it's fixed in next - good)

>- adding vlan to P1/P2 works, but not for P0 (again as per Ilias -fixed)

>- mcast - no way to manually add static record and include or exclude P0.

>

>

>:( above are basic functionality required.

>

>>

>>> Below I've described some tested use cases (not include full static configuration),

>>> but regarding sw0p0 - there is work done by Ivan Khoronzhuk [1] which enables

>>> adds MQPRIO and CBS Qdisc and targets AVB network features. It required to

>>> offload MQPRIO and CBS parameters on all ports including P0. In case of P0,

>>> CPDMA TX channels shapers need to be configured, and in case

>>> of sw0p1/sw0p2 internal FIFOS.

>>> sw0p0 also expected to be used to configure CPDMA interface in general -

>>> number of tx/rx channels, rates, ring sizes.

>>

>> Can this be derives from the configuration on sw0p1 and sw0p2?

>> sw0p1 has 1 tx channel, sw0p2 has 2 tx channels, so give p0 3 tx

>> channels?

>

>This not exactly what is required, sry I probably will need just repeat what Ivan

>described in https://lkml.org/lkml/2018/5/18/1135. I'd try to provide this info tomorrow.

>

>Unfortunately, I'm not sure if all this reworking and switchdev conversation would make sense

>if we will not be able to fit Ivan's work in new CPSW driver model ;..(

>and do AVB bridge.


Not sure I tracked everything, but current link example only for dual-emac mode,
where i guess no switchdev and thus we have only 2 ports having phys and no need
in some common configuration. Only common resources tx queues that are easily
shared between two ports. In case of switchdev the same, no need in port 0, at
least for cpsw implementation (not sure about others), tx queues cpdma shapers
are configured separately and can be done with any netdev presenting ports,
thus p0 can be avoided in this case also. For instance, I've created short
description, based on current switchdev RFC, showing simple configuration
commands for shapers configuration from host side and CBS for every port,
w/o p0 usage:
https://git.linaro.org/people/ivan.khoronzhuk/tsn_conf.git/tree/README_switchdev

Will add it once switchdev decision is stabilized and applied. Basically nothing
changed with dual-emac configuration except different rates set for cpdma
shapers from host side.

Probably, p0 is needed for other configurations, or for compatibility reasons
with old versions, but definitely should be created list of all cases, and on my
opinion, any common resource for both ports can be configured with any netdev
presenting ports if it doesn't conflict with ports configuration itself.

>

>>

>>> In addition there is set of global CPSW parameters (not related to P1/P2, like

>>> MAC Authorization Mode, OUI Deny Mode, crc ) which I've

>>> thought can be added to sw0p0 (using ethtool -priv-flags).

>>

>> You should describe these features, and then we can figure out how

>> best to model them. devlink might be an option if they are switch

>> global.

>

>Ok. that can be postponed.

>

>-- 

>regards,

>-grygorii


-- 
Regards,
Ivan Khoronzhuk