Message ID | 20210209151936.97382-5-olteanv@gmail.com |
---|---|
State | New |
Headers | show |
Series | Cleanup in brport flags switchdev offload for DSA | expand |
On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote: > So switchdev drivers operating in standalone mode should disable address > learning. As a matter of practicality, we can reduce code duplication in > drivers by having the bridge notify through switchdev of the initial and > final brport flags. Then, drivers can simply start up hardcoded for no > address learning (similar to how they already start up hardcoded for no > forwarding), then they only need to listen for > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no > need for special cases when the port joins or leaves the bridge etc. How are you handling the case where a port leaves a LAG that is linked to a bridge? In this case the port becomes a standalone port, but will not get this notification.
On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote: > On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote: > > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote: > > > So switchdev drivers operating in standalone mode should disable address > > > learning. As a matter of practicality, we can reduce code duplication in > > > drivers by having the bridge notify through switchdev of the initial and > > > final brport flags. Then, drivers can simply start up hardcoded for no > > > address learning (similar to how they already start up hardcoded for no > > > forwarding), then they only need to listen for > > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no > > > need for special cases when the port joins or leaves the bridge etc. > > > > How are you handling the case where a port leaves a LAG that is linked > > to a bridge? In this case the port becomes a standalone port, but will > > not get this notification. > > Apparently the answer to that question is "I delete the code that makes > this use case work", how smart of me. Thanks. Not sure how you expect to interpret this. > > Unless you have any idea how I could move the logic into the bridge, I > guess I'm stuck with DSA and all the other switchdev drivers having this > forest of corner cases to deal with. At least I can add a comment so I'm > not tempted to delete it next time. There are too many moving pieces with stacked devices. It is not only LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better to gracefully / explicitly not handle a case rather than pretending to handle it correctly with complex / buggy code. For example, you should refuse to be enslaved to a LAG that already has upper devices such as a bridge. You are probably not handling this correctly / at all. This is easy. Just a call to netdev_has_any_upper_dev(). The reverse, during unlinking, would be to refuse unlinking if the upper has uppers of its own. netdev_upper_dev_unlink() needs to learn to return an error and callers such as team/bond need to learn to handle it, but it seems patchable.
On Wed, Feb 10, 2021 at 12:51:53AM +0200, Vladimir Oltean wrote: > On Wed, Feb 10, 2021 at 12:01:24AM +0200, Ido Schimmel wrote: > > On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote: > > > On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote: > > > > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote: > > > > > So switchdev drivers operating in standalone mode should disable address > > > > > learning. As a matter of practicality, we can reduce code duplication in > > > > > drivers by having the bridge notify through switchdev of the initial and > > > > > final brport flags. Then, drivers can simply start up hardcoded for no > > > > > address learning (similar to how they already start up hardcoded for no > > > > > forwarding), then they only need to listen for > > > > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no > > > > > need for special cases when the port joins or leaves the bridge etc. > > > > > > > > How are you handling the case where a port leaves a LAG that is linked > > > > to a bridge? In this case the port becomes a standalone port, but will > > > > not get this notification. > > > > > > Apparently the answer to that question is "I delete the code that makes > > > this use case work", how smart of me. Thanks. > > > > Not sure how you expect to interpret this. > > Next patch (05/11) deletes that explicit notification from dsa_port_bridge_leave, > function which is called from dsa_port_lag_leave too, apparently with good reason. > > > > Unless you have any idea how I could move the logic into the bridge, I > > > guess I'm stuck with DSA and all the other switchdev drivers having this > > > forest of corner cases to deal with. At least I can add a comment so I'm > > > not tempted to delete it next time. > > > > There are too many moving pieces with stacked devices. It is not only > > LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better > > to gracefully / explicitly not handle a case rather than pretending to > > handle it correctly with complex / buggy code. > > > > For example, you should refuse to be enslaved to a LAG that already has > > upper devices such as a bridge. You are probably not handling this > > correctly / at all. This is easy. Just a call to > > netdev_has_any_upper_dev(). > > Correct, good point, in particular this means that joining a bridged LAG > will not get me any notifications of that LAG's CHANGEUPPER because that > was consumed a long time ago. An equally valid approach seems to be to > check for netdev_master_upper_dev_get_rcu in dsa_port_lag_join, and call > dsa_port_bridge_join on the upper if that is present. The bridge might already have a state you are not familiar with (e.g., FDB entry pointing to the LAG), so best to just forbid this. I think it's fair to impose such limitations (assuming they are properly communicated to user space) given it results in a much less buggy/complex code to maintain. > > > The reverse, during unlinking, would be to refuse unlinking if the upper > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to > > return an error and callers such as team/bond need to learn to handle > > it, but it seems patchable. > > Again, this was treated prior to my deletion in this series and not by > erroring out, I just really didn't think it through. > > So you're saying that if we impose that all switchdev drivers restrict > the house of cards to be constructed from the bottom up, and destructed > from the top down, then the notification of bridge port flags can stay > in the bridge layer? I actually don't think it's a good idea to have this in the bridge in any case. I understand that it makes sense for some devices where learning, flooding, etc are port attributes, but in other devices these can be {port,vlan} attributes and then you need to take care of them when a vlan is added / deleted and not only when a port is removed from the bridge. So for such devices this really won't save anything. I would thus leave it to the lower levels to decide.
On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote: > > > The reverse, during unlinking, would be to refuse unlinking if the upper > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to > > > return an error and callers such as team/bond need to learn to handle > > > it, but it seems patchable. > > > > Again, this was treated prior to my deletion in this series and not by > > erroring out, I just really didn't think it through. > > > > So you're saying that if we impose that all switchdev drivers restrict > > the house of cards to be constructed from the bottom up, and destructed > > from the top down, then the notification of bridge port flags can stay > > in the bridge layer? > > I actually don't think it's a good idea to have this in the bridge in > any case. I understand that it makes sense for some devices where > learning, flooding, etc are port attributes, but in other devices these > can be {port,vlan} attributes and then you need to take care of them > when a vlan is added / deleted and not only when a port is removed from > the bridge. So for such devices this really won't save anything. I would > thus leave it to the lower levels to decide. Just for my understanding, how are per-{port,vlan} attributes such as learning and flooding managed by the Linux bridge? How can I disable flooding only in a certain VLAN?
On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote: > On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote: > > > > The reverse, during unlinking, would be to refuse unlinking if the upper > > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to > > > > return an error and callers such as team/bond need to learn to handle > > > > it, but it seems patchable. > > > > > > Again, this was treated prior to my deletion in this series and not by > > > erroring out, I just really didn't think it through. > > > > > > So you're saying that if we impose that all switchdev drivers restrict > > > the house of cards to be constructed from the bottom up, and destructed > > > from the top down, then the notification of bridge port flags can stay > > > in the bridge layer? > > > > I actually don't think it's a good idea to have this in the bridge in > > any case. I understand that it makes sense for some devices where > > learning, flooding, etc are port attributes, but in other devices these > > can be {port,vlan} attributes and then you need to take care of them > > when a vlan is added / deleted and not only when a port is removed from > > the bridge. So for such devices this really won't save anything. I would > > thus leave it to the lower levels to decide. > > Just for my understanding, how are per-{port,vlan} attributes such as > learning and flooding managed by the Linux bridge? How can I disable > flooding only in a certain VLAN? You can't (currently). But it does not change the fact that in some devices these are {port,vlan} attributes and we are talking here about the interface towards these devices. Having these as {port,vlan} attributes allows you to support use cases such as a port being enslaved to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware bridge(s). Obviously you need to ensure there is no conflict between the VLANs used by the VLAN-aware bridge and the VLAN device(s).
On Thu, Feb 11, 2021 at 11:35:27AM +0200, Vladimir Oltean wrote: > On Thu, Feb 11, 2021 at 09:44:43AM +0200, Ido Schimmel wrote: > > On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote: > > > On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote: > > > > > > The reverse, during unlinking, would be to refuse unlinking if the upper > > > > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to > > > > > > return an error and callers such as team/bond need to learn to handle > > > > > > it, but it seems patchable. > > > > > > > > > > Again, this was treated prior to my deletion in this series and not by > > > > > erroring out, I just really didn't think it through. > > > > > > > > > > So you're saying that if we impose that all switchdev drivers restrict > > > > > the house of cards to be constructed from the bottom up, and destructed > > > > > from the top down, then the notification of bridge port flags can stay > > > > > in the bridge layer? > > > > > > > > I actually don't think it's a good idea to have this in the bridge in > > > > any case. I understand that it makes sense for some devices where > > > > learning, flooding, etc are port attributes, but in other devices these > > > > can be {port,vlan} attributes and then you need to take care of them > > > > when a vlan is added / deleted and not only when a port is removed from > > > > the bridge. So for such devices this really won't save anything. I would > > > > thus leave it to the lower levels to decide. > > > > > > Just for my understanding, how are per-{port,vlan} attributes such as > > > learning and flooding managed by the Linux bridge? How can I disable > > > flooding only in a certain VLAN? > > > > You can't (currently). But it does not change the fact that in some > > devices these are {port,vlan} attributes and we are talking here about > > the interface towards these devices. Having these as {port,vlan} > > attributes allows you to support use cases such as a port being enslaved > > to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware > > bridge(s). > > I don't think I understand the use case really. You mean something like this? > > br1 (vlan_filtering=0) > / \ > / \ > swp0.100 \ > | \ > |(vlan_filtering \ > | br0 =1) \ > | / \ \ > |/ \ \ > swp0 swp1 swp2 > > A packet received on swp0 with VLAN tag 100 will go to swp0.100 which > will be forwarded according to the FDB of br1, and will be delivered to > swp2 as untagged? Respectively in the other direction, a packet received > on swp2 will have a VLAN 100 tag pushed on egress towards swp0, even if > it is already VLAN-tagged? > > What do you even use this for? The more common use case is to have multiple VLAN-unaware bridges instead of one VLAN-aware bridge. I'm not aware of users that use the hybrid model (VLAN-aware + VLAN-unaware). But regardless, this entails treating above mentioned attributes as {port,vlan} attributes. A device that only supports them as port attributes will have problems supporting such a model. > And also: if the {port,vlan} attributes can be simulated by making the > bridge port be an 8021q upper of a physical interface, then as far as > the bridge is concerned, they still are per-port attributes, and they > are per-{port,vlan} only as far as the switch driver is concerned - > therefore I don't see why it isn't okay for the bridge to notify the > brport flags in exactly the same way for them too. Look at this hunk from the patch: @@ -343,6 +360,8 @@ static void del_nbp(struct net_bridge_port *p) update_headroom(br, get_max_headroom(br)); netdev_reset_rx_headroom(dev); + nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS & ~BR_LEARNING, + BR_PORT_DEFAULT_FLAGS); nbp_vlan_flush(p); br_fdb_delete_by_port(br, p, 0, 1); switchdev_deferred_process(); Devices that treat these attributes as {port,vlan} attributes will undo this change upon the call to nbp_vlan_flush() when all the VLANs are flushed. > > > Obviously you need to ensure there is no conflict between the > > VLANs used by the VLAN-aware bridge and the VLAN device(s). > > On the other hand I think I have a more real-life use case that I think > is in conflict with this last phrase. > I have a VLAN-aware bridge and I want to run PTP in VLAN 7, but I also > need to add VLAN 7 in the VLAN table of the bridge ports so that it > doesn't drop traffic. PTP is link-local, so I need to run it on VLAN > uppers of the switch ports. Like this: > > ip link add br0 type bridge vlan_filtering 1 > ip link set swp0 master br0 > ip link set swp1 master br0 > bridge vlan add dev swp0 vid 7 master > bridge vlan add dev swp1 vid 7 master > bridge vlan add dev br0 vid 7 self > ip link add link swp0 name swp0.7 type vlan id 7 > ip link add link swp1 name swp0.7 type vlan id 7 > ptp4l -i swp0.7 -i swp1.7 -m > > How can I do that considering that you recommend avoiding conflicts > between the VLAN-aware bridge and 8021q uppers? Or is that true only > when the 8021q uppers are bridged? The problem is with the statement "I also need to add VLAN 7 in the VLAN table of the bridge ports so that it doesn't drop traffic". Packets with VLAN 7 received by swp0 will be processed by swp0.7. br0 is irrelevant and configuring swp0.7 should be enough in order to enable the VLAN filter for VLAN 7 on swp0. I don't know the internals of the HW you are working with, but I imagine that you would need to create a HW bridge between {swp0, VLAN 7} and the CPU port so that all the traffic with VLAN 7 will be sent / flooded to the CPU.
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index b979005ea39c..36d77fa8f40b 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -58,6 +58,9 @@ struct br_ip_list { #define BR_MRP_LOST_CONT BIT(18) #define BR_MRP_LOST_IN_CONT BIT(19) +#define BR_PORT_DEFAULT_FLAGS (BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD | \ + BR_LEARNING) + #define BR_DEFAULT_AGEING_TIME (300 * HZ) extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *)); diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index f7d2f472ae24..f813eec986ba 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -89,6 +89,23 @@ void br_port_carrier_check(struct net_bridge_port *p, bool *notified) spin_unlock_bh(&br->lock); } +/* If @mask has multiple bits set at once, offload them one by one to + * switchdev, to allow it to reject only what it doesn't support and accept + * what it does. + */ +static void nbp_flags_notify(struct net_bridge_port *p, unsigned long flags, + unsigned long mask) +{ + int flag; + + for_each_set_bit(flag, &mask, 32) + br_switchdev_set_port_flag(p, flags & BIT(flag), + BIT(flag), NULL); + + p->flags &= ~mask; + p->flags |= flags; +} + static void br_port_set_promisc(struct net_bridge_port *p) { int err = 0; @@ -343,6 +360,8 @@ static void del_nbp(struct net_bridge_port *p) update_headroom(br, get_max_headroom(br)); netdev_reset_rx_headroom(dev); + nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS & ~BR_LEARNING, + BR_PORT_DEFAULT_FLAGS); nbp_vlan_flush(p); br_fdb_delete_by_port(br, p, 0, 1); switchdev_deferred_process(); @@ -428,7 +447,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br, p->path_cost = port_cost(dev); p->priority = 0x8000 >> BR_PORT_BITS; p->port_no = index; - p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; + nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS, BR_PORT_DEFAULT_FLAGS); br_init_port(p); br_set_state(p, BR_STATE_DISABLED); br_stp_port_timer_init(p); diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c index ac8dead86bf2..1fae532cfbb1 100644 --- a/net/bridge/br_switchdev.c +++ b/net/bridge/br_switchdev.c @@ -55,8 +55,7 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p, } /* Flags that can be offloaded to hardware */ -#define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \ - BR_MCAST_FLOOD | BR_BCAST_FLOOD) +#define BR_PORT_FLAGS_HW_OFFLOAD BR_PORT_DEFAULT_FLAGS int br_switchdev_set_port_flag(struct net_bridge_port *p, unsigned long flags,