diff mbox series

[RFC,net-next] net: bridge: switchdev: expose the port hwdom as a netlink attribute

Message ID 20210812121703.3461228-1-vladimir.oltean@nxp.com
State New
Headers show
Series [RFC,net-next] net: bridge: switchdev: expose the port hwdom as a netlink attribute | expand

Commit Message

Vladimir Oltean Aug. 12, 2021, 12:17 p.m. UTC
It is useful for a user to see whether a bridge port is offloaded or
not, and sometimes this may depend on hardware capability.

For example, a switchdev driver might be able to offload bonding/team
interfaces as bridge ports, but only for certain xmit hash policies.
When running into that situation, DSA for example prints a warning
extack that the interface is not offloaded, but not all drivers do that.
In fact, since the recent bridge switchdev explicit offloading API, all
switchdev drivers should be able to fall back to software LAGs being
bridge ports without having any explicit code to handle them. So they
don't have the warning extack printed anywhere.

Or alternatively, we might want in the future to patch individual
switchdev drivers to implement something like:

	ethtool -K sw0p2 l2-fwd-offload off

since that is now possible and easy (the driver just needs to not call
switchdev_bridge_port_offload). This might be helpful when installing
deep packet inspection firewall rules on a bridge port which must be
executed in software, so the port should not forward the packets
autonomously but send them to the CPU.

With this change, the "hardware domain" concept becomes UAPI. It is a
read-only link attribute which is zero for non-offloaded bridge ports,
and has a non-zero value that is unique per bridge otherwise (i.e. two
different bridge ports, in two different bridges, might have a hwdom of
1 and they are still different hardware domains).

./ip -d link
13: sw1p3@swp2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br0
		state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether 00:04:9f:0a:0b:0c brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68
    maxmtu 2021 bridge_slave state disabled priority 32 cost 100 hairpin off guard off
    root_block off fastleave off learning on flood on port_id 0x8007 port_no 0x7
    designated_port 32775 designated_cost 0 designated_bridge 8000.0:4:9f:a:b:c
    designated_root 8000.0:4:9f:a:b:c hold_timer    0.00 message_age_timer    0.00
    forward_delay_timer    0.00 topology_change_ack 0 config_pending 0 proxy_arp off
    proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on
    mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0
    vlan_tunnel off isolated off hwdom 2 addrgenmode none numtxqueues 8 numrxqueues 1
    gso_max_size 65536 gso_max_segs 65535 portname p3 switchid 02000000 parentbus spi
    parentdev spi2.1

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/uapi/linux/if_link.h | 1 +
 net/bridge/br_netlink.c      | 5 +++++
 net/bridge/br_private.h      | 6 ++++++
 net/bridge/br_switchdev.c    | 5 +++++
 4 files changed, 17 insertions(+)

Comments

Ido Schimmel Aug. 12, 2021, 3:35 p.m. UTC | #1
On Thu, Aug 12, 2021 at 03:17:03PM +0300, Vladimir Oltean wrote:
> It is useful for a user to see whether a bridge port is offloaded or
> not, and sometimes this may depend on hardware capability.
> 
> For example, a switchdev driver might be able to offload bonding/team
> interfaces as bridge ports, but only for certain xmit hash policies.
> When running into that situation, DSA for example prints a warning
> extack that the interface is not offloaded, but not all drivers do that.
> In fact, since the recent bridge switchdev explicit offloading API, all
> switchdev drivers should be able to fall back to software LAGs being
> bridge ports without having any explicit code to handle them. So they
> don't have the warning extack printed anywhere.

[...]

> With this change, the "hardware domain" concept becomes UAPI. It is a
> read-only link attribute which is zero for non-offloaded bridge ports,
> and has a non-zero value that is unique per bridge otherwise (i.e. two
> different bridge ports, in two different bridges, might have a hwdom of
> 1 and they are still different hardware domains).
> 
> ./ip -d link
> 13: sw1p3@swp2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue master br0
> 		state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
>     link/ether 00:04:9f:0a:0b:0c brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68
>     maxmtu 2021 bridge_slave state disabled priority 32 cost 100 hairpin off guard off
>     root_block off fastleave off learning on flood on port_id 0x8007 port_no 0x7
>     designated_port 32775 designated_cost 0 designated_bridge 8000.0:4:9f:a:b:c
>     designated_root 8000.0:4:9f:a:b:c hold_timer    0.00 message_age_timer    0.00
>     forward_delay_timer    0.00 topology_change_ack 0 config_pending 0 proxy_arp off
>     proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on
>     mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0
>     vlan_tunnel off isolated off hwdom 2 addrgenmode none numtxqueues 8 numrxqueues 1
>     gso_max_size 65536 gso_max_segs 65535 portname p3 switchid 02000000 parentbus spi
>     parentdev spi2.1

Makes sense to me. Gives us further insight into the offload process. I
vaguely remember discussing this with Nik in the past. The
hwdom/fwd_mark is in the tree for long enough to be considered a stable
and useful concept.

You are saying that it is useful to expose despite already having
"switchid" exposed because you can have interfaces with the same
"switchid" that are not member in the same hardware domain? E.g., the
LAG example. If so, might be worth explicitly spelling it out in the
commit message.
Vladimir Oltean Aug. 12, 2021, 4:16 p.m. UTC | #2
On Thu, Aug 12, 2021 at 06:35:15PM +0300, Ido Schimmel wrote:
> Makes sense to me. Gives us further insight into the offload process. I
> vaguely remember discussing this with Nik in the past. The
> hwdom/fwd_mark is in the tree for long enough to be considered a stable
> and useful concept.
> 
> You are saying that it is useful to expose despite already having
> "switchid" exposed because you can have interfaces with the same
> "switchid" that are not member in the same hardware domain? E.g., the
> LAG example. If so, might be worth explicitly spelling it out in the
> commit message.

Indeed, the "switchid" is static, whereas the "hwdom" depends upon the
current configuration. So it is useful as a debug feature for the
reasons you mention, but I am also a bit worried whether we should
expose this now, since I am not sure if it will impact future redesigns
of the bridge driver or switchdev (the hwdom is a pretty detailed bit of
information). Basically the only guarantee we're giving user space is
that a hwdom of zero means unoffloaded, and two non-zero and equal
integer values can forward between each other without involving the CPU.
The numbers themselves are arbitrary, mean nothing and can vary even
depending on the port join order into the bridge. That shouldn't impose
any restrictions going further, should it?
Ido Schimmel Aug. 16, 2021, 6:39 a.m. UTC | #3
On Thu, Aug 12, 2021 at 04:16:48PM +0000, Vladimir Oltean wrote:
> On Thu, Aug 12, 2021 at 06:35:15PM +0300, Ido Schimmel wrote:

> > Makes sense to me. Gives us further insight into the offload process. I

> > vaguely remember discussing this with Nik in the past. The

> > hwdom/fwd_mark is in the tree for long enough to be considered a stable

> > and useful concept.

> > 

> > You are saying that it is useful to expose despite already having

> > "switchid" exposed because you can have interfaces with the same

> > "switchid" that are not member in the same hardware domain? E.g., the

> > LAG example. If so, might be worth explicitly spelling it out in the

> > commit message.

> 

> Indeed, the "switchid" is static, whereas the "hwdom" depends upon the

> current configuration. So it is useful as a debug feature for the

> reasons you mention, but I am also a bit worried whether we should

> expose this now, since I am not sure if it will impact future redesigns

> of the bridge driver or switchdev (the hwdom is a pretty detailed bit of

> information). Basically the only guarantee we're giving user space is

> that a hwdom of zero means unoffloaded, and two non-zero and equal

> integer values can forward between each other without involving the CPU.

> The numbers themselves are arbitrary, mean nothing and can vary even

> depending on the port join order into the bridge. That shouldn't impose

> any restrictions going further, should it?


Not that I'm aware. On the other hand, I didn't see a pressing need to
expose this attribute either so we can wait a bit longer to gain more
confidence if you want.
diff mbox series

Patch

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 5310003523ce..498041ffd65f 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -534,6 +534,7 @@  enum {
 	IFLA_BRPORT_MRP_IN_OPEN,
 	IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT,
 	IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
+	IFLA_BRPORT_HWDOM,
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 8ae026fa2ad7..4299625f52dc 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -203,6 +203,7 @@  static inline size_t br_port_info_size(void)
 		+ nla_total_size(sizeof(u8))	/* IFLA_BRPORT_MRP_IN_OPEN */
 		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT */
 		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_MCAST_EHT_HOSTS_CNT */
+		+ nla_total_size(sizeof(u32))	/* IFLA_BRPORT_HWDOM */
 		+ 0;
 }
 
@@ -295,6 +296,9 @@  static int br_port_fill_attrs(struct sk_buff *skb,
 		return -EMSGSIZE;
 #endif
 
+	if (nla_put_u32(skb, IFLA_BRPORT_HWDOM, nbp_switchdev_get_hwdom(p)))
+		return -EMSGSIZE;
+
 	/* we might be called only with br->lock */
 	rcu_read_lock();
 	backup_p = rcu_dereference(p->backup_port);
@@ -829,6 +833,7 @@  static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
 	[IFLA_BRPORT_ISOLATED]	= { .type = NLA_U8 },
 	[IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 },
 	[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 },
+	[IFLA_BRPORT_HWDOM] = { .type = NLA_U32 },
 };
 
 /* Change the state of the port and notify spanning tree */
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 32c218aa3f36..c3f0a44abc14 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1969,6 +1969,7 @@  void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
 			      struct sk_buff *skb);
 bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
 				  const struct sk_buff *skb);
+int nbp_switchdev_get_hwdom(const struct net_bridge_port *p);
 int br_switchdev_set_port_flag(struct net_bridge_port *p,
 			       unsigned long flags,
 			       unsigned long mask,
@@ -2035,6 +2036,11 @@  static inline bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
 	return true;
 }
 
+static inline int nbp_switchdev_get_hwdom(const struct net_bridge_port *p)
+{
+	return 0;
+}
+
 static inline int br_switchdev_set_port_flag(struct net_bridge_port *p,
 					     unsigned long flags,
 					     unsigned long mask,
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index 6bf518d78f02..01858607c7e4 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -60,6 +60,11 @@  void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
 		BR_INPUT_SKB_CB(skb)->src_hwdom = p->hwdom;
 }
 
+int nbp_switchdev_get_hwdom(const struct net_bridge_port *p)
+{
+	return p->hwdom;
+}
+
 bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
 				  const struct sk_buff *skb)
 {