diff mbox series

[net-next,v5,05/15] devlink: Support get and set state of port function

Message ID 20201215090358.240365-6-saeed@kernel.org
State Superseded
Headers show
Series Add mlx5 subfunction support | expand

Commit Message

Saeed Mahameed Dec. 15, 2020, 9:03 a.m. UTC
From: Parav Pandit <parav@nvidia.com>

devlink port function can be in active or inactive state.
Allow users to get and set port function's state.

When the port function it activated, its operational state may change
after a while when the device is created and driver binds to it.
Similarly on deactivation flow.

To clearly describe the state of the port function and its device's
operational state in the host system, define state and opstate
attributes.

Example of a PCI SF port which supports a port function:
Create a device with ID=10 and one physical port.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/net/devlink.h        | 23 +++++++++
 include/uapi/linux/devlink.h | 21 +++++++++
 net/core/devlink.c           | 90 +++++++++++++++++++++++++++++++++++-
 3 files changed, 133 insertions(+), 1 deletion(-)

Comments

David Ahern Dec. 16, 2020, 4:15 p.m. UTC | #1
On 12/15/20 10:15 PM, Parav Pandit wrote:
>>> + * enum devlink_port_function_opstate - indicates operational state
>>> + of port function
>>> + * @DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED: Driver is attached
>> to the
>>> + function of port,
>> This name definitely needs to be shortened.
>>
> DEVLINK_PORT_FUNCTION_OPS_ATTACHED
> Or
> DEVLINK_PF_OPS_ATTACHED 
> 
> PF - port function
>  

The devlink attribute names need to start using established short names
to find that balance between readability and ridiculously long names.

In this case PF for networking has an established link to SRIOV
'physical function'.

FUNCTION can be written as FCN.
ATTACHED can be shortened to ATTCH.

So in this case DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED (38 chars) drops
to DEVLINK_PORT_FCN_OPSTATE_ATTCH (30 chars). That is a step in the
right direction.
Jakub Kicinski Dec. 17, 2020, 12:08 a.m. UTC | #2
On Wed, 16 Dec 2020 05:15:04 +0000 Parav Pandit wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Wednesday, December 16, 2020 6:08 AM
> > 
> > On Tue, 15 Dec 2020 01:03:48 -0800 Saeed Mahameed wrote:  
> > > From: Parav Pandit <parav@nvidia.com>
> > >
> > > devlink port function can be in active or inactive state.
> > > Allow users to get and set port function's state.
> > >
> > > When the port function it activated, its operational state may change
> > > after a while when the device is created and driver binds to it.
> > > Similarly on deactivation flow.  
> > 
> > So what's the flow device should implement?
> > 
> > User requests deactivated, the device sends a notification to the driver
> > bound to the device. What if the driver ignores it?
> >  
> If driver ignores it, those devices are marked unusable for new allocation.
> Device becomes usable only after it has act on the event.

But the device remains fully operational?

So if I'm an admin who wants to unplug a misbehaving "entity"[1]
the deactivate is not gonna help me, it's just a graceful hint?
Is there no need for a forceful shutdown?

[1] refer to earlier email, IDK what entity is supposed to use this

> > > + * @DEVLINK_PORT_FUNCTION_OPSTATE_DETACHED: Driver is detached  
> > from the function of port; it is  
> > > + *					    safe to delete the port.
> > > + */
> > > +enum devlink_port_function_opstate {
> > > +	DEVLINK_PORT_FUNCTION_OPSTATE_DETACHED,  
> > 
> > The port function must be some Mellanox speak - for the second time - I
> > have no idea what it means. Please use meaningful names.
> >  
> It is not a Mellanox term.
> Port function object is the one that represents function behind this port.
> It is not a new term. Port function already exists in devlink whose operational state attribute is defined here.

I must have missed that in review. PCI functions can host multiple
ports. So "port function" does not compute for me. Can we drop the
"function"?
Parav Pandit Dec. 17, 2020, 5:46 a.m. UTC | #3
> From: Jakub Kicinski <kuba@kernel.org>

> Sent: Thursday, December 17, 2020 5:39 AM

> 

> On Wed, 16 Dec 2020 05:15:04 +0000 Parav Pandit wrote:

> > > From: Jakub Kicinski <kuba@kernel.org>

> > > Sent: Wednesday, December 16, 2020 6:08 AM

> > >

> > > On Tue, 15 Dec 2020 01:03:48 -0800 Saeed Mahameed wrote:

> > > > From: Parav Pandit <parav@nvidia.com>

> > > >

> > > > devlink port function can be in active or inactive state.

> > > > Allow users to get and set port function's state.

> > > >

> > > > When the port function it activated, its operational state may

> > > > change after a while when the device is created and driver binds to it.

> > > > Similarly on deactivation flow.

> > >

> > > So what's the flow device should implement?

> > >

> > > User requests deactivated, the device sends a notification to the

> > > driver bound to the device. What if the driver ignores it?

> > >

> > If driver ignores it, those devices are marked unusable for new allocation.

> > Device becomes usable only after it has act on the event.

> 

> But the device remains fully operational?

> 

> So if I'm an admin who wants to unplug a misbehaving "entity"[1] the

> deactivate is not gonna help me, it's just a graceful hint?

Right.
> Is there no need for a forceful shutdown?

In this patchset, no. I didn't add the knob for it. It is already at 15 patches.
But yes, forceful shutdown extension can be done by the admin in future patchset as,

$ devlink port del pci/0000:06:00.0/<port_index> force true
                                                                                         ^^^^^^^^
Above will be the extension in control of the admin.

> 

> [1] refer to earlier email, IDK what entity is supposed to use this

> 

While I was replying, Saeed already answered it.
> > Port function object is the one that represents function behind this port.

> > It is not a new term. Port function already exists in devlink whose

> operational state attribute is defined here.

> 

> I must have missed that in review. PCI functions can host multiple ports. 

This is exactly why I had "multiple networking ports" above to differentiate it from devlink port.
And you asked me to drop 'networking' because devlink is all networking ports, that creates this confusion.

Anyways, I will rewrite the commit message as 'function', instead of 'port function' as below.

New commit message snippet _start:
A function can be in active or inactive state. Allow users to get and set function's state.

When the function it activated, its operational state may change after a while when the device is created and driver binds to it.
Similarly on deactivation flow.

To clearly describe the state of the function and its device's operational state in the host system, define state and opstate attributes.
_end.

> So

> "port function" does not compute for me. Can we drop the "function"?

No. it is better to keep it. Because it clearly distinguishes the host facing function whose attribute (mac) and state are controlled.
But I shorten the names, enums etc in code from port_function to port_fn. So it should be readable now.
Jakub Kicinski Dec. 18, 2020, 7:51 p.m. UTC | #4
On Thu, 17 Dec 2020 05:46:45 +0000 Parav Pandit wrote:
> > From: Jakub Kicinski <kuba@kernel.org>

> > Sent: Thursday, December 17, 2020 5:39 AM

> > 

> > On Wed, 16 Dec 2020 05:15:04 +0000 Parav Pandit wrote:  

> > > > From: Jakub Kicinski <kuba@kernel.org>

> > > > Sent: Wednesday, December 16, 2020 6:08 AM

> > > >

> > > > On Tue, 15 Dec 2020 01:03:48 -0800 Saeed Mahameed wrote:  

> > > > > From: Parav Pandit <parav@nvidia.com>

> > > > >

> > > > > devlink port function can be in active or inactive state.

> > > > > Allow users to get and set port function's state.

> > > > >

> > > > > When the port function it activated, its operational state may

> > > > > change after a while when the device is created and driver binds to it.

> > > > > Similarly on deactivation flow.  

> > > >

> > > > So what's the flow device should implement?

> > > >

> > > > User requests deactivated, the device sends a notification to the

> > > > driver bound to the device. What if the driver ignores it?

> > > >  

> > > If driver ignores it, those devices are marked unusable for new allocation.

> > > Device becomes usable only after it has act on the event.  

> > 

> > But the device remains fully operational?

> > 

> > So if I'm an admin who wants to unplug a misbehaving "entity"[1] the

> > deactivate is not gonna help me, it's just a graceful hint?  

> Right.

> > Is there no need for a forceful shutdown?  

> In this patchset, no. I didn't add the knob for it. It is already at 15 patches.

> But yes, forceful shutdown extension can be done by the admin in future patchset as,

> 

> $ devlink port del pci/0000:06:00.0/<port_index> force true

>                                                                                          ^^^^^^^^

> Above will be the extension in control of the admin.


Can we come up with operational states that would encompass that?

The "force true" does not look too clean.

And let's document meaning of the states. We don't want the next vendor
to just "assume" the states match their own interpretation.
Parav Pandit Dec. 19, 2020, 5:06 a.m. UTC | #5
> From: Jakub Kicinski <kuba@kernel.org>

> Sent: Saturday, December 19, 2020 1:21 AM

> 

> On Thu, 17 Dec 2020 05:46:45 +0000 Parav Pandit wrote:

> > > From: Jakub Kicinski <kuba@kernel.org>

> > > Sent: Thursday, December 17, 2020 5:39 AM

> > >

> > > On Wed, 16 Dec 2020 05:15:04 +0000 Parav Pandit wrote:

> > > > > From: Jakub Kicinski <kuba@kernel.org>

> > > > > Sent: Wednesday, December 16, 2020 6:08 AM

> > > > >

> > > > > On Tue, 15 Dec 2020 01:03:48 -0800 Saeed Mahameed wrote:

> > > > > > From: Parav Pandit <parav@nvidia.com>

> > > > > >

> > > > > > devlink port function can be in active or inactive state.

> > > > > > Allow users to get and set port function's state.

> > > > > >

> > > > > > When the port function it activated, its operational state may

> > > > > > change after a while when the device is created and driver binds to

> it.

> > > > > > Similarly on deactivation flow.

> > > > >

> > > > > So what's the flow device should implement?

> > > > >

> > > > > User requests deactivated, the device sends a notification to

> > > > > the driver bound to the device. What if the driver ignores it?

> > > > >

> > > > If driver ignores it, those devices are marked unusable for new

> allocation.

> > > > Device becomes usable only after it has act on the event.

> > >

> > > But the device remains fully operational?

> > >

> > > So if I'm an admin who wants to unplug a misbehaving "entity"[1] the

> > > deactivate is not gonna help me, it's just a graceful hint?

> > Right.

> > > Is there no need for a forceful shutdown?

> > In this patchset, no. I didn't add the knob for it. It is already at 15 patches.

> > But yes, forceful shutdown extension can be done by the admin in

> > future patchset as,

> >

> > $ devlink port del pci/0000:06:00.0/<port_index> force true

> >

> > ^^^^^^^^ Above will be the extension in control of the admin.

> 

> Can we come up with operational states that would encompass that?

> 

Operational state is read only. Adding more states will likely make user job harder, unless its absolute necessary.
Currently state and operational state definitions cover all the scenario needed.
Only exception is user doesn't have the ability of force delete.
$ devlink port shutdown pci/0000:03:00.0/port_index
Above command will attempt graceful port deletion.

$ devlink port del pci/0000:03:00.0/port_index
Above command will do force deletion.

> The "force true" does not look too clean.

>

I think notion of force to user is more intuitive than above two commands, as its exist for other parts of the system for example 'reboot'.
So no need for flag as true/false. Just force if user wish to do force removal.

> And let's document meaning of the states. We don't want the next vendor to

> just "assume" the states match their own interpretation.

Oh yes, I did document it in the UAPI header file which will be the first place for vendors to look on how to implement get/set.
But I will add this to patch_14 in the devlink 'subfunctions' section documentation.
diff mbox series

Patch

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f8cff3e402da..18a7e66b7982 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1374,6 +1374,29 @@  struct devlink_ops {
 	int (*port_function_hw_addr_set)(struct devlink *devlink, struct devlink_port *port,
 					 const u8 *hw_addr, int hw_addr_len,
 					 struct netlink_ext_ack *extack);
+	/**
+	 * @port_function_state_get: Port function's state get function.
+	 *
+	 * Should be used by device drivers to report the state of a function
+	 * managed by the devlink port. Driver should return -EOPNOTSUPP if it
+	 * doesn't support port function handling for a particular port.
+	 */
+	int (*port_function_state_get)(struct devlink *devlink,
+				       struct devlink_port *port,
+				       enum devlink_port_function_state *state,
+				       enum devlink_port_function_opstate *opstate,
+				       struct netlink_ext_ack *extack);
+	/**
+	 * @port_function_state_set: Port function's state set function.
+	 *
+	 * Should be used by device drivers to set the state of a function
+	 * managed by the devlink port. Driver should return -EOPNOTSUPP if it
+	 * doesn't support port function handling for a particular port.
+	 */
+	int (*port_function_state_set)(struct devlink *devlink,
+				       struct devlink_port *port,
+				       enum devlink_port_function_state state,
+				       struct netlink_ext_ack *extack);
 	/**
 	 * @port_new: Port add function.
 	 *
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 6fe00f10eb3f..beeb30bb6b20 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -583,9 +583,30 @@  enum devlink_resource_unit {
 enum devlink_port_function_attr {
 	DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
 	DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR,	/* binary */
+	DEVLINK_PORT_FUNCTION_ATTR_STATE,	/* u8 */
+	DEVLINK_PORT_FUNCTION_ATTR_OPSTATE,	/* u8 */
 
 	__DEVLINK_PORT_FUNCTION_ATTR_MAX,
 	DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
 };
 
+enum devlink_port_function_state {
+	DEVLINK_PORT_FUNCTION_STATE_INACTIVE,
+	DEVLINK_PORT_FUNCTION_STATE_ACTIVE,
+};
+
+/**
+ * enum devlink_port_function_opstate - indicates operational state of port function
+ * @DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED: Driver is attached to the function of port, for
+ *					    gracefufl tear down of the function, after
+ *					    inactivation of the port function, user should wait
+ *					    for operational state to turn DETACHED.
+ * @DEVLINK_PORT_FUNCTION_OPSTATE_DETACHED: Driver is detached from the function of port; it is
+ *					    safe to delete the port.
+ */
+enum devlink_port_function_opstate {
+	DEVLINK_PORT_FUNCTION_OPSTATE_DETACHED,
+	DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED,
+};
+
 #endif /* _UAPI_LINUX_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 11043707f63f..b8acb8842aa1 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -87,6 +87,9 @@  EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
 
 static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1] = {
 	[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] = { .type = NLA_BINARY },
+	[DEVLINK_PORT_FUNCTION_ATTR_STATE] =
+		NLA_POLICY_RANGE(NLA_U8, DEVLINK_PORT_FUNCTION_STATE_INACTIVE,
+				 DEVLINK_PORT_FUNCTION_STATE_ACTIVE),
 };
 
 static LIST_HEAD(devlink_list);
@@ -746,6 +749,57 @@  devlink_port_function_hw_addr_fill(struct devlink *devlink, const struct devlink
 	return 0;
 }
 
+static bool
+devlink_port_function_state_valid(enum devlink_port_function_state state)
+{
+	return state == DEVLINK_PORT_FUNCTION_STATE_INACTIVE ||
+	       state == DEVLINK_PORT_FUNCTION_STATE_ACTIVE;
+}
+
+static bool
+devlink_port_function_opstate_valid(enum devlink_port_function_opstate state)
+{
+	return state == DEVLINK_PORT_FUNCTION_OPSTATE_DETACHED ||
+	       state == DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED;
+}
+
+static int
+devlink_port_function_state_fill(struct devlink *devlink,
+				 const struct devlink_ops *ops,
+				 struct devlink_port *port, struct sk_buff *msg,
+				 struct netlink_ext_ack *extack,
+				 bool *msg_updated)
+{
+	enum devlink_port_function_opstate opstate;
+	enum devlink_port_function_state state;
+	int err;
+
+	if (!ops->port_function_state_get)
+		return 0;
+
+	err = ops->port_function_state_get(devlink, port, &state, &opstate, extack);
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			return 0;
+		return err;
+	}
+	if (!devlink_port_function_state_valid(state)) {
+		WARN_ON_ONCE(1);
+		NL_SET_ERR_MSG_MOD(extack, "Invalid state value read from driver");
+		return -EINVAL;
+	}
+	if (!devlink_port_function_opstate_valid(opstate)) {
+		WARN_ON_ONCE(1);
+		NL_SET_ERR_MSG_MOD(extack, "Invalid operational state value read from driver");
+		return -EINVAL;
+	}
+	if (nla_put_u8(msg, DEVLINK_PORT_FUNCTION_ATTR_STATE, state) ||
+	    nla_put_u8(msg, DEVLINK_PORT_FUNCTION_ATTR_OPSTATE, opstate))
+		return -EMSGSIZE;
+	*msg_updated = true;
+	return 0;
+}
+
 static int
 devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port,
 				   struct netlink_ext_ack *extack)
@@ -762,6 +816,13 @@  devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por
 
 	ops = devlink->ops;
 	err = devlink_port_function_hw_addr_fill(devlink, ops, port, msg, extack, &msg_updated);
+	if (err)
+		goto out;
+	err = devlink_port_function_state_fill(devlink, ops, port, msg, extack,
+					       &msg_updated);
+	if (err)
+		goto out;
+out:
 	if (err || !msg_updated)
 		nla_nest_cancel(msg, function_attr);
 	else
@@ -1027,6 +1088,22 @@  devlink_port_function_hw_addr_set(struct devlink *devlink, struct devlink_port *
 	return ops->port_function_hw_addr_set(devlink, port, hw_addr, hw_addr_len, extack);
 }
 
+static int
+devlink_port_function_state_set(struct devlink *devlink, struct devlink_port *port,
+				const struct nlattr *attr, struct netlink_ext_ack *extack)
+{
+	enum devlink_port_function_state state;
+	const struct devlink_ops *ops;
+
+	state = nla_get_u8(attr);
+	ops = devlink->ops;
+	if (!ops->port_function_state_set) {
+		NL_SET_ERR_MSG_MOD(extack, "Port function does not support state setting");
+		return -EOPNOTSUPP;
+	}
+	return ops->port_function_state_set(devlink, port, state, extack);
+}
+
 static int
 devlink_port_function_set(struct devlink *devlink, struct devlink_port *port,
 			  const struct nlattr *attr, struct netlink_ext_ack *extack)
@@ -1042,8 +1119,19 @@  devlink_port_function_set(struct devlink *devlink, struct devlink_port *port,
 	}
 
 	attr = tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR];
-	if (attr)
+	if (attr) {
 		err = devlink_port_function_hw_addr_set(devlink, port, attr, extack);
+		if (err)
+			return err;
+	}
+	/* Keep this as the last function attribute set, so that when
+	 * multiple port function attributes are set along with state,
+	 * Those can be applied first before activating the state.
+	 */
+	attr = tb[DEVLINK_PORT_FUNCTION_ATTR_STATE];
+	if (attr)
+		err = devlink_port_function_state_set(devlink, port, attr,
+						      extack);
 
 	if (!err)
 		devlink_port_notify(port, DEVLINK_CMD_PORT_NEW);