mbox series

[net-next,v2,0/8] devlink: Add SF add/delete devlink ops

Message ID 20200917172020.26484-1-parav@nvidia.com
Headers show
Series devlink: Add SF add/delete devlink ops | expand

Message

Parav Pandit Sept. 17, 2020, 5:20 p.m. UTC
Hi Dave, Jakub,

Similar to PCI VF, PCI SF represents portion of the device.
PCI SF is represented using a new devlink port flavour.

This short series implements small part of the RFC described in detail at [1] and [2].

It extends
(a) devlink core to expose new devlink port flavour 'pcisf'.
(b) Expose new user interface to add/delete devlink port.
(c) Extends netdevsim driver to simulate PCI PF and SF ports
(d) Add port function state attribute

Patch summary:
Patch-1 Extends devlink to expose new PCI SF port flavour
Patch-2 Extends devlink to let user add, delete devlink Port
Patch-3 Prepare code to handle multiple port attributes
Patch-4 Extends devlink to let user get, set function state
Patch-5 Extends netdevsim driver to simulate PCI PF ports
Patch-6 Extends netdevsim driver to simulate hw_addr get/set
Patch-7 Extends netdevsim driver to simulate function state get/set
Patch-8 Extends netdevsim driver to simulate PCI SF ports

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

---
Changelog:
v1->v2:
 - Fixed extra semicolon at end of switch case reportec by coccinelle

Parav Pandit (8):
  devlink: Introduce PCI SF port flavour and port attribute
  devlink: Support add and delete devlink port
  devlink: Prepare code to fill multiple port function attributes
  devlink: Support get and set state of port function
  netdevsim: Add support for add and delete of a PCI PF port
  netdevsim: Simulate get/set hardware address of a PCI port
  netdevsim: Simulate port function state for a PCI port
  netdevsim: Add support for add and delete PCI SF port

 drivers/net/netdevsim/Makefile        |   3 +-
 drivers/net/netdevsim/dev.c           |  14 +
 drivers/net/netdevsim/netdevsim.h     |  32 ++
 drivers/net/netdevsim/port_function.c | 498 ++++++++++++++++++++++++++
 include/net/devlink.h                 |  75 ++++
 include/uapi/linux/devlink.h          |  13 +
 net/core/devlink.c                    | 230 ++++++++++--
 7 files changed, 840 insertions(+), 25 deletions(-)
 create mode 100644 drivers/net/netdevsim/port_function.c

Comments

David Ahern Sept. 17, 2020, 8:01 p.m. UTC | #1
On 9/17/20 11:20 AM, Parav Pandit wrote:
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index 48b1c1ef1ebd..1edb558125b0 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -83,6 +83,20 @@ struct devlink_port_pci_vf_attrs {
>  	u8 external:1;
>  };
>  
> +/**
> + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF attributes
> + * @controller: Associated controller number
> + * @pf: Associated PCI PF number for this port.
> + * @sf: Associated PCI SF for of the PCI PF for this port.
> + * @external: when set, indicates if a port is for an external controller
> + */
> +struct devlink_port_pci_sf_attrs {
> +	u32 controller;
> +	u16 pf;
> +	u32 sf;

Why a u32? Do you expect to support that many SFs? Seems like even a u16
is more than you can adequately name within an IFNAMESZ buffer.


> +	u8 external:1;
> +};
> +
>  /**
>   * struct devlink_port_attrs - devlink port object
>   * @flavour: flavour of the port


> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index e5b71f3c2d4d..fada660fd515 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -7855,6 +7889,9 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>  		n = snprintf(name, len, "pf%uvf%u",
>  			     attrs->pci_vf.pf, attrs->pci_vf.vf);
>  		break;
> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:
> +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs->pci_sf.sf);
> +		break;
>  	}
>  
>  	if (n >= len)
> 

And as I noted before, this function continues to grow device names and
it is going to spill over the IFNAMESZ buffer and EINVAL is going to be
confusing. It really needs better error handling back to users (not
kernel buffers).
David Ahern Sept. 17, 2020, 8:31 p.m. UTC | #2
On 9/17/20 11:20 AM, Parav Pandit wrote:
> Simulate PCI SF ports. Allow user to create one or more PCI SF ports.
> 
> Examples:
> 
> Create a PCI PF and PCI SF port.
> $ devlink port add netdevsim/netdevsim10/10 flavour pcipf pfnum 0
> $ devlink port add netdevsim/netdevsim10/11 flavour pcisf pfnum 0 sfnum 44
> $ devlink port show netdevsim/netdevsim10/11
> netdevsim/netdevsim10/11: type eth netdev eni10npf0sf44 flavour pcisf controller 0 pfnum 0 sfnum 44 external true splittable false
>   function:
>     hw_addr 00:00:00:00:00:00 state inactive
> 
> $ devlink port function set netdevsim/netdevsim10/11 hw_addr 00:11:22:33:44:55 state active
> 
> $ devlink port show netdevsim/netdevsim10/11 -jp
> {
>     "port": {
>         "netdevsim/netdevsim10/11": {
>             "type": "eth",
>             "netdev": "eni10npf0sf44",

I could be missing something, but it does not seem like this patch
creates the netdevice for the subfunction.


>             "flavour": "pcisf",
>             "controller": 0,
>             "pfnum": 0,
>             "sfnum": 44,
>             "external": true,
>             "splittable": false,
>             "function": {
>                 "hw_addr": "00:11:22:33:44:55",
>                 "state": "active"
>             }
>         }
>     }
> }
> 
> Delete newly added devlink port
> $ devlink port add netdevsim/netdevsim10/11
> 
> Add devlink port of flavour 'pcisf' where port index and sfnum are
> auto assigned by driver.
> $ devlink port add netdevsim/netdevsim10 flavour pcisf controller 0 pfnum 0
> 
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> ---
>  drivers/net/netdevsim/netdevsim.h     |  1 +
>  drivers/net/netdevsim/port_function.c | 95 +++++++++++++++++++++++++--
>  2 files changed, 92 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
> index 0ea9705eda38..c70782e444d5 100644
> --- a/drivers/net/netdevsim/netdevsim.h
> +++ b/drivers/net/netdevsim/netdevsim.h
> @@ -222,6 +222,7 @@ struct nsim_dev {
>  		struct list_head head;
>  		struct ida ida;
>  		struct ida pfnum_ida;
> +		struct ida sfnum_ida;
>  	} port_functions;
>  };
>  
> diff --git a/drivers/net/netdevsim/port_function.c b/drivers/net/netdevsim/port_function.c
> index 99581d3d15fe..e1812acd55b4 100644
> --- a/drivers/net/netdevsim/port_function.c
> +++ b/drivers/net/netdevsim/port_function.c
> @@ -13,10 +13,12 @@ struct nsim_port_function {
>  	unsigned int port_index;
>  	enum devlink_port_flavour flavour;
>  	u32 controller;
> +	u32 sfnum;
>  	u16 pfnum;
>  	struct nsim_port_function *pf_port; /* Valid only for SF port */
>  	u8 hw_addr[ETH_ALEN];
>  	u8 state; /* enum devlink_port_function_state */
> +	int refcount; /* Counts how many sf ports are bound attached to this pf port. */
>  };
>  
>  void nsim_dev_port_function_init(struct nsim_dev *nsim_dev)
> @@ -25,10 +27,13 @@ void nsim_dev_port_function_init(struct nsim_dev *nsim_dev)
>  	INIT_LIST_HEAD(&nsim_dev->port_functions.head);
>  	ida_init(&nsim_dev->port_functions.ida);
>  	ida_init(&nsim_dev->port_functions.pfnum_ida);
> +	ida_init(&nsim_dev->port_functions.sfnum_ida);
>  }
>  
>  void nsim_dev_port_function_exit(struct nsim_dev *nsim_dev)
>  {
> +	WARN_ON(!ida_is_empty(&nsim_dev->port_functions.sfnum_ida));
> +	ida_destroy(&nsim_dev->port_functions.sfnum_ida);
>  	WARN_ON(!ida_is_empty(&nsim_dev->port_functions.pfnum_ida));
>  	ida_destroy(&nsim_dev->port_functions.pfnum_ida);
>  	WARN_ON(!ida_is_empty(&nsim_dev->port_functions.ida));
> @@ -119,9 +124,24 @@ nsim_devlink_port_function_alloc(struct nsim_dev *dev, const struct devlink_port
>  			goto fn_ida_err;
>  		port->pfnum = ret;
>  		break;
> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:
> +		if (attrs->sfnum_valid)
> +			ret = ida_alloc_range(&dev->port_functions.sfnum_ida, attrs->sfnum,
> +					      attrs->sfnum, GFP_KERNEL);
> +		else
> +			ret = ida_alloc(&dev->port_functions.sfnum_ida, GFP_KERNEL);
> +		if (ret < 0)
> +			goto fn_ida_err;
> +		port->sfnum = ret;
> +		port->pfnum = attrs->pfnum;
> +		break;
>  	default:
>  		break;
>  	}
> +	/* refcount_t is not needed as port is protected by port_functions.mutex.
> +	 * This count is to keep track of how many SF ports are attached a PF port.
> +	 */
> +	port->refcount = 1;
>  	return port;
>  
>  fn_ida_err:
> @@ -137,6 +157,9 @@ static void nsim_devlink_port_function_free(struct nsim_dev *dev, struct nsim_po
>  	case DEVLINK_PORT_FLAVOUR_PCI_PF:
>  		ida_simple_remove(&dev->port_functions.pfnum_ida, port->pfnum);
>  		break;
> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:
> +		ida_simple_remove(&dev->port_functions.sfnum_ida, port->sfnum);
> +		break;
>  	default:
>  		break;
>  	}
> @@ -170,6 +193,11 @@ nsim_dev_port_port_exists(struct nsim_dev *nsim_dev, const struct devlink_port_n
>  		if (attrs->flavour == DEVLINK_PORT_FLAVOUR_PCI_PF &&
>  		    tmp->flavour == DEVLINK_PORT_FLAVOUR_PCI_PF && tmp->pfnum == attrs->pfnum)
>  			return true;
> +
> +		if (attrs->flavour == DEVLINK_PORT_FLAVOUR_PCI_SF &&
> +		    tmp->flavour == DEVLINK_PORT_FLAVOUR_PCI_SF &&
> +		    tmp->sfnum == attrs->sfnum && tmp->pfnum == attrs->pfnum)
> +			return true;
>  	}
>  	return false;
>  }
> @@ -183,21 +211,71 @@ nsim_dev_devlink_port_index_lookup(const struct nsim_dev *nsim_dev, unsigned int
>  	list_for_each_entry(port, &nsim_dev->port_functions.head, list) {
>  		if (port->port_index != port_index)
>  			continue;
> +		if (port->refcount > 1) {
> +			NL_SET_ERR_MSG_MOD(extack, "Port is in use");
> +			return ERR_PTR(-EBUSY);
> +		}
>  		return port;
>  	}
>  	NL_SET_ERR_MSG_MOD(extack, "User created port not found");
>  	return ERR_PTR(-ENOENT);
>  }
>  
> +static struct nsim_port_function *
> +pf_port_get(struct nsim_dev *nsim_dev, struct nsim_port_function *port)
> +{
> +	struct nsim_port_function *tmp;
> +
> +	/* PF port addition doesn't need a parent. */
> +	if (port->flavour == DEVLINK_PORT_FLAVOUR_PCI_PF)
> +		return NULL;
> +
> +	list_for_each_entry(tmp, &nsim_dev->port_functions.head, list) {
> +		if (tmp->flavour != DEVLINK_PORT_FLAVOUR_PCI_PF || tmp->pfnum != port->pfnum)
> +			continue;
> +
> +		if (tmp->refcount + 1 == INT_MAX)
> +			return ERR_PTR(-ENOSPC);
> +
> +		port->pf_port = tmp;
> +		tmp->refcount++;
> +		return tmp;
> +	}
> +	return ERR_PTR(-ENOENT);
> +}
> +
> +static void pf_port_put(struct nsim_port_function *port)
> +{
> +	if (port->pf_port) {
> +		port->pf_port->refcount--;
> +		WARN_ON(port->pf_port->refcount < 0);
> +	}
> +	port->refcount--;
> +	WARN_ON(port->refcount != 0);
> +}
> +
>  static int nsim_devlink_port_function_add(struct devlink *devlink, struct nsim_dev *nsim_dev,
>  					  struct nsim_port_function *port,
>  					  struct netlink_ext_ack *extack)
>  {
> +	struct nsim_port_function *pf_port;
>  	int err;
>  
> -	list_add(&port->list, &nsim_dev->port_functions.head);
> +	/* Keep all PF ports at the start, so that when driver is unloaded
> +	 * All SF ports from the end of the list can be removed first.
> +	 */
> +	if (port->flavour == DEVLINK_PORT_FLAVOUR_PCI_PF)
> +		list_add(&port->list, &nsim_dev->port_functions.head);
> +	else
> +		list_add_tail(&port->list, &nsim_dev->port_functions.head);
> +
> +	pf_port = pf_port_get(nsim_dev, port);
> +	if (IS_ERR(pf_port)) {
> +		NL_SET_ERR_MSG_MOD(extack, "Fail to get pf port");
> +		err = PTR_ERR(pf_port);
> +		goto pf_err;
> +	}
>  
> -	port->state = DEVLINK_PORT_FUNCTION_STATE_INACTIVE;
>  	err = devlink_port_register(devlink, &port->dl_port, port->port_index);
>  	if (err)
>  		goto reg_err;
> @@ -213,6 +291,8 @@ static int nsim_devlink_port_function_add(struct devlink *devlink, struct nsim_d
>  	devlink_port_type_clear(&port->dl_port);
>  	devlink_port_unregister(&port->dl_port);
>  reg_err:
> +	pf_port_put(port);
> +pf_err:
>  	list_del(&port->list);
>  	return err;
>  }
> @@ -224,12 +304,14 @@ static void nsim_devlink_port_function_del(struct nsim_dev *nsim_dev,
>  	unregister_netdev(port->netdev);
>  	devlink_port_unregister(&port->dl_port);
>  	list_del(&port->list);
> +	pf_port_put(port);
>  }
>  
>  static bool nsim_dev_port_flavour_supported(const struct nsim_dev *nsim_dev,
>  					    const struct devlink_port_new_attrs *attrs)
>  {
> -	return attrs->flavour == DEVLINK_PORT_FLAVOUR_PCI_PF;
> +	return attrs->flavour == DEVLINK_PORT_FLAVOUR_PCI_PF ||
> +	       attrs->flavour == DEVLINK_PORT_FLAVOUR_PCI_SF;
>  }
>  
>  int nsim_dev_devlink_port_new(struct devlink *devlink, const struct devlink_port_new_attrs *attrs,
> @@ -266,7 +348,11 @@ int nsim_dev_devlink_port_new(struct devlink *devlink, const struct devlink_port
>  	       nsim_dev->switch_id.id_len);
>  	port->dl_port.attrs.switch_id.id_len = nsim_dev->switch_id.id_len;
>  
> -	devlink_port_attrs_pci_pf_set(&port->dl_port, port->controller, port->pfnum, false);
> +	if (attrs->flavour == DEVLINK_PORT_FLAVOUR_PCI_PF)
> +		devlink_port_attrs_pci_pf_set(&port->dl_port, port->controller, port->pfnum, false);
> +	else
> +		devlink_port_attrs_pci_sf_set(&port->dl_port, port->controller, port->pfnum,
> +					      port->sfnum, false);
>  
>  	err = nsim_devlink_port_function_add(devlink, nsim_dev, port, extack);
>  	if (err)
> @@ -333,6 +419,7 @@ void nsim_dev_port_function_disable(struct nsim_dev *nsim_dev)
>  	 * ports.
>  	 */
>  
> +	/* Remove SF ports first, followed by PF ports. */
>  	list_for_each_entry_safe_reverse(port, tmp, &nsim_dev->port_functions.head, list) {
>  		nsim_devlink_port_function_del(nsim_dev, port);
>  		nsim_devlink_port_function_free(nsim_dev, port);
>
Parav Pandit Sept. 18, 2020, 3:29 a.m. UTC | #3
> From: David Ahern <dsahern@gmail.com>

> Sent: Friday, September 18, 2020 2:01 AM

> 

> On 9/17/20 11:20 AM, Parav Pandit wrote:

> > Simulate PCI SF ports. Allow user to create one or more PCI SF ports.

> >

> > Examples:

> >

> > Create a PCI PF and PCI SF port.

> > $ devlink port add netdevsim/netdevsim10/10 flavour pcipf pfnum 0 $

> > devlink port add netdevsim/netdevsim10/11 flavour pcisf pfnum 0 sfnum

> > 44 $ devlink port show netdevsim/netdevsim10/11

> > netdevsim/netdevsim10/11: type eth netdev eni10npf0sf44 flavour pcisf

> controller 0 pfnum 0 sfnum 44 external true splittable false

> >   function:

> >     hw_addr 00:00:00:00:00:00 state inactive

> >

> > $ devlink port function set netdevsim/netdevsim10/11 hw_addr

> > 00:11:22:33:44:55 state active

> >

> > $ devlink port show netdevsim/netdevsim10/11 -jp {

> >     "port": {

> >         "netdevsim/netdevsim10/11": {

> >             "type": "eth",

> >             "netdev": "eni10npf0sf44",

> 

> I could be missing something, but it does not seem like this patch creates the

> netdevice for the subfunction.

>

The sf port created here is the eswitch port with a valid switch id similar to PF and physical port.
So the netdev created is the representor netdevice.
It is created uniformly for subfunction and pf port flavours.

This series enables user to add PCI PF and subfunction ports.
PF port addition (and its representor netdev) is done in patch 5 [1].
This patch for SF utilizes the same ' struct nsim_port_function' for PF and SF.
Only difference between them is flavour.
[1] https://lore.kernel.org/netdev/20200917172020.26484-6-parav@nvidia.com/

 > 

> >             "flavour": "pcisf",

> >             "controller": 0,

> >             "pfnum": 0,

> >             "sfnum": 44,

> >             "external": true,

> >             "splittable": false,

> >             "function": {

> >                 "hw_addr": "00:11:22:33:44:55",

> >                 "state": "active"

> >             }

> >         }

> >     }

> > }

> >

> > Delete newly added devlink port

> > $ devlink port add netdevsim/netdevsim10/11

> >

> > Add devlink port of flavour 'pcisf' where port index and sfnum are

> > auto assigned by driver.

> > $ devlink port add netdevsim/netdevsim10 flavour pcisf controller 0

> > pfnum 0
Parav Pandit Sept. 18, 2020, 3:35 a.m. UTC | #4
Hi Jacob,

> From: Jacob Keller <jacob.e.keller@intel.com>

> Sent: Friday, September 18, 2020 12:29 AM

> 

> 

> On 9/17/2020 10:20 AM, Parav Pandit wrote:

> > Prepare code to fill zero or more port function optional attributes.

> > Subsequent patch makes use of this to fill more port function

> > attributes.

> >

> > Signed-off-by: Parav Pandit <parav@nvidia.com>

> > Reviewed-by: Jiri Pirko <jiri@nvidia.com>

> > ---

> >  net/core/devlink.c | 53

> > +++++++++++++++++++++++++---------------------

> >  1 file changed, 29 insertions(+), 24 deletions(-)

> >

> > diff --git a/net/core/devlink.c b/net/core/devlink.c index

> > e93730065c57..d152489e48da 100644

> > --- a/net/core/devlink.c

> > +++ b/net/core/devlink.c

> > @@ -570,6 +570,31 @@ static int devlink_nl_port_attrs_put(struct sk_buff

> *msg,

> >  	return 0;

> >  }

> >

> > +static int

> > +devlink_port_function_hw_addr_fill(struct devlink *devlink, const struct

> devlink_ops *ops,

> > +				   struct devlink_port *port, struct sk_buff

> *msg,

> > +				   struct netlink_ext_ack *extack, bool

> *msg_updated) {

> > +	u8 hw_addr[MAX_ADDR_LEN];

> > +	int hw_addr_len;

> > +	int err;

> > +

> > +	if (!ops->port_function_hw_addr_get)

> > +		return 0;

> > +

> > +	err = ops->port_function_hw_addr_get(devlink, port, hw_addr,

> &hw_addr_len, extack);

> > +	if (err) {

> > +		if (err == -EOPNOTSUPP)

> > +			return 0;

> > +		return err;

> > +	}

> > +	err = nla_put(msg, DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR,

> hw_addr_len, hw_addr);

> > +	if (err)

> > +		return err;

> > +	*msg_updated = true;

> > +	return 0;

> > +}

> > +

> >  static int

> >  devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port

> *port,

> >  				   struct netlink_ext_ack *extack) @@ -577,36

> +602,16 @@

> > devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port

> *por

> >  	struct devlink *devlink = port->devlink;

> >  	const struct devlink_ops *ops;

> >  	struct nlattr *function_attr;

> > -	bool empty_nest = true;

> > -	int err = 0;

> > +	bool msg_updated = false;

> > +	int err;

> >

> >  	function_attr = nla_nest_start_noflag(msg,

> DEVLINK_ATTR_PORT_FUNCTION);

> >  	if (!function_attr)

> >  		return -EMSGSIZE;

> >

> >  	ops = devlink->ops;

> > -	if (ops->port_function_hw_addr_get) {

> > -		int hw_addr_len;

> > -		u8 hw_addr[MAX_ADDR_LEN];

> > -

> > -		err = ops->port_function_hw_addr_get(devlink, port, hw_addr,

> &hw_addr_len, extack);

> > -		if (err == -EOPNOTSUPP) {

> > -			/* Port function attributes are optional for a port. If

> port doesn't

> > -			 * support function attribute, returning -EOPNOTSUPP is

> not an error.

> > -			 */

> 

> We lost this comment in the move it looks like. I think it's still useful to keep for

> clarity of why we're converting -EOPNOTSUPP in the return.

You are right. It is a useful comment.
However, it is already covered in include/net/devlink.h in the standard comment of the callback function.
For new driver implementation, looking there will be more useful.
So I guess its ok to drop from here.

> 

> > -			err = 0;

> > -			goto out;

> > -		} else if (err) {

> > -			goto out;

> > -		}

> > -		err = nla_put(msg,

> DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, hw_addr_len, hw_addr);

> > -		if (err)

> > -			goto out;

> > -		empty_nest = false;

> > -	}

> > -

> > -out:

> > -	if (err || empty_nest)

> > +	err = devlink_port_function_hw_addr_fill(devlink, ops, port, msg,

> extack, &msg_updated);

> > +	if (err || !msg_updated)

> >  		nla_nest_cancel(msg, function_attr);

> >  	else

> >  		nla_nest_end(msg, function_attr);

> >
David Ahern Sept. 18, 2020, 3:38 a.m. UTC | #5
On 9/17/20 9:29 PM, Parav Pandit wrote:
>>> Examples:

>>>

>>> Create a PCI PF and PCI SF port.

>>> $ devlink port add netdevsim/netdevsim10/10 flavour pcipf pfnum 0 $

>>> devlink port add netdevsim/netdevsim10/11 flavour pcisf pfnum 0 sfnum

>>> 44 $ devlink port show netdevsim/netdevsim10/11

>>> netdevsim/netdevsim10/11: type eth netdev eni10npf0sf44 flavour pcisf

>> controller 0 pfnum 0 sfnum 44 external true splittable false

>>>   function:

>>>     hw_addr 00:00:00:00:00:00 state inactive

>>>

>>> $ devlink port function set netdevsim/netdevsim10/11 hw_addr

>>> 00:11:22:33:44:55 state active

>>>

>>> $ devlink port show netdevsim/netdevsim10/11 -jp {

>>>     "port": {

>>>         "netdevsim/netdevsim10/11": {

>>>             "type": "eth",

>>>             "netdev": "eni10npf0sf44",

>>

>> I could be missing something, but it does not seem like this patch creates the

>> netdevice for the subfunction.

>>

> The sf port created here is the eswitch port with a valid switch id similar to PF and physical port.

> So the netdev created is the representor netdevice.

> It is created uniformly for subfunction and pf port flavours.


To be clear: If I run the devlink commands to create a sub-function, `ip
link show` should list a net_device that corresponds to the sub-function?
Parav Pandit Sept. 18, 2020, 3:54 a.m. UTC | #6
> From: Jacob Keller <jacob.e.keller@intel.com>

> Sent: Friday, September 18, 2020 12:00 AM

> 

> 

> On 9/17/2020 10:20 AM, Parav Pandit wrote:

> > A PCI sub-function (SF) represents a portion of the device similar to

> > PCI VF.

> >

> > In an eswitch, PCI SF may have port which is normally represented

> > using a representor netdevice.

> > To have better visibility of eswitch port, its association with SF,

> > and its representor netdevice, introduce a PCI SF port flavour.

> >

> > When devlink port flavour is PCI SF, fill up PCI SF attributes of the

> > port.

> >

> > Extend port name creation using PCI PF and SF number scheme on best

> > effort basis, so that vendor drivers can skip defining their own

> > scheme.

> 

> What does this mean? What's the scheme used? 

>

Scheme used is equivalent as what is used for PCI VF ports. pfNvfM.
It is pfNsfM.
Below example shows the representor netdevice name as 'eni10npf0sf44' built by systemd/udev using phys_port_name.

> Do drivers still have the option to make their own scheme? If so, why?

Today we have two types of drivers (mlx5_core, netdevsim) which uses devlink core which creates the name.
Or other drivers (bnxt, nfp) which doesn't yet migrated to use devlink infra for PCI PF, VF ports.
Such drivers are phys_port_name and other ndos.
It is not the role of this patch to block those drivers, but any new implementation doesn't need to hand code switch_id and phys_port_name related ndos for SF.
For example, bnxt_vf_rep_get_phys_port_name().

> It's not obvious to me in this patch where the numbering scheme comes from. It

> looks like it's still up to the caller to set the numbers.

>

Naming scheme for PCI PF and PCI VF port flavours already exist.
Scheme is equivalent for PCI SF flavour.

I thought example is good enough to show that, but I will update commit message to describe this scheme to make it clear. pfNsfM.
 
> >

> > An example view of a PCI SF port.

> >

> > $ devlink port show netdevsim/netdevsim10/2

> > netdevsim/netdevsim10/2: type eth netdev eni10npf0sf44 flavour pcisf

> controller 0 pfnum 0 sfnum 44 external false splittable false

> >   function:

> >     hw_addr 00:00:00:00:00:00

> >

> > devlink port show netdevsim/netdevsim10/2 -jp {

> >     "port": {

> >         "netdevsim/netdevsim10/2": {

> >             "type": "eth",

> >             "netdev": "eni10npf0sf44",

> >             "flavour": "pcisf",

> >             "controller": 0,

> >             "pfnum": 0,

> >             "sfnum": 44,

> >             "external": false,

> >             "splittable": false,

> >             "function": {

> >                 "hw_addr": "00:00:00:00:00:00"

> >             }

> >         }

> >     }

> > }

> >

> > Signed-off-by: Parav Pandit <parav@nvidia.com>

> > Reviewed-by: Jiri Pirko <jiri@nvidia.com>

> > ---

> >  include/net/devlink.h        | 17 +++++++++++++++++

> >  include/uapi/linux/devlink.h |  7 +++++++

> >  net/core/devlink.c           | 37 ++++++++++++++++++++++++++++++++++++

> >  3 files changed, 61 insertions(+)

> >



> >  static int __devlink_port_phys_port_name_get(struct devlink_port

> *devlink_port,

> >  					     char *name, size_t len)

> >  {

> > @@ -7855,6 +7889,9 @@ static int

> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,

> >  		n = snprintf(name, len, "pf%uvf%u",

> >  			     attrs->pci_vf.pf, attrs->pci_vf.vf);

> >  		break;

> > +	case DEVLINK_PORT_FLAVOUR_PCI_SF:

> > +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-

> >pci_sf.sf);

> > +		break;

> >  	}

> >

This is where the naming scheme is done, like pcipf and pcivf port flavours.

> >  	if (n >= len)

> >
Parav Pandit Sept. 18, 2020, 4:18 a.m. UTC | #7
> From: David Ahern <dsahern@gmail.com>

> Sent: Friday, September 18, 2020 1:32 AM

> 

> On 9/17/20 11:20 AM, Parav Pandit wrote:

> > diff --git a/include/net/devlink.h b/include/net/devlink.h index

> > 48b1c1ef1ebd..1edb558125b0 100644

> > --- a/include/net/devlink.h

> > +++ b/include/net/devlink.h

> > @@ -83,6 +83,20 @@ struct devlink_port_pci_vf_attrs {

> >  	u8 external:1;

> >  };

> >

> > +/**

> > + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF

> > +attributes

> > + * @controller: Associated controller number

> > + * @pf: Associated PCI PF number for this port.

> > + * @sf: Associated PCI SF for of the PCI PF for this port.

> > + * @external: when set, indicates if a port is for an external

> > +controller  */ struct devlink_port_pci_sf_attrs {

> > +	u32 controller;

> > +	u16 pf;

> > +	u32 sf;

> 

> Why a u32? Do you expect to support that many SFs? Seems like even a u16 is

> more than you can adequately name within an IFNAMESZ buffer.

>

I think u16 is likely enough, which let use creates 64K SF ports which is a lot. :-)
Was little concerned that it shouldn't fall short in few years. So picked u32. 
Users will be able to make use of alternative names so just because IFNAMESZ is 16 characters, do not want to limit sfnum size.
What do you think?

> 

> > +	u8 external:1;

> > +};

> > +

> >  /**

> >   * struct devlink_port_attrs - devlink port object

> >   * @flavour: flavour of the port

> 

> 

> > diff --git a/net/core/devlink.c b/net/core/devlink.c index

> > e5b71f3c2d4d..fada660fd515 100644

> > --- a/net/core/devlink.c

> > +++ b/net/core/devlink.c

> > @@ -7855,6 +7889,9 @@ static int

> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,

> >  		n = snprintf(name, len, "pf%uvf%u",

> >  			     attrs->pci_vf.pf, attrs->pci_vf.vf);

> >  		break;

> > +	case DEVLINK_PORT_FLAVOUR_PCI_SF:

> > +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-

> >pci_sf.sf);

> > +		break;

> >  	}

> >

> >  	if (n >= len)

> >

> 

> And as I noted before, this function continues to grow device names and it is

> going to spill over the IFNAMESZ buffer and EINVAL is going to be confusing. It

> really needs better error handling back to users (not kernel buffers).

Alternative names [1] should help to overcome the limitation of IFNAMESZ.
For error code EINVAL, should it be ENOSPC?
If so, should I insert a pre-patch in this series?

[1] ip link property add dev DEVICE [ altname NAME .. ]
Parav Pandit Sept. 18, 2020, 4:41 a.m. UTC | #8
Hi David,

> From: David Ahern <dsahern@gmail.com>

> Sent: Friday, September 18, 2020 9:08 AM

> 

> On 9/17/20 9:29 PM, Parav Pandit wrote:

> >>> Examples:

> >>>

> >>> Create a PCI PF and PCI SF port.

> >>> $ devlink port add netdevsim/netdevsim10/10 flavour pcipf pfnum 0 $

> >>> devlink port add netdevsim/netdevsim10/11 flavour pcisf pfnum 0

> >>> sfnum

> >>> 44 $ devlink port show netdevsim/netdevsim10/11

> >>> netdevsim/netdevsim10/11: type eth netdev eni10npf0sf44 flavour

> >>> pcisf

> >> controller 0 pfnum 0 sfnum 44 external true splittable false

> >>>   function:

> >>>     hw_addr 00:00:00:00:00:00 state inactive

> >>>

> >>> $ devlink port function set netdevsim/netdevsim10/11 hw_addr

> >>> 00:11:22:33:44:55 state active

> >>>

> >>> $ devlink port show netdevsim/netdevsim10/11 -jp {

> >>>     "port": {

> >>>         "netdevsim/netdevsim10/11": {

> >>>             "type": "eth",

> >>>             "netdev": "eni10npf0sf44",

> >>

> >> I could be missing something, but it does not seem like this patch

> >> creates the netdevice for the subfunction.

> >>

> > The sf port created here is the eswitch port with a valid switch id similar to PF

> and physical port.

> > So the netdev created is the representor netdevice.

> > It is created uniformly for subfunction and pf port flavours.

> 

> To be clear: If I run the devlink commands to create a sub-function, `ip link

> show` should list a net_device that corresponds to the sub-function?


In this series only representor netdevice corresponds to sub-function will be visible in ip link show, i.e. eni10npf0sf44.

Netdevsim is only simulating the eswitch side or control path at present for pf/vf/sf ports.
So other end of this port (netdevice/rdma device/vdpa device) are not yet created.

Subfunction will be anchored on virtbus described in RFC [1], which is not yet in-kernel yet.
Grep for "every SF a device is created on virtbus" to jump to this part of the long RFC.

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
Samudrala, Sridhar Sept. 18, 2020, 4:53 a.m. UTC | #9
On 9/17/2020 9:41 PM, Parav Pandit wrote:
> Hi David,
> 
>> From: David Ahern <dsahern@gmail.com>
>> Sent: Friday, September 18, 2020 9:08 AM
>>
>> On 9/17/20 9:29 PM, Parav Pandit wrote:
>>>>> Examples:
>>>>>
>>>>> Create a PCI PF and PCI SF port.
>>>>> $ devlink port add netdevsim/netdevsim10/10 flavour pcipf pfnum 0 $
>>>>> devlink port add netdevsim/netdevsim10/11 flavour pcisf pfnum 0
>>>>> sfnum
>>>>> 44 $ devlink port show netdevsim/netdevsim10/11
>>>>> netdevsim/netdevsim10/11: type eth netdev eni10npf0sf44 flavour
>>>>> pcisf
>>>> controller 0 pfnum 0 sfnum 44 external true splittable false
>>>>>    function:
>>>>>      hw_addr 00:00:00:00:00:00 state inactive
>>>>>
>>>>> $ devlink port function set netdevsim/netdevsim10/11 hw_addr
>>>>> 00:11:22:33:44:55 state active
>>>>>
>>>>> $ devlink port show netdevsim/netdevsim10/11 -jp {
>>>>>      "port": {
>>>>>          "netdevsim/netdevsim10/11": {
>>>>>              "type": "eth",
>>>>>              "netdev": "eni10npf0sf44",
>>>>
>>>> I could be missing something, but it does not seem like this patch
>>>> creates the netdevice for the subfunction.
>>>>
>>> The sf port created here is the eswitch port with a valid switch id similar to PF
>> and physical port.
>>> So the netdev created is the representor netdevice.
>>> It is created uniformly for subfunction and pf port flavours.
>>
>> To be clear: If I run the devlink commands to create a sub-function, `ip link
>> show` should list a net_device that corresponds to the sub-function?
> 
> In this series only representor netdevice corresponds to sub-function will be visible in ip link show, i.e. eni10npf0sf44.

This should be OK if the eSwitch mode is changed to switchdev.
But i think it should be possible to create a subfunction even in legacy 
mode where representor netdev is not created.

> 
> Netdevsim is only simulating the eswitch side or control path at present for pf/vf/sf ports.
> So other end of this port (netdevice/rdma device/vdpa device) are not yet created.
> 
> Subfunction will be anchored on virtbus described in RFC [1], which is not yet in-kernel yet.
> Grep for "every SF a device is created on virtbus" to jump to this part of the long RFC.
> 
> [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
>
Parav Pandit Sept. 18, 2020, 5:10 a.m. UTC | #10
> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>

> Sent: Friday, September 18, 2020 10:23 AM

> 

> 

> On 9/17/2020 9:41 PM, Parav Pandit wrote:

> > Hi David,

> >

> >> From: David Ahern <dsahern@gmail.com>

> >> Sent: Friday, September 18, 2020 9:08 AM

> >>

> >> On 9/17/20 9:29 PM, Parav Pandit wrote:

> >>>>> Examples:

> >>>>>

> >>>>> Create a PCI PF and PCI SF port.

> >>>>> $ devlink port add netdevsim/netdevsim10/10 flavour pcipf pfnum 0

> >>>>> $ devlink port add netdevsim/netdevsim10/11 flavour pcisf pfnum 0

> >>>>> sfnum

> >>>>> 44 $ devlink port show netdevsim/netdevsim10/11

> >>>>> netdevsim/netdevsim10/11: type eth netdev eni10npf0sf44 flavour

> >>>>> pcisf

> >>>> controller 0 pfnum 0 sfnum 44 external true splittable false

> >>>>>    function:

> >>>>>      hw_addr 00:00:00:00:00:00 state inactive

> >>>>>

> >>>>> $ devlink port function set netdevsim/netdevsim10/11 hw_addr

> >>>>> 00:11:22:33:44:55 state active

> >>>>>

> >>>>> $ devlink port show netdevsim/netdevsim10/11 -jp {

> >>>>>      "port": {

> >>>>>          "netdevsim/netdevsim10/11": {

> >>>>>              "type": "eth",

> >>>>>              "netdev": "eni10npf0sf44",

> >>>>

> >>>> I could be missing something, but it does not seem like this patch

> >>>> creates the netdevice for the subfunction.

> >>>>

> >>> The sf port created here is the eswitch port with a valid switch id

> >>> similar to PF

> >> and physical port.

> >>> So the netdev created is the representor netdevice.

> >>> It is created uniformly for subfunction and pf port flavours.

> >>

> >> To be clear: If I run the devlink commands to create a sub-function,

> >> `ip link show` should list a net_device that corresponds to the sub-function?

> >

> > In this series only representor netdevice corresponds to sub-function will be

> visible in ip link show, i.e. eni10npf0sf44.

> 

> This should be OK if the eSwitch mode is changed to switchdev.

> But i think it should be possible to create a subfunction even in legacy mode

> where representor netdev is not created.


switch_id is optional attribute of the devlink port.
It is applicable only when it is eswitch port in switchdev mode.

> 

> >

> > Netdevsim is only simulating the eswitch side or control path at present for

> pf/vf/sf ports.

> > So other end of this port (netdevice/rdma device/vdpa device) are not yet

> created.

> >

> > Subfunction will be anchored on virtbus described in RFC [1], which is not yet

> in-kernel yet.

> > Grep for "every SF a device is created on virtbus" to jump to this part of the

> long RFC.

> >

> > [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/

> >
David Ahern Sept. 18, 2020, 3:15 p.m. UTC | #11
On 9/17/20 10:18 PM, Parav Pandit wrote:
> 

>> From: David Ahern <dsahern@gmail.com>

>> Sent: Friday, September 18, 2020 1:32 AM

>>

>> On 9/17/20 11:20 AM, Parav Pandit wrote:

>>> diff --git a/include/net/devlink.h b/include/net/devlink.h index

>>> 48b1c1ef1ebd..1edb558125b0 100644

>>> --- a/include/net/devlink.h

>>> +++ b/include/net/devlink.h

>>> @@ -83,6 +83,20 @@ struct devlink_port_pci_vf_attrs {

>>>  	u8 external:1;

>>>  };

>>>

>>> +/**

>>> + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF

>>> +attributes

>>> + * @controller: Associated controller number

>>> + * @pf: Associated PCI PF number for this port.

>>> + * @sf: Associated PCI SF for of the PCI PF for this port.

>>> + * @external: when set, indicates if a port is for an external

>>> +controller  */ struct devlink_port_pci_sf_attrs {

>>> +	u32 controller;

>>> +	u16 pf;

>>> +	u32 sf;

>>

>> Why a u32? Do you expect to support that many SFs? Seems like even a u16 is

>> more than you can adequately name within an IFNAMESZ buffer.

>>

> I think u16 is likely enough, which let use creates 64K SF ports which is a lot. :-)

> Was little concerned that it shouldn't fall short in few years. So picked u32. 

> Users will be able to make use of alternative names so just because IFNAMESZ is 16 characters, do not want to limit sfnum size.

> What do you think?

> 

>>

>>> +	u8 external:1;

>>> +};

>>> +

>>>  /**

>>>   * struct devlink_port_attrs - devlink port object

>>>   * @flavour: flavour of the port

>>

>>

>>> diff --git a/net/core/devlink.c b/net/core/devlink.c index

>>> e5b71f3c2d4d..fada660fd515 100644

>>> --- a/net/core/devlink.c

>>> +++ b/net/core/devlink.c

>>> @@ -7855,6 +7889,9 @@ static int

>> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,

>>>  		n = snprintf(name, len, "pf%uvf%u",

>>>  			     attrs->pci_vf.pf, attrs->pci_vf.vf);

>>>  		break;

>>> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:

>>> +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-

>>> pci_sf.sf);

>>> +		break;

>>>  	}

>>>

>>>  	if (n >= len)

>>>

>>

>> And as I noted before, this function continues to grow device names and it is

>> going to spill over the IFNAMESZ buffer and EINVAL is going to be confusing. It

>> really needs better error handling back to users (not kernel buffers).

> Alternative names [1] should help to overcome the limitation of IFNAMESZ.

> For error code EINVAL, should it be ENOSPC?

> If so, should I insert a pre-patch in this series?

> 

> [1] ip link property add dev DEVICE [ altname NAME .. ]

> 


You keep adding patches that extend the template based names. Those are
going to cause odd EINVAL failures (the absolute worst kind of
configuration failure) with no way for a user to understand why the
command is failing, and you need to handle that. Returning an extack
message back to the user is preferred.

Yes, the altnames provides a solution after the the netdevice has been
created, but I do not see how that works when the netdevice is created
as part of devlink commands using the template names approach.
David Ahern Sept. 18, 2020, 3:23 p.m. UTC | #12
On 9/17/20 10:41 PM, Parav Pandit wrote:
> Hi David,
> 
>> From: David Ahern <dsahern@gmail.com>
>> Sent: Friday, September 18, 2020 9:08 AM
>>
>> On 9/17/20 9:29 PM, Parav Pandit wrote:
>>>>> Examples:
>>>>>
>>>>> Create a PCI PF and PCI SF port.
>>>>> $ devlink port add netdevsim/netdevsim10/10 flavour pcipf pfnum 0 $
>>>>> devlink port add netdevsim/netdevsim10/11 flavour pcisf pfnum 0
>>>>> sfnum
>>>>> 44 $ devlink port show netdevsim/netdevsim10/11
>>>>> netdevsim/netdevsim10/11: type eth netdev eni10npf0sf44 flavour
>>>>> pcisf
>>>> controller 0 pfnum 0 sfnum 44 external true splittable false
>>>>>   function:
>>>>>     hw_addr 00:00:00:00:00:00 state inactive
>>>>>
>>>>> $ devlink port function set netdevsim/netdevsim10/11 hw_addr
>>>>> 00:11:22:33:44:55 state active
>>>>>
>>>>> $ devlink port show netdevsim/netdevsim10/11 -jp {
>>>>>     "port": {
>>>>>         "netdevsim/netdevsim10/11": {
>>>>>             "type": "eth",
>>>>>             "netdev": "eni10npf0sf44",
>>>>
>>>> I could be missing something, but it does not seem like this patch
>>>> creates the netdevice for the subfunction.
>>>>
>>> The sf port created here is the eswitch port with a valid switch id similar to PF
>> and physical port.
>>> So the netdev created is the representor netdevice.
>>> It is created uniformly for subfunction and pf port flavours.
>>
>> To be clear: If I run the devlink commands to create a sub-function, `ip link
>> show` should list a net_device that corresponds to the sub-function?
> 
> In this series only representor netdevice corresponds to sub-function will be visible in ip link show, i.e. eni10npf0sf44.
> 
> Netdevsim is only simulating the eswitch side or control path at present for pf/vf/sf ports.
> So other end of this port (netdevice/rdma device/vdpa device) are not yet created.
> 
> Subfunction will be anchored on virtbus described in RFC [1], which is not yet in-kernel yet.
> Grep for "every SF a device is created on virtbus" to jump to this part of the long RFC.
> 
> [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
> 

Thanks for the reference. I have seen that. I am interested in this idea
of creating netdevs for 'slices' of an asic or nic, but it is not clear
to me how it connects end to end. Will you be able to create a
sub-function based netdevice, assign it limited resources from the nic
and then assign that netdevice to a container for example?
Parav Pandit Sept. 18, 2020, 3:51 p.m. UTC | #13
> From: David Ahern <dsahern@gmail.com>

> Sent: Friday, September 18, 2020 8:54 PM

> 

> On 9/17/20 10:41 PM, Parav Pandit wrote:

> > Hi David,

> >

> >> From: David Ahern <dsahern@gmail.com>

> >> Sent: Friday, September 18, 2020 9:08 AM

> >>

> >> On 9/17/20 9:29 PM, Parav Pandit wrote:

> >>>>> Examples:

> >>>>>

> >>>>> Create a PCI PF and PCI SF port.

> >>>>> $ devlink port add netdevsim/netdevsim10/10 flavour pcipf pfnum 0

> >>>>> $ devlink port add netdevsim/netdevsim10/11 flavour pcisf pfnum 0

> >>>>> sfnum

> >>>>> 44 $ devlink port show netdevsim/netdevsim10/11

> >>>>> netdevsim/netdevsim10/11: type eth netdev eni10npf0sf44 flavour

> >>>>> pcisf

> >>>> controller 0 pfnum 0 sfnum 44 external true splittable false

> >>>>>   function:

> >>>>>     hw_addr 00:00:00:00:00:00 state inactive

> >>>>>

> >>>>> $ devlink port function set netdevsim/netdevsim10/11 hw_addr

> >>>>> 00:11:22:33:44:55 state active

> >>>>>

> >>>>> $ devlink port show netdevsim/netdevsim10/11 -jp {

> >>>>>     "port": {

> >>>>>         "netdevsim/netdevsim10/11": {

> >>>>>             "type": "eth",

> >>>>>             "netdev": "eni10npf0sf44",

> >>>>

> >>>> I could be missing something, but it does not seem like this patch

> >>>> creates the netdevice for the subfunction.

> >>>>

> >>> The sf port created here is the eswitch port with a valid switch id

> >>> similar to PF

> >> and physical port.

> >>> So the netdev created is the representor netdevice.

> >>> It is created uniformly for subfunction and pf port flavours.

> >>

> >> To be clear: If I run the devlink commands to create a sub-function,

> >> `ip link show` should list a net_device that corresponds to the sub-

> function?

> >

> > In this series only representor netdevice corresponds to sub-function will

> be visible in ip link show, i.e. eni10npf0sf44.

> >

> > Netdevsim is only simulating the eswitch side or control path at present for

> pf/vf/sf ports.

> > So other end of this port (netdevice/rdma device/vdpa device) are not yet

> created.

> >

> > Subfunction will be anchored on virtbus described in RFC [1], which is not

> yet in-kernel yet.

> > Grep for "every SF a device is created on virtbus" to jump to this part of the

> long RFC.

> >

> > [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/

> >

> 

> Thanks for the reference. I have seen that. I am interested in this idea of

> creating netdevs for 'slices' of an asic or nic, but it is not clear to me how it

> connects end to end. Will you be able to create a sub-function based

> netdevice, assign it limited resources from the nic and then assign that

> netdevice to a container for example?

Yep. You described is precisely well.
This short series creates eswitch representor for the moment.
Once virtbus is in_kernel, will extend to create actual netdevice of the subfunction too.
Parav Pandit Sept. 18, 2020, 4:13 p.m. UTC | #14
Hi David,

> From: David Ahern <dsahern@gmail.com>

> Sent: Friday, September 18, 2020 8:45 PM

> 

> On 9/17/20 10:18 PM, Parav Pandit wrote:

> >

> >> From: David Ahern <dsahern@gmail.com>

> >> Sent: Friday, September 18, 2020 1:32 AM

> >>

> >> On 9/17/20 11:20 AM, Parav Pandit wrote:

> >>> diff --git a/include/net/devlink.h b/include/net/devlink.h index

> >>> 48b1c1ef1ebd..1edb558125b0 100644

> >>> --- a/include/net/devlink.h

> >>> +++ b/include/net/devlink.h

> >>> @@ -83,6 +83,20 @@ struct devlink_port_pci_vf_attrs {

> >>>  	u8 external:1;

> >>>  };

> >>>

> >>> +/**

> >>> + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF

> >>> +attributes

> >>> + * @controller: Associated controller number

> >>> + * @pf: Associated PCI PF number for this port.

> >>> + * @sf: Associated PCI SF for of the PCI PF for this port.

> >>> + * @external: when set, indicates if a port is for an external

> >>> +controller  */ struct devlink_port_pci_sf_attrs {

> >>> +	u32 controller;

> >>> +	u16 pf;

> >>> +	u32 sf;

> >>

> >> Why a u32? Do you expect to support that many SFs? Seems like even a

> >> u16 is more than you can adequately name within an IFNAMESZ buffer.

> >>

> > I think u16 is likely enough, which let use creates 64K SF ports which

> > is a lot. :-) Was little concerned that it shouldn't fall short in few years. So

> picked u32.

> > Users will be able to make use of alternative names so just because

> IFNAMESZ is 16 characters, do not want to limit sfnum size.

> > What do you think?

> >

> >>

> >>> +	u8 external:1;

> >>> +};

> >>> +

> >>>  /**

> >>>   * struct devlink_port_attrs - devlink port object

> >>>   * @flavour: flavour of the port

> >>

> >>

> >>> diff --git a/net/core/devlink.c b/net/core/devlink.c index

> >>> e5b71f3c2d4d..fada660fd515 100644

> >>> --- a/net/core/devlink.c

> >>> +++ b/net/core/devlink.c

> >>> @@ -7855,6 +7889,9 @@ static int

> >> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,

> >>>  		n = snprintf(name, len, "pf%uvf%u",

> >>>  			     attrs->pci_vf.pf, attrs->pci_vf.vf);

> >>>  		break;

> >>> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:

> >>> +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-

> >>> pci_sf.sf);

> >>> +		break;

> >>>  	}

> >>>

> >>>  	if (n >= len)

> >>>

> >>

> >> And as I noted before, this function continues to grow device names

> >> and it is going to spill over the IFNAMESZ buffer and EINVAL is going

> >> to be confusing. It really needs better error handling back to users (not

> kernel buffers).

> > Alternative names [1] should help to overcome the limitation of IFNAMESZ.

> > For error code EINVAL, should it be ENOSPC?

> > If so, should I insert a pre-patch in this series?

> >

> > [1] ip link property add dev DEVICE [ altname NAME .. ]

> >

> 

> You keep adding patches that extend the template based names. Those are

> going to cause odd EINVAL failures (the absolute worst kind of configuration

> failure) with no way for a user to understand why the command is failing, and

> you need to handle that. Returning an extack message back to the user is

> preferred.

Sure, make sense.
I will run one short series after this one, to return extack in below code flow and other where extack return is possible.
In sysfs flow it is irrelevant anyway.
rtnl_getlink()
  rtnl_phys_port_name_fill()
     dev_get_phys_port_name()
       ndo_phys_port_name()

is that ok?
This series is not really making phys_port_name any worse except that sfnum field width is bit larger than vfnum.

Now coming back to phys_port_name not fitting in 15 characters which can trigger -EINVAL error is very slim in most sane cases.
Let's take an example.
A controller in valid range of 0 to 16,
PF in range of 0 to 7,
VF in range of 0 to 255,
SF in range of 0 to 65536,

Will generate VF phys_port_name=c16pf7vf255 (11 chars + 1 null)
SF phys_port name = c16pf7sf65535 (13 chars + 1 null)
So yes, eth dev name won't fit in but phys_port_name failure is almost nil.

> 

> Yes, the altnames provides a solution after the the netdevice has been

> created, but I do not see how that works when the netdevice is created as

> part of devlink commands using the template names approach.

systemd/udev version v245 understands the parent PCI device of netdev and phys_port_name to construct a unique netdevice name.

This is the example from this patch series for PF port.

udevadm test-builtin net_id /sys/class/net/eth0
Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=eni10npf0
Unload module index
Unloaded link configuration context.

And for SF port,

udevadm test-builtin net_id /sys/class/net/eth1
Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=eni10npf0sf44
Unload module index
Unloaded link configuration context.

Similar VF naming in places for I think more than a year now.
Jakub Kicinski Sept. 18, 2020, 4:52 p.m. UTC | #15
On Thu, 17 Sep 2020 20:20:12 +0300 Parav Pandit wrote:
> Hi Dave, Jakub,
> 
> Similar to PCI VF, PCI SF represents portion of the device.
> PCI SF is represented using a new devlink port flavour.
> 
> This short series implements small part of the RFC described in detail at [1] and [2].
> 
> It extends
> (a) devlink core to expose new devlink port flavour 'pcisf'.
> (b) Expose new user interface to add/delete devlink port.
> (c) Extends netdevsim driver to simulate PCI PF and SF ports
> (d) Add port function state attribute

Is this an RFC? It doesn't add any in-tree users.
Parav Pandit Sept. 18, 2020, 5:08 p.m. UTC | #16
> From: Jakub Kicinski <kuba@kernel.org>

> Sent: Friday, September 18, 2020 10:22 PM

> 

> On Thu, 17 Sep 2020 20:20:12 +0300 Parav Pandit wrote:

> > Hi Dave, Jakub,

> >

> > Similar to PCI VF, PCI SF represents portion of the device.

> > PCI SF is represented using a new devlink port flavour.

> >

> > This short series implements small part of the RFC described in detail at [1]

> and [2].

> >

> > It extends

> > (a) devlink core to expose new devlink port flavour 'pcisf'.

> > (b) Expose new user interface to add/delete devlink port.

> > (c) Extends netdevsim driver to simulate PCI PF and SF ports

> > (d) Add port function state attribute

> 

> Is this an RFC? It doesn't add any in-tree users.

It is not an RFC.
devlink + mlx5 + netdevsim is crossing 25+ patches on eswitch side.
So splitting it to logical piece as devlink + netdevsim.
After which mlx5 eswitch side come close to 15 + 4 patches which can run as two separate patchset.

What do you suggest?
Jakub Kicinski Sept. 18, 2020, 5:37 p.m. UTC | #17
On Fri, 18 Sep 2020 17:08:15 +0000 Parav Pandit wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Friday, September 18, 2020 10:22 PM
> > 
> > On Thu, 17 Sep 2020 20:20:12 +0300 Parav Pandit wrote:  
> > > Hi Dave, Jakub,
> > >
> > > Similar to PCI VF, PCI SF represents portion of the device.
> > > PCI SF is represented using a new devlink port flavour.
> > >
> > > This short series implements small part of the RFC described in detail at [1]  
> > and [2].  
> > >
> > > It extends
> > > (a) devlink core to expose new devlink port flavour 'pcisf'.
> > > (b) Expose new user interface to add/delete devlink port.
> > > (c) Extends netdevsim driver to simulate PCI PF and SF ports
> > > (d) Add port function state attribute  
> > 
> > Is this an RFC? It doesn't add any in-tree users.  
> It is not an RFC.
> devlink + mlx5 + netdevsim is crossing 25+ patches on eswitch side.
> So splitting it to logical piece as devlink + netdevsim.
> After which mlx5 eswitch side come close to 15 + 4 patches which can
> run as two separate patchset.
> 
> What do you suggest?

Start with real patches, not netdevsim.
Parav Pandit Sept. 18, 2020, 5:47 p.m. UTC | #18
> From: Jakub Kicinski <kuba@kernel.org>

> Sent: Friday, September 18, 2020 11:07 PM

> 

> On Fri, 18 Sep 2020 17:08:15 +0000 Parav Pandit wrote:

> > > From: Jakub Kicinski <kuba@kernel.org>

> > > Sent: Friday, September 18, 2020 10:22 PM

> > >

> > > On Thu, 17 Sep 2020 20:20:12 +0300 Parav Pandit wrote:

> > > > Hi Dave, Jakub,

> > > >

> > > > Similar to PCI VF, PCI SF represents portion of the device.

> > > > PCI SF is represented using a new devlink port flavour.

> > > >

> > > > This short series implements small part of the RFC described in

> > > > detail at [1]

> > > and [2].

> > > >

> > > > It extends

> > > > (a) devlink core to expose new devlink port flavour 'pcisf'.

> > > > (b) Expose new user interface to add/delete devlink port.

> > > > (c) Extends netdevsim driver to simulate PCI PF and SF ports

> > > > (d) Add port function state attribute

> > >

> > > Is this an RFC? It doesn't add any in-tree users.

> > It is not an RFC.

> > devlink + mlx5 + netdevsim is crossing 25+ patches on eswitch side.

> > So splitting it to logical piece as devlink + netdevsim.

> > After which mlx5 eswitch side come close to 15 + 4 patches which can

> > run as two separate patchset.

> >

> > What do you suggest?

> 

> Start with real patches, not netdevsim.


Hmm. Shall I split the series below, would that be ok?

First patchset,
(a) devlink piece to add/delete port
(b) mlx5 counter part

Second patchset,
(a) devlink piece to set the state
(b) mlx5 counter part

Follow on patchset to create/delete sf's netdev on virtbus in mlx5 + devlink plumbing.
Netdevsim after that.
Jakub Kicinski Sept. 18, 2020, 6:28 p.m. UTC | #19
On Fri, 18 Sep 2020 17:47:24 +0000 Parav Pandit wrote:
> > > What do you suggest?  
> > 
> > Start with real patches, not netdevsim.  
> 
> Hmm. Shall I split the series below, would that be ok?
> 
> First patchset,
> (a) devlink piece to add/delete port
> (b) mlx5 counter part
> 
> Second patchset,
> (a) devlink piece to set the state
> (b) mlx5 counter part
> 
> Follow on patchset to create/delete sf's netdev on virtbus in mlx5 + devlink plumbing.
> Netdevsim after that.
 
I'd start from the virtbus part so we can see what's being created.
Parav Pandit Sept. 18, 2020, 8:09 p.m. UTC | #20
> From: Jakub Kicinski <kuba@kernel.org>

> Sent: Friday, September 18, 2020 11:58 PM

> 

> On Fri, 18 Sep 2020 17:47:24 +0000 Parav Pandit wrote:

> > > > What do you suggest?

> > >

> > > Start with real patches, not netdevsim.

> >

> > Hmm. Shall I split the series below, would that be ok?

> >

> > First patchset,

> > (a) devlink piece to add/delete port

> > (b) mlx5 counter part

> >

> > Second patchset,

> > (a) devlink piece to set the state

> > (b) mlx5 counter part

> >

> > Follow on patchset to create/delete sf's netdev on virtbus in mlx5 + devlink

> plumbing.

> > Netdevsim after that.

> 

> I'd start from the virtbus part so we can see what's being created.


How do you reach there without a user interface?
Keller, Jacob E Sept. 18, 2020, 10:53 p.m. UTC | #21
On 9/17/2020 8:35 PM, Parav Pandit wrote:
> Hi Jacob,
> 
>> From: Jacob Keller <jacob.e.keller@intel.com>
>> Sent: Friday, September 18, 2020 12:29 AM
>>
>>
>> We lost this comment in the move it looks like. I think it's still useful to keep for
>> clarity of why we're converting -EOPNOTSUPP in the return.
> You are right. It is a useful comment.
> However, it is already covered in include/net/devlink.h in the standard comment of the callback function.
> For new driver implementation, looking there will be more useful.
> So I guess its ok to drop from here.
> 

Yea if it's still in the header I don't think it's too important to keep
here.

Thanks!
-Jake
Keller, Jacob E Sept. 18, 2020, 11:04 p.m. UTC | #22
On 9/17/2020 8:54 PM, Parav Pandit wrote:
> 

> 

>> From: Jacob Keller <jacob.e.keller@intel.com>

>> Sent: Friday, September 18, 2020 12:00 AM

>>

>>

>> On 9/17/2020 10:20 AM, Parav Pandit wrote:

>>> A PCI sub-function (SF) represents a portion of the device similar to

>>> PCI VF.

>>>

>>> In an eswitch, PCI SF may have port which is normally represented

>>> using a representor netdevice.

>>> To have better visibility of eswitch port, its association with SF,

>>> and its representor netdevice, introduce a PCI SF port flavour.

>>>

>>> When devlink port flavour is PCI SF, fill up PCI SF attributes of the

>>> port.

>>>

>>> Extend port name creation using PCI PF and SF number scheme on best

>>> effort basis, so that vendor drivers can skip defining their own

>>> scheme.

>>

>> What does this mean? What's the scheme used? 

>>

> Scheme used is equivalent as what is used for PCI VF ports. pfNvfM.

> It is pfNsfM.

> Below example shows the representor netdevice name as 'eni10npf0sf44' built by systemd/udev using phys_port_name.

> 

>> Do drivers still have the option to make their own scheme? If so, why?

> Today we have two types of drivers (mlx5_core, netdevsim) which uses devlink core which creates the name.

> Or other drivers (bnxt, nfp) which doesn't yet migrated to use devlink infra for PCI PF, VF ports.

> Such drivers are phys_port_name and other ndos.

> It is not the role of this patch to block those drivers, but any new implementation doesn't need to hand code switch_id and phys_port_name related ndos for SF.

> For example, bnxt_vf_rep_get_phys_port_name().

> 



Ok, thanks for the explanation.

>> It's not obvious to me in this patch where the numbering scheme comes from. It

>> looks like it's still up to the caller to set the numbers.

>>

> Naming scheme for PCI PF and PCI VF port flavours already exist.

> Scheme is equivalent for PCI SF flavour.

> 

> I thought example is good enough to show that, but I will update commit message to describe this scheme to make it clear. pfNsfM.

>  


I think I just hadn't quite moved from "sf number" to "name of the
netdevice" and was thinking of scheme for how the sf number is selected,
which isn't really what the statement was about.

>>>>>> An example view of a PCI SF port.

>>>

>>> $ devlink port show netdevsim/netdevsim10/2

>>> netdevsim/netdevsim10/2: type eth netdev eni10npf0sf44 flavour pcisf

>> controller 0 pfnum 0 sfnum 44 external false splittable false

>>>   function:

>>>     hw_addr 00:00:00:00:00:00

>>>

>>> devlink port show netdevsim/netdevsim10/2 -jp {

>>>     "port": {

>>>         "netdevsim/netdevsim10/2": {

>>>             "type": "eth",

>>>             "netdev": "eni10npf0sf44",

>>>             "flavour": "pcisf",

>>>             "controller": 0,

>>>             "pfnum": 0,

>>>             "sfnum": 44,

>>>             "external": false,

>>>             "splittable": false,

>>>             "function": {

>>>                 "hw_addr": "00:00:00:00:00:00"

>>>             }

>>>         }

>>>     }

>>> }

>>>

>>> Signed-off-by: Parav Pandit <parav@nvidia.com>

>>> Reviewed-by: Jiri Pirko <jiri@nvidia.com>

>>> ---

>>>  include/net/devlink.h        | 17 +++++++++++++++++

>>>  include/uapi/linux/devlink.h |  7 +++++++

>>>  net/core/devlink.c           | 37 ++++++++++++++++++++++++++++++++++++

>>>  3 files changed, 61 insertions(+)

>>>

> 

> 

>>>  static int __devlink_port_phys_port_name_get(struct devlink_port

>> *devlink_port,

>>>  					     char *name, size_t len)

>>>  {

>>> @@ -7855,6 +7889,9 @@ static int

>> __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,

>>>  		n = snprintf(name, len, "pf%uvf%u",

>>>  			     attrs->pci_vf.pf, attrs->pci_vf.vf);

>>>  		break;

>>> +	case DEVLINK_PORT_FLAVOUR_PCI_SF:

>>> +		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs-

>>> pci_sf.sf);

>>> +		break;

>>>  	}

>>>

> This is where the naming scheme is done, like pcipf and pcivf port flavours.

> 

>>>  	if (n >= len)

>>>
David Ahern Sept. 19, 2020, 4:49 a.m. UTC | #23
On 9/18/20 10:13 AM, Parav Pandit wrote:
>> You keep adding patches that extend the template based names. Those are

>> going to cause odd EINVAL failures (the absolute worst kind of configuration

>> failure) with no way for a user to understand why the command is failing, and

>> you need to handle that. Returning an extack message back to the user is

>> preferred.

> Sure, make sense.

> I will run one short series after this one, to return extack in below code flow and other where extack return is possible.

> In sysfs flow it is irrelevant anyway.


yes, sysfs is a different problem.

> rtnl_getlink()

>   rtnl_phys_port_name_fill()

>      dev_get_phys_port_name()

>        ndo_phys_port_name()

> 

> is that ok?


sure. The overflow is not going to happen today with 1-10 SFs; but you
are pushing ever close to the limit hence the push.


> This series is not really making phys_port_name any worse except that sfnum field width is bit larger than vfnum.

> 

> Now coming back to phys_port_name not fitting in 15 characters which can trigger -EINVAL error is very slim in most sane cases.

> Let's take an example.

> A controller in valid range of 0 to 16,

> PF in range of 0 to 7,

> VF in range of 0 to 255,

> SF in range of 0 to 65536,

> 

> Will generate VF phys_port_name=c16pf7vf255 (11 chars + 1 null)

> SF phys_port name = c16pf7sf65535 (13 chars + 1 null)

> So yes, eth dev name won't fit in but phys_port_name failure is almost nil.

> 


You lost me on that last sentence. Per your example for SF, adding 'eni'
or 'eth' as a prefix (which you commonly use in examples and which are
common prefixes) to that name and you overflow the IFNAMESZ limit.


(understand about the systemd integration and appreciate the forward
thinking about that).
Parav Pandit Sept. 19, 2020, 5:35 a.m. UTC | #24
> From: David Ahern <dsahern@gmail.com>

> Sent: Saturday, September 19, 2020 10:19 AM

> 

> On 9/18/20 10:13 AM, Parav Pandit wrote:

> >> You keep adding patches that extend the template based names. Those

> >> are going to cause odd EINVAL failures (the absolute worst kind of

> >> configuration

> >> failure) with no way for a user to understand why the command is

> >> failing, and you need to handle that. Returning an extack message

> >> back to the user is preferred.

> > Sure, make sense.

> > I will run one short series after this one, to return extack in below code flow

> and other where extack return is possible.

> > In sysfs flow it is irrelevant anyway.

> 

> yes, sysfs is a different problem.

> 

> > rtnl_getlink()

> >   rtnl_phys_port_name_fill()

> >      dev_get_phys_port_name()

> >        ndo_phys_port_name()

> >

> > is that ok?

> 

> sure. The overflow is not going to happen today with 1-10 SFs; but you are

> pushing ever close to the limit hence the push.

> 

Ok. yeah. got it.
> 

> > This series is not really making phys_port_name any worse except that sfnum

> field width is bit larger than vfnum.

> >

> > Now coming back to phys_port_name not fitting in 15 characters which can

> trigger -EINVAL error is very slim in most sane cases.

> > Let's take an example.

> > A controller in valid range of 0 to 16, PF in range of 0 to 7, VF in

> > range of 0 to 255, SF in range of 0 to 65536,

> >

> > Will generate VF phys_port_name=c16pf7vf255 (11 chars + 1 null) SF

> > phys_port name = c16pf7sf65535 (13 chars + 1 null) So yes, eth dev

> > name won't fit in but phys_port_name failure is almost nil.

> >

> 

> You lost me on that last sentence. Per your example for SF, adding 'eni'

> or 'eth' as a prefix (which you commonly use in examples and which are

> common prefixes) to that name and you overflow the IFNAMESZ limit.

>

Right. Got you.
Will run short series for returning extack.
 
> 

> (understand about the systemd integration and appreciate the forward thinking

> about that).

Thank you David.
Parav Pandit Sept. 19, 2020, 5:41 a.m. UTC | #25
> From: Jacob Keller <jacob.e.keller@intel.com>

> Sent: Saturday, September 19, 2020 4:24 AM

> 

> 

> On 9/17/2020 8:35 PM, Parav Pandit wrote:

> > Hi Jacob,

> >

> >> From: Jacob Keller <jacob.e.keller@intel.com>

> >> Sent: Friday, September 18, 2020 12:29 AM

> >>

> >>

> >> We lost this comment in the move it looks like. I think it's still

> >> useful to keep for clarity of why we're converting -EOPNOTSUPP in the

> return.

> > You are right. It is a useful comment.

> > However, it is already covered in include/net/devlink.h in the standard

> comment of the callback function.

> > For new driver implementation, looking there will be more useful.

> > So I guess its ok to drop from here.

> >

> 

> Yea if it's still in the header I don't think it's too important to keep here.

>

Alright. Will keep in the header.
Thanks.
 
> Thanks!

> -Jake
Jakub Kicinski Sept. 21, 2020, 10:02 p.m. UTC | #26
On Fri, 18 Sep 2020 20:09:24 +0000 Parav Pandit wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Friday, September 18, 2020 11:58 PM
> > 
> > On Fri, 18 Sep 2020 17:47:24 +0000 Parav Pandit wrote:  
> > > > > What do you suggest?  
> > > >
> > > > Start with real patches, not netdevsim.  
> > >
> > > Hmm. Shall I split the series below, would that be ok?
> > >
> > > First patchset,
> > > (a) devlink piece to add/delete port
> > > (b) mlx5 counter part
> > >
> > > Second patchset,
> > > (a) devlink piece to set the state
> > > (b) mlx5 counter part
> > >
> > > Follow on patchset to create/delete sf's netdev on virtbus in mlx5 + devlink  
> > plumbing.  
> > > Netdevsim after that.  
> > 
> > I'd start from the virtbus part so we can see what's being created.  
> 
> How do you reach there without a user interface?

Well.. why do you have a user interface which doesn't cause anything to
happen (devices won't get created)? You're splitting the submission,
it's obvious the implementation won't be complete after the first one.

My expectation is that your implementation of the devlink commands will
just hand them off to FW, so there won't be anything interesting there
to review. 

Start with the hard / risky parts - I consider the virtbus to be that,
since the conversation there includes multiple vendors, use cases and
subsystems.
Parav Pandit Sept. 22, 2020, 4:37 a.m. UTC | #27
> From: Jakub Kicinski <kuba@kernel.org>

> Sent: Tuesday, September 22, 2020 3:32 AM

> 

> On Fri, 18 Sep 2020 20:09:24 +0000 Parav Pandit wrote:

> > > From: Jakub Kicinski <kuba@kernel.org>

> > > Sent: Friday, September 18, 2020 11:58 PM

> > >

> > > On Fri, 18 Sep 2020 17:47:24 +0000 Parav Pandit wrote:

> > > > > > What do you suggest?

> > > > >

> > > > > Start with real patches, not netdevsim.

> > > >

> > > > Hmm. Shall I split the series below, would that be ok?

> > > >

> > > > First patchset,

> > > > (a) devlink piece to add/delete port

> > > > (b) mlx5 counter part

> > > >

> > > > Second patchset,

> > > > (a) devlink piece to set the state

> > > > (b) mlx5 counter part

> > > >

> > > > Follow on patchset to create/delete sf's netdev on virtbus in mlx5

> > > > + devlink

> > > plumbing.

> > > > Netdevsim after that.

> > >

> > > I'd start from the virtbus part so we can see what's being created.

> >

> > How do you reach there without a user interface?

> 

> Well.. why do you have a user interface which doesn't cause anything to happen

> (devices won't get created)? You're splitting the submission, it's obvious the

> implementation won't be complete after the first one.

> 

> My expectation is that your implementation of the devlink commands will just

> hand them off to FW, so there won't be anything interesting there to review.

Correct. Since handing off to firmware and processing event from firmware is creating fairly more patches,
In first series I will just go inline where devlink command would create/remove virtbus device on state active/inactive.

This way it should be possible to have minimal working series under 20 patches.
This way in one patchset, I prefer to cover
(a) devlink plumbing for port add/del, port state
(b) mlx5 eswitch refactor and devlink callback handling
(c) devlink plumbing around
(d) virtbus device and their netdev creation

This also gives complete view from user interface to netdev device creation.
Post this series, will improve the internals of mlx5 via events etc which doesn't need multi-vendor, virtbus and devlink involvement.
Will differ netdevsim to later date.

> 

> Start with the hard / risky parts - I consider the virtbus to be that, since the

> conversation there includes multiple vendors, use cases and subsystems.