From patchwork Fri Jan 22 19:36:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 369194 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2E5CC433E0 for ; Fri, 22 Jan 2021 23:13:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6741023B17 for ; Fri, 22 Jan 2021 23:13:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729301AbhAVXNM (ORCPT ); Fri, 22 Jan 2021 18:13:12 -0500 Received: from mail.kernel.org ([198.145.29.99]:60832 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729213AbhAVThs (ORCPT ); Fri, 22 Jan 2021 14:37:48 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id C94BF23B03; Fri, 22 Jan 2021 19:37:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611344226; bh=eSYLmvCypJgMnlM449jFAOgQeum1+s5F1uGrryXaYDQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EIptk5vOiPsnz3JapYKP209SUxpNpb+FZ6QQsLTd96ZEZUPBqBZKKzC7B3Cj60Z0W a8T8t715zkRDnb45qjGfGuzLqG6D2LgxfbUtSPda/70CgEApPaDBIsl2XL81h0ASyw zYfNxHrPNKXiysmiAGW+C83vCYeWvr9CCIhZ62ciCpSBFDSk5fi5l8GDytiBzWMK5k YpXdo0b2mXvLGkBaKyfgbHy+5xm2XijR7fr1ZMKRA/VkKf+wuVvBh2jUFl3ws8qJmD NmNGXaBJh54IN4lHNwRj/bw3hsuasUICNO7JTBk6CYr7fvhiNZTOpAzSQkDG+Aojjt xB4XucnmoMjCA== From: Saeed Mahameed To: "David S. Miller" , Jakub Kicinski , Jason Gunthorpe Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, alexander.duyck@gmail.com, sridhar.samudrala@intel.com, edwin.peer@broadcom.com, dsahern@kernel.org, kiran.patil@intel.com, jacob.e.keller@intel.com, david.m.ertman@intel.com, dan.j.williams@intel.com, Parav Pandit , Jiri Pirko , Vu Pham , Saeed Mahameed Subject: [net-next V10 02/14] devlink: Introduce PCI SF port flavour and port attribute Date: Fri, 22 Jan 2021 11:36:46 -0800 Message-Id: <20210122193658.282884-3-saeed@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210122193658.282884-1-saeed@kernel.org> References: <20210122193658.282884-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Parav Pandit A PCI sub-function (SF) represents a portion of the device similar to PCI VF. In an eswitch, PCI SF may have port which is normally represented using a representor netdevice. To have better visibility of eswitch port, its association with SF, and its representor netdevice, introduce a PCI SF port flavour. When devlink port flavour is PCI SF, fill up PCI SF attributes of the port. Extend port name creation using PCI PF and SF number scheme on best effort basis, so that vendor drivers can skip defining their own scheme. This is done as cApfNSfM, where A, N and M are controller, PCI PF and PCI SF number respectively. This is similar to existing naming for PCI PF and PCI VF ports. An example view of a PCI SF port: $ devlink port show pci/0000:06:00.0/32768 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:88:88 state active opstate attached $ devlink port show pci/0000:06:00.0/32768 -jp { "port": { "pci/0000:06:00.0/32768": { "type": "eth", "netdev": "ens2f0npf0sf88", "flavour": "pcisf", "controller": 0, "pfnum": 0, "sfnum": 88, "splittable": false, "function": { "hw_addr": "00:00:00:00:88:88", "state": "active", "opstate": "attached" } } } } Signed-off-by: Parav Pandit Reviewed-by: Jiri Pirko Reviewed-by: Vu Pham Signed-off-by: Saeed Mahameed --- include/net/devlink.h | 16 +++++++++++++++ include/uapi/linux/devlink.h | 5 +++++ net/core/devlink.c | 39 ++++++++++++++++++++++++++++++++++++ 3 files changed, 60 insertions(+) diff --git a/include/net/devlink.h b/include/net/devlink.h index f466819cc477..dc3bf8000082 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -93,6 +93,18 @@ struct devlink_port_pci_vf_attrs { u8 external:1; }; +/** + * struct devlink_port_pci_sf_attrs - devlink port's PCI SF attributes + * @controller: Associated controller number + * @sf: Associated PCI SF for of the PCI PF for this port. + * @pf: Associated PCI PF number for this port. + */ +struct devlink_port_pci_sf_attrs { + u32 controller; + u32 sf; + u16 pf; +}; + /** * struct devlink_port_attrs - devlink port object * @flavour: flavour of the port @@ -103,6 +115,7 @@ struct devlink_port_pci_vf_attrs { * @phys: physical port attributes * @pci_pf: PCI PF port attributes * @pci_vf: PCI VF port attributes + * @pci_sf: PCI SF port attributes */ struct devlink_port_attrs { u8 split:1, @@ -114,6 +127,7 @@ struct devlink_port_attrs { struct devlink_port_phys_attrs phys; struct devlink_port_pci_pf_attrs pci_pf; struct devlink_port_pci_vf_attrs pci_vf; + struct devlink_port_pci_sf_attrs pci_sf; }; }; @@ -1404,6 +1418,8 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro u16 pf, bool external); void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller, u16 pf, u16 vf, bool external); +void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, + u32 controller, u16 pf, u32 sf); int devlink_sb_register(struct devlink *devlink, unsigned int sb_index, u32 size, u16 ingress_pools_count, u16 egress_pools_count, u16 ingress_tc_count, diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index cf89c318f2ac..1a241b09a7f8 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -200,6 +200,10 @@ enum devlink_port_flavour { DEVLINK_PORT_FLAVOUR_UNUSED, /* Port which exists in the switch, but * is not used in any way. */ + DEVLINK_PORT_FLAVOUR_PCI_SF, /* Represents eswitch port + * for the PCI SF. It is an internal + * port that faces the PCI SF. + */ }; enum devlink_param_cmode { @@ -529,6 +533,7 @@ enum devlink_attr { DEVLINK_ATTR_RELOAD_ACTION_INFO, /* nested */ DEVLINK_ATTR_RELOAD_ACTION_STATS, /* nested */ + DEVLINK_ATTR_PORT_PCI_SF_NUMBER, /* u32 */ /* add new attributes above here, update the policy in devlink.c */ __DEVLINK_ATTR_MAX, diff --git a/net/core/devlink.c b/net/core/devlink.c index c39496311b71..4cbc02fb602d 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -690,6 +690,15 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg, if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external)) return -EMSGSIZE; break; + case DEVLINK_PORT_FLAVOUR_PCI_SF: + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, + attrs->pci_sf.controller) || + nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, + attrs->pci_sf.pf) || + nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SF_NUMBER, + attrs->pci_sf.sf)) + return -EMSGSIZE; + break; case DEVLINK_PORT_FLAVOUR_PHYSICAL: case DEVLINK_PORT_FLAVOUR_CPU: case DEVLINK_PORT_FLAVOUR_DSA: @@ -8374,6 +8383,32 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro } EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set); +/** + * devlink_port_attrs_pci_sf_set - Set PCI SF port attributes + * + * @devlink_port: devlink port + * @controller: associated controller number for the devlink port instance + * @pf: associated PF for the devlink port instance + * @sf: associated SF of a PF for the devlink port instance + */ +void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller, + u16 pf, u32 sf) +{ + struct devlink_port_attrs *attrs = &devlink_port->attrs; + int ret; + + if (WARN_ON(devlink_port->registered)) + return; + ret = __devlink_port_attrs_set(devlink_port, + DEVLINK_PORT_FLAVOUR_PCI_SF); + if (ret) + return; + attrs->pci_sf.controller = controller; + attrs->pci_sf.pf = pf; + attrs->pci_sf.sf = sf; +} +EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set); + static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port, char *name, size_t len) { @@ -8422,6 +8457,10 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port, n = snprintf(name, len, "pf%uvf%u", attrs->pci_vf.pf, attrs->pci_vf.vf); break; + case DEVLINK_PORT_FLAVOUR_PCI_SF: + n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, + attrs->pci_sf.sf); + break; } if (n >= len) From patchwork Fri Jan 22 19:36:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 369193 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9876EC433DB for ; Fri, 22 Jan 2021 23:13:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6A16B23B17 for ; Fri, 22 Jan 2021 23:13:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728526AbhAVXNA (ORCPT ); Fri, 22 Jan 2021 18:13:00 -0500 Received: from mail.kernel.org ([198.145.29.99]:60850 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729281AbhAVThs (ORCPT ); Fri, 22 Jan 2021 14:37:48 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id A94CA23B05; Fri, 22 Jan 2021 19:37:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611344227; bh=ePJK/CbZQ05Hho4hxro8iInBID8HaCaVfWRaAtsamrA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PvUMmk2R0+z3KhonmJ2cZ6dNzr4QCP1ZwqBkXNwN3fqHnHQ1HC37KRr2HomTqAxEm XG0Dgd6pukFcHUE9wZEOvp+UVi9ymKq3rkGlGRx5sH45HM0niFWkfVkPVlbRTxE0nv swDpWCUH+iyd1i0I65MgaFgUSIXhJWdyBLFiJsZhxPFzA71ylKDg7ZF4ZzwueZjpzB aKwEQdWGSzJ9KMfiJ81D+OzghBPtR1vMbo18O1t/nnw/08kgJg3ci3x+fOzM5xslNh t4v4+MPBWDMLJSkNMbQC2dzvep8Q+Q/8S9UzIA3QaSIKyfmD2Il8dg61tt+ZDSG1pI Xi9+4/LoY8Kmg== From: Saeed Mahameed To: "David S. Miller" , Jakub Kicinski , Jason Gunthorpe Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, alexander.duyck@gmail.com, sridhar.samudrala@intel.com, edwin.peer@broadcom.com, dsahern@kernel.org, kiran.patil@intel.com, jacob.e.keller@intel.com, david.m.ertman@intel.com, dan.j.williams@intel.com, Parav Pandit , Vu Pham , Saeed Mahameed Subject: [net-next V10 03/14] devlink: Support add and delete devlink port Date: Fri, 22 Jan 2021 11:36:47 -0800 Message-Id: <20210122193658.282884-4-saeed@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210122193658.282884-1-saeed@kernel.org> References: <20210122193658.282884-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Parav Pandit Extended devlink interface for the user to add and delete a port. Extend devlink to connect user requests to driver to add/delete a port in the device. Driver routines are invoked without holding devlink instance lock. This enables driver to perform several devlink objects registration, unregistration such as (port, health reporter, resource etc) by using existing devlink APIs. This also helps to uniformly use the code for port unregistration during driver unload and during port deletion initiated by user. Examples of add, show and delete commands: $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev $ devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached $ devlink port show pci/0000:06:00.0/32768 pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached $ udevadm test-builtin net_id /sys/class/net/eth6 Load module index Parsed configuration file /usr/lib/systemd/network/99-default.link Created link configuration context. Using default interface naming scheme 'v245'. ID_NET_NAMING_SCHEME=v245 ID_NET_NAME_PATH=enp6s0f0npf0sf88 ID_NET_NAME_SLOT=ens2f0npf0sf88 Unload module index Unloaded link configuration context. Signed-off-by: Parav Pandit Reviewed-by: Vu Pham Signed-off-by: Saeed Mahameed --- include/net/devlink.h | 52 ++++++++++++++++++ net/core/devlink.c | 121 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 173 insertions(+) diff --git a/include/net/devlink.h b/include/net/devlink.h index dc3bf8000082..d8edd9a10907 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -152,6 +152,17 @@ struct devlink_port { struct mutex reporters_lock; /* Protects reporter_list */ }; +struct devlink_port_new_attrs { + enum devlink_port_flavour flavour; + unsigned int port_index; + u32 controller; + u32 sfnum; + u16 pfnum; + u8 port_index_valid:1, + controller_valid:1, + sfnum_valid:1; +}; + struct devlink_sb_pool_info { enum devlink_sb_pool_type pool_type; u32 size; @@ -1362,6 +1373,47 @@ struct devlink_ops { int (*port_function_hw_addr_set)(struct devlink *devlink, struct devlink_port *port, const u8 *hw_addr, int hw_addr_len, struct netlink_ext_ack *extack); + /** + * port_new() - Add a new port function of a specified flavor + * @devlink: Devlink instance + * @attrs: attributes of the new port + * @extack: extack for reporting error messages + * @new_port_index: index of the new port + * + * Devlink core will call this device driver function upon user request + * to create a new port function of a specified flavor and optional + * attributes + * + * Notes: + * - Called without devlink instance lock being held. Drivers must + * implement own means of synchronization + * - On success, drivers must register a port with devlink core + * + * Return: 0 on success, negative value otherwise. + */ + int (*port_new)(struct devlink *devlink, + const struct devlink_port_new_attrs *attrs, + struct netlink_ext_ack *extack, + unsigned int *new_port_index); + /** + * port_del() - Delete a port function + * @devlink: Devlink instance + * @port_index: port function index to delete + * @extack: extack for reporting error messages + * + * Devlink core will call this device driver function upon user request + * to delete a previously created port function + * + * Notes: + * - Called without devlink instance lock being held. Drivers must + * implement own means of synchronization + * - On success, drivers must unregister the corresponding devlink + * port + * + * Return: 0 on success, negative value otherwise. + */ + int (*port_del)(struct devlink *devlink, unsigned int port_index, + struct netlink_ext_ack *extack); }; static inline void *devlink_priv(struct devlink *devlink) diff --git a/net/core/devlink.c b/net/core/devlink.c index 4cbc02fb602d..541b5f549274 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -1147,6 +1147,111 @@ static int devlink_nl_cmd_port_unsplit_doit(struct sk_buff *skb, return devlink_port_unsplit(devlink, port_index, info->extack); } +static int devlink_port_new_notifiy(struct devlink *devlink, + unsigned int port_index, + struct genl_info *info) +{ + struct devlink_port *devlink_port; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + mutex_lock(&devlink->lock); + devlink_port = devlink_port_get_by_index(devlink, port_index); + if (!devlink_port) { + err = -ENODEV; + goto out; + } + + err = devlink_nl_port_fill(msg, devlink, devlink_port, + DEVLINK_CMD_NEW, info->snd_portid, + info->snd_seq, 0, NULL); + if (err) + goto out; + + err = genlmsg_reply(msg, info); + mutex_unlock(&devlink->lock); + return err; + +out: + mutex_unlock(&devlink->lock); + nlmsg_free(msg); + return err; +} + +static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink_port_new_attrs new_attrs = {}; + struct devlink *devlink = info->user_ptr[0]; + unsigned int new_port_index; + int err; + + if (!devlink->ops->port_new || !devlink->ops->port_del) + return -EOPNOTSUPP; + + if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] || + !info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]) { + NL_SET_ERR_MSG_MOD(extack, "Port flavour or PCI PF are not specified"); + return -EINVAL; + } + new_attrs.flavour = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_FLAVOUR]); + new_attrs.pfnum = + nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]); + + if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { + /* Port index of the new port being created by driver. */ + new_attrs.port_index = + nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + new_attrs.port_index_valid = true; + } + if (info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]) { + new_attrs.controller = + nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]); + new_attrs.controller_valid = true; + } + if (new_attrs.flavour == DEVLINK_PORT_FLAVOUR_PCI_SF && + info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]) { + new_attrs.sfnum = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]); + new_attrs.sfnum_valid = true; + } + + err = devlink->ops->port_new(devlink, &new_attrs, extack, + &new_port_index); + if (err) + return err; + + err = devlink_port_new_notifiy(devlink, new_port_index, info); + if (err && err != -ENODEV) { + /* Fail to send the response; destroy newly created port. */ + devlink->ops->port_del(devlink, new_port_index, extack); + } + return err; +} + +static int devlink_nl_cmd_port_del_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink *devlink = info->user_ptr[0]; + unsigned int port_index; + + if (!devlink->ops->port_del) + return -EOPNOTSUPP; + + if (!info->attrs[DEVLINK_ATTR_PORT_INDEX]) { + NL_SET_ERR_MSG_MOD(extack, "Port index is not specified"); + return -EINVAL; + } + port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + + return devlink->ops->port_del(devlink, port_index, extack); +} + static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink, struct devlink_sb *devlink_sb, enum devlink_command cmd, u32 portid, @@ -7605,6 +7710,10 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { [DEVLINK_ATTR_RELOAD_ACTION] = NLA_POLICY_RANGE(NLA_U8, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, DEVLINK_RELOAD_ACTION_MAX), [DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK), + [DEVLINK_ATTR_PORT_FLAVOUR] = { .type = NLA_U16 }, + [DEVLINK_ATTR_PORT_PCI_PF_NUMBER] = { .type = NLA_U16 }, + [DEVLINK_ATTR_PORT_PCI_SF_NUMBER] = { .type = NLA_U32 }, + [DEVLINK_ATTR_PORT_CONTROLLER_NUMBER] = { .type = NLA_U32 }, }; static const struct genl_small_ops devlink_nl_ops[] = { @@ -7644,6 +7753,18 @@ static const struct genl_small_ops devlink_nl_ops[] = { .flags = GENL_ADMIN_PERM, .internal_flags = DEVLINK_NL_FLAG_NO_LOCK, }, + { + .cmd = DEVLINK_CMD_PORT_NEW, + .doit = devlink_nl_cmd_port_new_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NO_LOCK, + }, + { + .cmd = DEVLINK_CMD_PORT_DEL, + .doit = devlink_nl_cmd_port_del_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NO_LOCK, + }, { .cmd = DEVLINK_CMD_SB_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, From patchwork Fri Jan 22 19:36:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 369196 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1F19C433E0 for ; Fri, 22 Jan 2021 23:05:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C693223B17 for ; Fri, 22 Jan 2021 23:05:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728403AbhAVXFH (ORCPT ); Fri, 22 Jan 2021 18:05:07 -0500 Received: from mail.kernel.org ([198.145.29.99]:34168 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730061AbhAVTk2 (ORCPT ); Fri, 22 Jan 2021 14:40:28 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id DEBB823B19; Fri, 22 Jan 2021 19:37:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611344231; bh=1/OkBEeVRnJGXA+NaqwKTgjw4sa5OO6eacs1KTvvueY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=k+GJUk27LYFq8blQQht8t/mwuMYKuu8M2PhVL4ec7IXhRQu3quUQiekSnuBEg+bbp Cb30xpjzl2yThdWCmefJOY+ArUuTsI9uK6/2i7haqGtpv7PW8VE9eoGB2Iu9KJE2wr 7MzKCMYa0N1h1fjaffmVD5me4Z+9rTvA8AM6FeU5G+ZW6l6OXEvqCs0JbiSz3mdjs+ fi7o3Wk+Y6bJdnblpiYbeyuSeXAIZiF90KBTnnoEfJahlBXMKTpkUNoX8y64AdyP0Q mvQdutBOgTkC+km4EacgBk9m74O/BbE9sI1//3KiTNtBDXzc+z1+NpS/nv5T58zz75 XQmHo2zfdevTQ== From: Saeed Mahameed To: "David S. Miller" , Jakub Kicinski , Jason Gunthorpe Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, alexander.duyck@gmail.com, sridhar.samudrala@intel.com, edwin.peer@broadcom.com, dsahern@kernel.org, kiran.patil@intel.com, jacob.e.keller@intel.com, david.m.ertman@intel.com, dan.j.williams@intel.com, Vu Pham , Parav Pandit , Roi Dayan , Saeed Mahameed Subject: [net-next V10 08/14] net/mlx5: E-switch, Prepare eswitch to handle SF vport Date: Fri, 22 Jan 2021 11:36:52 -0800 Message-Id: <20210122193658.282884-9-saeed@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210122193658.282884-1-saeed@kernel.org> References: <20210122193658.282884-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Vu Pham Prepare eswitch to handle SF vport during (a) querying eswitch functions (b) egress ACL creation (c) account for SF vports in total vports calculation Assign a dedicated placeholder for SFs vports and their representors. They are placed after VFs vports and before ECPF vports as below: [PF,VF0,...,VFn,SF0,...SFm,ECPF,UPLINK]. Change functions to map SF's vport numbers to indices when accessing the vports or representors arrays, and vice versa. Signed-off-by: Vu Pham Signed-off-by: Parav Pandit Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- .../mellanox/mlx5/core/esw/acl/egress_ofld.c | 2 +- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 11 +++- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 50 +++++++++++++++++++ .../mellanox/mlx5/core/eswitch_offloads.c | 11 ++++ .../net/ethernet/mellanox/mlx5/core/vport.c | 3 +- 5 files changed, 73 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c index 4c74e2690d57..26b37a0f8762 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c @@ -150,7 +150,7 @@ static void esw_acl_egress_ofld_groups_destroy(struct mlx5_vport *vport) static bool esw_acl_egress_needed(const struct mlx5_eswitch *esw, u16 vport_num) { - return mlx5_eswitch_is_vf_vport(esw, vport_num); + return mlx5_eswitch_is_vf_vport(esw, vport_num) || mlx5_esw_is_sf_vport(esw, vport_num); } int esw_acl_egress_ofld_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 876e6449edb3..55ebd9474d97 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1365,9 +1365,15 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev) { int outlen = MLX5_ST_SZ_BYTES(query_esw_functions_out); u32 in[MLX5_ST_SZ_DW(query_esw_functions_in)] = {}; + u16 max_sf_vports; u32 *out; int err; + max_sf_vports = mlx5_sf_max_functions(dev); + /* Device interface is array of 64-bits */ + if (max_sf_vports) + outlen += DIV_ROUND_UP(max_sf_vports, BITS_PER_TYPE(__be64)) * sizeof(__be64); + out = kvzalloc(outlen, GFP_KERNEL); if (!out) return ERR_PTR(-ENOMEM); @@ -1375,7 +1381,7 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev) MLX5_SET(query_esw_functions_in, in, opcode, MLX5_CMD_OP_QUERY_ESW_FUNCTIONS); - err = mlx5_cmd_exec_inout(dev, query_esw_functions, in, out); + err = mlx5_cmd_exec(dev, in, sizeof(in), out, outlen); if (!err) return out; @@ -1898,7 +1904,8 @@ static bool is_port_function_supported(const struct mlx5_eswitch *esw, u16 vport_num) { return vport_num == MLX5_VPORT_PF || - mlx5_eswitch_is_vf_vport(esw, vport_num); + mlx5_eswitch_is_vf_vport(esw, vport_num) || + mlx5_esw_is_sf_vport(esw, vport_num); } int mlx5_devlink_port_function_hw_addr_get(struct devlink *devlink, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index cf87de94418f..4e3ed878ff03 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -43,6 +43,7 @@ #include #include "lib/mpfs.h" #include "lib/fs_chains.h" +#include "sf/sf.h" #include "en/tc_ct.h" #ifdef CONFIG_MLX5_ESWITCH @@ -499,6 +500,40 @@ static inline u16 mlx5_eswitch_first_host_vport_num(struct mlx5_core_dev *dev) MLX5_VPORT_PF : MLX5_VPORT_FIRST_VF; } +static inline int mlx5_esw_sf_start_idx(const struct mlx5_eswitch *esw) +{ + /* PF and VF vports indices start from 0 to max_vfs */ + return MLX5_VPORT_PF_PLACEHOLDER + mlx5_core_max_vfs(esw->dev); +} + +static inline int mlx5_esw_sf_end_idx(const struct mlx5_eswitch *esw) +{ + return mlx5_esw_sf_start_idx(esw) + mlx5_sf_max_functions(esw->dev); +} + +static inline int +mlx5_esw_sf_vport_num_to_index(const struct mlx5_eswitch *esw, u16 vport_num) +{ + return vport_num - mlx5_sf_start_function_id(esw->dev) + + MLX5_VPORT_PF_PLACEHOLDER + mlx5_core_max_vfs(esw->dev); +} + +static inline u16 +mlx5_esw_sf_vport_index_to_num(const struct mlx5_eswitch *esw, int idx) +{ + return mlx5_sf_start_function_id(esw->dev) + idx - + (MLX5_VPORT_PF_PLACEHOLDER + mlx5_core_max_vfs(esw->dev)); +} + +static inline bool +mlx5_esw_is_sf_vport(const struct mlx5_eswitch *esw, u16 vport_num) +{ + return mlx5_sf_supported(esw->dev) && + vport_num >= mlx5_sf_start_function_id(esw->dev) && + (vport_num < (mlx5_sf_start_function_id(esw->dev) + + mlx5_sf_max_functions(esw->dev))); +} + static inline bool mlx5_eswitch_is_funcs_handler(const struct mlx5_core_dev *dev) { return mlx5_core_is_ecpf_esw_manager(dev); @@ -527,6 +562,10 @@ static inline int mlx5_eswitch_vport_num_to_index(struct mlx5_eswitch *esw, if (vport_num == MLX5_VPORT_UPLINK) return mlx5_eswitch_uplink_idx(esw); + if (mlx5_esw_is_sf_vport(esw, vport_num)) + return mlx5_esw_sf_vport_num_to_index(esw, vport_num); + + /* PF and VF vports start from 0 to max_vfs */ return vport_num; } @@ -540,6 +579,12 @@ static inline u16 mlx5_eswitch_index_to_vport_num(struct mlx5_eswitch *esw, if (index == mlx5_eswitch_uplink_idx(esw)) return MLX5_VPORT_UPLINK; + /* SF vports indices are after VFs and before ECPF */ + if (mlx5_sf_supported(esw->dev) && + index > mlx5_core_max_vfs(esw->dev)) + return mlx5_esw_sf_vport_index_to_num(esw, index); + + /* PF and VF vports start from 0 to max_vfs */ return index; } @@ -625,6 +670,11 @@ void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw); for ((vport) = (nvfs); \ (vport) >= (esw)->first_host_vport; (vport)--) +#define mlx5_esw_for_each_sf_rep(esw, i, rep) \ + for ((i) = mlx5_esw_sf_start_idx(esw); \ + (rep) = &(esw)->offloads.vport_reps[(i)], \ + (i) < mlx5_esw_sf_end_idx(esw); (i++)) + struct mlx5_eswitch *mlx5_devlink_eswitch_get(struct devlink *devlink); struct mlx5_vport *__must_check mlx5_eswitch_get_vport(struct mlx5_eswitch *esw, u16 vport_num); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 2f6a0ae20650..2d241f7351b5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -1800,11 +1800,22 @@ static void __esw_offloads_unload_rep(struct mlx5_eswitch *esw, esw->offloads.rep_ops[rep_type]->unload(rep); } +static void __unload_reps_sf_vport(struct mlx5_eswitch *esw, u8 rep_type) +{ + struct mlx5_eswitch_rep *rep; + int i; + + mlx5_esw_for_each_sf_rep(esw, i, rep) + __esw_offloads_unload_rep(esw, rep, rep_type); +} + static void __unload_reps_all_vport(struct mlx5_eswitch *esw, u8 rep_type) { struct mlx5_eswitch_rep *rep; int i; + __unload_reps_sf_vport(esw, rep_type); + mlx5_esw_for_each_vf_rep_reverse(esw, i, rep, esw->esw_funcs.num_vfs) __esw_offloads_unload_rep(esw, rep, rep_type); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c index bdafc85fd874..ba78e0660523 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c @@ -36,6 +36,7 @@ #include #include #include "mlx5_core.h" +#include "sf/sf.h" /* Mutex to hold while enabling or disabling RoCE */ static DEFINE_MUTEX(mlx5_roce_en_lock); @@ -1160,6 +1161,6 @@ EXPORT_SYMBOL_GPL(mlx5_query_nic_system_image_guid); */ u16 mlx5_eswitch_get_total_vports(const struct mlx5_core_dev *dev) { - return MLX5_SPECIAL_VPORTS(dev) + mlx5_core_max_vfs(dev); + return MLX5_SPECIAL_VPORTS(dev) + mlx5_core_max_vfs(dev) + mlx5_sf_max_functions(dev); } EXPORT_SYMBOL_GPL(mlx5_eswitch_get_total_vports); From patchwork Fri Jan 22 19:36:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 369198 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03578C433E6 for ; Fri, 22 Jan 2021 23:01:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AAF0B23B17 for ; Fri, 22 Jan 2021 23:00:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728163AbhAVXAr (ORCPT ); Fri, 22 Jan 2021 18:00:47 -0500 Received: from mail.kernel.org ([198.145.29.99]:34182 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730850AbhAVTk3 (ORCPT ); Fri, 22 Jan 2021 14:40:29 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7808823B1C; Fri, 22 Jan 2021 19:37:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611344234; bh=A7KEFmxN3DVmc8kkZcblDlmaJu5jjoRnmTnpEgGbV8s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=neKHU+ETHevgBgzCqwfbtPaIZuRKJO5ybXfvd4xoqdPjMpHktUUMCuYMw4sstg7m+ y/cELY157ucLayvHnZUA3K4X1TqA0wWBxh7klcJ1rXSkM7ZRQerivsVwnLKzV8L+UF roQdUPgmfUK7i5umZeo5NE2TpDLC6W+Kh/vZivyvgqOk2Wjg74nEwbRhdMeLKxYno7 hlengNG5YZrbppxQyIcZ7L27KA6IxzS25jT1wdTdYh/0fjzNZ16ulDG+xtRcKp+9gN kyZY5BZ/D464tqGfv/o92AIqQvb2YuOKNu3He0YdquqxSTFQ1oUwr6KQQAz60em0/z tpnYj+7sl0Esw== From: Saeed Mahameed To: "David S. Miller" , Jakub Kicinski , Jason Gunthorpe Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, alexander.duyck@gmail.com, sridhar.samudrala@intel.com, edwin.peer@broadcom.com, dsahern@kernel.org, kiran.patil@intel.com, jacob.e.keller@intel.com, david.m.ertman@intel.com, dan.j.williams@intel.com, Parav Pandit , Vu Pham , Saeed Mahameed Subject: [net-next V10 11/14] net/mlx5: SF, Port function state change support Date: Fri, 22 Jan 2021 11:36:55 -0800 Message-Id: <20210122193658.282884-12-saeed@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210122193658.282884-1-saeed@kernel.org> References: <20210122193658.282884-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Parav Pandit Support changing the state of the SF port's function through devlink. When activating the SF port's function, enable the hca in the device followed by adding its auxiliary device. When deactivating the SF port's function, delete its auxiliary device followed by disabling the vHCA. Port function attributes get/set callbacks are invoked with devlink instance lock held. Such callbacks need to synchronize with sf port table getting disabled either via sriov sysfs callback. Such callbacks synchronize with table disable context holding table refcount. $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev $ devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached $ devlink port show ens2f0npf0sf88 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:88:88 state inactive opstate detached $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active $ devlink port show ens2f0npf0sf88 -jp { "port": { "pci/0000:06:00.0/32768": { "type": "eth", "netdev": "ens2f0npf0sf88", "flavour": "pcisf", "controller": 0, "pfnum": 0, "sfnum": 88, "external": false, "splittable": false, "function": { "hw_addr": "00:00:00:00:88:88", "state": "active", "opstate": "attached" } } } } On port function activation, an auxiliary device is created in below example. $ devlink dev show devlink dev show auxiliary/mlx5_core.sf.4 $ devlink port show auxiliary/mlx5_core.sf.4/1 auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false Signed-off-by: Parav Pandit Reviewed-by: Vu Pham Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/devlink.c | 2 + .../net/ethernet/mellanox/mlx5/core/main.c | 10 + .../net/ethernet/mellanox/mlx5/core/sf/cmd.c | 22 ++ .../ethernet/mellanox/mlx5/core/sf/devlink.c | 284 ++++++++++++++++-- .../ethernet/mellanox/mlx5/core/sf/hw_table.c | 116 ++++++- .../net/ethernet/mellanox/mlx5/core/sf/priv.h | 4 + .../net/ethernet/mellanox/mlx5/core/sf/sf.h | 20 +- 7 files changed, 431 insertions(+), 27 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index d4c0cdf5edd9..7712311e8684 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -195,6 +195,8 @@ static const struct devlink_ops mlx5_devlink_ops = { #ifdef CONFIG_MLX5_SF_MANAGER .port_new = mlx5_devlink_sf_port_new, .port_del = mlx5_devlink_sf_port_del, + .port_fn_state_get = mlx5_devlink_sf_port_fn_state_get, + .port_fn_state_set = mlx5_devlink_sf_port_fn_state_set, #endif .flash_update = mlx5_devlink_flash_update, .info_get = mlx5_devlink_info_get, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index d6bd09dd7490..604ac8bdebe0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -75,6 +75,7 @@ #include "diag/rsc_dump.h" #include "sf/vhca_event.h" #include "sf/dev/dev.h" +#include "sf/sf.h" MODULE_AUTHOR("Eli Cohen "); MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver"); @@ -1161,6 +1162,12 @@ static int mlx5_load(struct mlx5_core_dev *dev) mlx5_vhca_event_start(dev); + err = mlx5_sf_hw_table_create(dev); + if (err) { + mlx5_core_err(dev, "sf table create failed %d\n", err); + goto err_vhca; + } + err = mlx5_ec_init(dev); if (err) { mlx5_core_err(dev, "Failed to init embedded CPU\n"); @@ -1180,6 +1187,8 @@ static int mlx5_load(struct mlx5_core_dev *dev) err_sriov: mlx5_ec_cleanup(dev); err_ec: + mlx5_sf_hw_table_destroy(dev); +err_vhca: mlx5_vhca_event_stop(dev); mlx5_cleanup_fs(dev); err_fs: @@ -1209,6 +1218,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev) mlx5_sf_dev_table_destroy(dev); mlx5_sriov_detach(dev); mlx5_ec_cleanup(dev); + mlx5_sf_hw_table_destroy(dev); mlx5_vhca_event_stop(dev); mlx5_cleanup_fs(dev); mlx5_accel_ipsec_cleanup(dev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c index 0bc3075f34fa..a8d75c2f0275 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c @@ -25,3 +25,25 @@ int mlx5_cmd_dealloc_sf(struct mlx5_core_dev *dev, u16 function_id) return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); } + +int mlx5_cmd_sf_enable_hca(struct mlx5_core_dev *dev, u16 func_id) +{ + u32 out[MLX5_ST_SZ_DW(enable_hca_out)] = {}; + u32 in[MLX5_ST_SZ_DW(enable_hca_in)] = {}; + + MLX5_SET(enable_hca_in, in, opcode, MLX5_CMD_OP_ENABLE_HCA); + MLX5_SET(enable_hca_in, in, function_id, func_id); + MLX5_SET(enable_hca_in, in, embedded_cpu_function, 0); + return mlx5_cmd_exec(dev, &in, sizeof(in), &out, sizeof(out)); +} + +int mlx5_cmd_sf_disable_hca(struct mlx5_core_dev *dev, u16 func_id) +{ + u32 out[MLX5_ST_SZ_DW(disable_hca_out)] = {}; + u32 in[MLX5_ST_SZ_DW(disable_hca_in)] = {}; + + MLX5_SET(disable_hca_in, in, opcode, MLX5_CMD_OP_DISABLE_HCA); + MLX5_SET(disable_hca_in, in, function_id, func_id); + MLX5_SET(enable_hca_in, in, embedded_cpu_function, 0); + return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c index 7ad0d210ec30..c2ba41bb7a70 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c @@ -4,11 +4,17 @@ #include #include "eswitch.h" #include "priv.h" +#include "sf/dev/dev.h" +#include "mlx5_ifc_vhca_event.h" +#include "vhca_event.h" +#include "ecpf.h" struct mlx5_sf { struct devlink_port dl_port; unsigned int port_index; u16 id; + u16 hw_fn_id; + u16 hw_state; }; struct mlx5_sf_table { @@ -16,7 +22,10 @@ struct mlx5_sf_table { struct xarray port_indices; /* port index based lookup. */ refcount_t refcount; struct completion disable_complete; + struct mutex sf_state_lock; /* Serializes sf state among user cmds & vhca event handler. */ struct notifier_block esw_nb; + struct notifier_block vhca_nb; + u8 ecpu: 1; }; static struct mlx5_sf * @@ -25,6 +34,19 @@ mlx5_sf_lookup_by_index(struct mlx5_sf_table *table, unsigned int port_index) return xa_load(&table->port_indices, port_index); } +static struct mlx5_sf * +mlx5_sf_lookup_by_function_id(struct mlx5_sf_table *table, unsigned int fn_id) +{ + unsigned long index; + struct mlx5_sf *sf; + + xa_for_each(&table->port_indices, index, sf) { + if (sf->hw_fn_id == fn_id) + return sf; + } + return NULL; +} + static int mlx5_sf_id_insert(struct mlx5_sf_table *table, struct mlx5_sf *sf) { return xa_insert(&table->port_indices, sf->port_index, sf, GFP_KERNEL); @@ -59,6 +81,8 @@ mlx5_sf_alloc(struct mlx5_sf_table *table, u32 sfnum, struct netlink_ext_ack *ex hw_fn_id = mlx5_sf_sw_to_hw_id(table->dev, sf->id); dl_port_index = mlx5_esw_vport_to_devlink_port_index(table->dev, hw_fn_id); sf->port_index = dl_port_index; + sf->hw_fn_id = hw_fn_id; + sf->hw_state = MLX5_VHCA_STATE_ALLOCATED; err = mlx5_sf_id_insert(table, sf); if (err) @@ -99,6 +123,146 @@ static void mlx5_sf_table_put(struct mlx5_sf_table *table) complete(&table->disable_complete); } +static enum devlink_port_fn_state mlx5_sf_to_devlink_state(u8 hw_state) +{ + switch (hw_state) { + case MLX5_VHCA_STATE_ACTIVE: + case MLX5_VHCA_STATE_IN_USE: + case MLX5_VHCA_STATE_TEARDOWN_REQUEST: + return DEVLINK_PORT_FN_STATE_ACTIVE; + case MLX5_VHCA_STATE_INVALID: + case MLX5_VHCA_STATE_ALLOCATED: + default: + return DEVLINK_PORT_FN_STATE_INACTIVE; + } +} + +static enum devlink_port_fn_opstate mlx5_sf_to_devlink_opstate(u8 hw_state) +{ + switch (hw_state) { + case MLX5_VHCA_STATE_IN_USE: + case MLX5_VHCA_STATE_TEARDOWN_REQUEST: + return DEVLINK_PORT_FN_OPSTATE_ATTACHED; + case MLX5_VHCA_STATE_INVALID: + case MLX5_VHCA_STATE_ALLOCATED: + case MLX5_VHCA_STATE_ACTIVE: + default: + return DEVLINK_PORT_FN_OPSTATE_DETACHED; + } +} + +static bool mlx5_sf_is_active(const struct mlx5_sf *sf) +{ + return sf->hw_state == MLX5_VHCA_STATE_ACTIVE || sf->hw_state == MLX5_VHCA_STATE_IN_USE; +} + +int mlx5_devlink_sf_port_fn_state_get(struct devlink *devlink, struct devlink_port *dl_port, + enum devlink_port_fn_state *state, + enum devlink_port_fn_opstate *opstate, + struct netlink_ext_ack *extack) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + struct mlx5_sf_table *table; + struct mlx5_sf *sf; + int err = 0; + + table = mlx5_sf_table_try_get(dev); + if (!table) + return -EOPNOTSUPP; + + sf = mlx5_sf_lookup_by_index(table, dl_port->index); + if (!sf) { + err = -EOPNOTSUPP; + goto sf_err; + } + mutex_lock(&table->sf_state_lock); + *state = mlx5_sf_to_devlink_state(sf->hw_state); + *opstate = mlx5_sf_to_devlink_opstate(sf->hw_state); + mutex_unlock(&table->sf_state_lock); +sf_err: + mlx5_sf_table_put(table); + return err; +} + +static int mlx5_sf_activate(struct mlx5_core_dev *dev, struct mlx5_sf *sf) +{ + int err; + + if (mlx5_sf_is_active(sf)) + return 0; + if (sf->hw_state != MLX5_VHCA_STATE_ALLOCATED) + return -EINVAL; + + err = mlx5_cmd_sf_enable_hca(dev, sf->hw_fn_id); + if (err) + return err; + + sf->hw_state = MLX5_VHCA_STATE_ACTIVE; + return 0; +} + +static int mlx5_sf_deactivate(struct mlx5_core_dev *dev, struct mlx5_sf *sf) +{ + int err; + + if (!mlx5_sf_is_active(sf)) + return 0; + + err = mlx5_cmd_sf_disable_hca(dev, sf->hw_fn_id); + if (err) + return err; + + sf->hw_state = MLX5_VHCA_STATE_TEARDOWN_REQUEST; + return 0; +} + +static int mlx5_sf_state_set(struct mlx5_core_dev *dev, struct mlx5_sf_table *table, + struct mlx5_sf *sf, + enum devlink_port_fn_state state) +{ + int err = 0; + + mutex_lock(&table->sf_state_lock); + if (state == mlx5_sf_to_devlink_state(sf->hw_state)) + goto out; + if (state == DEVLINK_PORT_FN_STATE_ACTIVE) + err = mlx5_sf_activate(dev, sf); + else if (state == DEVLINK_PORT_FN_STATE_INACTIVE) + err = mlx5_sf_deactivate(dev, sf); + else + err = -EINVAL; +out: + mutex_unlock(&table->sf_state_lock); + return err; +} + +int mlx5_devlink_sf_port_fn_state_set(struct devlink *devlink, struct devlink_port *dl_port, + enum devlink_port_fn_state state, + struct netlink_ext_ack *extack) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + struct mlx5_sf_table *table; + struct mlx5_sf *sf; + int err; + + table = mlx5_sf_table_try_get(dev); + if (!table) { + NL_SET_ERR_MSG_MOD(extack, + "Port state set is only supported in eswitch switchdev mode or SF ports are disabled."); + return -EOPNOTSUPP; + } + sf = mlx5_sf_lookup_by_index(table, dl_port->index); + if (!sf) { + err = -ENODEV; + goto out; + } + + err = mlx5_sf_state_set(dev, table, sf, state); +out: + mlx5_sf_table_put(table); + return err; +} + static int mlx5_sf_add(struct mlx5_core_dev *dev, struct mlx5_sf_table *table, const struct devlink_port_new_attrs *new_attr, struct netlink_ext_ack *extack, @@ -125,16 +289,6 @@ static int mlx5_sf_add(struct mlx5_core_dev *dev, struct mlx5_sf_table *table, return err; } -static void mlx5_sf_del(struct mlx5_core_dev *dev, struct mlx5_sf_table *table, struct mlx5_sf *sf) -{ - struct mlx5_eswitch *esw = dev->priv.eswitch; - u16 hw_fn_id; - - hw_fn_id = mlx5_sf_sw_to_hw_id(dev, sf->id); - mlx5_esw_offloads_sf_vport_disable(esw, hw_fn_id); - mlx5_sf_free(table, sf); -} - static int mlx5_sf_new_check_attr(struct mlx5_core_dev *dev, const struct devlink_port_new_attrs *new_attr, struct netlink_ext_ack *extack) @@ -188,10 +342,30 @@ int mlx5_devlink_sf_port_new(struct devlink *devlink, return err; } +static void mlx5_sf_dealloc(struct mlx5_sf_table *table, struct mlx5_sf *sf) +{ + if (sf->hw_state == MLX5_VHCA_STATE_ALLOCATED) { + mlx5_sf_free(table, sf); + } else if (mlx5_sf_is_active(sf)) { + /* Even if its active, it is treated as in_use because by the time, + * it is disabled here, it may getting used. So it is safe to + * always look for the event to ensure that it is recycled only after + * firmware gives confirmation that it is detached by the driver. + */ + mlx5_cmd_sf_disable_hca(table->dev, sf->hw_fn_id); + mlx5_sf_hw_table_sf_deferred_free(table->dev, sf->id); + kfree(sf); + } else { + mlx5_sf_hw_table_sf_deferred_free(table->dev, sf->id); + kfree(sf); + } +} + int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index, struct netlink_ext_ack *extack) { struct mlx5_core_dev *dev = devlink_priv(devlink); + struct mlx5_eswitch *esw = dev->priv.eswitch; struct mlx5_sf_table *table; struct mlx5_sf *sf; int err = 0; @@ -208,20 +382,58 @@ int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index, goto sf_err; } - mlx5_sf_del(dev, table, sf); + mlx5_esw_offloads_sf_vport_disable(esw, sf->hw_fn_id); + mlx5_sf_id_erase(table, sf); + + mutex_lock(&table->sf_state_lock); + mlx5_sf_dealloc(table, sf); + mutex_unlock(&table->sf_state_lock); sf_err: mlx5_sf_table_put(table); return err; } -static void mlx5_sf_destroy_all(struct mlx5_sf_table *table) +static bool mlx5_sf_state_update_check(const struct mlx5_sf *sf, u8 new_state) { - struct mlx5_core_dev *dev = table->dev; - unsigned long index; + if (sf->hw_state == MLX5_VHCA_STATE_ACTIVE && new_state == MLX5_VHCA_STATE_IN_USE) + return true; + + if (sf->hw_state == MLX5_VHCA_STATE_IN_USE && new_state == MLX5_VHCA_STATE_ACTIVE) + return true; + + if (sf->hw_state == MLX5_VHCA_STATE_TEARDOWN_REQUEST && + new_state == MLX5_VHCA_STATE_ALLOCATED) + return true; + + return false; +} + +static int mlx5_sf_vhca_event(struct notifier_block *nb, unsigned long opcode, void *data) +{ + struct mlx5_sf_table *table = container_of(nb, struct mlx5_sf_table, vhca_nb); + const struct mlx5_vhca_state_event *event = data; + bool update = false; struct mlx5_sf *sf; - xa_for_each(&table->port_indices, index, sf) - mlx5_sf_del(dev, table, sf); + table = mlx5_sf_table_try_get(table->dev); + if (!table) + return 0; + + mutex_lock(&table->sf_state_lock); + sf = mlx5_sf_lookup_by_function_id(table, event->function_id); + if (!sf) + goto sf_err; + + /* When driver is attached or detached to a function, an event + * notifies such state change. + */ + update = mlx5_sf_state_update_check(sf, event->new_vhca_state); + if (update) + sf->hw_state = event->new_vhca_state; +sf_err: + mutex_unlock(&table->sf_state_lock); + mlx5_sf_table_put(table); + return 0; } static void mlx5_sf_table_enable(struct mlx5_sf_table *table) @@ -233,6 +445,22 @@ static void mlx5_sf_table_enable(struct mlx5_sf_table *table) refcount_set(&table->refcount, 1); } +static void mlx5_sf_deactivate_all(struct mlx5_sf_table *table) +{ + struct mlx5_eswitch *esw = table->dev->priv.eswitch; + unsigned long index; + struct mlx5_sf *sf; + + /* At this point, no new user commands can start and no vhca event can + * arrive. It is safe to destroy all user created SFs. + */ + xa_for_each(&table->port_indices, index, sf) { + mlx5_esw_offloads_sf_vport_disable(esw, sf->hw_fn_id); + mlx5_sf_id_erase(table, sf); + mlx5_sf_dealloc(table, sf); + } +} + static void mlx5_sf_table_disable(struct mlx5_sf_table *table) { if (!mlx5_sf_max_functions(table->dev)) @@ -241,14 +469,13 @@ static void mlx5_sf_table_disable(struct mlx5_sf_table *table) if (!refcount_read(&table->refcount)) return; - /* Balances with refcount_set; drop the reference so that new user cmd cannot start. */ + /* Balances with refcount_set; drop the reference so that new user cmd cannot start + * and new vhca event handler cannnot run. + */ mlx5_sf_table_put(table); wait_for_completion(&table->disable_complete); - /* At this point, no new user commands can start. - * It is safe to destroy all user created SFs. - */ - mlx5_sf_destroy_all(table); + mlx5_sf_deactivate_all(table); } static int mlx5_sf_esw_event(struct notifier_block *nb, unsigned long event, void *data) @@ -280,23 +507,34 @@ int mlx5_sf_table_init(struct mlx5_core_dev *dev) struct mlx5_sf_table *table; int err; - if (!mlx5_sf_table_supported(dev)) + if (!mlx5_sf_table_supported(dev) || !mlx5_vhca_event_supported(dev)) return 0; table = kzalloc(sizeof(*table), GFP_KERNEL); if (!table) return -ENOMEM; + mutex_init(&table->sf_state_lock); table->dev = dev; xa_init(&table->port_indices); dev->priv.sf_table = table; + refcount_set(&table->refcount, 0); table->esw_nb.notifier_call = mlx5_sf_esw_event; err = mlx5_esw_event_notifier_register(dev->priv.eswitch, &table->esw_nb); if (err) goto reg_err; + + table->vhca_nb.notifier_call = mlx5_sf_vhca_event; + err = mlx5_vhca_event_notifier_register(table->dev, &table->vhca_nb); + if (err) + goto vhca_err; + return 0; +vhca_err: + mlx5_esw_event_notifier_unregister(dev->priv.eswitch, &table->esw_nb); reg_err: + mutex_destroy(&table->sf_state_lock); kfree(table); dev->priv.sf_table = NULL; return err; @@ -309,8 +547,10 @@ void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev) if (!table) return; + mlx5_vhca_event_notifier_unregister(table->dev, &table->vhca_nb); mlx5_esw_event_notifier_unregister(dev->priv.eswitch, &table->esw_nb); WARN_ON(refcount_read(&table->refcount)); + mutex_destroy(&table->sf_state_lock); WARN_ON(!xa_empty(&table->port_indices)); kfree(table); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c index c7757f399e8a..58b6be0b03d7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c @@ -4,11 +4,14 @@ #include "vhca_event.h" #include "priv.h" #include "sf.h" +#include "mlx5_ifc_vhca_event.h" +#include "vhca_event.h" #include "ecpf.h" struct mlx5_sf_hw { u32 usr_sfnum; u8 allocated: 1; + u8 pending_delete: 1; }; struct mlx5_sf_hw_table { @@ -16,6 +19,8 @@ struct mlx5_sf_hw_table { struct mlx5_sf_hw *sfs; int max_local_functions; u8 ecpu: 1; + struct mutex table_lock; /* Serializes sf deletion and vhca state change handler. */ + struct notifier_block vhca_nb; }; u16 mlx5_sf_sw_to_hw_id(const struct mlx5_core_dev *dev, u16 sw_id) @@ -23,6 +28,11 @@ u16 mlx5_sf_sw_to_hw_id(const struct mlx5_core_dev *dev, u16 sw_id) return sw_id + mlx5_sf_start_function_id(dev); } +static u16 mlx5_sf_hw_to_sw_id(const struct mlx5_core_dev *dev, u16 hw_id) +{ + return hw_id - mlx5_sf_start_function_id(dev); +} + int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum) { struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table; @@ -34,10 +44,13 @@ int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum) if (!table->max_local_functions) return -EOPNOTSUPP; + mutex_lock(&table->table_lock); /* Check if sf with same sfnum already exists or not. */ for (i = 0; i < table->max_local_functions; i++) { - if (table->sfs[i].allocated && table->sfs[i].usr_sfnum == usr_sfnum) - return -EEXIST; + if (table->sfs[i].allocated && table->sfs[i].usr_sfnum == usr_sfnum) { + err = -EEXIST; + goto exist_err; + } } /* Find the free entry and allocate the entry from the array */ @@ -63,16 +76,19 @@ int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum) if (err) goto vhca_err; + mutex_unlock(&table->table_lock); return sw_id; vhca_err: mlx5_cmd_dealloc_sf(table->dev, hw_fn_id); err: table->sfs[i].allocated = false; +exist_err: + mutex_unlock(&table->table_lock); return err; } -void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id) +static void _mlx5_sf_hw_id_free(struct mlx5_core_dev *dev, u16 id) { struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table; u16 hw_fn_id; @@ -80,6 +96,50 @@ void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id) hw_fn_id = mlx5_sf_sw_to_hw_id(table->dev, id); mlx5_cmd_dealloc_sf(table->dev, hw_fn_id); table->sfs[id].allocated = false; + table->sfs[id].pending_delete = false; +} + +void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id) +{ + struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table; + + mutex_lock(&table->table_lock); + _mlx5_sf_hw_id_free(dev, id); + mutex_unlock(&table->table_lock); +} + +void mlx5_sf_hw_table_sf_deferred_free(struct mlx5_core_dev *dev, u16 id) +{ + struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table; + u32 out[MLX5_ST_SZ_DW(query_vhca_state_out)] = {}; + u16 hw_fn_id; + u8 state; + int err; + + hw_fn_id = mlx5_sf_sw_to_hw_id(dev, id); + mutex_lock(&table->table_lock); + err = mlx5_cmd_query_vhca_state(dev, hw_fn_id, table->ecpu, out, sizeof(out)); + if (err) + goto err; + state = MLX5_GET(query_vhca_state_out, out, vhca_state_context.vhca_state); + if (state == MLX5_VHCA_STATE_ALLOCATED) { + mlx5_cmd_dealloc_sf(table->dev, hw_fn_id); + table->sfs[id].allocated = false; + } else { + table->sfs[id].pending_delete = true; + } +err: + mutex_unlock(&table->table_lock); +} + +static void mlx5_sf_hw_dealloc_all(struct mlx5_sf_hw_table *table) +{ + int i; + + for (i = 0; i < table->max_local_functions; i++) { + if (table->sfs[i].allocated) + _mlx5_sf_hw_id_free(table->dev, i); + } } int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev) @@ -88,7 +148,7 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev) struct mlx5_sf_hw *sfs; int max_functions; - if (!mlx5_sf_supported(dev)) + if (!mlx5_sf_supported(dev) || !mlx5_vhca_event_supported(dev)) return 0; max_functions = mlx5_sf_max_functions(dev); @@ -100,6 +160,7 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev) if (!sfs) goto table_err; + mutex_init(&table->table_lock); table->dev = dev; table->sfs = sfs; table->max_local_functions = max_functions; @@ -120,6 +181,53 @@ void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev) if (!table) return; + mutex_destroy(&table->table_lock); kfree(table->sfs); kfree(table); } + +static int mlx5_sf_hw_vhca_event(struct notifier_block *nb, unsigned long opcode, void *data) +{ + struct mlx5_sf_hw_table *table = container_of(nb, struct mlx5_sf_hw_table, vhca_nb); + const struct mlx5_vhca_state_event *event = data; + struct mlx5_sf_hw *sf_hw; + u16 sw_id; + + if (event->new_vhca_state != MLX5_VHCA_STATE_ALLOCATED) + return 0; + + sw_id = mlx5_sf_hw_to_sw_id(table->dev, event->function_id); + sf_hw = &table->sfs[sw_id]; + + mutex_lock(&table->table_lock); + /* SF driver notified through firmware that SF is finally detached. + * Hence recycle the sf hardware id for reuse. + */ + if (sf_hw->allocated && sf_hw->pending_delete) + _mlx5_sf_hw_id_free(table->dev, sw_id); + mutex_unlock(&table->table_lock); + return 0; +} + +int mlx5_sf_hw_table_create(struct mlx5_core_dev *dev) +{ + struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table; + + if (!table) + return 0; + + table->vhca_nb.notifier_call = mlx5_sf_hw_vhca_event; + return mlx5_vhca_event_notifier_register(table->dev, &table->vhca_nb); +} + +void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev) +{ + struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table; + + if (!table) + return; + + mlx5_vhca_event_notifier_unregister(table->dev, &table->vhca_nb); + /* Dealloc SFs whose firmware event has been missed. */ + mlx5_sf_hw_dealloc_all(table); +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h index 7f3622375a9c..cb02a51d0986 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h @@ -9,9 +9,13 @@ int mlx5_cmd_alloc_sf(struct mlx5_core_dev *dev, u16 function_id); int mlx5_cmd_dealloc_sf(struct mlx5_core_dev *dev, u16 function_id); +int mlx5_cmd_sf_enable_hca(struct mlx5_core_dev *dev, u16 func_id); +int mlx5_cmd_sf_disable_hca(struct mlx5_core_dev *dev, u16 func_id); + u16 mlx5_sf_sw_to_hw_id(const struct mlx5_core_dev *dev, u16 sw_id); int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum); void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id); +void mlx5_sf_hw_table_sf_deferred_free(struct mlx5_core_dev *dev, u16 id); #endif diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h index 31278dc42e72..0b6aea1e6a94 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h @@ -47,6 +47,9 @@ static inline u16 mlx5_sf_max_functions(const struct mlx5_core_dev *dev) int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev); void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev); +int mlx5_sf_hw_table_create(struct mlx5_core_dev *dev); +void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev); + int mlx5_sf_table_init(struct mlx5_core_dev *dev); void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev); @@ -56,7 +59,13 @@ int mlx5_devlink_sf_port_new(struct devlink *devlink, unsigned int *new_port_index); int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index, struct netlink_ext_ack *extack); - +int mlx5_devlink_sf_port_fn_state_get(struct devlink *devlink, struct devlink_port *dl_port, + enum devlink_port_fn_state *state, + enum devlink_port_fn_opstate *opstate, + struct netlink_ext_ack *extack); +int mlx5_devlink_sf_port_fn_state_set(struct devlink *devlink, struct devlink_port *dl_port, + enum devlink_port_fn_state state, + struct netlink_ext_ack *extack); #else static inline int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev) @@ -68,6 +77,15 @@ static inline void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev) { } +static inline int mlx5_sf_hw_table_create(struct mlx5_core_dev *dev) +{ + return 0; +} + +static inline void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev) +{ +} + static inline int mlx5_sf_table_init(struct mlx5_core_dev *dev) { return 0; From patchwork Fri Jan 22 19:36:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 369197 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 758DFC433DB for ; Fri, 22 Jan 2021 23:01:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3D7D223AF8 for ; Fri, 22 Jan 2021 23:01:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728236AbhAVXBo (ORCPT ); Fri, 22 Jan 2021 18:01:44 -0500 Received: from mail.kernel.org ([198.145.29.99]:34190 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729545AbhAVTk2 (ORCPT ); Fri, 22 Jan 2021 14:40:28 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6055523B00; Fri, 22 Jan 2021 19:37:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611344235; bh=scLCCxJVoGOvT2w5RbwYecYIAeA90dNW03bxA9MWhWA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gElx5nGM0iDqJ2JKsdZTLLSZ5rKJ3RzcyP8AWcLoT7GC+iL5KWlgU27+eYx76JGdy lrF2fNulH2EwAG8BOAwiUA6U7WPQ5SPbhUpnrC+YDlQmJzlwWay5A+aum+ePRa42Tk U8e9bi7X7vNjynI1NnN4YYF/4ZIj7LXwRa2TotYZA9e/sIcXICND/kxDCCGhVFELP4 PTOuH46UGr5mSdDuBJQX4OBhJeiA3PVY/XPjiiKJwqm8iOXkkSi3sv6td7vl5TbtEh BXbLaQ0Qh6riHrzXOpjeFj/eDmqI5oP0mR/8zERq5q9r49SBkEK4/XLyCgw7kurUpw Fnh8uGzzIF3WA== From: Saeed Mahameed To: "David S. Miller" , Jakub Kicinski , Jason Gunthorpe Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, alexander.duyck@gmail.com, sridhar.samudrala@intel.com, edwin.peer@broadcom.com, dsahern@kernel.org, kiran.patil@intel.com, jacob.e.keller@intel.com, david.m.ertman@intel.com, dan.j.williams@intel.com, Parav Pandit , Jiri Pirko , Saeed Mahameed Subject: [net-next V10 12/14] devlink: Add devlink port documentation Date: Fri, 22 Jan 2021 11:36:56 -0800 Message-Id: <20210122193658.282884-13-saeed@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210122193658.282884-1-saeed@kernel.org> References: <20210122193658.282884-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Parav Pandit Added documentation for devlink port and port function related commands. Signed-off-by: Parav Pandit Reviewed-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Saeed Mahameed --- .../networking/devlink/devlink-port.rst | 118 ++++++++++++++++++ Documentation/networking/devlink/index.rst | 1 + 2 files changed, 119 insertions(+) create mode 100644 Documentation/networking/devlink/devlink-port.rst diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst new file mode 100644 index 000000000000..c564b557e757 --- /dev/null +++ b/Documentation/networking/devlink/devlink-port.rst @@ -0,0 +1,118 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _devlink_port: + +============ +Devlink Port +============ + +``devlink-port`` is a port that exists on the device. It has a logically +separate ingress/egress point of the device. A devlink port can be any one +of many flavours. A devlink port flavour along with port attributes +describe what a port represents. + +A device driver that intends to publish a devlink port sets the +devlink port attributes and registers the devlink port. + +Devlink port flavours are described below. + +.. list-table:: List of devlink port flavours + :widths: 33 90 + + * - Flavour + - Description + * - ``DEVLINK_PORT_FLAVOUR_PHYSICAL`` + - Any kind of physical port. This can be an eswitch physical port or any + other physical port on the device. + * - ``DEVLINK_PORT_FLAVOUR_DSA`` + - This indicates a DSA interconnect port. + * - ``DEVLINK_PORT_FLAVOUR_CPU`` + - This indicates a CPU port applicable only to DSA. + * - ``DEVLINK_PORT_FLAVOUR_PCI_PF`` + - This indicates an eswitch port representing a port of PCI + physical function (PF). + * - ``DEVLINK_PORT_FLAVOUR_PCI_VF`` + - This indicates an eswitch port representing a port of PCI + virtual function (VF). + * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL`` + - This indicates a virtual port for the PCI virtual function. + +Devlink port can have a different type based on the link layer described below. + +.. list-table:: List of devlink port types + :widths: 23 90 + + * - Type + - Description + * - ``DEVLINK_PORT_TYPE_ETH`` + - Driver should set this port type when a link layer of the port is + Ethernet. + * - ``DEVLINK_PORT_TYPE_IB`` + - Driver should set this port type when a link layer of the port is + InfiniBand. + * - ``DEVLINK_PORT_TYPE_AUTO`` + - This type is indicated by the user when driver should detect the port + type automatically. + +PCI controllers +--------------- +In most cases a PCI device has only one controller. A controller consists of +potentially multiple physical and virtual functions. A function consists +of one or more ports. This port is represented by the devlink eswitch port. + +A PCI device connected to multiple CPUs or multiple PCI root complexes or a +SmartNIC, however, may have multiple controllers. For a device with multiple +controllers, each controller is distinguished by a unique controller number. +An eswitch is on the PCI device which supports ports of multiple controllers. + +An example view of a system with two controllers:: + + --------------------------------------------------------- + | | + | --------- --------- ------- ------- | + ----------- | | vf(s) | | sf(s) | |vf(s)| |sf(s)| | + | server | | ------- ----/---- ---/----- ------- ---/--- ---/--- | + | pci rc |=== | pf0 |______/________/ | pf1 |___/_______/ | + | connect | | ------- ------- | + ----------- | | controller_num=1 (no eswitch) | + ------|-------------------------------------------------- + (internal wire) + | + --------------------------------------------------------- + | devlink eswitch ports and reps | + | ----------------------------------------------------- | + | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | | + | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | | + | ----------------------------------------------------- | + | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | | + | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | | + | ----------------------------------------------------- | + | | + | | + ----------- | --------- --------- ------- ------- | + | smartNIC| | | vf(s) | | sf(s) | |vf(s)| |sf(s)| | + | pci rc |==| ------- ----/---- ---/----- ------- ---/--- ---/--- | + | connect | | | pf0 |______/________/ | pf1 |___/_______/ | + ----------- | ------- ------- | + | | + | local controller_num=0 (eswitch) | + --------------------------------------------------------- + +In the above example, the external controller (identified by controller number = 1) +doesn't have the eswitch. Local controller (identified by controller number = 0) +has the eswitch. The Devlink instance on the local controller has eswitch +devlink ports for both the controllers. + +Function configuration +====================== + +A user can configure the function attribute before enumerating the PCI +function. Usually it means, user should configure function attribute +before a bus specific device for the function is created. However, when +SRIOV is enabled, virtual function devices are created on the PCI bus. +Hence, function attribute should be configured before binding virtual +function device to the driver. + +A user may set the hardware address of the function using +'devlink port function set hw_addr' command. For Ethernet port function +this means a MAC address. diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst index d82874760ae2..aab79667f97b 100644 --- a/Documentation/networking/devlink/index.rst +++ b/Documentation/networking/devlink/index.rst @@ -18,6 +18,7 @@ general. devlink-info devlink-flash devlink-params + devlink-port devlink-region devlink-resource devlink-reload From patchwork Fri Jan 22 19:36:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 369199 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 273ABC433E0 for ; Fri, 22 Jan 2021 23:00:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDC6723AC8 for ; Fri, 22 Jan 2021 23:00:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728520AbhAVW77 (ORCPT ); Fri, 22 Jan 2021 17:59:59 -0500 Received: from mail.kernel.org ([198.145.29.99]:34192 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730851AbhAVTkj (ORCPT ); Fri, 22 Jan 2021 14:40:39 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 37BC023B04; Fri, 22 Jan 2021 19:37:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611344235; bh=mzbkx4nxlpJy3TS+rp4LjZaF+nfej0USFd0t/5xR6g0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ImBNKnYwRpbqDucPS2fGMBDhRdceHcjDrfUwvTNG4/zXmiOJuY2aOXS4p6VwCXVHN WjYjFVfQxaTN9EhTguaFxus4aB6GXvkOsHfG33LEMb4zKKHst9gh+rO3tSndfua3qm oS2pUGswGUL7a/UBoA8H7faXLwkG6EvEIq7oc6IuIdBggmXVZD3ZjmG4JGkSngTrVt 3yPECH5UT296RIX/nW/hF+2Z3d4C8AN/Eum9qp0YY5fbQZ/mhghnBuwCyilOlRttZV Oj8919SNrjP6wVnnB4Mq6JVTmACtxVnHUR3mU5yb+OiGy6tUzc1AV9wmF59iYmxJXP k67Rks+F6uXlQ== From: Saeed Mahameed To: "David S. Miller" , Jakub Kicinski , Jason Gunthorpe Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, alexander.duyck@gmail.com, sridhar.samudrala@intel.com, edwin.peer@broadcom.com, dsahern@kernel.org, kiran.patil@intel.com, jacob.e.keller@intel.com, david.m.ertman@intel.com, dan.j.williams@intel.com, Parav Pandit , Saeed Mahameed Subject: [net-next V10 13/14] devlink: Extend devlink port documentation for subfunctions Date: Fri, 22 Jan 2021 11:36:57 -0800 Message-Id: <20210122193658.282884-14-saeed@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210122193658.282884-1-saeed@kernel.org> References: <20210122193658.282884-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Parav Pandit Add devlink port documentation for subfunction management. Signed-off-by: Parav Pandit Signed-off-by: Saeed Mahameed --- Documentation/driver-api/auxiliary_bus.rst | 2 + .../networking/devlink/devlink-port.rst | 87 ++++++++++++++++++- 2 files changed, 86 insertions(+), 3 deletions(-) diff --git a/Documentation/driver-api/auxiliary_bus.rst b/Documentation/driver-api/auxiliary_bus.rst index 2312506b0674..fff96c7ba7a8 100644 --- a/Documentation/driver-api/auxiliary_bus.rst +++ b/Documentation/driver-api/auxiliary_bus.rst @@ -1,5 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0-only +.. _auxiliary_bus: + ============= Auxiliary Bus ============= diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst index c564b557e757..e99b41599465 100644 --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -34,6 +34,9 @@ Devlink port flavours are described below. * - ``DEVLINK_PORT_FLAVOUR_PCI_VF`` - This indicates an eswitch port representing a port of PCI virtual function (VF). + * - ``DEVLINK_PORT_FLAVOUR_PCI_SF`` + - This indicates an eswitch port representing a port of PCI + subfunction (SF). * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL`` - This indicates a virtual port for the PCI virtual function. @@ -57,8 +60,9 @@ Devlink port can have a different type based on the link layer described below. PCI controllers --------------- In most cases a PCI device has only one controller. A controller consists of -potentially multiple physical and virtual functions. A function consists -of one or more ports. This port is represented by the devlink eswitch port. +potentially multiple physical, virtual functions and subfunctions. A function +consists of one or more ports. This port is represented by the devlink eswitch +port. A PCI device connected to multiple CPUs or multiple PCI root complexes or a SmartNIC, however, may have multiple controllers. For a device with multiple @@ -111,8 +115,85 @@ function. Usually it means, user should configure function attribute before a bus specific device for the function is created. However, when SRIOV is enabled, virtual function devices are created on the PCI bus. Hence, function attribute should be configured before binding virtual -function device to the driver. +function device to the driver. For subfunctions, this means user should +configure port function attribute before activating the port function. A user may set the hardware address of the function using 'devlink port function set hw_addr' command. For Ethernet port function this means a MAC address. + +Subfunction +============ + +Subfunction is a lightweight function that has a parent PCI function on which +it is deployed. Subfunction is created and deployed in unit of 1. Unlike +SRIOV VFs, a subfunction doesn't require its own PCI virtual function. +A subfunction communicates with the hardware through the parent PCI function. + +To use a subfunction, 3 steps setup sequence is followed. +(1) create - create a subfunction; +(2) configure - configure subfunction attributes; +(3) deploy - deploy the subfunction; + +Subfunction management is done using devlink port user interface. +User performs setup on the subfunction management device. + +(1) Create +---------- +A subfunction is created using a devlink port interface. A user adds the +subfunction by adding a devlink port of subfunction flavour. The devlink +kernel code calls down to subfunction management driver (devlink ops) and asks +it to create a subfunction devlink port. Driver then instantiates the +subfunction port and any associated objects such as health reporters and +representor netdevice. + +(2) Configure +------------- +A subfunction devlink port is created but it is not active yet. That means the +entities are created on devlink side, the e-switch port representor is created, +but the subfunction device itself it not created. A user might use e-switch port +representor to do settings, putting it into bridge, adding TC rules, etc. A user +might as well configure the hardware address (such as MAC address) of the +subfunction while subfunction is inactive. + +(3) Deploy +---------- +Once a subfunction is configured, user must activate it to use it. Upon +activation, subfunction management driver asks the subfunction management +device to instantiate the subfunction device on particular PCI function. +A subfunction device is created on the :ref:`Documentation/driver-api/auxiliary_bus.rst `. +At this point a matching subfunction driver binds to the subfunction's auxiliary device. + +Terms and Definitions +===================== + +.. list-table:: Terms and Definitions + :widths: 22 90 + + * - Term + - Definitions + * - ``PCI device`` + - A physical PCI device having one or more PCI bus consists of one or + more PCI controllers. + * - ``PCI controller`` + - A controller consists of potentially multiple physical functions, + virtual functions and subfunctions. + * - ``Port function`` + - An object to manage the function of a port. + * - ``Subfunction`` + - A lightweight function that has parent PCI function on which it is + deployed. + * - ``Subfunction device`` + - A bus device of the subfunction, usually on a auxiliary bus. + * - ``Subfunction driver`` + - A device driver for the subfunction auxiliary device. + * - ``Subfunction management device`` + - A PCI physical function that supports subfunction management. + * - ``Subfunction management driver`` + - A device driver for PCI physical function that supports + subfunction management using devlink port interface. + * - ``Subfunction host driver`` + - A device driver for PCI physical function that hosts subfunction + devices. In most cases it is same as subfunction management driver. When + subfunction is used on external controller, subfunction management and + host drivers are different. From patchwork Fri Jan 22 19:36:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Saeed Mahameed X-Patchwork-Id: 369195 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4A46C433DB for ; Fri, 22 Jan 2021 23:08:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7031B23B52 for ; Fri, 22 Jan 2021 23:08:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728525AbhAVXH7 (ORCPT ); Fri, 22 Jan 2021 18:07:59 -0500 Received: from mail.kernel.org ([198.145.29.99]:32868 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728588AbhAVTkP (ORCPT ); Fri, 22 Jan 2021 14:40:15 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0735423B23; Fri, 22 Jan 2021 19:37:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1611344236; bh=FZUTy6l6DtkXWDUvMlBb593cpHNCaucfYeLH7pkK/KA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fsF6X1M+9F5PwFCAVuG0qZfVyDjeSLece/Bw5/Ddci9MUfbDE8NSvu297+Wes9N5B 0Ttu4ceGMbHEVpoS/ID1plD72R6+2XTRhhaX+JfiU6eeceXpv9FBrPicsuZKKykH1+ +d8pJRGFVBo2xezgKZ3MPhhMUCpRxujOMMUX42iTrQ7OxYBxWNSOFZQeh+wA68Ewn9 Y93q4YTRMnP0djJ8HUlAhq4NI+hMDWyza0rBCZM36GPegwpeGo2zgEa5vnFvSOrHE7 7yCws4G4QN1IhnkgCrc0nVfreK4jos1tgIdbqbIgAhNfhlx8H6E1vcc+iq95vK/kTc oXDcMbIdt8qKA== From: Saeed Mahameed To: "David S. Miller" , Jakub Kicinski , Jason Gunthorpe Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, alexander.duyck@gmail.com, sridhar.samudrala@intel.com, edwin.peer@broadcom.com, dsahern@kernel.org, kiran.patil@intel.com, jacob.e.keller@intel.com, david.m.ertman@intel.com, dan.j.williams@intel.com, Parav Pandit , Saeed Mahameed Subject: [net-next V10 14/14] net/mlx5: Add devlink subfunction port documentation Date: Fri, 22 Jan 2021 11:36:58 -0800 Message-Id: <20210122193658.282884-15-saeed@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210122193658.282884-1-saeed@kernel.org> References: <20210122193658.282884-1-saeed@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Parav Pandit Add documentation for subfunction management using devlink port. Signed-off-by: Parav Pandit Signed-off-by: Saeed Mahameed --- .../device_drivers/ethernet/mellanox/mlx5.rst | 210 ++++++++++++++++++ 1 file changed, 210 insertions(+) diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst index a5eb22793bb9..a1b32fcd0d76 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst @@ -12,6 +12,8 @@ Contents - `Enabling the driver and kconfig options`_ - `Devlink info`_ - `Devlink parameters`_ +- `mlx5 subfunction`_ +- `mlx5 port function`_ - `Devlink health reporters`_ - `mlx5 tracepoints`_ @@ -181,6 +183,214 @@ User command examples: values: cmode driverinit value true +mlx5 subfunction +================ +mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst `) interface. + +A Subfunction has its own function capabilities and its own resources. This +means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These +queues are neither shared nor stolen from the parent PCI function. + +When a subfunction is RDMA capable, it has its own QP1, GID table and rdma +resources neither shared nor stolen from the parent PCI function. + +A subfunction has a dedicated window in PCI BAR space that is not shared +with ther other subfunctions or the parent PCI function. This ensures that all +devices (netdev, rdma, vdpa etc.) of the subfunction accesses only assigned +PCI BAR space. + +A Subfunction supports eswitch representation through which it supports tc +offloads. The user configures eswitch to send/receive packets from/to +the subfunction port. + +Subfunctions share PCI level resources such as PCI MSI-X IRQs with +other subfunctions and/or with its parent PCI function. + +Example mlx5 software, system and device view:: + + _______ + | admin | + | user |---------- + |_______| | + | | + ____|____ __|______ _________________ + | | | | | | + | devlink | | tc tool | | user | + | tool | |_________| | applications | + |_________| | |_________________| + | | | | + | | | | Userspace + +---------|-------------|-------------------|----------|--------------------+ + | | +----------+ +----------+ Kernel + | | | netdev | | rdma dev | + | | +----------+ +----------+ + (devlink port add/del | ^ ^ + port function set) | | | + | | +---------------| + _____|___ | | _______|_______ + | | | | | mlx5 class | + | devlink | +------------+ | | drivers | + | kernel | | rep netdev | | |(mlx5_core,ib) | + |_________| +------------+ | |_______________| + | | | ^ + (devlink ops) | | (probe/remove) + _________|________ | | ____|________ + | subfunction | | +---------------+ | subfunction | + | management driver|----- | subfunction |---| driver | + | (mlx5_core) | | auxiliary dev | | (mlx5_core) | + |__________________| +---------------+ |_____________| + | ^ + (sf add/del, vhca events) | + | (device add/del) + _____|____ ____|________ + | | | subfunction | + | PCI NIC |---- activate/deactive events---->| host driver | + |__________| | (mlx5_core) | + |_____________| + +Subfunction is created using devlink port interface. + +- Change device to switchdev mode:: + + $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev + +- Add a devlink port of subfunction flaovur:: + + $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 + pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false + function: + hw_addr 00:00:00:00:00:00 state inactive opstate detached + +- Show a devlink port of the subfunction:: + + $ devlink port show pci/0000:06:00.0/32768 + pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 + function: + hw_addr 00:00:00:00:00:00 state inactive opstate detached + +- Delete a devlink port of subfunction after use:: + + $ devlink port del pci/0000:06:00.0/32768 + +mlx5 function attributes +======================== +The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in +a unified way for SmartNIC and non-SmartNIC. + +This is supported only when the eswitch mode is set to switchdev. Port function +configuration of the PCI VF/SF is supported through devlink eswitch port. + +Port function attributes should be set before PCI VF/SF is enumerated by the +driver. + +MAC address setup +----------------- +mlx5 driver provides mechanism to setup the MAC address of the PCI VF/SF. + +The configured MAC address of the PCI VF/SF will be used by netdevice and rdma +device created for the PCI VF/SF. + +- Get the MAC address of the VF identified by its unique devlink port index:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 + +- Set the MAC address of the VF identified by its unique devlink port index:: + + $ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55 + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:11:22:33:44:55 + +- Get the MAC address of the SF identified by its unique devlink port index:: + + $ devlink port show pci/0000:06:00.0/32768 + pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 + function: + hw_addr 00:00:00:00:00:00 + +- Set the MAC address of the VF identified by its unique devlink port index:: + + $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 + + $ devlink port show pci/0000:06:00.0/32768 + pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcivf pfnum 0 sfnum 88 + function: + hw_addr 00:00:00:00:88:88 + +SF state setup +-------------- +To use the SF, the user must active the SF using the SF function state +attribute. + +- Get the state of the SF identified by its unique devlink port index:: + + $ devlink port show ens2f0npf0sf88 + pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false + function: + hw_addr 00:00:00:00:88:88 state inactive opstate detached + +- Activate the function and verify its state is active:: + + $ devlink port function set ens2f0npf0sf88 state active + + $ devlink port show ens2f0npf0sf88 + pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false + function: + hw_addr 00:00:00:00:88:88 state active opstate detached + +Upon function activation, the PF driver instance gets the event from the device +that a particular SF was activated. It's the cue to put the device on bus, probe +it and instantiate the devlink instance and class specific auxiliary devices +for it. + +- Show the auxiliary device and port of the subfunction:: + + $ devlink dev show + devlink dev show auxiliary/mlx5_core.sf.4 + + $ devlink port show auxiliary/mlx5_core.sf.4/1 + auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false + + $ rdma link show mlx5_0/1 + link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88 + + $ rdma dev show + 8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112 + 13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112 + +- Subfunction auxiliary device and class device hierarchy:: + + mlx5_core.sf.4 + (subfunction auxiliary device) + /\ + / \ + / \ + / \ + / \ + mlx5_core.eth.4 mlx5_core.rdma.4 + (sf eth aux dev) (sf rdma aux dev) + | | + | | + p0sf88 mlx5_0 + (sf netdev) (sf rdma device) + +Additionally, the SF port also gets the event when the driver attaches to the +auxiliary device of the subfunction. This results in changing the operational +state of the function. This provides visiblity to the user to decide when is it +safe to delete the SF port for graceful termination of the subfunction. + +- Show the SF port operational state:: + + $ devlink port show ens2f0npf0sf88 + pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false + function: + hw_addr 00:00:00:00:88:88 state active opstate attached + Devlink health reporters ========================