mbox series

[mlx5-next,v2,0/5] Dynamically assign MSI-X vectors count

Message ID 20210114103140.866141-1-leon@kernel.org
Headers show
Series Dynamically assign MSI-X vectors count | expand

Message

Leon Romanovsky Jan. 14, 2021, 10:31 a.m. UTC
From: Leon Romanovsky <leonro@nvidia.com>

Changelog
v2:
 * Patch 1:
  * Renamed vf_msix_vec sysfs knob to be sriov_vf_msix_count
  * Added PF and VF device locks during set MSI-X call to protect from parallel
    driver bind/unbind operations.
  * Removed extra checks when reading sriov_vf_msix, because users will
    be able to distinguish between supported/not supported by looking on
    sriov_vf_total_msix count.
  * Changed all occurrences of "numb" to be "count"
  * Changed returned error from EOPNOTSUPP to be EBUSY if user tries to set
    MSI-X count after driver already bound to the VF.
  * Added extra comment in pci_set_msix_vec_count() to emphasize that driver
    should not be bound.
 * Patch 2:
  * Chaged vf_total_msix from int to be u32 and updated function signatures
    accordingly.
  * Improved patch title
v1: https://lore.kernel.org/linux-pci/20210110150727.1965295-1-leon@kernel.org
 * Improved wording and commit messages of first PCI patch
 * Added extra PCI patch to provide total number of MSI-X vectors
 * Prohibited read of vf_msix_vec sysfs file if driver doesn't support write
 * Removed extra function definition in pci.h
v0: https://lore.kernel.org/linux-pci/20210103082440.34994-1-leon@kernel.org

--------------------------------------------------------------------
Hi,

The number of MSI-X vectors is PCI property visible through lspci, that
field is read-only and configured by the device.

The static assignment of an amount of MSI-X vectors doesn't allow utilize
the newly created VF because it is not known to the device the future load
and configuration where that VF will be used.

The VFs are created on the hypervisor and forwarded to the VMs that have
different properties (for example number of CPUs).

To overcome the inefficiency in the spread of such MSI-X vectors, we
allow the kernel to instruct the device with the needed number of such
vectors, before VF is initialized and bounded to the driver.

Before this series:
[root@server ~]# lspci -vs 0000:08:00.2
08:00.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
....
        Capabilities: [9c] MSI-X: Enable- Count=12 Masked-

Configuration script:
1. Start fresh
echo 0 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_numvfs
modprobe -q -r mlx5_ib mlx5_core
2. Ensure that driver doesn't run and it is safe to change MSI-X
echo 0 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_drivers_autoprobe
3. Load driver for the PF
modprobe mlx5_core
4. Configure one of the VFs with new number
echo 2 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_numvfs
echo 21 > /sys/bus/pci/devices/0000\:08\:00.2/sriov_vf_msix_count

After this series:
[root@server ~]# lspci -vs 0000:08:00.2
08:00.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
....
        Capabilities: [9c] MSI-X: Enable- Count=21 Masked-

Thanks

Leon Romanovsky (5):
  PCI: Add sysfs callback to allow MSI-X table size change of SR-IOV VFs
  PCI: Add SR-IOV sysfs entry to read total number of dynamic MSI-X
    vectors
  net/mlx5: Add dynamic MSI-X capabilities bits
  net/mlx5: Dynamically assign MSI-X vectors count
  net/mlx5: Allow to the users to configure number of MSI-X vectors

 Documentation/ABI/testing/sysfs-bus-pci       | 34 +++++++
 .../net/ethernet/mellanox/mlx5/core/main.c    |  5 ++
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  6 ++
 .../net/ethernet/mellanox/mlx5/core/pci_irq.c | 62 +++++++++++++
 .../net/ethernet/mellanox/mlx5/core/sriov.c   | 52 ++++++++++-
 drivers/pci/iov.c                             | 89 +++++++++++++++++++
 drivers/pci/msi.c                             | 47 ++++++++++
 drivers/pci/pci-sysfs.c                       |  1 +
 drivers/pci/pci.h                             |  5 ++
 include/linux/mlx5/mlx5_ifc.h                 | 11 ++-
 include/linux/pci.h                           |  5 ++
 11 files changed, 314 insertions(+), 3 deletions(-)

--
2.29.2

Comments

Jakub Kicinski Jan. 14, 2021, 5:51 p.m. UTC | #1
On Thu, 14 Jan 2021 12:31:35 +0200 Leon Romanovsky wrote:
> The number of MSI-X vectors is PCI property visible through lspci, that
> field is read-only and configured by the device.
> 
> The static assignment of an amount of MSI-X vectors doesn't allow utilize
> the newly created VF because it is not known to the device the future load
> and configuration where that VF will be used.
> 
> The VFs are created on the hypervisor and forwarded to the VMs that have
> different properties (for example number of CPUs).
> 
> To overcome the inefficiency in the spread of such MSI-X vectors, we
> allow the kernel to instruct the device with the needed number of such
> vectors, before VF is initialized and bounded to the driver.


Hi Leon!

Looks like you got some missing kdoc here, check out the test in
patchwork so we don't need to worry about this later:

https://patchwork.kernel.org/project/netdevbpf/list/?series=414497
Alex Williamson Jan. 15, 2021, 12:05 a.m. UTC | #2
On Thu, 14 Jan 2021 12:31:37 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Some SR-IOV capable devices provide an ability to configure specific
> number of MSI-X vectors on their VF prior driver is probed on that VF.
> 
> In order to make management easy, provide new read-only sysfs file that
> returns a total number of possible to configure MSI-X vectors.
> 
> cat /sys/bus/pci/devices/.../sriov_vf_total_msix
>   = 0 - feature is not supported
>   > 0 - total number of MSI-X vectors to consume by the VFs  
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-pci | 14 +++++++++++
>  drivers/pci/iov.c                       | 31 +++++++++++++++++++++++++
>  drivers/pci/pci.h                       |  3 +++
>  include/linux/pci.h                     |  2 ++
>  4 files changed, 50 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index 34a8c6bcde70..530c249cc3da 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -395,3 +395,17 @@ Description:
>  		The file is writable if the PF is bound to a driver that
>  		set sriov_vf_total_msix > 0 and there is no driver bound
>  		to the VF.
> +
> +What:		/sys/bus/pci/devices/.../sriov_vf_total_msix
> +Date:		January 2021
> +Contact:	Leon Romanovsky <leonro@nvidia.com>
> +Description:
> +		This file is associated with the SR-IOV PFs.
> +		It returns a total number of possible to configure MSI-X
> +		vectors on the enabled VFs.
> +
> +		The values returned are:
> +		 * > 0 - this will be total number possible to consume by VFs,
> +		 * = 0 - feature is not supported

As with previous, why expose it if not supported?

This seems pretty challenging for userspace to use; aiui they would
need to iterate all the VFs to learn how many vectors are already
allocated, subtract that number from this value, all while hoping they
aren't racing someone else doing the same.  Would it be more useful if
this reported the number of surplus vectors available?

How would a per VF limit be exposed?  Do we expect users to know the
absolutely MSI-X vector limit or the device specific limit?  Thanks,

Alex

> +
> +		If no SR-IOV VFs are enabled, this value will return 0.
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 5bc496f8ffa3..f9dc31947dbc 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -394,12 +394,22 @@ static ssize_t sriov_drivers_autoprobe_store(struct device *dev,
>  	return count;
>  }
> 
> +static ssize_t sriov_vf_total_msix_show(struct device *dev,
> +					struct device_attribute *attr,
> +					char *buf)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +
> +	return sprintf(buf, "%u\n", pdev->sriov->vf_total_msix);
> +}
> +
>  static DEVICE_ATTR_RO(sriov_totalvfs);
>  static DEVICE_ATTR_RW(sriov_numvfs);
>  static DEVICE_ATTR_RO(sriov_offset);
>  static DEVICE_ATTR_RO(sriov_stride);
>  static DEVICE_ATTR_RO(sriov_vf_device);
>  static DEVICE_ATTR_RW(sriov_drivers_autoprobe);
> +static DEVICE_ATTR_RO(sriov_vf_total_msix);
> 
>  static struct attribute *sriov_dev_attrs[] = {
>  	&dev_attr_sriov_totalvfs.attr,
> @@ -408,6 +418,7 @@ static struct attribute *sriov_dev_attrs[] = {
>  	&dev_attr_sriov_stride.attr,
>  	&dev_attr_sriov_vf_device.attr,
>  	&dev_attr_sriov_drivers_autoprobe.attr,
> +	&dev_attr_sriov_vf_total_msix.attr,
>  	NULL,
>  };
> 
> @@ -654,6 +665,7 @@ static void sriov_disable(struct pci_dev *dev)
>  		sysfs_remove_link(&dev->dev.kobj, "dep_link");
> 
>  	iov->num_VFs = 0;
> +	iov->vf_total_msix = 0;
>  	pci_iov_set_numvfs(dev, 0);
>  }
> 
> @@ -1112,6 +1124,25 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
> 
> +/**
> + * pci_sriov_set_vf_total_msix - set total number of MSI-X vectors for the VFs
> + * @dev: the PCI PF device
> + * @count: the total number of MSI-X vector to consume by the VFs
> + *
> + * Sets the number of MSI-X vectors that is possible to consume by the VFs.
> + * This interface is complimentary part of the pci_set_msix_vec_count()
> + * that will be used to configure the required number on the VF.
> + */
> +void pci_sriov_set_vf_total_msix(struct pci_dev *dev, u32 count)
> +{
> +	if (!dev->is_physfn || !dev->driver ||
> +	    !dev->driver->sriov_set_msix_vec_count)
> +		return;
> +
> +	dev->sriov->vf_total_msix = count;
> +}
> +EXPORT_SYMBOL_GPL(pci_sriov_set_vf_total_msix);
> +
>  /**
>   * pci_sriov_configure_simple - helper to configure SR-IOV
>   * @dev: the PCI device
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index dbbfa9e73ea8..604e1f9172c2 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -327,6 +327,9 @@ struct pci_sriov {
>  	u16		subsystem_device; /* VF subsystem device */
>  	resource_size_t	barsz[PCI_SRIOV_NUM_BARS];	/* VF BAR size */
>  	bool		drivers_autoprobe; /* Auto probing of VFs by driver */
> +	u32		vf_total_msix;  /* Total number of MSI-X vectors the VFs
> +					 * can consume
> +					 */
>  };
> 
>  /**
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 6be96d468eda..c950513558b8 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -2075,6 +2075,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev);
>  int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn);
>  resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
>  void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe);
> +void pci_sriov_set_vf_total_msix(struct pci_dev *dev, u32 count);
> 
>  /* Arch may override these (weak) */
>  int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs);
> @@ -2115,6 +2116,7 @@ static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
>  static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
>  { return 0; }
>  static inline void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe) { }
> +static inline void pci_sriov_set_vf_total_msix(struct pci_dev *dev, u32 count) {}
>  #endif
> 
>  #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
> --
> 2.29.2
>
Leon Romanovsky Jan. 16, 2021, 8:23 a.m. UTC | #3
On Thu, Jan 14, 2021 at 05:05:43PM -0700, Alex Williamson wrote:
> On Thu, 14 Jan 2021 12:31:36 +0200

> Leon Romanovsky <leon@kernel.org> wrote:

>

> > From: Leon Romanovsky <leonro@nvidia.com>

> >

> > Extend PCI sysfs interface with a new callback that allows configure

> > the number of MSI-X vectors for specific SR-IO VF. This is needed

> > to optimize the performance of newly bound devices by allocating

> > the number of vectors based on the administrator knowledge of targeted VM.

> >

> > This function is applicable for SR-IOV VF because such devices allocate

> > their MSI-X table before they will run on the VMs and HW can't guess the

> > right number of vectors, so the HW allocates them statically and equally.

> >

> > The newly added /sys/bus/pci/devices/.../sriov_vf_msix_count file will be seen

> > for the VFs and it is writable as long as a driver is not bounded to the VF.

> >

> > The values accepted are:

> >  * > 0 - this will be number reported by the VF's MSI-X capability

> >  * < 0 - not valid

> >  * = 0 - will reset to the device default value

> >

> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

> > ---

> >  Documentation/ABI/testing/sysfs-bus-pci | 20 +++++++++

> >  drivers/pci/iov.c                       | 58 +++++++++++++++++++++++++

> >  drivers/pci/msi.c                       | 47 ++++++++++++++++++++

> >  drivers/pci/pci-sysfs.c                 |  1 +

> >  drivers/pci/pci.h                       |  2 +

> >  include/linux/pci.h                     |  3 ++

> >  6 files changed, 131 insertions(+)

> >

> > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci

> > index 25c9c39770c6..34a8c6bcde70 100644

> > --- a/Documentation/ABI/testing/sysfs-bus-pci

> > +++ b/Documentation/ABI/testing/sysfs-bus-pci

> > @@ -375,3 +375,23 @@ Description:

> >  		The value comes from the PCI kernel device state and can be one

> >  		of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".

> >  		The file is read only.

> > +

> > +What:		/sys/bus/pci/devices/.../sriov_vf_msix_count

> > +Date:		December 2020

> > +Contact:	Leon Romanovsky <leonro@nvidia.com>

> > +Description:

> > +		This file is associated with the SR-IOV VFs.

> > +		It allows configuration of the number of MSI-X vectors for

> > +		the VF. This is needed to optimize performance of newly bound

> > +		devices by allocating the number of vectors based on the

> > +		administrator knowledge of targeted VM.

> > +

> > +		The values accepted are:

> > +		 * > 0 - this will be number reported by the VF's MSI-X

> > +			 capability

> > +		 * < 0 - not valid

> > +		 * = 0 - will reset to the device default value

> > +

> > +		The file is writable if the PF is bound to a driver that

> > +		set sriov_vf_total_msix > 0 and there is no driver bound

> > +		to the VF.

> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c

> > index 4afd4ee4f7f0..5bc496f8ffa3 100644

> > --- a/drivers/pci/iov.c

> > +++ b/drivers/pci/iov.c

> > @@ -31,6 +31,7 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)

> >  	return (dev->devfn + dev->sriov->offset +

> >  		dev->sriov->stride * vf_id) & 0xff;

> >  }

> > +EXPORT_SYMBOL(pci_iov_virtfn_devfn);

> >

> >  /*

> >   * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may

> > @@ -426,6 +427,63 @@ const struct attribute_group sriov_dev_attr_group = {

> >  	.is_visible = sriov_attrs_are_visible,

> >  };

> >

> > +#ifdef CONFIG_PCI_MSI

> > +static ssize_t sriov_vf_msix_count_show(struct device *dev,

> > +					struct device_attribute *attr,

> > +					char *buf)

> > +{

> > +	struct pci_dev *pdev = to_pci_dev(dev);

> > +	int count = pci_msix_vec_count(pdev);

> > +

> > +	if (count < 0)

> > +		return count;

> > +

> > +	return sprintf(buf, "%d\n", count);

> > +}

> > +

> > +static ssize_t sriov_vf_msix_count_store(struct device *dev,

> > +					 struct device_attribute *attr,

> > +					 const char *buf, size_t count)

> > +{

> > +	struct pci_dev *vf_dev = to_pci_dev(dev);

> > +	int val, ret;

> > +

> > +	ret = kstrtoint(buf, 0, &val);

> > +	if (ret)

> > +		return ret;

> > +

> > +	ret = pci_set_msix_vec_count(vf_dev, val);

> > +	if (ret)

> > +		return ret;

> > +

> > +	return count;

> > +}

> > +static DEVICE_ATTR_RW(sriov_vf_msix_count);

> > +#endif

> > +

> > +static struct attribute *sriov_vf_dev_attrs[] = {

> > +#ifdef CONFIG_PCI_MSI

> > +	&dev_attr_sriov_vf_msix_count.attr,

> > +#endif

> > +	NULL,

> > +};

> > +

> > +static umode_t sriov_vf_attrs_are_visible(struct kobject *kobj,

> > +					  struct attribute *a, int n)

> > +{

> > +	struct device *dev = kobj_to_dev(kobj);

> > +

> > +	if (dev_is_pf(dev))

> > +		return 0;

>

> Wouldn't it be cleaner to also hide this on VFs where

> pci_msix_vec_count() returns an error or where the PF driver doesn't

> implement .sriov_set_msix_vec_count()?  IOW, expose it only where it

> could actually work.


I wasn't sure about the policy in PCI/core, but sure will change.

>

> > +

> > +	return a->mode;

> > +}

> > +

> > +const struct attribute_group sriov_vf_dev_attr_group = {

> > +	.attrs = sriov_vf_dev_attrs,

> > +	.is_visible = sriov_vf_attrs_are_visible,

> > +};

> > +

> >  int __weak pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)

> >  {

> >  	return 0;

> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c

> > index 3162f88fe940..5a40200343c9 100644

> > --- a/drivers/pci/msi.c

> > +++ b/drivers/pci/msi.c

> > @@ -991,6 +991,53 @@ int pci_msix_vec_count(struct pci_dev *dev)

> >  }

> >  EXPORT_SYMBOL(pci_msix_vec_count);

> >

> > +/**

> > + * pci_set_msix_vec_count - change the reported number of MSI-X vectors

> > + * This function is applicable for SR-IOV VF because such devices allocate

> > + * their MSI-X table before they will run on the VMs and HW can't guess the

> > + * right number of vectors, so the HW allocates them statically and equally.

>

> Nit, this is an assumption of the VF usage and conjecture of the

> implementation.


This is one of the possible implementations.

>

> > + * @dev: VF device that is going to be changed

> > + * @count amount of MSI-X vectors

> > + **/

> > +int pci_set_msix_vec_count(struct pci_dev *dev, int count)

>

> pci_vf_set_msix_vec_count()?  Long, I know, but if it's limited to VFs

> name it accordingly.  Thanks,


I'll do.

>

> Alex

>

> > +{

> > +	struct pci_dev *pdev = pci_physfn(dev);

> > +	int ret;

> > +

> > +	if (!dev->msix_cap || !pdev->msix_cap || count < 0)

> > +		/*

> > +		 * We don't support negative numbers for now,

> > +		 * but maybe in the future it will make sense.

> > +		 */

> > +		return -EINVAL;

> > +

> > +	device_lock(&pdev->dev);

> > +	if (!pdev->driver || !pdev->driver->sriov_set_msix_vec_count) {

> > +		ret = -EOPNOTSUPP;

> > +		goto err_pdev;

> > +	}

> > +

> > +	device_lock(&dev->dev);

> > +	if (dev->driver) {

> > +		/*

> > +		 * Driver already probed this VF and configured itself

> > +		 * based on previously configured (or default) MSI-X vector

> > +		 * count. It is too late to change this field for this

> > +		 * specific VF.

> > +		 */

> > +		ret = -EBUSY;

> > +		goto err_dev;

> > +	}

> > +

> > +	ret = pdev->driver->sriov_set_msix_vec_count(dev, count);

> > +

> > +err_dev:

> > +	device_unlock(&dev->dev);

> > +err_pdev:

> > +	device_unlock(&pdev->dev);

> > +	return ret;

> > +}

> > +

> >  static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,

> >  			     int nvec, struct irq_affinity *affd, int flags)

> >  {

> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c

> > index fb072f4b3176..0af2222643c2 100644

> > --- a/drivers/pci/pci-sysfs.c

> > +++ b/drivers/pci/pci-sysfs.c

> > @@ -1557,6 +1557,7 @@ static const struct attribute_group *pci_dev_attr_groups[] = {

> >  	&pci_dev_hp_attr_group,

> >  #ifdef CONFIG_PCI_IOV

> >  	&sriov_dev_attr_group,

> > +	&sriov_vf_dev_attr_group,

> >  #endif

> >  	&pci_bridge_attr_group,

> >  	&pcie_dev_attr_group,

> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h

> > index 5c59365092fa..dbbfa9e73ea8 100644

> > --- a/drivers/pci/pci.h

> > +++ b/drivers/pci/pci.h

> > @@ -183,6 +183,7 @@ extern unsigned int pci_pm_d3hot_delay;

> >

> >  #ifdef CONFIG_PCI_MSI

> >  void pci_no_msi(void);

> > +int pci_set_msix_vec_count(struct pci_dev *dev, int count);

> >  #else

> >  static inline void pci_no_msi(void) { }

> >  #endif

> > @@ -502,6 +503,7 @@ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);

> >  void pci_restore_iov_state(struct pci_dev *dev);

> >  int pci_iov_bus_range(struct pci_bus *bus);

> >  extern const struct attribute_group sriov_dev_attr_group;

> > +extern const struct attribute_group sriov_vf_dev_attr_group;

> >  #else

> >  static inline int pci_iov_init(struct pci_dev *dev)

> >  {

> > diff --git a/include/linux/pci.h b/include/linux/pci.h

> > index b32126d26997..6be96d468eda 100644

> > --- a/include/linux/pci.h

> > +++ b/include/linux/pci.h

> > @@ -856,6 +856,8 @@ struct module;

> >   *		e.g. drivers/net/e100.c.

> >   * @sriov_configure: Optional driver callback to allow configuration of

> >   *		number of VFs to enable via sysfs "sriov_numvfs" file.

> > + * @sriov_set_msix_vec_count: Driver callback to change number of MSI-X vectors

> > + *              exposed by the sysfs "vf_msix_vec" entry.

> >   * @err_handler: See Documentation/PCI/pci-error-recovery.rst

> >   * @groups:	Sysfs attribute groups.

> >   * @driver:	Driver model structure.

> > @@ -871,6 +873,7 @@ struct pci_driver {

> >  	int  (*resume)(struct pci_dev *dev);	/* Device woken up */

> >  	void (*shutdown)(struct pci_dev *dev);

> >  	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */

> > +	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */

> >  	const struct pci_error_handlers *err_handler;

> >  	const struct attribute_group **groups;

> >  	struct device_driver	driver;

> > --

> > 2.29.2

> >

>
Leon Romanovsky Jan. 16, 2021, 8:36 a.m. UTC | #4
On Thu, Jan 14, 2021 at 05:05:36PM -0700, Alex Williamson wrote:
> On Thu, 14 Jan 2021 12:31:37 +0200

> Leon Romanovsky <leon@kernel.org> wrote:

>

> > From: Leon Romanovsky <leonro@nvidia.com>

> >

> > Some SR-IOV capable devices provide an ability to configure specific

> > number of MSI-X vectors on their VF prior driver is probed on that VF.

> >

> > In order to make management easy, provide new read-only sysfs file that

> > returns a total number of possible to configure MSI-X vectors.

> >

> > cat /sys/bus/pci/devices/.../sriov_vf_total_msix

> >   = 0 - feature is not supported

> >   > 0 - total number of MSI-X vectors to consume by the VFs

> >

> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

> > ---

> >  Documentation/ABI/testing/sysfs-bus-pci | 14 +++++++++++

> >  drivers/pci/iov.c                       | 31 +++++++++++++++++++++++++

> >  drivers/pci/pci.h                       |  3 +++

> >  include/linux/pci.h                     |  2 ++

> >  4 files changed, 50 insertions(+)

> >

> > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci

> > index 34a8c6bcde70..530c249cc3da 100644

> > --- a/Documentation/ABI/testing/sysfs-bus-pci

> > +++ b/Documentation/ABI/testing/sysfs-bus-pci

> > @@ -395,3 +395,17 @@ Description:

> >  		The file is writable if the PF is bound to a driver that

> >  		set sriov_vf_total_msix > 0 and there is no driver bound

> >  		to the VF.

> > +

> > +What:		/sys/bus/pci/devices/.../sriov_vf_total_msix

> > +Date:		January 2021

> > +Contact:	Leon Romanovsky <leonro@nvidia.com>

> > +Description:

> > +		This file is associated with the SR-IOV PFs.

> > +		It returns a total number of possible to configure MSI-X

> > +		vectors on the enabled VFs.

> > +

> > +		The values returned are:

> > +		 * > 0 - this will be total number possible to consume by VFs,

> > +		 * = 0 - feature is not supported

>

> As with previous, why expose it if not supported?


It is much simpler to the users implement logic that operates
accordingly to this value instead of relying on exist/not-exist and
anyway handle 0 to be on the safe side.

>

> This seems pretty challenging for userspace to use; aiui they would

> need to iterate all the VFs to learn how many vectors are already

> allocated, subtract that number from this value, all while hoping they

> aren't racing someone else doing the same.  Would it be more useful if

> this reported the number of surplus vectors available?


Only privileged users are allowed to do it, so it is unlikely that we
will have more than one entity which manages PFs/VFs assignments.

Users already count number of CPUs they give to the VMs, so counting
resources is not new to them.

I didn't count in the kernel because it will require from users to
understand and treat "0" differently to understand that the pool is
depleted. So they will need to count max size of the pool anyway.

Unless we want to have two knobs, one of max and another for current,
they will count. The thing is that users will count anyway and won't
use the current value. It gives nothing.

>

> How would a per VF limit be exposed?  Do we expect users to know the

> absolutely MSI-X vector limit or the device specific limit?  Thanks,


At this stage yes, we can discuss it later when the need will arise.

Thanks
Leon Romanovsky Jan. 17, 2021, 5:44 a.m. UTC | #5
On Thu, Jan 14, 2021 at 09:51:28AM -0800, Jakub Kicinski wrote:
> On Thu, 14 Jan 2021 12:31:35 +0200 Leon Romanovsky wrote:

> > The number of MSI-X vectors is PCI property visible through lspci, that

> > field is read-only and configured by the device.

> >

> > The static assignment of an amount of MSI-X vectors doesn't allow utilize

> > the newly created VF because it is not known to the device the future load

> > and configuration where that VF will be used.

> >

> > The VFs are created on the hypervisor and forwarded to the VMs that have

> > different properties (for example number of CPUs).

> >

> > To overcome the inefficiency in the spread of such MSI-X vectors, we

> > allow the kernel to instruct the device with the needed number of such

> > vectors, before VF is initialized and bounded to the driver.

>

>

> Hi Leon!

>

> Looks like you got some missing kdoc here, check out the test in

> patchwork so we don't need to worry about this later:

>

> https://patchwork.kernel.org/project/netdevbpf/list/?series=414497


Thanks Jakub,

I'll add kdocs to internal mlx5 functions.
IMHO, they are useless.

Thanks
Leon Romanovsky Jan. 17, 2021, 7:03 a.m. UTC | #6
On Sat, Jan 16, 2021 at 10:23:31AM +0200, Leon Romanovsky wrote:
> On Thu, Jan 14, 2021 at 05:05:43PM -0700, Alex Williamson wrote:

> > On Thu, 14 Jan 2021 12:31:36 +0200

> > Leon Romanovsky <leon@kernel.org> wrote:

> >

> > > From: Leon Romanovsky <leonro@nvidia.com>

> > >

> > > Extend PCI sysfs interface with a new callback that allows configure

> > > the number of MSI-X vectors for specific SR-IO VF. This is needed

> > > to optimize the performance of newly bound devices by allocating

> > > the number of vectors based on the administrator knowledge of targeted VM.

> > >

> > > This function is applicable for SR-IOV VF because such devices allocate

> > > their MSI-X table before they will run on the VMs and HW can't guess the

> > > right number of vectors, so the HW allocates them statically and equally.

> > >

> > > The newly added /sys/bus/pci/devices/.../sriov_vf_msix_count file will be seen

> > > for the VFs and it is writable as long as a driver is not bounded to the VF.

> > >

> > > The values accepted are:

> > >  * > 0 - this will be number reported by the VF's MSI-X capability

> > >  * < 0 - not valid

> > >  * = 0 - will reset to the device default value

> > >

> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

> > > ---

> > >  Documentation/ABI/testing/sysfs-bus-pci | 20 +++++++++

> > >  drivers/pci/iov.c                       | 58 +++++++++++++++++++++++++

> > >  drivers/pci/msi.c                       | 47 ++++++++++++++++++++

> > >  drivers/pci/pci-sysfs.c                 |  1 +

> > >  drivers/pci/pci.h                       |  2 +

> > >  include/linux/pci.h                     |  3 ++

> > >  6 files changed, 131 insertions(+)


<...>

> > > +static umode_t sriov_vf_attrs_are_visible(struct kobject *kobj,

> > > +					  struct attribute *a, int n)

> > > +{

> > > +	struct device *dev = kobj_to_dev(kobj);

> > > +

> > > +	if (dev_is_pf(dev))

> > > +		return 0;

> >

> > Wouldn't it be cleaner to also hide this on VFs where

> > pci_msix_vec_count() returns an error or where the PF driver doesn't

> > implement .sriov_set_msix_vec_count()?  IOW, expose it only where it

> > could actually work.

>

> I wasn't sure about the policy in PCI/core, but sure will change.


I ended adding checks of msix_cap, but can't check .sriov_set_msix_vec_count.
The latter will require to hold device_lock on PF that can disappear later, it
is too racy.

Thanks
Leon Romanovsky Jan. 17, 2021, 7:24 a.m. UTC | #7
On Sun, Jan 17, 2021 at 07:44:09AM +0200, Leon Romanovsky wrote:
> On Thu, Jan 14, 2021 at 09:51:28AM -0800, Jakub Kicinski wrote:

> > On Thu, 14 Jan 2021 12:31:35 +0200 Leon Romanovsky wrote:

> > > The number of MSI-X vectors is PCI property visible through lspci, that

> > > field is read-only and configured by the device.

> > >

> > > The static assignment of an amount of MSI-X vectors doesn't allow utilize

> > > the newly created VF because it is not known to the device the future load

> > > and configuration where that VF will be used.

> > >

> > > The VFs are created on the hypervisor and forwarded to the VMs that have

> > > different properties (for example number of CPUs).

> > >

> > > To overcome the inefficiency in the spread of such MSI-X vectors, we

> > > allow the kernel to instruct the device with the needed number of such

> > > vectors, before VF is initialized and bounded to the driver.

> >

> >

> > Hi Leon!

> >

> > Looks like you got some missing kdoc here, check out the test in

> > patchwork so we don't need to worry about this later:

> >

> > https://patchwork.kernel.org/project/netdevbpf/list/?series=414497

>

> Thanks Jakub,

>

> I'll add kdocs to internal mlx5 functions.

> IMHO, they are useless.


At the end, it looks like CI false alarm.

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:81: warning: Function parameter or member 'dev' not described in 'mlx5_set_msix_vec_count'
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:81: warning: Function parameter or member 'function_id' not described in 'mlx5_set_msix_vec_count'
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:81: warning: Function parameter or member 'msix_vec_count' not described in 'mlx5_set_msix_vec_count'
New warnings added

The function mlx5_set_msix_vec_count() is documented.
+/**
+ * mlx5_set_msix_vec_count() - Set dynamically allocated MSI-X to the VF
+ * @dev - PF to work on
+ * @function_id - internal PCI VF function id
+ * @msix_vec_count - Number of MSI-X to set
+ **/
+int mlx5_set_msix_vec_count(struct mlx5_core_dev *dev, int function_id,
+			    int msix_vec_count)
https://patchwork.kernel.org/project/netdevbpf/patch/20210114103140.866141-5-leon@kernel.org/

Thanks

>

> Thanks
Jakub Kicinski Jan. 18, 2021, 6:07 p.m. UTC | #8
On Sun, 17 Jan 2021 09:24:41 +0200 Leon Romanovsky wrote:
> On Sun, Jan 17, 2021 at 07:44:09AM +0200, Leon Romanovsky wrote:

> > On Thu, Jan 14, 2021 at 09:51:28AM -0800, Jakub Kicinski wrote:  

> > > On Thu, 14 Jan 2021 12:31:35 +0200 Leon Romanovsky wrote:  

> > > > The number of MSI-X vectors is PCI property visible through lspci, that

> > > > field is read-only and configured by the device.

> > > >

> > > > The static assignment of an amount of MSI-X vectors doesn't allow utilize

> > > > the newly created VF because it is not known to the device the future load

> > > > and configuration where that VF will be used.

> > > >

> > > > The VFs are created on the hypervisor and forwarded to the VMs that have

> > > > different properties (for example number of CPUs).

> > > >

> > > > To overcome the inefficiency in the spread of such MSI-X vectors, we

> > > > allow the kernel to instruct the device with the needed number of such

> > > > vectors, before VF is initialized and bounded to the driver.  

> > >

> > >

> > > Hi Leon!

> > >

> > > Looks like you got some missing kdoc here, check out the test in

> > > patchwork so we don't need to worry about this later:

> > >

> > > https://patchwork.kernel.org/project/netdevbpf/list/?series=414497  

> >

> > Thanks Jakub,

> >

> > I'll add kdocs to internal mlx5 functions.

> > IMHO, they are useless.  


It's just scripts/kernel-doc, and it's checking if the kdoc is _valid_,
your call if you want to add kdoc, just a comment, or nothing at all.

> At the end, it looks like CI false alarm.

> 

> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:81: warning: Function parameter or member 'dev' not described in 'mlx5_set_msix_vec_count'

> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:81: warning: Function parameter or member 'function_id' not described in 'mlx5_set_msix_vec_count'

> drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:81: warning: Function parameter or member 'msix_vec_count' not described in 'mlx5_set_msix_vec_count'

> New warnings added

> 

> The function mlx5_set_msix_vec_count() is documented.

> +/**

> + * mlx5_set_msix_vec_count() - Set dynamically allocated MSI-X to the VF

> + * @dev - PF to work on

> + * @function_id - internal PCI VF function id

> + * @msix_vec_count - Number of MSI-X to set

> + **/

> +int mlx5_set_msix_vec_count(struct mlx5_core_dev *dev, int function_id,

> +			    int msix_vec_count)

> https://patchwork.kernel.org/project/netdevbpf/patch/20210114103140.866141-5-leon@kernel.org/


AFAIU that's not valid kdoc, I _think_ you need to replace ' -' with ':'
for arguments (not my rules).
Leon Romanovsky Jan. 19, 2021, 5:39 a.m. UTC | #9
On Mon, Jan 18, 2021 at 10:07:32AM -0800, Jakub Kicinski wrote:
> On Sun, 17 Jan 2021 09:24:41 +0200 Leon Romanovsky wrote:

> > On Sun, Jan 17, 2021 at 07:44:09AM +0200, Leon Romanovsky wrote:

> > > On Thu, Jan 14, 2021 at 09:51:28AM -0800, Jakub Kicinski wrote:

> > > > On Thu, 14 Jan 2021 12:31:35 +0200 Leon Romanovsky wrote:

> > > > > The number of MSI-X vectors is PCI property visible through lspci, that

> > > > > field is read-only and configured by the device.

> > > > >

> > > > > The static assignment of an amount of MSI-X vectors doesn't allow utilize

> > > > > the newly created VF because it is not known to the device the future load

> > > > > and configuration where that VF will be used.

> > > > >

> > > > > The VFs are created on the hypervisor and forwarded to the VMs that have

> > > > > different properties (for example number of CPUs).

> > > > >

> > > > > To overcome the inefficiency in the spread of such MSI-X vectors, we

> > > > > allow the kernel to instruct the device with the needed number of such

> > > > > vectors, before VF is initialized and bounded to the driver.

> > > >

> > > >

> > > > Hi Leon!

> > > >

> > > > Looks like you got some missing kdoc here, check out the test in

> > > > patchwork so we don't need to worry about this later:

> > > >

> > > > https://patchwork.kernel.org/project/netdevbpf/list/?series=414497

> > >

> > > Thanks Jakub,

> > >

> > > I'll add kdocs to internal mlx5 functions.

> > > IMHO, they are useless.

>

> It's just scripts/kernel-doc, and it's checking if the kdoc is _valid_,

> your call if you want to add kdoc, just a comment, or nothing at all.


I prefer clean CI, so will add.

>

> > At the end, it looks like CI false alarm.

> >

> > drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:81: warning: Function parameter or member 'dev' not described in 'mlx5_set_msix_vec_count'

> > drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:81: warning: Function parameter or member 'function_id' not described in 'mlx5_set_msix_vec_count'

> > drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:81: warning: Function parameter or member 'msix_vec_count' not described in 'mlx5_set_msix_vec_count'

> > New warnings added

> >

> > The function mlx5_set_msix_vec_count() is documented.

> > +/**

> > + * mlx5_set_msix_vec_count() - Set dynamically allocated MSI-X to the VF

> > + * @dev - PF to work on

> > + * @function_id - internal PCI VF function id

> > + * @msix_vec_count - Number of MSI-X to set

> > + **/

> > +int mlx5_set_msix_vec_count(struct mlx5_core_dev *dev, int function_id,

> > +			    int msix_vec_count)

> > https://patchwork.kernel.org/project/netdevbpf/patch/20210114103140.866141-5-leon@kernel.org/

>

> AFAIU that's not valid kdoc, I _think_ you need to replace ' -' with ':'

> for arguments (not my rules).


Right, I figured it when submitted v3.

Thanks