mbox series

[RFC,v4,0/8] vfio/hisilicon: add ACC live migration driver

Message ID 20220208133425.1096-1-shameerali.kolothum.thodi@huawei.com
Headers show
Series vfio/hisilicon: add ACC live migration driver | expand

Message

Shameerali Kolothum Thodi Feb. 8, 2022, 1:34 p.m. UTC
Hi,

This series attempts to add vfio live migration support for
HiSilicon ACC VF devices based on the new v2 migration protocol
definition and mlx5 v7 series discussed here[0]. Many thanks to
everyone involved in the v2 protocol discussions.

Since the v2 protocol is still under discussion, this is provided
as a RFC for now. This migration driver supports only the
VFIO_MIGRATION_STOP_COPY and doesn't have the support for optional
VFIO_MIGRATION_P2P or VFIO_MIGRATION_PRE_COPY features.

Since we are not making use of the PRE_COPY state as used in v1,
the compatibility check for source and destination devices is now
done once the migration data transfer is finished. As discussed
here[1], this is not ideal as it will delay reporting the error
in case of a mismatch.

This is sanity tested on a HiSilicon platform using the Qemu branch
provided here[2].

Please take a look and let know your feedback.

Thanks,
Shameer
[0] https://lore.kernel.org/netdev/20220207172216.206415-1-yishaih@nvidia.com/
[1] https://lore.kernel.org/kvm/20220202103041.2b404d13.alex.williamson@redhat.com/
[2] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2 


v3 --> RFCv4
-Based on migration v2 protocol and mlx5 v7 series.
-Added RFC tag again as migration v2 protocol is still under discussion.
-Added new patch #6 to retrieve the PF QM data.
-PRE_COPY compatibility check is now done after the migration data
 transfer. This is not ideal and needs discussion.

RFC v2 --> v3
 -Dropped RFC tag as the vfio_pci_core subsystem framework is now
  part of 5.15-rc1.
 -Added override methods for vfio_device_ops read/write/mmap calls
  to limit the access within the functional register space.
 -Patches 1 to 3 are code refactoring to move the common ACC QM
  definitions and header around.

RFCv1 --> RFCv2

 -Adds a new vendor-specific vfio_pci driver(hisi-acc-vfio-pci)
  for HiSilicon ACC VF devices based on the new vfio-pci-core
  framework proposal.

 -Since HiSilicon ACC VF device MMIO space contains both the
  functional register space and migration control register space,
  override the vfio_device_ops ioctl method to report only the
  functional space to VMs.

 -For a successful migration, we still need access to VF dev
  functional register space mainly to read the status registers.
  But accessing these while the Guest vCPUs are running may leave
  a security hole. To avoid any potential security issues, we
  map/unmap the MMIO regions on a need basis and is safe to do so.
  (Please see hisi_acc_vf_ioremap/unmap() fns in patch #4).
 
 -Dropped debugfs support for now.
 -Uses common QM functions for mailbox access(patch #3).

Longfang Liu (2):
  crypto: hisilicon/qm: Move few definitions to common header
  hisi_acc_vfio_pci: Add support for VFIO live migration

Shameer Kolothum (6):
  crypto: hisilicon/qm: Move the QM header to include/linux
  hisi_acc_qm: Move PCI device IDs to common header
  hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices
  hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region
  crypto: hisilicon/qm: Add helper to retrieve the PF qm data
  hisi_acc_vfio_pci: Use its own PCI reset_done error handler

 drivers/crypto/hisilicon/hpre/hpre.h          |    2 +-
 drivers/crypto/hisilicon/hpre/hpre_main.c     |   18 +-
 drivers/crypto/hisilicon/qm.c                 |   72 +-
 drivers/crypto/hisilicon/sec2/sec.h           |    2 +-
 drivers/crypto/hisilicon/sec2/sec_main.c      |   20 +-
 drivers/crypto/hisilicon/sgl.c                |    2 +-
 drivers/crypto/hisilicon/zip/zip.h            |    2 +-
 drivers/crypto/hisilicon/zip/zip_main.c       |   17 +-
 drivers/vfio/pci/Kconfig                      |    9 +
 drivers/vfio/pci/Makefile                     |    3 +
 drivers/vfio/pci/hisi_acc_vfio_pci.c          | 1232 +++++++++++++++++
 drivers/vfio/pci/hisi_acc_vfio_pci.h          |  121 ++
 .../qm.h => include/linux/hisi_acc_qm.h       |   44 +
 include/linux/pci_ids.h                       |    6 +
 14 files changed, 1496 insertions(+), 54 deletions(-)
 create mode 100644 drivers/vfio/pci/hisi_acc_vfio_pci.c
 create mode 100644 drivers/vfio/pci/hisi_acc_vfio_pci.h
 rename drivers/crypto/hisilicon/qm.h => include/linux/hisi_acc_qm.h (87%)

Comments

Alex Williamson Feb. 9, 2022, 9:41 p.m. UTC | #1
On Tue, 8 Feb 2022 13:34:22 +0000
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> HiSilicon ACC VF device BAR2 region consists of both functional
> register space and migration control register space. From a
> security point of view, it's not advisable to export the migration
> control region to Guest.
> 
> Hence, override the ioctl/read/write/mmap methods to hide the
> migration region and limit the access only to the functional register
> space.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  drivers/vfio/pci/hisi_acc_vfio_pci.c | 122 ++++++++++++++++++++++++++-
>  1 file changed, 118 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/pci/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisi_acc_vfio_pci.c
> index 8b59e628110e..563ed2cc861f 100644
> --- a/drivers/vfio/pci/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisi_acc_vfio_pci.c
> @@ -13,6 +13,120 @@
>  #include <linux/vfio.h>
>  #include <linux/vfio_pci_core.h>
>  
> +static int hisi_acc_pci_rw_access_check(struct vfio_device *core_vdev,
> +					size_t count, loff_t *ppos,
> +					size_t *new_count)
> +{
> +	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> +	struct vfio_pci_core_device *vdev =
> +		container_of(core_vdev, struct vfio_pci_core_device, vdev);
> +
> +	if (index == VFIO_PCI_BAR2_REGION_INDEX) {
> +		loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +		resource_size_t end = pci_resource_len(vdev->pdev, index) / 2;

Be careful here, there are nested assignment use cases.  This can only
survive one level of assignment before we've restricted more than we
intended.  If migration support is dependent on PF access, can we use
that to determine when to when to expose only half the BAR and when to
expose the full BAR?

We should also follow the mlx5 lead to use a vendor sub-directory below
drivers/vfio/pci/  Thanks,

Alex

> +
> +		/* Check if access is for migration control region */
> +		if (pos >= end)
> +			return -EINVAL;
> +
> +		*new_count = min(count, (size_t)(end - pos));
> +	}
> +
> +	return 0;
> +}
> +
> +static int hisi_acc_vfio_pci_mmap(struct vfio_device *core_vdev,
> +				  struct vm_area_struct *vma)
> +{
> +	struct vfio_pci_core_device *vdev =
> +		container_of(core_vdev, struct vfio_pci_core_device, vdev);
> +	unsigned int index;
> +
> +	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
> +	if (index == VFIO_PCI_BAR2_REGION_INDEX) {
> +		u64 req_len, pgoff, req_start;
> +		resource_size_t end = pci_resource_len(vdev->pdev, index) / 2;
> +
> +		req_len = vma->vm_end - vma->vm_start;
> +		pgoff = vma->vm_pgoff &
> +			((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
> +		req_start = pgoff << PAGE_SHIFT;
> +
> +		if (req_start + req_len > end)
> +			return -EINVAL;
> +	}
> +
> +	return vfio_pci_core_mmap(core_vdev, vma);
> +}
> +
> +static ssize_t hisi_acc_vfio_pci_write(struct vfio_device *core_vdev,
> +				       const char __user *buf, size_t count,
> +				       loff_t *ppos)
> +{
> +	size_t new_count = count;
> +	int ret;
> +
> +	ret = hisi_acc_pci_rw_access_check(core_vdev, count, ppos, &new_count);
> +	if (ret)
> +		return ret;
> +
> +	return vfio_pci_core_write(core_vdev, buf, new_count, ppos);
> +}
> +
> +static ssize_t hisi_acc_vfio_pci_read(struct vfio_device *core_vdev,
> +				      char __user *buf, size_t count,
> +				      loff_t *ppos)
> +{
> +	size_t new_count = count;
> +	int ret;
> +
> +	ret = hisi_acc_pci_rw_access_check(core_vdev, count, ppos, &new_count);
> +	if (ret)
> +		return ret;
> +
> +	return vfio_pci_core_read(core_vdev, buf, new_count, ppos);
> +}
> +
> +static long hisi_acc_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> +				    unsigned long arg)
> +{
> +	struct vfio_pci_core_device *vdev =
> +		container_of(core_vdev, struct vfio_pci_core_device, vdev);
> +
> +	if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
> +		struct pci_dev *pdev = vdev->pdev;
> +		struct vfio_region_info info;
> +		unsigned long minsz;
> +
> +		minsz = offsetofend(struct vfio_region_info, offset);
> +
> +		if (copy_from_user(&info, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (info.argsz < minsz)
> +			return -EINVAL;
> +
> +		if (info.index == VFIO_PCI_BAR2_REGION_INDEX) {
> +			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
> +
> +			/*
> +			 * ACC VF dev BAR2 region consists of both functional
> +			 * register space and migration control register space.
> +			 * Report only the functional region to Guest.
> +			 */
> +			info.size = pci_resource_len(pdev, info.index) / 2;
> +
> +			info.flags = VFIO_REGION_INFO_FLAG_READ |
> +					VFIO_REGION_INFO_FLAG_WRITE |
> +					VFIO_REGION_INFO_FLAG_MMAP;
> +
> +			return copy_to_user((void __user *)arg, &info, minsz) ?
> +					    -EFAULT : 0;
> +		}
> +	}
> +	return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> +}
> +
>  static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
>  {
>  	struct vfio_pci_core_device *vdev =
> @@ -32,10 +146,10 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
>  	.name = "hisi-acc-vfio-pci",
>  	.open_device = hisi_acc_vfio_pci_open_device,
>  	.close_device = vfio_pci_core_close_device,
> -	.ioctl = vfio_pci_core_ioctl,
> -	.read = vfio_pci_core_read,
> -	.write = vfio_pci_core_write,
> -	.mmap = vfio_pci_core_mmap,
> +	.ioctl = hisi_acc_vfio_pci_ioctl,
> +	.read = hisi_acc_vfio_pci_read,
> +	.write = hisi_acc_vfio_pci_write,
> +	.mmap = hisi_acc_vfio_pci_mmap,
>  	.request = vfio_pci_core_request,
>  	.match = vfio_pci_core_match,
>  };