diff mbox series

[v6,5/5] vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP

Message ID 20220817051323.20091-6-abhsahu@nvidia.com
State New
Headers show
Series vfio/pci: power management changes | expand

Commit Message

Abhishek Sahu Aug. 17, 2022, 5:13 a.m. UTC
This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP
device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is
any access for the VFIO device on the host side, then the device will
be moved out of the low power state without the user's guest driver
involvement. Once the device access has been finished, then the host
can move the device again into low power state. With the low power
entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP,
the device will not be moved back into the low power state and
a notification will be sent to the user by triggering wakeup eventfd.

vfio_pci_core_pm_entry() will be called for both the variants of low
power feature entry so add an extra argument for wakeup eventfd context
and store locally in 'struct vfio_pci_core_device'.

For the entry happened without wakeup eventfd, all the exit related
handling will be done by the LOW_POWER_EXIT device feature only.
When the LOW_POWER_EXIT will be called, then the vfio core layer
vfio_device_pm_runtime_get() will increment the usage count and will
resume the device. In the driver runtime_resume callback, the
'pm_wake_eventfd_ctx' will be NULL. Then vfio_pci_core_pm_exit()
will call vfio_pci_runtime_pm_exit() and all the exit related handling
will be done.

For the entry happened with wakeup eventfd, in the driver resume
callback, eventfd will be triggered and all the exit related handling will
be done. When vfio_pci_runtime_pm_exit() will be called by
vfio_pci_core_pm_exit(), then it will return early.
But if the runtime suspend has not happened on the host side, then
all the exit related handling will be done in vfio_pci_core_pm_exit()
only.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 62 ++++++++++++++++++++++++++++++--
 include/linux/vfio_pci_core.h    |  1 +
 2 files changed, 60 insertions(+), 3 deletions(-)

Comments

Alex Williamson Aug. 25, 2022, 10:32 p.m. UTC | #1
On Thu, 18 Aug 2022 22:31:03 +0530
Abhishek Sahu <abhsahu@nvidia.com> wrote:

> On 8/17/2022 11:10 PM, Jason Gunthorpe wrote:
> > On Wed, Aug 17, 2022 at 09:34:30PM +0530, Abhishek Sahu wrote:  
> >> On 8/17/2022 7:23 PM, Jason Gunthorpe wrote:  
> >>> On Wed, Aug 17, 2022 at 10:43:23AM +0530, Abhishek Sahu wrote:
> >>>  
> >>>> +static int
> >>>> +vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags,
> >>>> +				   void __user *arg, size_t argsz)  
> >>>
> >>> This should be
> >>>   struct vfio_device_low_power_entry_with_wakeup __user *arg
> >>>  
> >>
> >>  Thanks Jason.
> >>
> >>  I can update this.
> >>
> >>  But if we look the existing code, for example
> >>  (vfio_ioctl_device_feature_mig_device_state()), then there it still uses
> >>  'void __user *arg' only. Is this a new guideline which we need to take
> >>  care ?  
> > 
> > I just sent a patch that fixes that too
> >   
> 
>  Thanks for the update.
>  I will change this. 
> 
> >>  Do we need to keep the IOCTL name alphabetically sorted in the case list.
> >>  Currently, I have added in the order of IOCTL numbers.  
> > 
> > It is generally a good practice to sort lists of things.
> > 
> > Jason  
> 
>  Sure. I will make the sorted list.

The series looks good to me, so I'd suggest to rebase on Jason's
patches[1][2] so you can easily sort out the above.  Thanks,

Alex

[1]https://lore.kernel.org/all/0-v1-da6fc51ee22e+562-vfio_pci_priv_jgg@nvidia.com/
[2]https://lore.kernel.org/all/0-v1-11d8272dc65a+4bd-vfio_ioctl_split_jgg@nvidia.com/
Abhishek Sahu Aug. 26, 2022, 2:30 p.m. UTC | #2
On 8/26/2022 4:02 AM, Alex Williamson wrote:
> On Thu, 18 Aug 2022 22:31:03 +0530
> Abhishek Sahu <abhsahu@nvidia.com> wrote:
> 
>> On 8/17/2022 11:10 PM, Jason Gunthorpe wrote:
>>> On Wed, Aug 17, 2022 at 09:34:30PM +0530, Abhishek Sahu wrote:  
>>>> On 8/17/2022 7:23 PM, Jason Gunthorpe wrote:  
>>>>> On Wed, Aug 17, 2022 at 10:43:23AM +0530, Abhishek Sahu wrote:
>>>>>  
>>>>>> +static int
>>>>>> +vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags,
>>>>>> +				   void __user *arg, size_t argsz)  
>>>>>
>>>>> This should be
>>>>>   struct vfio_device_low_power_entry_with_wakeup __user *arg
>>>>>  
>>>>
>>>>  Thanks Jason.
>>>>
>>>>  I can update this.
>>>>
>>>>  But if we look the existing code, for example
>>>>  (vfio_ioctl_device_feature_mig_device_state()), then there it still uses
>>>>  'void __user *arg' only. Is this a new guideline which we need to take
>>>>  care ?  
>>>
>>> I just sent a patch that fixes that too
>>>   
>>
>>  Thanks for the update.
>>  I will change this. 
>>
>>>>  Do we need to keep the IOCTL name alphabetically sorted in the case list.
>>>>  Currently, I have added in the order of IOCTL numbers.  
>>>
>>> It is generally a good practice to sort lists of things.
>>>
>>> Jason  
>>
>>  Sure. I will make the sorted list.
> 
> The series looks good to me, so I'd suggest to rebase on Jason's
> patches[1][2] so you can easily sort out the above.  Thanks,
> 
> Alex
> 
> [1]https://lore.kernel.org/all/0-v1-da6fc51ee22e+562-vfio_pci_priv_jgg@nvidia.com/
> [2]https://lore.kernel.org/all/0-v1-11d8272dc65a+4bd-vfio_ioctl_split_jgg@nvidia.com/
> 

 Thanks Alex.
 I will rebase my patches to the above mentioned patch series
 and will send the updated patches.

 Regards,
 Abhishek
diff mbox series

Patch

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d7d3c4392f70..00d24243b89e 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -260,7 +260,8 @@  int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat
 	return ret;
 }
 
-static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev)
+static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev,
+				     struct eventfd_ctx *efdctx)
 {
 	/*
 	 * The vdev power related flags are protected with 'memory_lock'
@@ -273,6 +274,7 @@  static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev)
 	}
 
 	vdev->pm_runtime_engaged = true;
+	vdev->pm_wake_eventfd_ctx = efdctx;
 	pm_runtime_put_noidle(&vdev->pdev->dev);
 	up_write(&vdev->memory_lock);
 
@@ -296,7 +298,39 @@  static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags,
 	 * while returning from the ioctl and then the device can go into
 	 * runtime suspended state.
 	 */
-	return vfio_pci_runtime_pm_entry(vdev);
+	return vfio_pci_runtime_pm_entry(vdev, NULL);
+}
+
+static int
+vfio_pci_core_pm_entry_with_wakeup(struct vfio_device *device, u32 flags,
+				   void __user *arg, size_t argsz)
+{
+	struct vfio_pci_core_device *vdev =
+		container_of(device, struct vfio_pci_core_device, vdev);
+	struct vfio_device_low_power_entry_with_wakeup entry;
+	struct eventfd_ctx *efdctx;
+	int ret;
+
+	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
+				 sizeof(entry));
+	if (ret != 1)
+		return ret;
+
+	if (copy_from_user(&entry, arg, sizeof(entry)))
+		return -EFAULT;
+
+	if (entry.wakeup_eventfd < 0)
+		return -EINVAL;
+
+	efdctx = eventfd_ctx_fdget(entry.wakeup_eventfd);
+	if (IS_ERR(efdctx))
+		return PTR_ERR(efdctx);
+
+	ret = vfio_pci_runtime_pm_entry(vdev, efdctx);
+	if (ret)
+		eventfd_ctx_put(efdctx);
+
+	return ret;
 }
 
 static void __vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
@@ -304,6 +338,11 @@  static void __vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
 	if (vdev->pm_runtime_engaged) {
 		vdev->pm_runtime_engaged = false;
 		pm_runtime_get_noresume(&vdev->pdev->dev);
+
+		if (vdev->pm_wake_eventfd_ctx) {
+			eventfd_ctx_put(vdev->pm_wake_eventfd_ctx);
+			vdev->pm_wake_eventfd_ctx = NULL;
+		}
 	}
 }
 
@@ -331,7 +370,10 @@  static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags,
 
 	/*
 	 * The device is always in the active state here due to pm wrappers
-	 * around ioctls.
+	 * around ioctls. If the device had entered a low power state and
+	 * pm_wake_eventfd_ctx is valid, vfio_pci_core_runtime_resume() has
+	 * already signaled the eventfd and exited low power mode itself.
+	 * pm_runtime_engaged protects the redundant call here.
 	 */
 	vfio_pci_runtime_pm_exit(vdev);
 	return 0;
@@ -370,6 +412,17 @@  static int vfio_pci_core_runtime_resume(struct device *dev)
 {
 	struct vfio_pci_core_device *vdev = dev_get_drvdata(dev);
 
+	/*
+	 * Resume with a pm_wake_eventfd_ctx signals the eventfd and exit
+	 * low power mode.
+	 */
+	down_write(&vdev->memory_lock);
+	if (vdev->pm_wake_eventfd_ctx) {
+		eventfd_signal(vdev->pm_wake_eventfd_ctx, 1);
+		__vfio_pci_runtime_pm_exit(vdev);
+	}
+	up_write(&vdev->memory_lock);
+
 	if (vdev->pm_intx_masked)
 		vfio_pci_intx_unmask(vdev);
 
@@ -1336,6 +1389,9 @@  int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 		return vfio_pci_core_feature_token(device, flags, arg, argsz);
 	case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY:
 		return vfio_pci_core_pm_entry(device, flags, arg, argsz);
+	case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP:
+		return vfio_pci_core_pm_entry_with_wakeup(device, flags,
+							  arg, argsz);
 	case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT:
 		return vfio_pci_core_pm_exit(device, flags, arg, argsz);
 	default:
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index d31cc9cc9c70..8bdf20cd94a9 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -131,6 +131,7 @@  struct vfio_pci_core_device {
 	int			ioeventfds_nr;
 	struct eventfd_ctx	*err_trigger;
 	struct eventfd_ctx	*req_trigger;
+	struct eventfd_ctx	*pm_wake_eventfd_ctx;
 	struct list_head	dummy_resources_list;
 	struct mutex		ioeventfds_lock;
 	struct list_head	ioeventfds_list;