diff mbox series

[v3,11/23] iommufd/viommu: Add IOMMUFD_CMD_VQUEUE_ALLOC ioctl

Message ID 1ef2e242ee1d844f823581a5365823d78c67ec6a.1746139811.git.nicolinc@nvidia.com
State New
Headers show
Series iommufd: Add vIOMMU infrastructure (Part-4 vQUEUE) | expand

Commit Message

Nicolin Chen May 1, 2025, 11:01 p.m. UTC
Introduce a new IOMMUFD_CMD_VQUEUE_ALLOC ioctl for user space to allocate
a vQUEUE object for a vIOMMU specific HW-accelerated virtual queue, e.g.:
 - NVIDIA's Virtual Command Queue
 - AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer

This is a vIOMMU based ioctl. Simply increase the refcount of the vIOMMU.

Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_private.h |  2 +
 include/uapi/linux/iommufd.h            | 44 +++++++++++
 drivers/iommu/iommufd/main.c            |  6 ++
 drivers/iommu/iommufd/viommu.c          | 99 +++++++++++++++++++++++++
 4 files changed, 151 insertions(+)

Comments

Vasant Hegde May 6, 2025, 9:15 a.m. UTC | #1
Hi Nicolin,


On 5/2/2025 4:31 AM, Nicolin Chen wrote:
> Introduce a new IOMMUFD_CMD_VQUEUE_ALLOC ioctl for user space to allocate
> a vQUEUE object for a vIOMMU specific HW-accelerated virtual queue, e.g.:
>  - NVIDIA's Virtual Command Queue
>  - AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer
> 
> This is a vIOMMU based ioctl. Simply increase the refcount of the vIOMMU.
> 
> Reviewed-by: Pranjal Shrivastava <praan@google.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/iommufd/iommufd_private.h |  2 +
>  include/uapi/linux/iommufd.h            | 44 +++++++++++
>  drivers/iommu/iommufd/main.c            |  6 ++
>  drivers/iommu/iommufd/viommu.c          | 99 +++++++++++++++++++++++++
>  4 files changed, 151 insertions(+)
> 


.../...


> +
> +/**
> + * struct iommu_vqueue_alloc - ioctl(IOMMU_VQUEUE_ALLOC)
> + * @size: sizeof(struct iommu_vqueue_alloc)
> + * @flags: Must be 0
> + * @viommu_id: Virtual IOMMU ID to associate the virtual queue with
> + * @type: One of enum iommu_vqueue_type
> + * @index: The logical index to the virtual queue per virtual IOMMU, for a multi
> + *         queue model
> + * @out_vqueue_id: The ID of the new virtual queue
> + * @addr: Base address of the queue memory in the guest physical address space
> + * @length: Length of the queue memory in the guest physical address space
> + *
> + * Allocate a virtual queue object for a vIOMMU-specific HW-acceleration feature
> + * that allows HW to access a guest queue memory described by @addr and @length.
> + * It's suggested for VMM to back the queue memory using a single huge page with
> + * a proper alignment for its contiguity in the host physical address space. The
> + * call will fail, if the queue memory is not contiguous in the physical address
> + * space. Upon success, its underlying physical pages will be pinned to prevent
> + * VMM from unmapping them in the IOAS, until the virtual queue gets destroyed.
> + *
> + * A vIOMMU can allocate multiple queues, but it must use a different @index to
> + * separate each allocation, e.g. VCMDQ0, VCMDQ1, ...

This will handle multiple queues. But AMD vIOMMU needs to comunicate certain
control bit setting which is not related to buffers like "Completion wait
interrupt".

How do we handle that? extend iommu_queue_alloc() or have different interface?

-Vasant
Jason Gunthorpe May 6, 2025, 12:01 p.m. UTC | #2
On Tue, May 06, 2025 at 02:45:00PM +0530, Vasant Hegde wrote:
> > +/**
> > + * struct iommu_vqueue_alloc - ioctl(IOMMU_VQUEUE_ALLOC)
> > + * @size: sizeof(struct iommu_vqueue_alloc)
> > + * @flags: Must be 0
> > + * @viommu_id: Virtual IOMMU ID to associate the virtual queue with
> > + * @type: One of enum iommu_vqueue_type
> > + * @index: The logical index to the virtual queue per virtual IOMMU, for a multi
> > + *         queue model
> > + * @out_vqueue_id: The ID of the new virtual queue
> > + * @addr: Base address of the queue memory in the guest physical address space
> > + * @length: Length of the queue memory in the guest physical address space
> > + *
> > + * Allocate a virtual queue object for a vIOMMU-specific HW-acceleration feature
> > + * that allows HW to access a guest queue memory described by @addr and @length.
> > + * It's suggested for VMM to back the queue memory using a single huge page with
> > + * a proper alignment for its contiguity in the host physical address space. The
> > + * call will fail, if the queue memory is not contiguous in the physical address
> > + * space. Upon success, its underlying physical pages will be pinned to prevent
> > + * VMM from unmapping them in the IOAS, until the virtual queue gets destroyed.
> > + *
> > + * A vIOMMU can allocate multiple queues, but it must use a different @index to
> > + * separate each allocation, e.g. VCMDQ0, VCMDQ1, ...
> 
> This will handle multiple queues. But AMD vIOMMU needs to comunicate certain
> control bit setting which is not related to buffers like "Completion wait
> interrupt".
> 
> How do we handle that? extend iommu_queue_alloc() or have different interface?

Do you need a modify queue operation?

Jason
Vasant Hegde May 7, 2025, 7:41 a.m. UTC | #3
Hi Jason,


On 5/6/2025 5:31 PM, Jason Gunthorpe wrote:
> On Tue, May 06, 2025 at 02:45:00PM +0530, Vasant Hegde wrote:
>>> +/**
>>> + * struct iommu_vqueue_alloc - ioctl(IOMMU_VQUEUE_ALLOC)
>>> + * @size: sizeof(struct iommu_vqueue_alloc)
>>> + * @flags: Must be 0
>>> + * @viommu_id: Virtual IOMMU ID to associate the virtual queue with
>>> + * @type: One of enum iommu_vqueue_type
>>> + * @index: The logical index to the virtual queue per virtual IOMMU, for a multi
>>> + *         queue model
>>> + * @out_vqueue_id: The ID of the new virtual queue
>>> + * @addr: Base address of the queue memory in the guest physical address space
>>> + * @length: Length of the queue memory in the guest physical address space
>>> + *
>>> + * Allocate a virtual queue object for a vIOMMU-specific HW-acceleration feature
>>> + * that allows HW to access a guest queue memory described by @addr and @length.
>>> + * It's suggested for VMM to back the queue memory using a single huge page with
>>> + * a proper alignment for its contiguity in the host physical address space. The
>>> + * call will fail, if the queue memory is not contiguous in the physical address
>>> + * space. Upon success, its underlying physical pages will be pinned to prevent
>>> + * VMM from unmapping them in the IOAS, until the virtual queue gets destroyed.
>>> + *
>>> + * A vIOMMU can allocate multiple queues, but it must use a different @index to
>>> + * separate each allocation, e.g. VCMDQ0, VCMDQ1, ...
>>
>> This will handle multiple queues. But AMD vIOMMU needs to comunicate certain
>> control bit setting which is not related to buffers like "Completion wait
>> interrupt".
>>
>> How do we handle that? extend iommu_queue_alloc() or have different interface?
> 
> Do you need a modify queue operation?

We have two types of operations. One that impacts the queue, other set of bits
which doesn't operate on qeueue.

ex: Event log buffer
  - We configure "MMIO Offset 0010h Event Log Base Address Register" with Base
address and size

  -  MMIO Offset 0018h IOMMU Control Register
     EventLogEn: Event log enable
       * When guest sets this bit, qemu will trap and will send queue_alloc
       * When guest disables this bit, qemu will trap and send vqueue_destroy

     This part is fine.

     EventIntEn: Event log interrupt enable
       * When guest sets this bit, qemu will trap
       * this needs to be communicated to Host so that we can program VF Control
BAR and enable the interrupt

  - There is other bit "Completion wait interrupt enable"
    This doesn't related to any buffer. Instead if we configure this for
completion wait command it will generate interrupt.

I am asking how do we handle above two steps? Should it be part of queue IOCTL
or may be some other IOCTL which just passes these info to HW driver?

-Vasant
Tian, Kevin May 7, 2025, 8 a.m. UTC | #4
> From: Vasant Hegde <vasant.hegde@amd.com>
> Sent: Wednesday, May 7, 2025 3:42 PM
> 
> Hi Jason,
> 
> 
> On 5/6/2025 5:31 PM, Jason Gunthorpe wrote:
> > Do you need a modify queue operation?
> 
> We have two types of operations. One that impacts the queue, other set of
> bits
> which doesn't operate on qeueue.
> 
> ex: Event log buffer
>   - We configure "MMIO Offset 0010h Event Log Base Address Register" with
> Base
> address and size
> 
>   -  MMIO Offset 0018h IOMMU Control Register
>      EventLogEn: Event log enable
>        * When guest sets this bit, qemu will trap and will send queue_alloc
>        * When guest disables this bit, qemu will trap and send vqueue_destroy
> 
>      This part is fine.
> 
>      EventIntEn: Event log interrupt enable
>        * When guest sets this bit, qemu will trap
>        * this needs to be communicated to Host so that we can program VF
> Control
> BAR and enable the interrupt
> 
>   - There is other bit "Completion wait interrupt enable"
>     This doesn't related to any buffer. Instead if we configure this for
> completion wait command it will generate interrupt.
> 
> I am asking how do we handle above two steps? Should it be part of queue
> IOCTL
> or may be some other IOCTL which just passes these info to HW driver?
> 

Probably IOMMUFD_CMD_OPTION can server the purpose?
Jason Gunthorpe May 7, 2025, 12:31 p.m. UTC | #5
On Wed, May 07, 2025 at 01:11:43PM +0530, Vasant Hegde wrote:
>   -  MMIO Offset 0018h IOMMU Control Register
>      EventLogEn: Event log enable
>        * When guest sets this bit, qemu will trap and will send queue_alloc
>        * When guest disables this bit, qemu will trap and send vqueue_destroy
> 
>      This part is fine.

Ok

>      EventIntEn: Event log interrupt enable
>        * When guest sets this bit, qemu will trap
>        * this needs to be communicated to Host so that we can program VF Control
> BAR and enable the interrupt

This sounds like modifying the vqueue? Or maybe on the viommu?

>   - There is other bit "Completion wait interrupt enable"
>     This doesn't related to any buffer. Instead if we configure this for
> completion wait command it will generate interrupt.

This sounds like a modify on the VIOMMU object?

Jason
Vasant Hegde May 8, 2025, 4:46 a.m. UTC | #6
Jason,

On 5/7/2025 6:01 PM, Jason Gunthorpe wrote:
> On Wed, May 07, 2025 at 01:11:43PM +0530, Vasant Hegde wrote:
>>   -  MMIO Offset 0018h IOMMU Control Register
>>      EventLogEn: Event log enable
>>        * When guest sets this bit, qemu will trap and will send queue_alloc
>>        * When guest disables this bit, qemu will trap and send vqueue_destroy
>>
>>      This part is fine.
> 
> Ok
> 
>>      EventIntEn: Event log interrupt enable
>>        * When guest sets this bit, qemu will trap
>>        * this needs to be communicated to Host so that we can program VF Control
>> BAR and enable the interrupt
> 
> This sounds like modifying the vqueue? Or maybe on the viommu?

IMO its VIOMMU as it informs HW to trigger interrupt or not.


> 
>>   - There is other bit "Completion wait interrupt enable"
>>     This doesn't related to any buffer. Instead if we configure this for
>> completion wait command it will generate interrupt.
> 
> This sounds like a modify on the VIOMMU object?

Again in my view its VIOMMU object as it tells HW what to do when it finishes
completion wait command.

-Vasant
Nicolin Chen May 8, 2025, 5:56 a.m. UTC | #7
On Thu, May 08, 2025 at 10:16:51AM +0530, Vasant Hegde wrote:
> >>   - There is other bit "Completion wait interrupt enable"
> >>     This doesn't related to any buffer. Instead if we configure this for
> >> completion wait command it will generate interrupt.
> > 
> > This sounds like a modify on the VIOMMU object?
> 
> Again in my view its VIOMMU object as it tells HW what to do when it finishes
> completion wait command.

According to the spec:
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifications/48882_IOMMU.pdf

This is for an interrupt from a COMPLETION_WAIT command:
"The COMPLETION_WAIT command allows software to serialize itself
 with IOMMU command processing. The COMPLETION_WAIT command does
 not finish until all older commands issued since a prior
 COMPLETION_WAIT have completely executed."

So, basically it's like the IRQ for CMD_SYNC on ARM. IMHO, this is
very specific to Command Buffer (i.e. a vQUEUE object, and now HW
QUEUE object), though the bit is located in a global IOMMU control
register.

Looking at this paragraph:
"
To restart the IOMMU command processing after the IOMMU halts it,
use the following procedure.
• Wait until CmdBufRun=0b in the IOMMU Status Register
   [MMIO Offset 2020h] so that all commands complete processing as
   the circumstances allow. CmdBufRun must be 0b to modify the
   command buffer registers properly.
• Set CmdBufEn=0b in the IOMMU Control Register [MMIO Offset 0018h].
• As necessary, change the following registers (e.g., to relocate
   the command buffer):
   • the Command Buffer Base Address Register [MMIO Offset 0008h],
   • the Command Buffer Head Pointer Register [MMIO Offset 2000h],
   • the Command Buffer Tail Pointer Register [MMIO Offset 2008h].
• Any or all command buffer entries may be copied from the old
   command buffer to the new and software must set the head and tail
   pointers appropriately.
• Write the IOMMU Control Register [MMIO Offset 0018h] with
   CmdBufEn=1b and ComWaitIntEn as desired
",
the ComWaitIntEn bit is suggested to be set along with the CmdBufEn
bit, i.e. it can be a part of the IOMMU_HW_QUEUE_ALLOC ioctl.

What I am not sure is if the HW allows setting the ComWaitIntEn bit
after CmdBufEn=1, which seems to be unlikely but the spec does not
highlight. If so, this would be an modification to the HW QUEUE, in
which case we could either do an relocation of the HW QUEUE (where
we can set the flag in the 2nd allocation) or add an new option via
IOMMUFD_CMD_OPTION (as Kevin suggested), and I think it should be
a per-HW_QUEUE option since it doesn't affect other type of queues
like Event/PRR Log Buffers.

Similarly, an Event Log Buffer can have an EventIntEn flag; and a
PPR Log Buffer can have an PprIntEn flag too, right?

Thanks
Nicolin
Jason Gunthorpe May 8, 2025, 12:14 p.m. UTC | #8
On Wed, May 07, 2025 at 10:56:17PM -0700, Nicolin Chen wrote:

> What I am not sure is if the HW allows setting the ComWaitIntEn bit
> after CmdBufEn=1, which seems to be unlikely but the spec does not
> highlight. If so, this would be an modification to the HW QUEUE, in
> which case we could either do an relocation of the HW QUEUE (where
> we can set the flag in the 2nd allocation) or add an new option via
> IOMMUFD_CMD_OPTION (as Kevin suggested), and I think it should be
> a per-HW_QUEUE option since it doesn't affect other type of queues
> like Event/PRR Log Buffers.

The main question is if the control is global to the entire VIOMMU and
all its HW QUEUE's or local to a single HW QUEUE.

If it is global then some "modify viommu" operation should be used to
change it.

If it is local then some "modify hw queu" operation.

IOMMUFD_CMD_OPTION could be used with an object_id == VIOMMU as a kind
of modify..

Jason
Nicolin Chen May 8, 2025, 5:12 p.m. UTC | #9
On Thu, May 08, 2025 at 09:14:56AM -0300, Jason Gunthorpe wrote:
> On Wed, May 07, 2025 at 10:56:17PM -0700, Nicolin Chen wrote:
> 
> > What I am not sure is if the HW allows setting the ComWaitIntEn bit
> > after CmdBufEn=1, which seems to be unlikely but the spec does not
> > highlight. If so, this would be an modification to the HW QUEUE, in
> > which case we could either do an relocation of the HW QUEUE (where
> > we can set the flag in the 2nd allocation) or add an new option via
> > IOMMUFD_CMD_OPTION (as Kevin suggested), and I think it should be
> > a per-HW_QUEUE option since it doesn't affect other type of queues
> > like Event/PRR Log Buffers.
> 
> The main question is if the control is global to the entire VIOMMU and
> all its HW QUEUE's or local to a single HW QUEUE.

Oh, that's right.. I recall AMD only has one Command Buffer,
but can have dual Event Log Buffers and dual PPR Log Buffers.

And the EventIntEn or PprIntEn bit seem to be global for the
dual buffers..

> If it is global then some "modify viommu" operation should be used to
> change it.
>
> If it is local then some "modify hw queu" operation.
>
> IOMMUFD_CMD_OPTION could be used with an object_id == VIOMMU as a kind
> of modify..

Vasant can confirm. But looks like it should be a vIOMMU
option.

Thanks
Nicolin
Vasant Hegde May 9, 2025, 11:52 a.m. UTC | #10
Hi Nicolin, Jason,


On 5/8/2025 10:42 PM, Nicolin Chen wrote:
> On Thu, May 08, 2025 at 09:14:56AM -0300, Jason Gunthorpe wrote:
>> On Wed, May 07, 2025 at 10:56:17PM -0700, Nicolin Chen wrote:
>>
>>> What I am not sure is if the HW allows setting the ComWaitIntEn bit
>>> after CmdBufEn=1, which seems to be unlikely but the spec does not
>>> highlight. If so, this would be an modification to the HW QUEUE, in
>>> which case we could either do an relocation of the HW QUEUE (where
>>> we can set the flag in the 2nd allocation) or add an new option via
>>> IOMMUFD_CMD_OPTION (as Kevin suggested), and I think it should be
>>> a per-HW_QUEUE option since it doesn't affect other type of queues
>>> like Event/PRR Log Buffers.
>>
>> The main question is if the control is global to the entire VIOMMU and
>> all its HW QUEUE's or local to a single HW QUEUE.
> 
> Oh, that's right.. I recall AMD only has one Command Buffer,
> but can have dual Event Log Buffers and dual PPR Log Buffers.

Right.

> 
> And the EventIntEn or PprIntEn bit seem to be global for the
> dual buffers..

Yes. But there are other bit to configure dual buffers etc.
(like DualEventLogEn).

> 
>> If it is global then some "modify viommu" operation should be used to
>> change it.
>>
>> If it is local then some "modify hw queu" operation.
>>
>> IOMMUFD_CMD_OPTION could be used with an object_id == VIOMMU as a kind
>> of modify..
> 
> Vasant can confirm. But looks like it should be a vIOMMU
> option.

I think CMD_OPTION will work. So something like below?

if (cmd_option_id == IOMMU_OPTION_VIOMMU && cmd->object_id == viommu_id)
	iommufd_viommu->ops->viommu_options() ?


-Vasant
diff mbox series

Patch

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 79160b039bc7..595d5e7021c4 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -611,6 +611,8 @@  int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd);
 void iommufd_viommu_destroy(struct iommufd_object *obj);
 int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd);
 void iommufd_vdevice_destroy(struct iommufd_object *obj);
+int iommufd_vqueue_alloc_ioctl(struct iommufd_ucmd *ucmd);
+void iommufd_vqueue_destroy(struct iommufd_object *obj);
 
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index cc90299a08d9..c6742bb00a41 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -56,6 +56,7 @@  enum {
 	IOMMUFD_CMD_VDEVICE_ALLOC = 0x91,
 	IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
 	IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
+	IOMMUFD_CMD_VQUEUE_ALLOC = 0x94,
 };
 
 /**
@@ -1147,4 +1148,47 @@  struct iommu_veventq_alloc {
 	__u32 __reserved;
 };
 #define IOMMU_VEVENTQ_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VEVENTQ_ALLOC)
+
+/**
+ * enum iommu_vqueue_type - Virtual Queue Type
+ * @IOMMU_VQUEUE_TYPE_DEFAULT: Reserved for future use
+ */
+enum iommu_vqueue_type {
+	IOMMU_VQUEUE_TYPE_DEFAULT = 0,
+};
+
+/**
+ * struct iommu_vqueue_alloc - ioctl(IOMMU_VQUEUE_ALLOC)
+ * @size: sizeof(struct iommu_vqueue_alloc)
+ * @flags: Must be 0
+ * @viommu_id: Virtual IOMMU ID to associate the virtual queue with
+ * @type: One of enum iommu_vqueue_type
+ * @index: The logical index to the virtual queue per virtual IOMMU, for a multi
+ *         queue model
+ * @out_vqueue_id: The ID of the new virtual queue
+ * @addr: Base address of the queue memory in the guest physical address space
+ * @length: Length of the queue memory in the guest physical address space
+ *
+ * Allocate a virtual queue object for a vIOMMU-specific HW-acceleration feature
+ * that allows HW to access a guest queue memory described by @addr and @length.
+ * It's suggested for VMM to back the queue memory using a single huge page with
+ * a proper alignment for its contiguity in the host physical address space. The
+ * call will fail, if the queue memory is not contiguous in the physical address
+ * space. Upon success, its underlying physical pages will be pinned to prevent
+ * VMM from unmapping them in the IOAS, until the virtual queue gets destroyed.
+ *
+ * A vIOMMU can allocate multiple queues, but it must use a different @index to
+ * separate each allocation, e.g. VCMDQ0, VCMDQ1, ...
+ */
+struct iommu_vqueue_alloc {
+	__u32 size;
+	__u32 flags;
+	__u32 viommu_id;
+	__u32 type;
+	__u32 index;
+	__u32 out_vqueue_id;
+	__aligned_u64 addr;
+	__aligned_u64 length;
+};
+#define IOMMU_VQUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VQUEUE_ALLOC)
 #endif
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 2b9ee9b4a424..23ed58f1f7b1 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -307,6 +307,7 @@  union ucmd_buffer {
 	struct iommu_veventq_alloc veventq;
 	struct iommu_vfio_ioas vfio_ioas;
 	struct iommu_viommu_alloc viommu;
+	struct iommu_vqueue_alloc vqueue;
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
@@ -366,6 +367,8 @@  static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 __reserved),
 	IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl,
 		 struct iommu_viommu_alloc, out_viommu_id),
+	IOCTL_OP(IOMMU_VQUEUE_ALLOC, iommufd_vqueue_alloc_ioctl,
+		 struct iommu_vqueue_alloc, length),
 #ifdef CONFIG_IOMMUFD_TEST
 	IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
 #endif
@@ -511,6 +514,9 @@  static const struct iommufd_object_ops iommufd_object_ops[] = {
 	[IOMMUFD_OBJ_VIOMMU] = {
 		.destroy = iommufd_viommu_destroy,
 	},
+	[IOMMUFD_OBJ_VQUEUE] = {
+		.destroy = iommufd_vqueue_destroy,
+	},
 #ifdef CONFIG_IOMMUFD_TEST
 	[IOMMUFD_OBJ_SELFTEST] = {
 		.destroy = iommufd_selftest_destroy,
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index a65153458a26..10d985aae9a8 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -170,3 +170,102 @@  int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
 	iommufd_put_object(ucmd->ictx, &viommu->obj);
 	return rc;
 }
+
+void iommufd_vqueue_destroy(struct iommufd_object *obj)
+{
+	struct iommufd_vqueue *vqueue =
+		container_of(obj, struct iommufd_vqueue, obj);
+	struct iommufd_viommu *viommu = vqueue->viommu;
+
+	if (viommu->ops->vqueue_destroy)
+		viommu->ops->vqueue_destroy(vqueue);
+	iopt_unpin_pages(&viommu->hwpt->ioas->iopt, vqueue->addr,
+			 vqueue->length);
+	refcount_dec(&viommu->obj.users);
+}
+
+int iommufd_vqueue_alloc_ioctl(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_vqueue_alloc *cmd = ucmd->cmd;
+	struct iommufd_viommu *viommu;
+	struct iommufd_vqueue *vqueue;
+	struct page **pages;
+	int max_npages, i;
+	dma_addr_t end;
+	int rc;
+
+	if (cmd->flags || cmd->type == IOMMU_VQUEUE_TYPE_DEFAULT)
+		return -EOPNOTSUPP;
+	if (!cmd->addr || !cmd->length)
+		return -EINVAL;
+	if (check_add_overflow(cmd->addr, cmd->length - 1, &end))
+		return -EOVERFLOW;
+
+	max_npages = DIV_ROUND_UP(cmd->length, PAGE_SIZE);
+	pages = kcalloc(max_npages, sizeof(*pages), GFP_KERNEL);
+	if (!pages)
+		return -ENOMEM;
+
+	viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
+	if (IS_ERR(viommu)) {
+		rc = PTR_ERR(viommu);
+		goto out_free;
+	}
+
+	if (!viommu->ops || !viommu->ops->vqueue_alloc) {
+		rc = -EOPNOTSUPP;
+		goto out_put_viommu;
+	}
+
+	/* Quick test on the base address */
+	if (!iommu_iova_to_phys(viommu->hwpt->common.domain, cmd->addr)) {
+		rc = -ENXIO;
+		goto out_put_viommu;
+	}
+
+	/*
+	 * The underlying physical pages must be pinned to prevent them from
+	 * being unmapped (via IOMMUFD_CMD_IOAS_UNMAP) during the life cycle
+	 * of the vCMDQ object.
+	 */
+	rc = iopt_pin_pages(&viommu->hwpt->ioas->iopt, cmd->addr, cmd->length,
+			    pages, 0);
+	if (rc)
+		goto out_put_viommu;
+
+	/* Validate if the underlying physical pages are contiguous */
+	for (i = 1; i < max_npages && pages[i]; i++) {
+		if (page_to_pfn(pages[i]) == page_to_pfn(pages[i - 1]) + 1)
+			continue;
+		rc = -EFAULT;
+		goto out_unpin;
+	}
+
+	vqueue = viommu->ops->vqueue_alloc(viommu, cmd->type, cmd->index,
+					   cmd->addr, cmd->length);
+	if (IS_ERR(vqueue)) {
+		rc = PTR_ERR(vqueue);
+		goto out_unpin;
+	}
+
+	vqueue->viommu = viommu;
+	refcount_inc(&viommu->obj.users);
+	vqueue->addr = cmd->addr;
+	vqueue->ictx = ucmd->ictx;
+	vqueue->length = cmd->length;
+	cmd->out_vqueue_id = vqueue->obj.id;
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		iommufd_object_abort_and_destroy(ucmd->ictx, &vqueue->obj);
+	else
+		iommufd_object_finalize(ucmd->ictx, &vqueue->obj);
+	goto out_put_viommu;
+
+out_unpin:
+	iopt_unpin_pages(&viommu->hwpt->ioas->iopt, cmd->addr, cmd->length);
+out_put_viommu:
+	iommufd_put_object(ucmd->ictx, &viommu->obj);
+out_free:
+	kfree(pages);
+	return rc;
+}