mbox series

[v14,00/25] Drivers for Gunyah hypervisor

Message ID 20230613172054.3959700-1-quic_eberman@quicinc.com
Headers show
Series Drivers for Gunyah hypervisor | expand

Message

Elliot Berman June 13, 2023, 5:20 p.m. UTC
Gunyah is a Type-1 hypervisor independent of any
high-level OS kernel, and runs in a higher CPU privilege level. It does
not depend on any lower-privileged OS kernel/code for its core
functionality. This increases its security and can support a much smaller
trusted computing base than a Type-2 hypervisor.

Gunyah is an open source hypervisor. The source repo is available at
https://github.com/quic/gunyah-hypervisor.

The diagram below shows the architecture.

::

         VM A                    VM B
     +-----+ +-----+  | +-----+ +-----+ +-----+
     |     | |     |  | |     | |     | |     |
 EL0 | APP | | APP |  | | APP | | APP | | APP |
     |     | |     |  | |     | |     | |     |
     +-----+ +-----+  | +-----+ +-----+ +-----+
 ---------------------|-------------------------
     +--------------+ | +----------------------+
     |              | | |                      |
 EL1 | Linux Kernel | | |Linux kernel/Other OS |   ...
     |              | | |                      |
     +--------------+ | +----------------------+
 --------hvc/smc------|------hvc/smc------------
     +----------------------------------------+
     |                                        |
 EL2 |            Gunyah Hypervisor           |
     |                                        |
     +----------------------------------------+

Gunyah provides these following features.

- Threads and Scheduling: The scheduler schedules virtual CPUs (VCPUs) on
physical CPUs and enables time-sharing of the CPUs.
- Memory Management: Gunyah tracks memory ownership and use of all memory
under its control. Memory partitioning between VMs is a fundamental
security feature.
- Interrupt Virtualization: All interrupts are handled in the hypervisor
and routed to the assigned VM.
- Inter-VM Communication: There are several different mechanisms provided
for communicating between VMs.
- Device Virtualization: Para-virtualization of devices is supported using
inter-VM communication. Low level system features and devices such as
interrupt controllers are supported with emulation where required.

This series adds the basic framework for detecting that Linux is running
under Gunyah as a virtual machine, communication with the Gunyah Resource
Manager, and a sample virtual machine manager capable of launching virtual machines.

The series relies on two other patches posted separately:
 - https://lore.kernel.org/all/20230213181832.3489174-1-quic_eberman@quicinc.com/
 - https://lore.kernel.org/all/20230213232537.2040976-2-quic_eberman@quicinc.com/


Changes in v14:
 - Coding/cosmetic tweaks suggested by Alex
 - Mark IRQs as wake-up capable

Changes in v13: https://lore.kernel.org/all/20230509204801.2824351-1-quic_eberman@quicinc.com/
 - Tweaks to message queue driver to address race condition between IRQ and mailbox registration
 - Allow removal of VM functions by function-specific comparison -- specifically to allow
   removing irqfd by label only and not requiring original FD to be provided.

Changes in v12: https://lore.kernel.org/all/20230424231558.70911-1-quic_eberman@quicinc.com/
 - Stylistic/cosmetic tweaks suggested by Alex
 - Remove patch "virt: gunyah: Identify hypervisor version" and squash the
   check that we're running under a reasonable Gunyah hypervisor into RM driver
 - Refactor platform hooks into a separate module per suggestion from Srini
 - GFP_KERNEL_ACCOUNT and account_locked_vm() for page pinning
 - enum-ify related constants

Changes in v11: https://lore.kernel.org/all/20230304010632.2127470-1-quic_eberman@quicinc.com/
 - Rename struct gh_vm_dtb_config:gpa -> guest_phys_addr & overflow checks for this
 - More docstrings throughout
 - Make resp_buf and resp_buf_size optional
 - Replace deprecated idr with xarray
 - Refconting on misc device instead of RM's platform device
 - Renaming variables, structs, etc. from gunyah_ -> gh_
 - Drop removal of user mem regions
 - Drop mem_lend functionality; to converge with restricted_memfd later

Changes in v10: https://lore.kernel.org/all/20230214211229.3239350-1-quic_eberman@quicinc.com/
 - Fix bisectability (end result of series is same, --fixups applied to wrong commits)
 - Convert GH_ERROR_* and GH_RM_ERROR_* to enums
 - Correct race condition between allocating/freeing user memory
 - Replace offsetof with struct_size
 - Series-wide renaming of functions to be more consistent
 - VM shutdown & restart support added in vCPU and VM Manager patches
 - Convert VM function name (string) to type (number)
 - Convert VM function argument to value (which could be a pointer) to remove memory wastage for arguments
 - Remove defensive checks of hypervisor correctness
 - Clean ups to ioeventfd as suggested by Srivatsa

Changes in v9: https://lore.kernel.org/all/20230120224627.4053418-1-quic_eberman@quicinc.com/
 - Refactor Gunyah API flags to be exposed as feature flags at kernel level
 - Move mbox client cleanup into gunyah_msgq_remove()
 - Simplify gh_rm_call return value and response payload
 - Missing clean-up/error handling/little endian fixes as suggested by Srivatsa and Alex in v8 series

Changes in v8: https://lore.kernel.org/all/20221219225850.2397345-1-quic_eberman@quicinc.com/
 - Treat VM manager as a library of RM
 - Add patches 21-28 as RFC to support proxy-scheduled vCPUs and necessary bits to support virtio
   from Gunyah userspace

Changes in v7: https://lore.kernel.org/all/20221121140009.2353512-1-quic_eberman@quicinc.com/
 - Refactor to remove gunyah RM bus
 - Refactor allow multiple RM device instances
 - Bump UAPI to start at 0x0
 - Refactor QCOM SCM's platform hooks to allow CONFIG_QCOM_SCM=Y/CONFIG_GUNYAH=M combinations

Changes in v6: https://lore.kernel.org/all/20221026185846.3983888-1-quic_eberman@quicinc.com/
 - *Replace gunyah-console with gunyah VM Manager*
 - Move include/asm-generic/gunyah.h into include/linux/gunyah.h
 - s/gunyah_msgq/gh_msgq/
 - Minor tweaks and documentation tidying based on comments from Jiri, Greg, Arnd, Dmitry, and Bagas.

Changes in v5: https://lore.kernel.org/all/20221011000840.289033-1-quic_eberman@quicinc.com/
 - Dropped sysfs nodes
 - Switch from aux bus to Gunyah RM bus for the subdevices
 - Cleaning up RM console

Changes in v4: https://lore.kernel.org/all/20220928195633.2348848-1-quic_eberman@quicinc.com/
 - Tidied up documentation throughout based on questions/feedback received
 - Switched message queue implementation to use mailboxes
 - Renamed "gunyah_device" as "gunyah_resource"

Changes in v3: https://lore.kernel.org/all/20220811214107.1074343-1-quic_eberman@quicinc.com/
 - /Maintained/Supported/ in MAINTAINERS
 - Tidied up documentation throughout based on questions/feedback received
 - Moved hypercalls into arch/arm64/gunyah/; following hyper-v's implementation
 - Drop opaque typedefs
 - Move sysfs nodes under /sys/hypervisor/gunyah/
 - Moved Gunyah console driver to drivers/tty/
 - Reworked gh_device design to drop the Gunyah bus.

Changes in v2: https://lore.kernel.org/all/20220801211240.597859-1-quic_eberman@quicinc.com/
 - DT bindings clean up
 - Switch hypercalls to follow SMCCC 

v1: https://lore.kernel.org/all/20220223233729.1571114-1-quic_eberman@quicinc.com/

Elliot Berman (25):
  docs: gunyah: Introduce Gunyah Hypervisor
  dt-bindings: Add binding for gunyah hypervisor
  gunyah: Common types and error codes for Gunyah hypercalls
  virt: gunyah: Add hypercalls to identify Gunyah
  virt: gunyah: msgq: Add hypercalls to send and receive messages
  mailbox: Add Gunyah message queue mailbox
  gunyah: rsc_mgr: Add resource manager RPC core
  gunyah: rsc_mgr: Add VM lifecycle RPC
  gunyah: vm_mgr: Introduce basic VM Manager
  gunyah: rsc_mgr: Add RPC for sharing memory
  gunyah: vm_mgr: Add/remove user memory regions
  gunyah: vm_mgr: Add ioctls to support basic non-proxy VM boot
  samples: Add sample userspace Gunyah VM Manager
  gunyah: rsc_mgr: Add platform ops on mem_lend/mem_reclaim
  virt: gunyah: Add Qualcomm Gunyah platform ops
  docs: gunyah: Document Gunyah VM Manager
  virt: gunyah: Translate gh_rm_hyp_resource into gunyah_resource
  gunyah: vm_mgr: Add framework for VM Functions
  virt: gunyah: Add resource tickets
  virt: gunyah: Add IO handlers
  virt: gunyah: Add proxy-scheduled vCPUs
  virt: gunyah: Add hypercalls for sending doorbell
  virt: gunyah: Add irqfd interface
  virt: gunyah: Add ioeventfd
  MAINTAINERS: Add Gunyah hypervisor drivers section

 .../bindings/firmware/gunyah-hypervisor.yaml  |  82 ++
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 Documentation/virt/gunyah/index.rst           | 114 +++
 Documentation/virt/gunyah/message-queue.rst   |  71 ++
 Documentation/virt/gunyah/vm-manager.rst      | 143 +++
 Documentation/virt/index.rst                  |   1 +
 MAINTAINERS                                   |  13 +
 arch/arm64/Kbuild                             |   1 +
 arch/arm64/gunyah/Makefile                    |   3 +
 arch/arm64/gunyah/gunyah_hypercall.c          | 142 +++
 arch/arm64/include/asm/gunyah.h               |  24 +
 drivers/mailbox/Makefile                      |   2 +
 drivers/mailbox/gunyah-msgq.c                 | 219 +++++
 drivers/virt/Kconfig                          |   2 +
 drivers/virt/Makefile                         |   1 +
 drivers/virt/gunyah/Kconfig                   |  59 ++
 drivers/virt/gunyah/Makefile                  |  11 +
 drivers/virt/gunyah/gunyah_ioeventfd.c        | 130 +++
 drivers/virt/gunyah/gunyah_irqfd.c            | 180 ++++
 drivers/virt/gunyah/gunyah_platform_hooks.c   |  80 ++
 drivers/virt/gunyah/gunyah_qcom.c             | 153 +++
 drivers/virt/gunyah/gunyah_vcpu.c             | 470 +++++++++
 drivers/virt/gunyah/rsc_mgr.c                 | 909 ++++++++++++++++++
 drivers/virt/gunyah/rsc_mgr.h                 |  19 +
 drivers/virt/gunyah/rsc_mgr_rpc.c             | 500 ++++++++++
 drivers/virt/gunyah/vm_mgr.c                  | 791 +++++++++++++++
 drivers/virt/gunyah/vm_mgr.h                  |  70 ++
 drivers/virt/gunyah/vm_mgr_mm.c               | 252 +++++
 include/linux/gunyah.h                        | 207 ++++
 include/linux/gunyah_rsc_mgr.h                | 162 ++++
 include/linux/gunyah_vm_mgr.h                 | 153 +++
 include/uapi/linux/gunyah.h                   | 293 ++++++
 samples/Kconfig                               |  10 +
 samples/Makefile                              |   1 +
 samples/gunyah/.gitignore                     |   2 +
 samples/gunyah/Makefile                       |   6 +
 samples/gunyah/gunyah_vmm.c                   | 266 +++++
 samples/gunyah/sample_vm.dts                  |  68 ++
 38 files changed, 5611 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
 create mode 100644 Documentation/virt/gunyah/index.rst
 create mode 100644 Documentation/virt/gunyah/message-queue.rst
 create mode 100644 Documentation/virt/gunyah/vm-manager.rst
 create mode 100644 arch/arm64/gunyah/Makefile
 create mode 100644 arch/arm64/gunyah/gunyah_hypercall.c
 create mode 100644 arch/arm64/include/asm/gunyah.h
 create mode 100644 drivers/mailbox/gunyah-msgq.c
 create mode 100644 drivers/virt/gunyah/Kconfig
 create mode 100644 drivers/virt/gunyah/Makefile
 create mode 100644 drivers/virt/gunyah/gunyah_ioeventfd.c
 create mode 100644 drivers/virt/gunyah/gunyah_irqfd.c
 create mode 100644 drivers/virt/gunyah/gunyah_platform_hooks.c
 create mode 100644 drivers/virt/gunyah/gunyah_qcom.c
 create mode 100644 drivers/virt/gunyah/gunyah_vcpu.c
 create mode 100644 drivers/virt/gunyah/rsc_mgr.c
 create mode 100644 drivers/virt/gunyah/rsc_mgr.h
 create mode 100644 drivers/virt/gunyah/rsc_mgr_rpc.c
 create mode 100644 drivers/virt/gunyah/vm_mgr.c
 create mode 100644 drivers/virt/gunyah/vm_mgr.h
 create mode 100644 drivers/virt/gunyah/vm_mgr_mm.c
 create mode 100644 include/linux/gunyah.h
 create mode 100644 include/linux/gunyah_rsc_mgr.h
 create mode 100644 include/linux/gunyah_vm_mgr.h
 create mode 100644 include/uapi/linux/gunyah.h
 create mode 100644 samples/gunyah/.gitignore
 create mode 100644 samples/gunyah/Makefile
 create mode 100644 samples/gunyah/gunyah_vmm.c
 create mode 100644 samples/gunyah/sample_vm.dts


base-commit: 858fd168a95c5b9669aac8db6c14a9aeab446375

Comments

Alex Elder June 16, 2023, 4:33 p.m. UTC | #1
On 6/13/23 12:20 PM, Elliot Berman wrote:
> Gunyah message queues are a unidirectional inter-VM pipe for messages up
> to 1024 bytes. This driver supports pairing a receiver message queue and
> a transmitter message queue to expose a single mailbox channel.
> 
> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>

Despite a small gripe this looks good.

Reviewed-by: Alex Elder <elder@linaro.org>

> ---
>   Documentation/virt/gunyah/message-queue.rst |   8 +
>   drivers/mailbox/Makefile                    |   2 +
>   drivers/mailbox/gunyah-msgq.c               | 219 ++++++++++++++++++++
>   include/linux/gunyah.h                      |  57 +++++
>   4 files changed, 286 insertions(+)
>   create mode 100644 drivers/mailbox/gunyah-msgq.c
> 
> diff --git a/Documentation/virt/gunyah/message-queue.rst b/Documentation/virt/gunyah/message-queue.rst
> index b352918ae54b4..70d82a4ef32d7 100644
> --- a/Documentation/virt/gunyah/message-queue.rst
> +++ b/Documentation/virt/gunyah/message-queue.rst
> @@ -61,3 +61,11 @@ vIRQ: two TX message queues will have two vIRQs (and two capability IDs).
>         |               |         |                 |         |               |
>         |               |         |                 |         |               |
>         +---------------+         +-----------------+         +---------------+
> +
> +Gunyah message queues are exposed as mailboxes. To create the mailbox, create
> +a mbox_client and call `gh_msgq_init()`. On receipt of the RX_READY interrupt,
> +all messages in the RX message queue are read and pushed via the `rx_callback`
> +of the registered mbox_client.
> +
> +.. kernel-doc:: drivers/mailbox/gunyah-msgq.c
> +   :identifiers: gh_msgq_init
> diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
> index fc93761171113..5f929bb55e9a5 100644
> --- a/drivers/mailbox/Makefile
> +++ b/drivers/mailbox/Makefile
> @@ -55,6 +55,8 @@ obj-$(CONFIG_MTK_CMDQ_MBOX)	+= mtk-cmdq-mailbox.o
>   
>   obj-$(CONFIG_ZYNQMP_IPI_MBOX)	+= zynqmp-ipi-mailbox.o
>   
> +obj-$(CONFIG_GUNYAH)		+= gunyah-msgq.o
> +
>   obj-$(CONFIG_SUN6I_MSGBOX)	+= sun6i-msgbox.o
>   
>   obj-$(CONFIG_SPRD_MBOX)		+= sprd-mailbox.o
> diff --git a/drivers/mailbox/gunyah-msgq.c b/drivers/mailbox/gunyah-msgq.c
> new file mode 100644
> index 0000000000000..7f777339278eb
> --- /dev/null
> +++ b/drivers/mailbox/gunyah-msgq.c
> @@ -0,0 +1,219 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/mailbox_controller.h>
> +#include <linux/module.h>
> +#include <linux/interrupt.h>
> +#include <linux/gunyah.h>
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/wait.h>
> +
> +#define mbox_chan_to_msgq(chan) (container_of(chan->mbox, struct gh_msgq, mbox))
> +
> +static irqreturn_t gh_msgq_rx_irq_handler(int irq, void *data)
> +{
> +	struct gh_msgq *msgq = data;
> +	struct gh_msgq_rx_data rx_data;
> +	enum gh_error gh_error;
> +	bool ready = true;

Since ready is true initially, you could use do...while ()
and not bother pre-initializing it.

> +
> +	while (ready) {
> +		gh_error = gh_hypercall_msgq_recv(msgq->rx_ghrsc->capid,
> +				&rx_data.data, sizeof(rx_data.data),
> +				&rx_data.length, &ready);
> +		if (gh_error != GH_ERROR_OK) {
> +			if (gh_error != GH_ERROR_MSGQUEUE_EMPTY)
> +				dev_warn(msgq->mbox.dev, "Failed to receive data: %d\n", gh_error);
> +			break;
> +		}
> +		if (likely(gh_msgq_chan(msgq)->cl))
> +			mbox_chan_received_data(gh_msgq_chan(msgq), &rx_data);
> +	}
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/* Fired when message queue transitions from "full" to "space available" to send messages */
> +static irqreturn_t gh_msgq_tx_irq_handler(int irq, void *data)
> +{
> +	struct gh_msgq *msgq = data;
> +
> +	mbox_chan_txdone(gh_msgq_chan(msgq), 0);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/* Fired after sending message and hypercall told us there was more space available. */
> +static void gh_msgq_txdone_tasklet(struct tasklet_struct *tasklet)
> +{
> +	struct gh_msgq *msgq = container_of(tasklet, struct gh_msgq, txdone_tasklet);
> +
> +	mbox_chan_txdone(gh_msgq_chan(msgq), msgq->last_ret);
> +}
> +
> +static int gh_msgq_send_data(struct mbox_chan *chan, void *data)
> +{
> +	struct gh_msgq *msgq = mbox_chan_to_msgq(chan);
> +	struct gh_msgq_tx_data *msgq_data = data;
> +	u64 tx_flags = 0;
> +	enum gh_error gh_error;
> +	bool ready;
> +
> +	if (!msgq->tx_ghrsc)
> +		return -EOPNOTSUPP;
> +
> +	if (msgq_data->push)
> +		tx_flags |= GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH;
> +
> +	gh_error = gh_hypercall_msgq_send(msgq->tx_ghrsc->capid, msgq_data->length, msgq_data->data,
> +						tx_flags, &ready);
> +
> +	/**
> +	 * unlikely because Linux tracks state of msgq and should not try to
> +	 * send message when msgq is full.
> +	 */
> +	if (unlikely(gh_error == GH_ERROR_MSGQUEUE_FULL))
> +		return -EAGAIN;
> +
> +	/**
> +	 * Propagate all other errors to client. If we return error to mailbox
> +	 * framework, then no other messages can be sent and nobody will know
> +	 * to retry this message.
> +	 */
> +	msgq->last_ret = gh_error_remap(gh_error);
> +
> +	/**
> +	 * This message was successfully sent, but message queue isn't ready to
> +	 * accept more messages because it's now full. Mailbox framework
> +	 * requires that we only report that message was transmitted when
> +	 * we're ready to transmit another message. We'll get that in the form
> +	 * of tx IRQ once the other side starts to drain the msgq.
> +	 */

I really dislike this mismatch between Gunyah and mailbox
semantics.  Oh well.

> +	if (gh_error == GH_ERROR_OK) {
> +		if (!ready)
> +			return 0;
> +	} else {
> +		dev_err(msgq->mbox.dev, "Failed to send data: %d (%d)\n", gh_error, msgq->last_ret);
> +	}
> +
> +	/**
> +	 * We can send more messages. Mailbox framework requires that tx done
> +	 * happens asynchronously to sending the message. Gunyah message queues
> +	 * tell us right away on the hypercall return whether we can send more
> +	 * messages. To work around this, defer the txdone to a tasklet.
> +	 */

This mismatch too...

> +	tasklet_schedule(&msgq->txdone_tasklet);
> +
> +	return 0;
> +}
> +
> +static struct mbox_chan_ops gh_msgq_ops = {
> +	.send_data = gh_msgq_send_data,
> +};
> +
> +/**
> + * gh_msgq_init() - Initialize a Gunyah message queue with an mbox_client
> + * @parent: device parent used for the mailbox controller
> + * @msgq: Pointer to the gh_msgq to initialize
> + * @cl: A mailbox client to bind to the mailbox channel that the message queue creates
> + * @tx_ghrsc: optional, the transmission side of the message queue
> + * @rx_ghrsc: optional, the receiving side of the message queue
> + *
> + * At least one of tx_ghrsc and rx_ghrsc must be not NULL. Most message queue use cases come with
> + * a pair of message queues to facilitate bidirectional communication. When tx_ghrsc is set,
> + * the client can send messages with mbox_send_message(gh_msgq_chan(msgq), msg). When rx_ghrsc
> + * is set, the mbox_client must register an .rx_callback() and the message queue driver will
> + * deliver all available messages upon receiving the RX ready interrupt. The messages should be
> + * consumed or copied by the client right away as the gh_msgq_rx_data will be replaced/destroyed
> + * after the callback.
> + *
> + * Returns - 0 on success, negative otherwise
> + */
> +int gh_msgq_init(struct device *parent, struct gh_msgq *msgq, struct mbox_client *cl,
> +		 struct gh_resource *tx_ghrsc, struct gh_resource *rx_ghrsc)
> +{
> +	int ret;
> +
> +	/* Must have at least a tx_ghrsc or rx_ghrsc and that they are the right device types */
> +	if ((!tx_ghrsc && !rx_ghrsc) ||
> +	    (tx_ghrsc && tx_ghrsc->type != GH_RESOURCE_TYPE_MSGQ_TX) ||
> +	    (rx_ghrsc && rx_ghrsc->type != GH_RESOURCE_TYPE_MSGQ_RX))
> +		return -EINVAL;
> +
> +	msgq->mbox.dev = parent;
> +	msgq->mbox.ops = &gh_msgq_ops;
> +	msgq->mbox.num_chans = 1;
> +	msgq->mbox.txdone_irq = true;
> +	msgq->mbox.chans = &msgq->mbox_chan;
> +
> +	ret = mbox_controller_register(&msgq->mbox);
> +	if (ret)
> +		return ret;
> +
> +	ret = mbox_bind_client(gh_msgq_chan(msgq), cl);
> +	if (ret)
> +		goto err_mbox;
> +
> +	if (tx_ghrsc) {
> +		msgq->tx_ghrsc = tx_ghrsc;
> +
> +		ret = request_irq(msgq->tx_ghrsc->irq, gh_msgq_tx_irq_handler, 0, "gh_msgq_tx",
> +				msgq);
> +		if (ret)
> +			goto err_tx_ghrsc;
> +
> +		enable_irq_wake(msgq->tx_ghrsc->irq);
> +
> +		tasklet_setup(&msgq->txdone_tasklet, gh_msgq_txdone_tasklet);
> +	}
> +
> +	if (rx_ghrsc) {
> +		msgq->rx_ghrsc = rx_ghrsc;
> +
> +		ret = request_threaded_irq(msgq->rx_ghrsc->irq, NULL, gh_msgq_rx_irq_handler,
> +						IRQF_ONESHOT, "gh_msgq_rx", msgq);
> +		if (ret)
> +			goto err_tx_irq;
> +
> +		enable_irq_wake(msgq->rx_ghrsc->irq);
> +	}
> +
> +	return 0;
> +err_tx_irq:
> +	if (msgq->tx_ghrsc)
> +		free_irq(msgq->tx_ghrsc->irq, msgq);
> +
> +	msgq->rx_ghrsc = NULL;
> +err_tx_ghrsc:
> +	msgq->tx_ghrsc = NULL;
> +err_mbox:
> +	mbox_controller_unregister(&msgq->mbox);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(gh_msgq_init);
> +
> +void gh_msgq_remove(struct gh_msgq *msgq)
> +{
> +	mbox_free_channel(gh_msgq_chan(msgq));
> +
> +	if (msgq->rx_ghrsc)
> +		free_irq(msgq->rx_ghrsc->irq, msgq);
> +
> +	if (msgq->tx_ghrsc) {
> +		tasklet_kill(&msgq->txdone_tasklet);
> +		free_irq(msgq->tx_ghrsc->irq, msgq);
> +	}
> +
> +	mbox_controller_unregister(&msgq->mbox);
> +
> +	msgq->rx_ghrsc = NULL;
> +	msgq->tx_ghrsc = NULL;
> +}
> +EXPORT_SYMBOL_GPL(gh_msgq_remove);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Gunyah Message Queue Driver");
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> index 01a6f202d037e..982e27d10d57f 100644
> --- a/include/linux/gunyah.h
> +++ b/include/linux/gunyah.h
> @@ -8,11 +8,68 @@
>   
>   #include <linux/bitfield.h>
>   #include <linux/errno.h>
> +#include <linux/interrupt.h>
>   #include <linux/limits.h>
> +#include <linux/mailbox_controller.h>
> +#include <linux/mailbox_client.h>
>   #include <linux/types.h>
>   
> +/* Matches resource manager's resource types for VM_GET_HYP_RESOURCES RPC */
> +enum gh_resource_type {
> +	GH_RESOURCE_TYPE_BELL_TX	= 0,
> +	GH_RESOURCE_TYPE_BELL_RX	= 1,
> +	GH_RESOURCE_TYPE_MSGQ_TX	= 2,
> +	GH_RESOURCE_TYPE_MSGQ_RX	= 3,
> +	GH_RESOURCE_TYPE_VCPU		= 4,
> +};
> +
> +struct gh_resource {
> +	enum gh_resource_type type;
> +	u64 capid;
> +	unsigned int irq;
> +};
> +
> +/**
> + * Gunyah Message Queues
> + */
> +
> +#define GH_MSGQ_MAX_MSG_SIZE		240
> +
> +struct gh_msgq_tx_data {
> +	size_t length;
> +	bool push;
> +	char data[];
> +};
> +
> +struct gh_msgq_rx_data {
> +	size_t length;
> +	char data[GH_MSGQ_MAX_MSG_SIZE];
> +};
> +
> +struct gh_msgq {
> +	struct gh_resource *tx_ghrsc;
> +	struct gh_resource *rx_ghrsc;
> +
> +	/* msgq private */
> +	int last_ret; /* Linux error, not GH_STATUS_* */
> +	struct mbox_chan mbox_chan;
> +	struct mbox_controller mbox;
> +	struct tasklet_struct txdone_tasklet;
> +};
> +
> +
> +int gh_msgq_init(struct device *parent, struct gh_msgq *msgq, struct mbox_client *cl,
> +		     struct gh_resource *tx_ghrsc, struct gh_resource *rx_ghrsc);
> +void gh_msgq_remove(struct gh_msgq *msgq);
> +
> +static inline struct mbox_chan *gh_msgq_chan(struct gh_msgq *msgq)
> +{
> +	return &msgq->mbox.chans[0];
> +}
> +
>   /******************************************************************************/
>   /* Common arch-independent definitions for Gunyah hypercalls                  */
> +
>   #define GH_CAPID_INVAL	U64_MAX
>   #define GH_VMID_ROOT_VM	0xff
>
Alex Elder June 16, 2023, 4:34 p.m. UTC | #2
On 6/13/23 12:20 PM, Elliot Berman wrote:
> Enable support for creating irqfds which can raise an interrupt on a
> Gunyah virtual machine. irqfds are exposed to userspace as a Gunyah VM
> function with the name "irqfd". If the VM devicetree is not configured
> to create a doorbell with the corresponding label, userspace will still
> be able to assert the eventfd but no interrupt will be raised on the
> guest.
> 
> Acked-by: Alex Elder <elder@linaro.org>

I guess I don't mind upgrading this.  It looks good to me.

Reviewed-by: Alex Elder <elder@linaro.org>

> Co-developed-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
> Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
> ---
>   Documentation/virt/gunyah/vm-manager.rst |   2 +-
>   drivers/virt/gunyah/Kconfig              |   9 ++
>   drivers/virt/gunyah/Makefile             |   1 +
>   drivers/virt/gunyah/gunyah_irqfd.c       | 180 +++++++++++++++++++++++
>   include/uapi/linux/gunyah.h              |  35 +++++
>   5 files changed, 226 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/virt/gunyah/gunyah_irqfd.c
> 
> diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
> index b0c3bde105ae9..57a254827be26 100644
> --- a/Documentation/virt/gunyah/vm-manager.rst
> +++ b/Documentation/virt/gunyah/vm-manager.rst
> @@ -116,7 +116,7 @@ the VM *before* the VM starts.
>   The argument types are documented below:
>   
>   .. kernel-doc:: include/uapi/linux/gunyah.h
> -   :identifiers: gh_fn_vcpu_arg
> +   :identifiers: gh_fn_vcpu_arg gh_fn_irqfd_arg gh_irqfd_flags
>   
>   Gunyah VCPU API Descriptions
>   ----------------------------
> diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
> index 0a58395f7d2c5..bc2c46d9df946 100644
> --- a/drivers/virt/gunyah/Kconfig
> +++ b/drivers/virt/gunyah/Kconfig
> @@ -39,3 +39,12 @@ config GUNYAH_VCPU
>   	  VMMs can also handle stage 2 faults of the vCPUs.
>   
>   	  Say Y/M here if unsure and you want to support Gunyah VMMs.
> +
> +config GUNYAH_IRQFD
> +	tristate "Gunyah irqfd interface"
> +	depends on GUNYAH
> +	help
> +	  Enable kernel support for creating irqfds which can raise an interrupt
> +	  on Gunyah virtual machine.
> +
> +	  Say Y/M here if unsure and you want to support Gunyah VMMs.
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> index cc16b6c19db92..ad212a1cf9671 100644
> --- a/drivers/virt/gunyah/Makefile
> +++ b/drivers/virt/gunyah/Makefile
> @@ -7,3 +7,4 @@ gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
>   obj-$(CONFIG_GUNYAH) += gunyah.o
>   
>   obj-$(CONFIG_GUNYAH_VCPU) += gunyah_vcpu.o
> +obj-$(CONFIG_GUNYAH_IRQFD) += gunyah_irqfd.o
> diff --git a/drivers/virt/gunyah/gunyah_irqfd.c b/drivers/virt/gunyah/gunyah_irqfd.c
> new file mode 100644
> index 0000000000000..3e954ebd20297
> --- /dev/null
> +++ b/drivers/virt/gunyah/gunyah_irqfd.c
> @@ -0,0 +1,180 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/eventfd.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/gunyah.h>
> +#include <linux/gunyah_vm_mgr.h>
> +#include <linux/module.h>
> +#include <linux/poll.h>
> +#include <linux/printk.h>
> +
> +#include <uapi/linux/gunyah.h>
> +
> +struct gh_irqfd {
> +	struct gh_resource *ghrsc;
> +	struct gh_vm_resource_ticket ticket;
> +	struct gh_vm_function_instance *f;
> +
> +	bool level;
> +
> +	struct eventfd_ctx *ctx;
> +	wait_queue_entry_t wait;
> +	poll_table pt;
> +};
> +
> +static int irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key)
> +{
> +	struct gh_irqfd *irqfd = container_of(wait, struct gh_irqfd, wait);
> +	__poll_t flags = key_to_poll(key);
> +	int ret = 0;
> +
> +	if (flags & EPOLLIN) {
> +		if (irqfd->ghrsc) {
> +			ret = gh_hypercall_bell_send(irqfd->ghrsc->capid, 1, NULL);
> +			if (ret)
> +				pr_err_ratelimited("Failed to inject interrupt %d: %d\n",
> +						irqfd->ticket.label, ret);
> +		} else
> +			pr_err_ratelimited("Premature injection of interrupt\n");
> +	}
> +
> +	return 0;
> +}
> +
> +static void irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, poll_table *pt)
> +{
> +	struct gh_irqfd *irq_ctx = container_of(pt, struct gh_irqfd, pt);
> +
> +	add_wait_queue(wqh, &irq_ctx->wait);
> +}
> +
> +static bool gh_irqfd_populate(struct gh_vm_resource_ticket *ticket, struct gh_resource *ghrsc)
> +{
> +	struct gh_irqfd *irqfd = container_of(ticket, struct gh_irqfd, ticket);
> +	int ret;
> +
> +	if (irqfd->ghrsc) {
> +		pr_warn("irqfd%d already got a Gunyah resource. Check if multiple resources with same label were configured.\n",
> +			irqfd->ticket.label);
> +		return false;
> +	}
> +
> +	irqfd->ghrsc = ghrsc;
> +	if (irqfd->level) {
> +		/* Configure the bell to trigger when bit 0 is asserted (see
> +		 * irq_wakeup) and for bell to automatically clear bit 0 once
> +		 * received by the VM (ack_mask).  need to make sure bit 0 is cleared right away,
> +		 * otherwise the line will never be deasserted. Emulating edge
> +		 * trigger interrupt does not need to set either mask
> +		 * because irq is listed only once per gh_hypercall_bell_send
> +		 */
> +		ret = gh_hypercall_bell_set_mask(irqfd->ghrsc->capid, 1, 1);
> +		if (ret)
> +			pr_warn("irq %d couldn't be set as level triggered. Might cause IRQ storm if asserted\n",
> +				irqfd->ticket.label);
> +	}
> +
> +	return true;
> +}
> +
> +static void gh_irqfd_unpopulate(struct gh_vm_resource_ticket *ticket, struct gh_resource *ghrsc)
> +{
> +	struct gh_irqfd *irqfd = container_of(ticket, struct gh_irqfd, ticket);
> +	u64 cnt;
> +
> +	eventfd_ctx_remove_wait_queue(irqfd->ctx, &irqfd->wait, &cnt);
> +}
> +
> +static long gh_irqfd_bind(struct gh_vm_function_instance *f)
> +{
> +	struct gh_fn_irqfd_arg *args = f->argp;
> +	struct gh_irqfd *irqfd;
> +	__poll_t events;
> +	struct fd fd;
> +	long r;
> +
> +	if (f->arg_size != sizeof(*args))
> +		return -EINVAL;
> +
> +	/* All other flag bits are reserved for future use */
> +	if (args->flags & ~GH_IRQFD_FLAGS_LEVEL)
> +		return -EINVAL;
> +
> +	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> +	if (!irqfd)
> +		return -ENOMEM;
> +
> +	irqfd->f = f;
> +	f->data = irqfd;
> +
> +	fd = fdget(args->fd);
> +	if (!fd.file) {
> +		kfree(irqfd);
> +		return -EBADF;
> +	}
> +
> +	irqfd->ctx = eventfd_ctx_fileget(fd.file);
> +	if (IS_ERR(irqfd->ctx)) {
> +		r = PTR_ERR(irqfd->ctx);
> +		goto err_fdput;
> +	}
> +
> +	if (args->flags & GH_IRQFD_FLAGS_LEVEL)
> +		irqfd->level = true;
> +
> +	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> +	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> +
> +	irqfd->ticket.resource_type = GH_RESOURCE_TYPE_BELL_TX;
> +	irqfd->ticket.label = args->label;
> +	irqfd->ticket.owner = THIS_MODULE;
> +	irqfd->ticket.populate = gh_irqfd_populate;
> +	irqfd->ticket.unpopulate = gh_irqfd_unpopulate;
> +
> +	r = gh_vm_add_resource_ticket(f->ghvm, &irqfd->ticket);
> +	if (r)
> +		goto err_ctx;
> +
> +	events = vfs_poll(fd.file, &irqfd->pt);
> +	if (events & EPOLLIN)
> +		pr_warn("Premature injection of interrupt\n");
> +	fdput(fd);
> +
> +	return 0;
> +err_ctx:
> +	eventfd_ctx_put(irqfd->ctx);
> +err_fdput:
> +	fdput(fd);
> +	kfree(irqfd);
> +	return r;
> +}
> +
> +static void gh_irqfd_unbind(struct gh_vm_function_instance *f)
> +{
> +	struct gh_irqfd *irqfd = f->data;
> +
> +	gh_vm_remove_resource_ticket(irqfd->f->ghvm, &irqfd->ticket);
> +	eventfd_ctx_put(irqfd->ctx);
> +	kfree(irqfd);
> +}
> +
> +static bool gh_irqfd_compare(const struct gh_vm_function_instance *f,
> +				const void *arg, size_t size)
> +{
> +	const struct gh_fn_irqfd_arg *instance = f->argp,
> +					 *other = arg;
> +
> +	if (sizeof(*other) != size)
> +		return false;
> +
> +	return instance->label == other->label;
> +}
> +
> +DECLARE_GH_VM_FUNCTION_INIT(irqfd, GH_FN_IRQFD, 2, gh_irqfd_bind, gh_irqfd_unbind,
> +				gh_irqfd_compare);
> +MODULE_DESCRIPTION("Gunyah irqfd VM Function");
> +MODULE_LICENSE("GPL");
> diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
> index 434ffa8ffc783..0c480c622686a 100644
> --- a/include/uapi/linux/gunyah.h
> +++ b/include/uapi/linux/gunyah.h
> @@ -77,9 +77,12 @@ struct gh_vm_dtb_config {
>    * @GH_FN_VCPU: create a vCPU instance to control a vCPU
>    *              &struct gh_fn_desc.arg is a pointer to &struct gh_fn_vcpu_arg
>    *              Return: file descriptor to manipulate the vcpu.
> + * @GH_FN_IRQFD: register eventfd to assert a Gunyah doorbell
> + *               &struct gh_fn_desc.arg is a pointer to &struct gh_fn_irqfd_arg
>    */
>   enum gh_fn_type {
>   	GH_FN_VCPU = 1,
> +	GH_FN_IRQFD,
>   };
>   
>   #define GH_FN_MAX_ARG_SIZE		256
> @@ -99,6 +102,38 @@ struct gh_fn_vcpu_arg {
>   	__u32 id;
>   };
>   
> +/**
> + * enum gh_irqfd_flags - flags for use in gh_fn_irqfd_arg
> + * @GH_IRQFD_FLAGS_LEVEL: make the interrupt operate like a level triggered
> + *                        interrupt on guest side. Triggering IRQFD before
> + *                        guest handles the interrupt causes interrupt to
> + *                        stay asserted.
> + */
> +enum gh_irqfd_flags {
> +	GH_IRQFD_FLAGS_LEVEL		= 1UL << 0,
> +};
> +
> +/**
> + * struct gh_fn_irqfd_arg - Arguments to create an irqfd function.
> + *
> + * Create this function with &GH_VM_ADD_FUNCTION using type &GH_FN_IRQFD.
> + *
> + * Allows setting an eventfd to directly trigger a guest interrupt.
> + * irqfd.fd specifies the file descriptor to use as the eventfd.
> + * irqfd.label corresponds to the doorbell label used in the guest VM's devicetree.
> + *
> + * @fd: an eventfd which when written to will raise a doorbell
> + * @label: Label of the doorbell created on the guest VM
> + * @flags: see &enum gh_irqfd_flags
> + * @padding: padding bytes
> + */
> +struct gh_fn_irqfd_arg {
> +	__u32 fd;
> +	__u32 label;
> +	__u32 flags;
> +	__u32 padding;
> +};
> +
>   /**
>    * struct gh_fn_desc - Arguments to create a VM function
>    * @type: Type of the function. See &enum gh_fn_type.
Elliot Berman July 3, 2023, 10:41 p.m. UTC | #3
On 6/16/2023 9:32 AM, Alex Elder wrote:
> On 6/13/23 12:20 PM, Elliot Berman wrote:
>> Gunyah is an open-source Type-1 hypervisor developed by Qualcomm. It
>> does not depend on any lower-privileged OS/kernel code for its core
>> functionality. This increases its security and can support a smaller
>> trusted computing based when compared to Type-2 hypervisors.
> 
> s/based/base/
> 
>>
>> Add documentation describing the Gunyah hypervisor and the main
>> components of the Gunyah hypervisor which are of interest to Linux
>> virtualization development.
>>
>> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
>> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
> 
> I have some questions and comments.  But I trust that you
> can answer them and update your patch in a reasonable way
> to address what I say.  So... please consider these things,
> and update as you see fit.
> 
> Reviewed-by: Alex Elder <elder@linaro.org>
> 
>> ---
>>   Documentation/virt/gunyah/index.rst         | 113 ++++++++++++++++++++
>>   Documentation/virt/gunyah/message-queue.rst |  63 +++++++++++
>>   Documentation/virt/index.rst                |   1 +
>>   3 files changed, 177 insertions(+)
>>   create mode 100644 Documentation/virt/gunyah/index.rst
>>   create mode 100644 Documentation/virt/gunyah/message-queue.rst
>>
>> diff --git a/Documentation/virt/gunyah/index.rst 
>> b/Documentation/virt/gunyah/index.rst
>> new file mode 100644
>> index 0000000000000..74aa345e0a144
>> --- /dev/null
>> +++ b/Documentation/virt/gunyah/index.rst
>> @@ -0,0 +1,113 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +=================
>> +Gunyah Hypervisor
>> +=================
>> +
>> +.. toctree::
>> +   :maxdepth: 1
>> +
>> +   message-queue
>> +
>> +Gunyah is a Type-1 hypervisor which is independent of any OS kernel, 
>> and runs in
>> +a higher CPU privilege level. It does not depend on any 
>> lower-privileged operating system
>> +for its core functionality. This increases its security and can 
>> support a much smaller
>> +trusted computing base than a Type-2 hypervisor.
>> +
>> +Gunyah is an open source hypervisor. The source repo is available at
>> +https://github.com/quic/gunyah-hypervisor.
>> +
>> +Gunyah provides these following features.
>> +
>> +- Scheduling:
>> +
>> +  A scheduler for virtual CPUs (vCPUs) on physical CPUs enables 
>> time-sharing
>> +  of the CPUs. Gunyah supports two models of scheduling:
>> +
>> +    1. "Behind the back" scheduling in which Gunyah hypervisor 
>> schedules vCPUS on its own.
> 
> s/VCPUS/VCPUs/
> 
>> +    2. "Proxy" scheduling in which a delegated VM can donate part of 
>> one of its vCPU slice
>> +       to another VM's vCPU via a hypercall.
> 
> This might sound dumb, but can there be more vCPUs than there
> are physical CPUs?  Is a vCPU *tied* to a particular physical
> CPU, or does it just indicate that a VM has one abstracted CPU
> available to use--and any available physical CPU core can
> implement it (possibly changing between time slices)?
> 

There can be more vCPUs than physical CPUs. If someone wanted to 
hard-code their VM to use 16 vCPUs, they could (I picked 16 arbitrarily).

The latter -- the physical CPU that makes the "vcpu_run" hypercall will 
be the one to run the vCPU. The userspace thread triggers the hypercall 
via GH_VCPU_RUN ioctl and is dependent on the host's task placement for 
which physical cpu that userspace thread runs on.

>> +
>> +- Memory Management:
>> +
>> +  APIs handling memory, abstracted as objects, limiting direct use of 
>> physical
>> +  addresses. Memory ownership and usage tracking of all memory under 
>> its control.
>> +  Memory partitioning between VMs is a fundamental security feature.
>> +
>> +- Interrupt Virtualization:
>> +
>> +  Uses CPU hardware interrupt virtualization capabilities. Interrupts 
>> are handled
>> +  in the hypervisor and routed to the assigned VM.
>> +
>> +- Inter-VM Communication:
>> +
>> +  There are several different mechanisms provided for communicating 
>> between VMs.
>> +
>> +- Virtual platform:
>> +
>> +  Architectural devices such as interrupt controllers and CPU timers 
>> are directly provided
>> +  by the hypervisor as well as core virtual platform devices and 
>> system APIs such as ARM PSCI.
>> +
>> +- Device Virtualization:
>> +
>> +  Para-virtualization of devices is supported using inter-VM 
>> communication.
>> +
>> +Architectures supported
>> +=======================
>> +AArch64 with a GIC
>> +
>> +Resources and Capabilities
>> +==========================
>> +
>> +Some services or resources provided by the Gunyah hypervisor are 
>> described to a virtual machine by
>> +capability IDs. For instance, inter-VM communication is performed 
>> with doorbells and message queues.
>> +Gunyah allows access to manipulate that doorbell via the capability 
>> ID. These resources are
>> +described in Linux as a struct gh_resource.
>> +
>> +High level management of these resources is performed by the resource 
>> manager VM. RM informs a
>> +guest VM about resources it can access through either the device tree 
>> or via guest-initiated RPC.
>> +
>> +For each virtual machine, Gunyah maintains a table of resources which 
>> can be accessed by that VM.
>> +An entry in this table is called a "capability" and VMs can only 
>> access resources via this
>> +capability table. Hence, virtual Gunyah resources are referenced by a 
>> "capability IDs" and not
>> +"resource IDs". If 2 VMs have access to the same resource, they might 
>> not be using the same
>> +capability ID to access that resource since the capability tables are 
>> independent per VM.
>> +
>> +Resource Manager
>> +================
>> +
>> +The resource manager (RM) is a privileged application VM supporting 
>> the Gunyah Hypervisor.
>> +It provides policy enforcement aspects of the virtualization system. 
>> The resource manager can
>> +be treated as an extension of the Hypervisor but is separated to its 
>> own partition to ensure
>> +that the hypervisor layer itself remains small and secure and to 
>> maintain a separation of policy
>> +and mechanism in the platform. RM runs at arm64 NS-EL1 similar to 
>> other virtual machines.
>> +
>> +Communication with the resource manager from each guest VM happens 
>> with message-queue.rst. Details
>> +about the specific messages can be found in 
>> drivers/virt/gunyah/rsc_mgr.c
>> +
>> +::
>> +
>> +  +-------+   +--------+   +--------+
>> +  |  RM   |   |  VM_A  |   |  VM_B  |
>> +  +-.-.-.-+   +---.----+   +---.----+
>> +    | |           |            |
>> +  +-.-.-----------.------------.----+
>> +  | | \==========/             |    |
>> +  |  \========================/     |
>> +  |            Gunyah               |
>> +  +---------------------------------+
>> +
>> +The source for the resource manager is available at 
>> https://github.com/quic/gunyah-resource-manager.
>> +
>> +The resource manager provides the following features:
>> +
>> +- VM lifecycle management: allocating a VM, starting VMs, destruction 
>> of VMs
>> +- VM access control policy, including memory sharing and lending
>> +- Interrupt routing configuration
>> +- Forwarding of system-level events (e.g. VM shutdown) to owner VM
>> +
>> +When booting a virtual machine which uses a devicetree such as Linux, 
>> resource manager overlays a
>> +/hypervisor node. This node can let Linux know it is running as a 
>> Gunyah guest VM,
>> +how to communicate with resource manager, and basic description and 
>> capabilities of
> 
> Maybe:
> 
> This node lets Linux know it is running as a Gunyah guest VM.
> It provides a basic description and capabilities of the VM,
> as well as information required to communicate with the resource
> manager.
> 
>> +this VM. See 
>> Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml for 
>> a description
>> +of this node.
>> diff --git a/Documentation/virt/gunyah/message-queue.rst 
>> b/Documentation/virt/gunyah/message-queue.rst
>> new file mode 100644
>> index 0000000000000..b352918ae54b4
>> --- /dev/null
>> +++ b/Documentation/virt/gunyah/message-queue.rst
>> @@ -0,0 +1,63 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +Message Queues
>> +==============
>> +Message queue is a simple low-capacity IPC channel between two VMs. 
>> It is
> 
> I don't know what the "capacity" of an IPC channel is.  But
> that's OK I guess; it's sort of descriptive.
> 
>> +intended for sending small control and configuration messages. Each 
>> message
>> +queue is unidirectional, so a full-duplex IPC channel requires a pair 
>> of queues.
>> +
>> +Messages can be up to 240 bytes in length. Longer messages require a 
>> further
>> +protocol on top of the message queue messages themselves. For 
>> instance, communication
>> +with the resource manager adds a header field for sending longer 
>> messages via multiple
>> +message fragments.
>> +
>> +The diagram below shows how message queue works. A typical 
>> configuration involves
>> +2 message queues. Message queue 1 allows VM_A to send messages to 
>> VM_B. Message
>> +queue 2 allows VM_B to send messages to VM_A.
>> +
>> +1. VM_A sends a message of up to 240 bytes in length. It raises a 
>> hypercall
>> +   with the message to inform the hypervisor to add the message to
>> +   message queue 1's queue. The hypervisor copies memory into the 
>> internal
>> +   message queue representation; the memory doesn't need to be shared 
>> between
>> +   VM_A and VM_B.
>> +
>> +2. Gunyah raises the corresponding interrupt for VM_B (Rx vIRQ) when 
>> any of
>> +   these happens:
>> +
>> +   a. gh_msgq_send() has PUSH flag. Queue is immediately flushed. 
>> This is the typical case.
>> +   b. Explicility with gh_msgq_push command from VM_A.
> 
> s/Explicility/Explicitly/
> 
> Is gh_msgq_send() a function and gh_msgq_push a "command" or
> something?  Why the difference in parentheses?  (Pick a
> convention and follow it.)

Will fix.

> 
> Does "Queue is flushed" mean "VM_B is interrupted"?

Yes, I'll clarify that's what it means. VM_B could get the interrupt and 
still decide not to read from the queue.

> 
> VM_A calls gh_msgq_push, and that causes the VM_B interrupt to
> be signaled?
> 

Yes.

> I'm being a little picky but I think these descriptions could be
> improved a bit.
> 
>> +   c. Message queue has reached a threshold depth.
>> +
>> +3. VM_B calls gh_msgq_recv() and Gunyah copies message to requested 
>> buffer.
> 
> It sure would be nice if all this didn't have to be copied
> twice.  But I recognize the copies ensure isolation.
> 
>> +
>> +4. Gunyah buffers messages in the queue. If the queue became full 
>> when VM_A added a message,
>> +   the return values for gh_msgq_send() include a flag that indicates 
>> the queue is full.
>> +   Once VM_B receives the message and, thus, there is space in the 
>> queue, Gunyah
>> +   will raise the Tx vIRQ on VM_A to indicate it can continue sending 
>> messages.
> 
> Does the Tx vIRQ on VM_A fire after *every* message is sent,
> or only when the state of the queue goes from "full" to "not"?
> (Looking at patch 6 it looks like the latter.)

Tx vIRQ only fires when state of queue goes from "full" to "not".

This may not be very relevant, but Gunyah allows the "not full" 
threshold to be less than the queue depth. For instance, the Tx vIRQ 
could be configured to only fire once there are no pending messages in 
the queue. Linux doesn't presently configure this threshold.

> 
> If it's signaled after every message is sent, does it
> indicate that the message has been *received* by VM_B
> (versus just received and copied by Gunyah)?
> 

To connect some dots: the Tx vIRQ is fired when the reader reads a 
message and the number of messages still in the queue decrements to the 
"not full" threshold.

https://github.com/quic/gunyah-hypervisor/blob/3d4014404993939f898018cfb1935c2d9bfc2830/hyp/ipc/msgqueue/src/msgqueue_common.c#L142-L148

>> +
>> +For VM_B to send a message to VM_A, the process is identical, except 
>> that hypercalls
>> +reference message queue 2's capability ID. Each message queue has its 
>> own independent
>> +vIRQ: two TX message queues will have two vIRQs (and two capability 
>> IDs).
>> +
>> +::
>> +
>> +      +---------------+         +-----------------+         
>> +---------------+
>> +      |      VM_A     |         |Gunyah hypervisor|         |      
>> VM_B     |
>> +      |               |         |                 |         
>> |               |
>> +      |               |         |                 |         
>> |               |
>> +      |               |   Tx    |                 |         
>> |               |
>> +      |               |-------->|                 | Rx vIRQ 
>> |               |
>> +      |gh_msgq_send() | Tx vIRQ |Message queue 1  
>> |-------->|gh_msgq_recv() |
>> +      |               |<------- |                 |         
>> |               |
>> +      |               |         |                 |         
>> |               |
>> +      | Message Queue |         |                 |         | Message 
>> Queue |
>> +      | driver        |         |                 |         | 
>> driver        |
>> +      |               |         |                 |         
>> |               |
>> +      |               |         |                 |         
>> |               |
>> +      |               |         |                 |   Tx    
>> |               |
>> +      |               | Rx vIRQ |                 
>> |<--------|               |
>> +      |gh_msgq_recv() |<--------|Message queue 2  | Tx vIRQ 
>> |gh_msgq_send() |
>> +      |               |         |                 
>> |-------->|               |
>> +      |               |         |                 |         
>> |               |
>> +      |               |         |                 |         
>> |               |
>> +      +---------------+         +-----------------+         
>> +---------------+
>> diff --git a/Documentation/virt/index.rst b/Documentation/virt/index.rst
>> index 7fb55ae08598d..15869ee059b35 100644
>> --- a/Documentation/virt/index.rst
>> +++ b/Documentation/virt/index.rst
>> @@ -16,6 +16,7 @@ Virtualization Support
>>      coco/sev-guest
>>      coco/tdx-guest
>>      hyperv/index
>> +   gunyah/index
>>   .. only:: html and subproject
>
Alex Elder July 14, 2023, 1:35 p.m. UTC | #4
On 7/3/23 5:41 PM, Elliot Berman wrote:
>> If it's signaled after every message is sent, does it
>> indicate that the message has been *received* by VM_B
>> (versus just received and copied by Gunyah)?
>>
> 
> To connect some dots: the Tx vIRQ is fired when the reader reads a 
> message and the number of messages still in the queue decrements to the 
> "not full" threshold.
> 
> https://github.com/quic/gunyah-hypervisor/blob/3d4014404993939f898018cfb1935c2d9bfc2830/hyp/ipc/msgqueue/src/msgqueue_common.c#L142-L148

So the Tx vIRQ on the sender is only fired when the state of the
receiver's Rx queue goes from "full" to "not full".

Normally there is no signal sent, and a sender sends messages
until it gets a "queue full" flag back from a gh_msgq_send()
call.  At that point it should stop sending, until the Tx vIRQ
fires to indicate the receiver queue has "room" (fewer than
the "full threshold" messages are consumed).

There is no way (at this layer of the protocol) to tell whether
a given message has been *received*, only that it has been *sent*
(meaning the hypervisor has accepted it).  And Gunyah provides
reliable delivery (each message received in send order, exactly
once).

Now that I re-read what you said it makes sense and I guess I
just misunderstood.  There *might* be a way to reword slightly
to prevent any misinterpretation.

Thanks.

					-Alex
Bjorn Andersson Aug. 5, 2023, 3:51 a.m. UTC | #5
On Tue, Jun 13, 2023 at 10:20:33AM -0700, Elliot Berman wrote:
> Add hypercalls to send and receive messages on a Gunyah message queue.
> 
> Reviewed-by: Alex Elder <elder@linaro.org>
> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
> ---
>  arch/arm64/gunyah/gunyah_hypercall.c | 31 ++++++++++++++++++++++++++++
>  include/linux/gunyah.h               |  6 ++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/arch/arm64/gunyah/gunyah_hypercall.c b/arch/arm64/gunyah/gunyah_hypercall.c
> index ef48e0b8b5448..c5bf0dd468ec9 100644
> --- a/arch/arm64/gunyah/gunyah_hypercall.c
> +++ b/arch/arm64/gunyah/gunyah_hypercall.c
> @@ -35,6 +35,8 @@ EXPORT_SYMBOL_GPL(arch_is_gh_guest);
>  						   fn)
>  
>  #define GH_HYPERCALL_HYP_IDENTIFY		GH_HYPERCALL(0x8000)
> +#define GH_HYPERCALL_MSGQ_SEND			GH_HYPERCALL(0x801B)
> +#define GH_HYPERCALL_MSGQ_RECV			GH_HYPERCALL(0x801C)
>  
>  /**
>   * gh_hypercall_hyp_identify() - Returns build information and feature flags
> @@ -54,5 +56,34 @@ void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identi
>  }
>  EXPORT_SYMBOL_GPL(gh_hypercall_hyp_identify);
>  
> +enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready)

I find the name "ready" unclear, I believe it will be true to indicate
that the message queue is not full (i.e. Gunyah is ready to receive
another message?)

Naming it "full" and reversing the values would make this immediately
obvious.

> +{
> +	struct arm_smccc_res res;
> +
> +	arm_smccc_1_1_hvc(GH_HYPERCALL_MSGQ_SEND, capid, size, (uintptr_t)buff, tx_flags, 0, &res);
> +
> +	if (res.a0 == GH_ERROR_OK)
> +		*ready = !!res.a1;
> +
> +	return res.a0;
> +}
> +EXPORT_SYMBOL_GPL(gh_hypercall_msgq_send);
> +
> +enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t *recv_size,
> +					bool *ready)

And this "ready" seems to be "more data available". Naming it "empty",
and reversing the values, would make this obvious.


Perhaps when working with the hypercalls on a regular basis it makes
sense to not invert the value space...renaming the two "ready" would be
beneficial regardless.

> +{
> +	struct arm_smccc_res res;
> +
> +	arm_smccc_1_1_hvc(GH_HYPERCALL_MSGQ_RECV, capid, (uintptr_t)buff, size, 0, &res);
> +
> +	if (res.a0 == GH_ERROR_OK) {
> +		*recv_size = res.a1;
> +		*ready = !!res.a2;
> +	}
> +
> +	return res.a0;
> +}
> +EXPORT_SYMBOL_GPL(gh_hypercall_msgq_recv);
> +
>  MODULE_LICENSE("GPL");
>  MODULE_DESCRIPTION("Gunyah Hypervisor Hypercalls");
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> index 6b36cf4787efb..01a6f202d037e 100644
> --- a/include/linux/gunyah.h
> +++ b/include/linux/gunyah.h
> @@ -111,4 +111,10 @@ static inline u16 gh_api_version(const struct gh_hypercall_hyp_identify_resp *gh
>  
>  void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identity);
>  
> +#define GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH		BIT(0)

Unless the flags are growing, passing this as a bool push instead of a
flag field seems reasonable.

> +
> +enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready);
> +enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t *recv_size,
> +					bool *ready);

Please wrap things to 80 characters, unless going a little bit over
makes things more readable.

Regards,
Bjorn
Bjorn Andersson Aug. 5, 2023, 4:16 a.m. UTC | #6
On Tue, Jun 13, 2023 at 10:20:34AM -0700, Elliot Berman wrote:
> Gunyah message queues are a unidirectional inter-VM pipe for messages up
> to 1024 bytes. This driver supports pairing a receiver message queue and
> a transmitter message queue to expose a single mailbox channel.
> 

I don't find value in using the mailbox framework to abstract the
message queues. You've split the resource-manager code in two separate
subsystems, with the complexities introduced by the abstraction, while
still being one implementation. Every message sent requires a tasklet
to be scheduled before continuing and you're unnecessarily waiting for
the tx fifo when you're filling the message queue.

You can effectively implement the same functionality, with 10% of the
code if not involving the mailbox framework.

> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
> ---
>  Documentation/virt/gunyah/message-queue.rst |   8 +
>  drivers/mailbox/Makefile                    |   2 +
>  drivers/mailbox/gunyah-msgq.c               | 219 ++++++++++++++++++++
>  include/linux/gunyah.h                      |  57 +++++
>  4 files changed, 286 insertions(+)
>  create mode 100644 drivers/mailbox/gunyah-msgq.c
> 
> diff --git a/Documentation/virt/gunyah/message-queue.rst b/Documentation/virt/gunyah/message-queue.rst
> index b352918ae54b4..70d82a4ef32d7 100644
> --- a/Documentation/virt/gunyah/message-queue.rst
> +++ b/Documentation/virt/gunyah/message-queue.rst
> @@ -61,3 +61,11 @@ vIRQ: two TX message queues will have two vIRQs (and two capability IDs).
>        |               |         |                 |         |               |
>        |               |         |                 |         |               |
>        +---------------+         +-----------------+         +---------------+
> +
> +Gunyah message queues are exposed as mailboxes. To create the mailbox, create
> +a mbox_client and call `gh_msgq_init()`. On receipt of the RX_READY interrupt,
> +all messages in the RX message queue are read and pushed via the `rx_callback`
> +of the registered mbox_client.
> +
> +.. kernel-doc:: drivers/mailbox/gunyah-msgq.c
> +   :identifiers: gh_msgq_init
> diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
> index fc93761171113..5f929bb55e9a5 100644
> --- a/drivers/mailbox/Makefile
> +++ b/drivers/mailbox/Makefile
> @@ -55,6 +55,8 @@ obj-$(CONFIG_MTK_CMDQ_MBOX)	+= mtk-cmdq-mailbox.o
>  
>  obj-$(CONFIG_ZYNQMP_IPI_MBOX)	+= zynqmp-ipi-mailbox.o
>  
> +obj-$(CONFIG_GUNYAH)		+= gunyah-msgq.o
> +
>  obj-$(CONFIG_SUN6I_MSGBOX)	+= sun6i-msgbox.o
>  
>  obj-$(CONFIG_SPRD_MBOX)		+= sprd-mailbox.o
> diff --git a/drivers/mailbox/gunyah-msgq.c b/drivers/mailbox/gunyah-msgq.c
> new file mode 100644
> index 0000000000000..7f777339278eb
> --- /dev/null
> +++ b/drivers/mailbox/gunyah-msgq.c
> @@ -0,0 +1,219 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/mailbox_controller.h>
> +#include <linux/module.h>
> +#include <linux/interrupt.h>
> +#include <linux/gunyah.h>
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/wait.h>
> +
> +#define mbox_chan_to_msgq(chan) (container_of(chan->mbox, struct gh_msgq, mbox))
> +
> +static irqreturn_t gh_msgq_rx_irq_handler(int irq, void *data)
> +{
> +	struct gh_msgq *msgq = data;
> +	struct gh_msgq_rx_data rx_data;
> +	enum gh_error gh_error;

err would be sufficient

> +	bool ready = true;

s/read/more_data/ perhaps?

> +
> +	while (ready) {

Doesn't really matter logically, but what you really mean is:

do {
	...
} while (more_data);

So perhaps reasonable to type that.

> +		gh_error = gh_hypercall_msgq_recv(msgq->rx_ghrsc->capid,
> +				&rx_data.data, sizeof(rx_data.data),
> +				&rx_data.length, &ready);
> +		if (gh_error != GH_ERROR_OK) {
> +			if (gh_error != GH_ERROR_MSGQUEUE_EMPTY)
> +				dev_warn(msgq->mbox.dev, "Failed to receive data: %d\n", gh_error);
> +			break;
> +		}

		if (err == GH_ERROR_MSGQUEUE_EMPTY) {
			break;
		} else if (err != GH_ERROR_OK) {
			dev_warn(...)
			break;
		}

		handle_message();

> +		if (likely(gh_msgq_chan(msgq)->cl))
> +			mbox_chan_received_data(gh_msgq_chan(msgq), &rx_data);
> +	}
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/* Fired when message queue transitions from "full" to "space available" to send messages */
> +static irqreturn_t gh_msgq_tx_irq_handler(int irq, void *data)
> +{
> +	struct gh_msgq *msgq = data;
> +
> +	mbox_chan_txdone(gh_msgq_chan(msgq), 0);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/* Fired after sending message and hypercall told us there was more space available. */
> +static void gh_msgq_txdone_tasklet(struct tasklet_struct *tasklet)
> +{
> +	struct gh_msgq *msgq = container_of(tasklet, struct gh_msgq, txdone_tasklet);
> +
> +	mbox_chan_txdone(gh_msgq_chan(msgq), msgq->last_ret);
> +}
> +
> +static int gh_msgq_send_data(struct mbox_chan *chan, void *data)
> +{
> +	struct gh_msgq *msgq = mbox_chan_to_msgq(chan);
> +	struct gh_msgq_tx_data *msgq_data = data;
> +	u64 tx_flags = 0;
> +	enum gh_error gh_error;
> +	bool ready;

s/ready/not_full/ or invert the values in gh_hypercall_msgq_send()

> +
> +	if (!msgq->tx_ghrsc)
> +		return -EOPNOTSUPP;
> +
> +	if (msgq_data->push)
> +		tx_flags |= GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH;
> +
> +	gh_error = gh_hypercall_msgq_send(msgq->tx_ghrsc->capid, msgq_data->length, msgq_data->data,
> +						tx_flags, &ready);
> +
> +	/**
> +	 * unlikely because Linux tracks state of msgq and should not try to
> +	 * send message when msgq is full.
> +	 */
> +	if (unlikely(gh_error == GH_ERROR_MSGQUEUE_FULL))
> +		return -EAGAIN;

You're propagating this all the way up to the application.

> +
> +	/**
> +	 * Propagate all other errors to client. If we return error to mailbox
> +	 * framework, then no other messages can be sent and nobody will know
> +	 * to retry this message.
> +	 */
> +	msgq->last_ret = gh_error_remap(gh_error);
> +
> +	/**
> +	 * This message was successfully sent, but message queue isn't ready to
> +	 * accept more messages because it's now full. Mailbox framework
> +	 * requires that we only report that message was transmitted when
> +	 * we're ready to transmit another message. We'll get that in the form
> +	 * of tx IRQ once the other side starts to drain the msgq.
> +	 */
> +	if (gh_error == GH_ERROR_OK) {
> +		if (!ready)
> +			return 0;

The message was committed successfully to the queue, there's no reason
for this VM to wait for the receiving VM to start consuming the queue
before returning from mbox_send_message().

As mentioned above, if you move away from the mailbox framework you can
instead keep track of when gh_hypercall_msgq_send() returned "the
message queue was filled up" and wait for the tx interrupt before making
the next call.

This will allow this instance to continue executing to some natural
point of waiting for resources.

> +	} else {
> +		dev_err(msgq->mbox.dev, "Failed to send data: %d (%d)\n", gh_error, msgq->last_ret);
> +	}
> +
> +	/**
> +	 * We can send more messages. Mailbox framework requires that tx done
> +	 * happens asynchronously to sending the message. Gunyah message queues
> +	 * tell us right away on the hypercall return whether we can send more
> +	 * messages. To work around this, defer the txdone to a tasklet.
> +	 */
> +	tasklet_schedule(&msgq->txdone_tasklet);

Ick.

> +
> +	return 0;
> +}
> +
> +static struct mbox_chan_ops gh_msgq_ops = {
> +	.send_data = gh_msgq_send_data,
> +};
> +
> +/**
> + * gh_msgq_init() - Initialize a Gunyah message queue with an mbox_client
> + * @parent: device parent used for the mailbox controller
> + * @msgq: Pointer to the gh_msgq to initialize
> + * @cl: A mailbox client to bind to the mailbox channel that the message queue creates
> + * @tx_ghrsc: optional, the transmission side of the message queue
> + * @rx_ghrsc: optional, the receiving side of the message queue
> + *
> + * At least one of tx_ghrsc and rx_ghrsc must be not NULL. Most message queue use cases come with
> + * a pair of message queues to facilitate bidirectional communication. When tx_ghrsc is set,
> + * the client can send messages with mbox_send_message(gh_msgq_chan(msgq), msg). When rx_ghrsc
> + * is set, the mbox_client must register an .rx_callback() and the message queue driver will
> + * deliver all available messages upon receiving the RX ready interrupt. The messages should be
> + * consumed or copied by the client right away as the gh_msgq_rx_data will be replaced/destroyed
> + * after the callback.
> + *
> + * Returns - 0 on success, negative otherwise
> + */
> +int gh_msgq_init(struct device *parent, struct gh_msgq *msgq, struct mbox_client *cl,
> +		 struct gh_resource *tx_ghrsc, struct gh_resource *rx_ghrsc)
> +{
> +	int ret;
> +
> +	/* Must have at least a tx_ghrsc or rx_ghrsc and that they are the right device types */
> +	if ((!tx_ghrsc && !rx_ghrsc) ||
> +	    (tx_ghrsc && tx_ghrsc->type != GH_RESOURCE_TYPE_MSGQ_TX) ||
> +	    (rx_ghrsc && rx_ghrsc->type != GH_RESOURCE_TYPE_MSGQ_RX))
> +		return -EINVAL;
> +
> +	msgq->mbox.dev = parent;
> +	msgq->mbox.ops = &gh_msgq_ops;
> +	msgq->mbox.num_chans = 1;
> +	msgq->mbox.txdone_irq = true;
> +	msgq->mbox.chans = &msgq->mbox_chan;
> +
> +	ret = mbox_controller_register(&msgq->mbox);
> +	if (ret)
> +		return ret;
> +
> +	ret = mbox_bind_client(gh_msgq_chan(msgq), cl);
> +	if (ret)
> +		goto err_mbox;
> +
> +	if (tx_ghrsc) {
> +		msgq->tx_ghrsc = tx_ghrsc;
> +
> +		ret = request_irq(msgq->tx_ghrsc->irq, gh_msgq_tx_irq_handler, 0, "gh_msgq_tx",

Wrap your lines to 80 characters, unless there's a good reason. You
don't have a good reason here.

> +				msgq);
> +		if (ret)
> +			goto err_tx_ghrsc;
> +
> +		enable_irq_wake(msgq->tx_ghrsc->irq);
> +
> +		tasklet_setup(&msgq->txdone_tasklet, gh_msgq_txdone_tasklet);
> +	}
> +
> +	if (rx_ghrsc) {
> +		msgq->rx_ghrsc = rx_ghrsc;
> +
> +		ret = request_threaded_irq(msgq->rx_ghrsc->irq, NULL, gh_msgq_rx_irq_handler,
> +						IRQF_ONESHOT, "gh_msgq_rx", msgq);
> +		if (ret)
> +			goto err_tx_irq;
> +
> +		enable_irq_wake(msgq->rx_ghrsc->irq);
> +	}
> +
> +	return 0;
> +err_tx_irq:
> +	if (msgq->tx_ghrsc)
> +		free_irq(msgq->tx_ghrsc->irq, msgq);
> +
> +	msgq->rx_ghrsc = NULL;
> +err_tx_ghrsc:
> +	msgq->tx_ghrsc = NULL;
> +err_mbox:
> +	mbox_controller_unregister(&msgq->mbox);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(gh_msgq_init);
> +
> +void gh_msgq_remove(struct gh_msgq *msgq)
> +{
> +	mbox_free_channel(gh_msgq_chan(msgq));
> +
> +	if (msgq->rx_ghrsc)
> +		free_irq(msgq->rx_ghrsc->irq, msgq);
> +
> +	if (msgq->tx_ghrsc) {
> +		tasklet_kill(&msgq->txdone_tasklet);
> +		free_irq(msgq->tx_ghrsc->irq, msgq);
> +	}
> +
> +	mbox_controller_unregister(&msgq->mbox);
> +
> +	msgq->rx_ghrsc = NULL;
> +	msgq->tx_ghrsc = NULL;
> +}
> +EXPORT_SYMBOL_GPL(gh_msgq_remove);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Gunyah Message Queue Driver");
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> index 01a6f202d037e..982e27d10d57f 100644
> --- a/include/linux/gunyah.h
> +++ b/include/linux/gunyah.h
> @@ -8,11 +8,68 @@
>  
>  #include <linux/bitfield.h>
>  #include <linux/errno.h>
> +#include <linux/interrupt.h>
>  #include <linux/limits.h>
> +#include <linux/mailbox_controller.h>
> +#include <linux/mailbox_client.h>
>  #include <linux/types.h>
>  
> +/* Matches resource manager's resource types for VM_GET_HYP_RESOURCES RPC */
> +enum gh_resource_type {
> +	GH_RESOURCE_TYPE_BELL_TX	= 0,
> +	GH_RESOURCE_TYPE_BELL_RX	= 1,
> +	GH_RESOURCE_TYPE_MSGQ_TX	= 2,
> +	GH_RESOURCE_TYPE_MSGQ_RX	= 3,
> +	GH_RESOURCE_TYPE_VCPU		= 4,
> +};
> +
> +struct gh_resource {
> +	enum gh_resource_type type;
> +	u64 capid;
> +	unsigned int irq;
> +};
> +
> +/**
> + * Gunyah Message Queues
> + */
> +
> +#define GH_MSGQ_MAX_MSG_SIZE		240
> +
> +struct gh_msgq_tx_data {
> +	size_t length;
> +	bool push;
> +	char data[];
> +};
> +
> +struct gh_msgq_rx_data {
> +	size_t length;
> +	char data[GH_MSGQ_MAX_MSG_SIZE];
> +};
> +
> +struct gh_msgq {

This is just private data, exposed so that it can be intertwined with
the resource manager implementation.

> +	struct gh_resource *tx_ghrsc;
> +	struct gh_resource *rx_ghrsc;
> +
> +	/* msgq private */
> +	int last_ret; /* Linux error, not GH_STATUS_* */
> +	struct mbox_chan mbox_chan;
> +	struct mbox_controller mbox;
> +	struct tasklet_struct txdone_tasklet;
> +};
> +

Double newlines.

> +
> +int gh_msgq_init(struct device *parent, struct gh_msgq *msgq, struct mbox_client *cl,
> +		     struct gh_resource *tx_ghrsc, struct gh_resource *rx_ghrsc);
> +void gh_msgq_remove(struct gh_msgq *msgq);
> +
> +static inline struct mbox_chan *gh_msgq_chan(struct gh_msgq *msgq)
> +{
> +	return &msgq->mbox.chans[0];
> +}
> +
>  /******************************************************************************/
>  /* Common arch-independent definitions for Gunyah hypercalls                  */
> +

Unrelated change.

Regards,
Bjorn

>  #define GH_CAPID_INVAL	U64_MAX
>  #define GH_VMID_ROOT_VM	0xff
>  
> -- 
> 2.40.0
>
Bjorn Andersson Aug. 5, 2023, 5:02 a.m. UTC | #7
On Tue, Jun 13, 2023 at 10:20:35AM -0700, Elliot Berman wrote:
> The resource manager is a special virtual machine which is always
> running on a Gunyah system. It provides APIs for creating and destroying
> VMs, secure memory management, sharing/lending of memory between VMs,
> and setup of inter-VM communication. Calls to the resource manager are
> made via message queues.
> 
> This patch implements the basic probing and RPC mechanism to make those
> API calls. Request/response calls can be made with gh_rm_call.
> Drivers can also register to notifications pushed by RM via
> gh_rm_register_notifier
> 
> Specific API calls that resource manager supports will be implemented in
> subsequent patches.
> 
> Reviewed-by: Alex Elder <elder@linaro.org>
> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
> ---
>  drivers/virt/Makefile          |   1 +
>  drivers/virt/gunyah/Makefile   |   4 +
>  drivers/virt/gunyah/rsc_mgr.c  | 700 +++++++++++++++++++++++++++++++++
>  drivers/virt/gunyah/rsc_mgr.h  |  16 +
>  include/linux/gunyah_rsc_mgr.h |  21 +
>  5 files changed, 742 insertions(+)
>  create mode 100644 drivers/virt/gunyah/Makefile
>  create mode 100644 drivers/virt/gunyah/rsc_mgr.c
>  create mode 100644 drivers/virt/gunyah/rsc_mgr.h
>  create mode 100644 include/linux/gunyah_rsc_mgr.h
> 
> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> index e9aa6fc96fab7..a5817e2d7d718 100644
> --- a/drivers/virt/Makefile
> +++ b/drivers/virt/Makefile
> @@ -12,3 +12,4 @@ obj-$(CONFIG_ACRN_HSM)		+= acrn/
>  obj-$(CONFIG_EFI_SECRET)	+= coco/efi_secret/
>  obj-$(CONFIG_SEV_GUEST)		+= coco/sev-guest/
>  obj-$(CONFIG_INTEL_TDX_GUEST)	+= coco/tdx-guest/
> +obj-y				+= gunyah/
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> new file mode 100644
> index 0000000000000..0f5aec8346988
> --- /dev/null
> +++ b/drivers/virt/gunyah/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +gunyah-y += rsc_mgr.o
> +obj-$(CONFIG_GUNYAH) += gunyah.o
> diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
> new file mode 100644
> index 0000000000000..04c8e131d259f
> --- /dev/null
> +++ b/drivers/virt/gunyah/rsc_mgr.c
> @@ -0,0 +1,700 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/of.h>
> +#include <linux/slab.h>
> +#include <linux/mutex.h>
> +#include <linux/sched.h>
> +#include <linux/gunyah.h>
> +#include <linux/module.h>
> +#include <linux/of_irq.h>
> +#include <linux/notifier.h>
> +#include <linux/workqueue.h>
> +#include <linux/completion.h>
> +#include <linux/gunyah_rsc_mgr.h>
> +#include <linux/platform_device.h>
> +
> +#include "rsc_mgr.h"
> +
> +#define RM_RPC_API_VERSION_MASK		GENMASK(3, 0)
> +#define RM_RPC_HEADER_WORDS_MASK	GENMASK(7, 4)
> +#define RM_RPC_API_VERSION		FIELD_PREP(RM_RPC_API_VERSION_MASK, 1)
> +#define RM_RPC_HEADER_WORDS		FIELD_PREP(RM_RPC_HEADER_WORDS_MASK, \
> +						(sizeof(struct gh_rm_rpc_hdr) / sizeof(u32)))
> +#define RM_RPC_API			(RM_RPC_API_VERSION | RM_RPC_HEADER_WORDS)
> +
> +#define RM_RPC_TYPE_CONTINUATION	0x0
> +#define RM_RPC_TYPE_REQUEST		0x1
> +#define RM_RPC_TYPE_REPLY		0x2
> +#define RM_RPC_TYPE_NOTIF		0x3
> +#define RM_RPC_TYPE_MASK		GENMASK(1, 0)
> +
> +#define GH_RM_MAX_NUM_FRAGMENTS		62
> +#define RM_RPC_FRAGMENTS_MASK		GENMASK(7, 2)
> +
> +struct gh_rm_rpc_hdr {
> +	u8 api;
> +	u8 type;
> +	__le16 seq;
> +	__le32 msg_id;
> +} __packed;
> +
> +struct gh_rm_rpc_reply_hdr {
> +	struct gh_rm_rpc_hdr hdr;
> +	__le32 err_code; /* GH_RM_ERROR_* */
> +} __packed;
> +
> +#define GH_RM_MAX_MSG_SIZE	(GH_MSGQ_MAX_MSG_SIZE - sizeof(struct gh_rm_rpc_hdr))
> +
> +/* RM Error codes */
> +enum gh_rm_error {
> +	GH_RM_ERROR_OK			= 0x0,
> +	GH_RM_ERROR_UNIMPLEMENTED	= 0xFFFFFFFF,
> +	GH_RM_ERROR_NOMEM		= 0x1,
> +	GH_RM_ERROR_NORESOURCE		= 0x2,
> +	GH_RM_ERROR_DENIED		= 0x3,
> +	GH_RM_ERROR_INVALID		= 0x4,
> +	GH_RM_ERROR_BUSY		= 0x5,
> +	GH_RM_ERROR_ARGUMENT_INVALID	= 0x6,
> +	GH_RM_ERROR_HANDLE_INVALID	= 0x7,
> +	GH_RM_ERROR_VALIDATE_FAILED	= 0x8,
> +	GH_RM_ERROR_MAP_FAILED		= 0x9,
> +	GH_RM_ERROR_MEM_INVALID		= 0xA,
> +	GH_RM_ERROR_MEM_INUSE		= 0xB,
> +	GH_RM_ERROR_MEM_RELEASED	= 0xC,
> +	GH_RM_ERROR_VMID_INVALID	= 0xD,
> +	GH_RM_ERROR_LOOKUP_FAILED	= 0xE,
> +	GH_RM_ERROR_IRQ_INVALID		= 0xF,
> +	GH_RM_ERROR_IRQ_INUSE		= 0x10,
> +	GH_RM_ERROR_IRQ_RELEASED	= 0x11,
> +};
> +
> +/**
> + * struct gh_rm_connection - Represents a complete message from resource manager
> + * @payload: Combined payload of all the fragments (msg headers stripped off).
> + * @size: Size of the payload received so far.
> + * @msg_id: Message ID from the header.
> + * @type: RM_RPC_TYPE_REPLY or RM_RPC_TYPE_NOTIF.
> + * @num_fragments: total number of fragments expected to be received.
> + * @fragments_received: fragments received so far.
> + * @reply: Fields used for request/reply sequences
> + * @notification: Fields used for notifiations
> + */
> +struct gh_rm_connection {

Is there a good reason why this is called a connection, when it's
actually a message (specifically a response or notification)?

> +	void *payload;
> +	size_t size;
> +	__le32 msg_id;

Everything else in this struct is native endian, I don't think you loose
much from keeping this native and moving your
cpu_to_le32()/le32_to_cpu() calls around the implementation.

> +	u8 type;
> +
> +	u8 num_fragments;
> +	u8 fragments_received;
> +
> +	union {
> +		/**
> +		 * @ret: Linux return code, there was an error processing connection
> +		 * @seq: Sequence ID for the main message.
> +		 * @rm_error: For request/reply sequences with standard replies
> +		 * @seq_done: Signals caller that the RM reply has been received
> +		 */
> +		struct {
> +			int ret;
> +			u16 seq;
> +			enum gh_rm_error rm_error;
> +			struct completion seq_done;
> +		} reply;
> +
> +		/**
> +		 * @rm: Pointer to the RM that launched the connection
> +		 * @work: Triggered when all fragments of a notification received
> +		 */
> +		struct {
> +			struct gh_rm *rm;
> +			struct work_struct work;
> +		} notification;
> +	};
> +};
> +
> +/**
> + * struct gh_rm - private data for communicating w/Gunyah resource manager
> + * @dev: pointer to RM platform device
> + * @tx_ghrsc: message queue resource to TX to RM
> + * @rx_ghrsc: message queue resource to RX from RM
> + * @msgq: mailbox instance of TX/RX resources above
> + * @msgq_client: mailbox client of above msgq
> + * @active_rx_connection: ongoing gh_rm_connection for which we're receiving fragments
> + * @last_tx_ret: return value of last mailbox tx
> + * @call_xarray: xarray to allocate & lookup sequence IDs for Request/Response flows
> + * @next_seq: next ID to allocate (for xa_alloc_cyclic)
> + * @cache: cache for allocating Tx messages
> + * @send_lock: synchronization to allow only one request to be sent at a time
> + * @nh: notifier chain for clients interested in RM notification messages
> + */
> +struct gh_rm {
> +	struct device *dev;
> +	struct gh_resource tx_ghrsc;
> +	struct gh_resource rx_ghrsc;
> +	struct gh_msgq msgq;
> +	struct mbox_client msgq_client;
> +	struct gh_rm_connection *active_rx_connection;
> +	int last_tx_ret;
> +
> +	struct xarray call_xarray;
> +	u32 next_seq;
> +
> +	struct kmem_cache *cache;
> +	struct mutex send_lock;
> +	struct blocking_notifier_head nh;
> +};
> +
> +/**
> + * gh_rm_error_remap() - Remap Gunyah resource manager errors into a Linux error code
> + * @rm_error: "Standard" return value from Gunyah resource manager
> + */
> +static inline int gh_rm_error_remap(enum gh_rm_error rm_error)
> +{
> +	switch (rm_error) {
> +	case GH_RM_ERROR_OK:
> +		return 0;
> +	case GH_RM_ERROR_UNIMPLEMENTED:
> +		return -EOPNOTSUPP;
> +	case GH_RM_ERROR_NOMEM:
> +		return -ENOMEM;
> +	case GH_RM_ERROR_NORESOURCE:
> +		return -ENODEV;
> +	case GH_RM_ERROR_DENIED:
> +		return -EPERM;
> +	case GH_RM_ERROR_BUSY:
> +		return -EBUSY;
> +	case GH_RM_ERROR_INVALID:
> +	case GH_RM_ERROR_ARGUMENT_INVALID:
> +	case GH_RM_ERROR_HANDLE_INVALID:
> +	case GH_RM_ERROR_VALIDATE_FAILED:
> +	case GH_RM_ERROR_MAP_FAILED:
> +	case GH_RM_ERROR_MEM_INVALID:
> +	case GH_RM_ERROR_MEM_INUSE:
> +	case GH_RM_ERROR_MEM_RELEASED:
> +	case GH_RM_ERROR_VMID_INVALID:
> +	case GH_RM_ERROR_LOOKUP_FAILED:
> +	case GH_RM_ERROR_IRQ_INVALID:
> +	case GH_RM_ERROR_IRQ_INUSE:
> +	case GH_RM_ERROR_IRQ_RELEASED:
> +		return -EINVAL;
> +	default:
> +		return -EBADMSG;
> +	}
> +}
> +
> +static int gh_rm_init_connection_payload(struct gh_rm_connection *connection, void *msg,
> +					size_t hdr_size, size_t msg_size)
> +{
> +	size_t max_buf_size, payload_size;
> +	struct gh_rm_rpc_hdr *hdr = msg;
> +
> +	if (msg_size < hdr_size)
> +		return -EINVAL;
> +
> +	payload_size = msg_size - hdr_size;
> +
> +	connection->num_fragments = FIELD_GET(RM_RPC_FRAGMENTS_MASK, hdr->type);
> +	connection->fragments_received = 0;
> +
> +	/* There's not going to be any payload, no need to allocate buffer. */
> +	if (!payload_size && !connection->num_fragments)
> +		return 0;
> +
> +	if (connection->num_fragments > GH_RM_MAX_NUM_FRAGMENTS)
> +		return -EINVAL;
> +
> +	max_buf_size = payload_size + (connection->num_fragments * GH_RM_MAX_MSG_SIZE);
> +
> +	connection->payload = kzalloc(max_buf_size, GFP_KERNEL);
> +	if (!connection->payload)
> +		return -ENOMEM;
> +
> +	memcpy(connection->payload, msg + hdr_size, payload_size);
> +	connection->size = payload_size;
> +	return 0;
> +}
> +
> +static void gh_rm_abort_connection(struct gh_rm *rm)
> +{
> +	switch (rm->active_rx_connection->type) {
> +	case RM_RPC_TYPE_REPLY:
> +		rm->active_rx_connection->reply.ret = -EIO;
> +		complete(&rm->active_rx_connection->reply.seq_done);
> +		break;
> +	case RM_RPC_TYPE_NOTIF:
> +		fallthrough;

Don't you need to cancel the work here as well?

> +	default:
> +		kfree(rm->active_rx_connection->payload);
> +		kfree(rm->active_rx_connection);
> +	}
> +
> +	rm->active_rx_connection = NULL;
> +}
> +
> +static void gh_rm_notif_work(struct work_struct *work)
> +{
> +	struct gh_rm_connection *connection = container_of(work, struct gh_rm_connection,
> +								notification.work);
> +	struct gh_rm *rm = connection->notification.rm;
> +
> +	blocking_notifier_call_chain(&rm->nh, le32_to_cpu(connection->msg_id), connection->payload);
> +
> +	put_device(rm->dev);
> +	kfree(connection->payload);
> +	kfree(connection);
> +}
> +
> +static void gh_rm_process_notif(struct gh_rm *rm, void *msg, size_t msg_size)
> +{
> +	struct gh_rm_connection *connection;
> +	struct gh_rm_rpc_hdr *hdr = msg;
> +	int ret;
> +
> +	if (rm->active_rx_connection)
> +		gh_rm_abort_connection(rm);

So every notification or reply is always expected to be sent in full, or
it should be thrown away?

I do find this logic a little bit odd, can you confirm that this is
defined by the hypervisor?

> +
> +	connection = kzalloc(sizeof(*connection), GFP_KERNEL);
> +	if (!connection)
> +		return;
> +
> +	connection->type = RM_RPC_TYPE_NOTIF;
> +	connection->msg_id = hdr->msg_id;
> +
> +	get_device(rm->dev);
> +	connection->notification.rm = rm;
> +	INIT_WORK(&connection->notification.work, gh_rm_notif_work);

I tend to dislike work in allocated objects, in particularly when they
are scheduled on the global work queues...

I'd prefer a list of outstanding notifications associated with the rm
and a worker that traverses that list. That saves you from carrying the
rm back-pointer in the "connection" and you don't need to refcount the
rm->dev.

> +
> +	ret = gh_rm_init_connection_payload(connection, msg, sizeof(*hdr), msg_size);
> +	if (ret) {
> +		dev_err(rm->dev, "Failed to initialize connection for notification: %d\n", ret);
> +		put_device(rm->dev);
> +		kfree(connection);
> +		return;
> +	}
> +
> +	rm->active_rx_connection = connection;
> +}
> +
> +static void gh_rm_process_reply(struct gh_rm *rm, void *msg, size_t msg_size)
> +{
> +	struct gh_rm_rpc_reply_hdr *reply_hdr = msg;
> +	struct gh_rm_connection *connection;
> +	u16 seq_id;
> +
> +	seq_id = le16_to_cpu(reply_hdr->hdr.seq);
> +	connection = xa_load(&rm->call_xarray, seq_id);
> +
> +	if (!connection || connection->msg_id != reply_hdr->hdr.msg_id)

Wouldn't it be bad if a response is received with the wrong msg_id?
Please confirm that this should be silently ignored.

> +		return;
> +
> +	if (rm->active_rx_connection)
> +		gh_rm_abort_connection(rm);
> +
> +	if (gh_rm_init_connection_payload(connection, msg, sizeof(*reply_hdr), msg_size)) {
> +		dev_err(rm->dev, "Failed to alloc connection buffer for sequence %d\n", seq_id);
> +		/* Send connection complete and error the client. */
> +		connection->reply.ret = -ENOMEM;
> +		complete(&connection->reply.seq_done);
> +		return;
> +	}
> +
> +	connection->reply.rm_error = le32_to_cpu(reply_hdr->err_code);
> +	rm->active_rx_connection = connection;
> +}
> +
> +static void gh_rm_process_cont(struct gh_rm *rm, struct gh_rm_connection *connection,
> +				void *msg, size_t msg_size)
> +{
> +	struct gh_rm_rpc_hdr *hdr = msg;
> +	size_t payload_size = msg_size - sizeof(*hdr);
> +
> +	if (!rm->active_rx_connection)

Perhaps reasonable to be consistent and complain if the hypervisor gives
you a continuation when you have no active response? Seems like this
would result in a silent hang...

> +		return;
> +
> +	/*
> +	 * hdr->fragments and hdr->msg_id preserves the value from first reply
> +	 * or notif message. To detect mishandling, check it's still intact.
> +	 */
> +	if (connection->msg_id != hdr->msg_id ||
> +		connection->num_fragments != FIELD_GET(RM_RPC_FRAGMENTS_MASK, hdr->type)) {

When breaking an expression, align all lines under each other. Here it
looks like connection->num_fragments.. is part of the conditional block,
rather than the expression.

> +		gh_rm_abort_connection(rm);
> +		return;
> +	}
> +
> +	memcpy(connection->payload + connection->size, msg + sizeof(*hdr), payload_size);
> +	connection->size += payload_size;
> +	connection->fragments_received++;
> +}
> +
> +static void gh_rm_try_complete_connection(struct gh_rm *rm)
> +{
> +	struct gh_rm_connection *connection = rm->active_rx_connection;
> +
> +	if (!connection || connection->fragments_received != connection->num_fragments)
> +		return;
> +
> +	switch (connection->type) {
> +	case RM_RPC_TYPE_REPLY:
> +		complete(&connection->reply.seq_done);
> +		break;
> +	case RM_RPC_TYPE_NOTIF:
> +		schedule_work(&connection->notification.work);
> +		break;
> +	default:
> +		dev_err_ratelimited(rm->dev, "Invalid message type (%u) received\n",
> +					connection->type);
> +		gh_rm_abort_connection(rm);
> +		break;
> +	}
> +
> +	rm->active_rx_connection = NULL;
> +}
> +
> +static void gh_rm_msgq_rx_data(struct mbox_client *cl, void *mssg)
> +{
> +	struct gh_rm *rm = container_of(cl, struct gh_rm, msgq_client);
> +	struct gh_msgq_rx_data *rx_data = mssg;
> +	size_t msg_size = rx_data->length;
> +	void *msg = rx_data->data;
> +	struct gh_rm_rpc_hdr *hdr;
> +
> +	if (msg_size < sizeof(*hdr) || msg_size > GH_MSGQ_MAX_MSG_SIZE)
> +		return;
> +
> +	hdr = msg;
> +	if (hdr->api != RM_RPC_API) {
> +		dev_err(rm->dev, "Unknown RM RPC API version: %x\n", hdr->api);
> +		return;
> +	}
> +
> +	switch (FIELD_GET(RM_RPC_TYPE_MASK, hdr->type)) {
> +	case RM_RPC_TYPE_NOTIF:
> +		gh_rm_process_notif(rm, msg, msg_size);
> +		break;
> +	case RM_RPC_TYPE_REPLY:
> +		gh_rm_process_reply(rm, msg, msg_size);
> +		break;
> +	case RM_RPC_TYPE_CONTINUATION:
> +		gh_rm_process_cont(rm, rm->active_rx_connection, msg, msg_size);
> +		break;
> +	default:
> +		dev_err(rm->dev, "Invalid message type (%lu) received\n",
> +			FIELD_GET(RM_RPC_TYPE_MASK, hdr->type));
> +		return;
> +	}
> +
> +	gh_rm_try_complete_connection(rm);

I stared at this logic for quite a while, I think it would be clearer if
you moved the "now check if we did get all the data and if so handle it"
to each f the process functions.

> +}
> +
> +static void gh_rm_msgq_tx_done(struct mbox_client *cl, void *mssg, int r)
> +{
> +	struct gh_rm *rm = container_of(cl, struct gh_rm, msgq_client);
> +
> +	kmem_cache_free(rm->cache, mssg);
> +	rm->last_tx_ret = r;
> +}
> +
> +static int gh_rm_send_request(struct gh_rm *rm, u32 message_id,
> +			      const void *req_buf, size_t req_buf_size,
> +			      struct gh_rm_connection *connection)
> +{
> +	size_t buf_size_remaining = req_buf_size;
> +	const void *req_buf_curr = req_buf;
> +	struct gh_msgq_tx_data *msg;
> +	struct gh_rm_rpc_hdr *hdr, hdr_template;
> +	u32 cont_fragments = 0;
> +	size_t payload_size;
> +	void *payload;
> +	int ret;
> +
> +	if (req_buf_size > GH_RM_MAX_NUM_FRAGMENTS * GH_RM_MAX_MSG_SIZE) {
> +		dev_warn(rm->dev, "Limit (%lu bytes) exceeded for the maximum message size: %lu\n",
> +			GH_RM_MAX_NUM_FRAGMENTS * GH_RM_MAX_MSG_SIZE, req_buf_size);
> +		dump_stack();
> +		return -E2BIG;
> +	}
> +
> +	if (req_buf_size)
> +		cont_fragments = (req_buf_size - 1) / GH_RM_MAX_MSG_SIZE;
> +
> +	hdr_template.api = RM_RPC_API;
> +	hdr_template.type = FIELD_PREP(RM_RPC_TYPE_MASK, RM_RPC_TYPE_REQUEST) |
> +				FIELD_PREP(RM_RPC_FRAGMENTS_MASK, cont_fragments);
> +	hdr_template.seq = cpu_to_le16(connection->reply.seq);
> +	hdr_template.msg_id = cpu_to_le32(message_id);
> +
> +	ret = mutex_lock_interruptible(&rm->send_lock);
> +	if (ret)
> +		return ret;
> +
> +	do {
> +		msg = kmem_cache_zalloc(rm->cache, GFP_KERNEL);

Per the rm->send_lock() we can only have one msg outstanding at any
point in time. So you should be able to allocate one of those at probe
time and be done with it.

> +		if (!msg) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		/* Fill header */
> +		hdr = (struct gh_rm_rpc_hdr *)&msg->data[0];
> +		*hdr = hdr_template;
> +
> +		/* Copy payload */
> +		payload = &msg->data[0] + sizeof(*hdr);
> +		payload_size = min(buf_size_remaining, GH_RM_MAX_MSG_SIZE);
> +		memcpy(payload, req_buf_curr, payload_size);
> +		req_buf_curr += payload_size;
> +		buf_size_remaining -= payload_size;
> +
> +		/* Force the last fragment to immediately alert the receiver */
> +		msg->push = !buf_size_remaining;
> +		msg->length = sizeof(*hdr) + payload_size;
> +
> +		ret = mbox_send_message(gh_msgq_chan(&rm->msgq), msg);

This would be much cleaner if it was just an optional wait for pending
tx irq and a synchronous call to gh_hypercall_msgq_send().

> +		if (ret < 0) {
> +			kmem_cache_free(rm->cache, msg);
> +			break;
> +		}
> +
> +		if (rm->last_tx_ret) {
> +			ret = rm->last_tx_ret;
> +			break;
> +		}
> +
> +		hdr_template.type = FIELD_PREP(RM_RPC_TYPE_MASK, RM_RPC_TYPE_CONTINUATION) |
> +					FIELD_PREP(RM_RPC_FRAGMENTS_MASK, cont_fragments);
> +	} while (buf_size_remaining);
> +
> +out:
> +	mutex_unlock(&rm->send_lock);
> +	return ret < 0 ? ret : 0;
> +}
> +
> +/**
> + * gh_rm_call: Achieve request-response type communication with RPC
> + * @rm: Pointer to Gunyah resource manager internal data
> + * @message_id: The RM RPC message-id
> + * @req_buf: Request buffer that contains the payload
> + * @req_buf_size: Total size of the payload
> + * @resp_buf: Pointer to a response buffer
> + * @resp_buf_size: Size of the response buffer
> + *
> + * Make a request to the Resource Manager and wait for reply back. For a successful
> + * response, the function returns the payload. The size of the payload is set in
> + * resp_buf_size. The resp_buf must be freed by the caller when 0 is returned
> + * and resp_buf_size != 0.
> + *
> + * req_buf should be not NULL for req_buf_size >0. If req_buf_size == 0,
> + * req_buf *can* be NULL and no additional payload is sent.
> + *
> + * Context: Process context. Will sleep waiting for reply.
> + * Return: 0 on success. <0 if error.
> + */
> +int gh_rm_call(struct gh_rm *rm, u32 message_id, const void *req_buf, size_t req_buf_size,
> +		void **resp_buf, size_t *resp_buf_size)
> +{
> +	struct gh_rm_connection *connection;
> +	u32 seq_id;
> +	int ret;
> +
> +	/* message_id 0 is reserved. req_buf_size implies req_buf is not NULL */
> +	if (!rm || !message_id || (!req_buf && req_buf_size))
> +		return -EINVAL;
> +
> +
> +	connection = kzalloc(sizeof(*connection), GFP_KERNEL);

No matter how many times I read this code, this isn't "a connection" it
is the response message.

sizeof(*connection) doesn't seem too bad either, so you should be able
to just store it on the stack.

> +	if (!connection)
> +		return -ENOMEM;
> +
> +	connection->type = RM_RPC_TYPE_REPLY;
> +	connection->msg_id = cpu_to_le32(message_id);
> +
> +	init_completion(&connection->reply.seq_done);
> +
> +	/* Allocate a new seq number for this connection */
> +	ret = xa_alloc_cyclic(&rm->call_xarray, &seq_id, connection, xa_limit_16b, &rm->next_seq,
> +				GFP_KERNEL);
> +	if (ret < 0)
> +		goto free;
> +	connection->reply.seq = lower_16_bits(seq_id);

You use connection->reply.seq for two things; 1) as a local variable, 2)
to pass the sequence id as an argument to gh_rm_send_request().

Further more, reply.seq is the only thing extracted from "connection" in
gh_rm_send_request(), so if you passed that directly instead of the
"connection" #2 is gone and you're down to duplicating the variable
already on the stack...

So you can remove "seq" from struct gh_rm_connection...

> +
> +	/* Send the request to the Resource Manager */
> +	ret = gh_rm_send_request(rm, message_id, req_buf, req_buf_size, connection);
> +	if (ret < 0)
> +		goto out;
> +
> +	/* Wait for response. Uninterruptible because rollback based on what RM did to VM

/*
 * Multi-line comments should generally be started on the second line.
 */

> +	 * requires us to know how RM handled the call.
> +	 */
> +	wait_for_completion(&connection->reply.seq_done);
> +
> +	/* Check for internal (kernel) error waiting for the response */
> +	if (connection->reply.ret) {
> +		ret = connection->reply.ret;
> +		if (ret != -ENOMEM)
> +			kfree(connection->payload);
> +		goto out;
> +	}
> +
> +	/* Got a response, did resource manager give us an error? */
> +	if (connection->reply.rm_error != GH_RM_ERROR_OK) {
> +		dev_warn(rm->dev, "RM rejected message %08x. Error: %d\n", message_id,
> +			connection->reply.rm_error);
> +		ret = gh_rm_error_remap(connection->reply.rm_error);
> +		kfree(connection->payload);
> +		goto out;
> +	}
> +
> +	/* Everything looks good, return the payload */
> +	if (resp_buf_size)
> +		*resp_buf_size = connection->size;
> +	if (connection->size && resp_buf)

Keep {} around all blocks if anyone of them need it.

> +		*resp_buf = connection->payload;
> +	else {
> +		/* kfree in case RM sent us multiple fragments but never any data in
> +		 * those fragments. We would've allocated memory for it, but connection->size == 0
> +		 */
> +		kfree(connection->payload);
> +	}
> +
> +out:
> +	xa_erase(&rm->call_xarray, connection->reply.seq);
> +free:
> +	kfree(connection);
> +	return ret;
> +}
> +
> +
> +int gh_rm_notifier_register(struct gh_rm *rm, struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_register(&rm->nh, nb);
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_notifier_register);
> +
> +int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_unregister(&rm->nh, nb);
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_notifier_unregister);
> +
> +static int gh_msgq_platform_probe_direction(struct platform_device *pdev, bool tx,
> +					    struct gh_resource *ghrsc)
> +{
> +	struct device_node *node = pdev->dev.of_node;
> +	int ret;
> +	int idx = tx ? 0 : 1;
> +
> +	ghrsc->type = tx ? GH_RESOURCE_TYPE_MSGQ_TX : GH_RESOURCE_TYPE_MSGQ_RX;
> +
> +	ghrsc->irq = platform_get_irq(pdev, idx);
> +	if (ghrsc->irq < 0) {
> +		dev_err(&pdev->dev, "Failed to get irq%d: %d\n", idx, ghrsc->irq);

"irq0"/"irq1" won't help the reader much.

... "Failed to get %s irq: %d\n", tx ? "tx" : "rx", ghrsc->irq);

> +		return ghrsc->irq;
> +	}
> +
> +	ret = of_property_read_u64_index(node, "reg", idx, &ghrsc->capid);
> +	if (ret) {
> +		dev_err(&pdev->dev, "Failed to get capid%d: %d\n", idx, ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int gh_identify(void)
> +{
> +	struct gh_hypercall_hyp_identify_resp gh_api;
> +
> +	if (!arch_is_gh_guest())
> +		return -ENODEV;
> +
> +	gh_hypercall_hyp_identify(&gh_api);
> +
> +	pr_info("Running under Gunyah hypervisor %llx/v%u\n",
> +		FIELD_GET(GH_API_INFO_VARIANT_MASK, gh_api.api_info),
> +		gh_api_version(&gh_api));
> +
> +	/* We might move this out to individual drivers if there's ever an API version bump */
> +	if (gh_api_version(&gh_api) != GH_API_V1) {
> +		pr_info("Unsupported Gunyah version: %u\n", gh_api_version(&gh_api));

Is this just an informational statement?

> +		return -ENODEV;
> +	}
> +
> +	return 0;
> +}
> +
> +static int gh_rm_drv_probe(struct platform_device *pdev)

Does "_drv_" add value to this function name?

> +{
> +	struct gh_msgq_tx_data *msg;
> +	struct gh_rm *rm;
> +	int ret;
> +
> +	ret = gh_identify();
> +	if (ret)
> +		return ret;
> +
> +	rm = devm_kzalloc(&pdev->dev, sizeof(*rm), GFP_KERNEL);
> +	if (!rm)
> +		return -ENOMEM;
> +
> +	platform_set_drvdata(pdev, rm);
> +	rm->dev = &pdev->dev;
> +
> +	mutex_init(&rm->send_lock);
> +	BLOCKING_INIT_NOTIFIER_HEAD(&rm->nh);
> +	xa_init_flags(&rm->call_xarray, XA_FLAGS_ALLOC);
> +	rm->cache = kmem_cache_create("gh_rm", struct_size(msg, data, GH_MSGQ_MAX_MSG_SIZE), 0,
> +		SLAB_HWCACHE_ALIGN, NULL);
> +	if (!rm->cache)
> +		return -ENOMEM;
> +
> +	ret = gh_msgq_platform_probe_direction(pdev, true, &rm->tx_ghrsc);
> +	if (ret)
> +		goto err_cache;
> +
> +	ret = gh_msgq_platform_probe_direction(pdev, false, &rm->rx_ghrsc);
> +	if (ret)
> +		goto err_cache;
> +
> +	rm->msgq_client.dev = &pdev->dev;
> +	rm->msgq_client.tx_block = true;
> +	rm->msgq_client.rx_callback = gh_rm_msgq_rx_data;
> +	rm->msgq_client.tx_done = gh_rm_msgq_tx_done;
> +
> +	return gh_msgq_init(&pdev->dev, &rm->msgq, &rm->msgq_client, &rm->tx_ghrsc, &rm->rx_ghrsc);
> +err_cache:
> +	kmem_cache_destroy(rm->cache);
> +	return ret;
> +}
> +
> +static int gh_rm_drv_remove(struct platform_device *pdev)
> +{
> +	struct gh_rm *rm = platform_get_drvdata(pdev);
> +
> +	gh_msgq_remove(&rm->msgq);
> +	kmem_cache_destroy(rm->cache);
> +

As soon as you return here, "rm" will be freed by devres (it's not tied
to the refcount of the struct device). If you have any outstanding
notification work, that will result in a use-after-free.

> +	return 0;
> +}
> +
> +static const struct of_device_id gh_rm_of_match[] = {
> +	{ .compatible = "gunyah-resource-manager" },
> +	{}
> +};
> +MODULE_DEVICE_TABLE(of, gh_rm_of_match);
> +
> +static struct platform_driver gh_rm_driver = {
> +	.probe = gh_rm_drv_probe,
> +	.remove = gh_rm_drv_remove,
> +	.driver = {
> +		.name = "gh_rsc_mgr",
> +		.of_match_table = gh_rm_of_match,
> +	},
> +};
> +module_platform_driver(gh_rm_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Gunyah Resource Manager Driver");
> diff --git a/drivers/virt/gunyah/rsc_mgr.h b/drivers/virt/gunyah/rsc_mgr.h
> new file mode 100644
> index 0000000000000..8309b7bf46683
> --- /dev/null
> +++ b/drivers/virt/gunyah/rsc_mgr.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +#ifndef __GH_RSC_MGR_PRIV_H
> +#define __GH_RSC_MGR_PRIV_H
> +
> +#include <linux/gunyah.h>
> +#include <linux/gunyah_rsc_mgr.h>

You declare struct gh_rm below, so do you really need these two
includes?

> +#include <linux/types.h>
> +
> +struct gh_rm;

Would be nice with an empty line here.

> +int gh_rm_call(struct gh_rm *rsc_mgr, u32 message_id, const void *req_buf, size_t req_buf_size,
> +		void **resp_buf, size_t *resp_buf_size);
> +
> +#endif
> diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
> new file mode 100644
> index 0000000000000..f2a312e80af52
> --- /dev/null
> +++ b/include/linux/gunyah_rsc_mgr.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#ifndef _GUNYAH_RSC_MGR_H
> +#define _GUNYAH_RSC_MGR_H
> +
> +#include <linux/list.h>

Unused?

> +#include <linux/notifier.h>
> +#include <linux/gunyah.h>
> +
> +#define GH_VMID_INVAL		U16_MAX

Unused in this patch.

> +
> +struct gh_rm;

Would be nice with an empty line here.

> +int gh_rm_notifier_register(struct gh_rm *rm, struct notifier_block *nb);
> +int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb);
> +struct device *gh_rm_get(struct gh_rm *rm);
> +void gh_rm_put(struct gh_rm *rm);
> +

Regards,
Bjorn
Bjorn Andersson Aug. 5, 2023, 5:07 p.m. UTC | #8
On Tue, Jun 13, 2023 at 10:20:29AM -0700, Elliot Berman wrote:
> Gunyah is an open-source Type-1 hypervisor developed by Qualcomm. It
> does not depend on any lower-privileged OS/kernel code for its core
> functionality. This increases its security and can support a smaller
> trusted computing based when compared to Type-2 hypervisors.
> 
> Add documentation describing the Gunyah hypervisor and the main
> components of the Gunyah hypervisor which are of interest to Linux
> virtualization development.
> 
> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
> ---
>  Documentation/virt/gunyah/index.rst         | 113 ++++++++++++++++++++
>  Documentation/virt/gunyah/message-queue.rst |  63 +++++++++++
>  Documentation/virt/index.rst                |   1 +
>  3 files changed, 177 insertions(+)
>  create mode 100644 Documentation/virt/gunyah/index.rst
>  create mode 100644 Documentation/virt/gunyah/message-queue.rst
> 
> diff --git a/Documentation/virt/gunyah/index.rst b/Documentation/virt/gunyah/index.rst
> new file mode 100644
> index 0000000000000..74aa345e0a144
> --- /dev/null
> +++ b/Documentation/virt/gunyah/index.rst
> @@ -0,0 +1,113 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=================
> +Gunyah Hypervisor
> +=================
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   message-queue
> +
> +Gunyah is a Type-1 hypervisor which is independent of any OS kernel, and runs in
> +a higher CPU privilege level. It does not depend on any lower-privileged operating system
> +for its core functionality. This increases its security and can support a much smaller
> +trusted computing base than a Type-2 hypervisor.

Wrap your lines at 80 characters please.

> +
> +Gunyah is an open source hypervisor. The source repo is available at
> +https://github.com/quic/gunyah-hypervisor.
> +
> +Gunyah provides these following features.
> +
> +- Scheduling:
> +
> +  A scheduler for virtual CPUs (vCPUs) on physical CPUs enables time-sharing
> +  of the CPUs. Gunyah supports two models of scheduling:
> +
> +    1. "Behind the back" scheduling in which Gunyah hypervisor schedules vCPUS on its own.
> +    2. "Proxy" scheduling in which a delegated VM can donate part of one of its vCPU slice
> +       to another VM's vCPU via a hypercall.
> +
> +- Memory Management:
> +
> +  APIs handling memory, abstracted as objects, limiting direct use of physical
> +  addresses. Memory ownership and usage tracking of all memory under its control.
> +  Memory partitioning between VMs is a fundamental security feature.
> +
> +- Interrupt Virtualization:
> +
> +  Uses CPU hardware interrupt virtualization capabilities. Interrupts are handled
> +  in the hypervisor and routed to the assigned VM.
> +
> +- Inter-VM Communication:
> +
> +  There are several different mechanisms provided for communicating between VMs.
> +
> +- Virtual platform:
> +
> +  Architectural devices such as interrupt controllers and CPU timers are directly provided
> +  by the hypervisor as well as core virtual platform devices and system APIs such as ARM PSCI.
> +
> +- Device Virtualization:
> +
> +  Para-virtualization of devices is supported using inter-VM communication.
> +
> +Architectures supported
> +=======================
> +AArch64 with a GIC
> +
> +Resources and Capabilities
> +==========================
> +
> +Some services or resources provided by the Gunyah hypervisor are described to a virtual machine by

To my understanding neither resources, nor services are "described", but
rather "exposed through capability IDs"

Is it really "some services or resource", isn't everything in Gunyah
exposed to the VMs as a capability?

> +capability IDs. For instance, inter-VM communication is performed with doorbells and message queues.
> +Gunyah allows access to manipulate that doorbell via the capability ID. These resources are

s/manipulate/interact with/ ?

> +described in Linux as a struct gh_resource.
> +
> +High level management of these resources is performed by the resource manager VM. RM informs a
> +guest VM about resources it can access through either the device tree or via guest-initiated RPC.
> +
> +For each virtual machine, Gunyah maintains a table of resources which can be accessed by that VM.
> +An entry in this table is called a "capability" and VMs can only access resources via this
> +capability table. Hence, virtual Gunyah resources are referenced by a "capability IDs" and not
> +"resource IDs". If 2 VMs have access to the same resource, they might not be using the same
> +capability ID to access that resource since the capability tables are independent per VM.

I think you can rewrite this section more succinctly by saying that Gunyah
handles resources, which are selectively exposed to each VM through
VM-specific capability ids.

> +
> +Resource Manager
> +================
> +
> +The resource manager (RM) is a privileged application VM supporting the Gunyah Hypervisor.
> +It provides policy enforcement aspects of the virtualization system. The resource manager can
> +be treated as an extension of the Hypervisor but is separated to its own partition to ensure
> +that the hypervisor layer itself remains small and secure and to maintain a separation of policy
> +and mechanism in the platform. RM runs at arm64 NS-EL1 similar to other virtual machines.

s/RM/The resource manager/ and a ',' after EL1, please.

> +
> +Communication with the resource manager from each guest VM happens with message-queue.rst. Details

s/each guest VM/other VMs/ or perhaps even spell out virtual machines.

> +about the specific messages can be found in drivers/virt/gunyah/rsc_mgr.c
> +
> +::
> +
> +  +-------+   +--------+   +--------+
> +  |  RM   |   |  VM_A  |   |  VM_B  |
> +  +-.-.-.-+   +---.----+   +---.----+
> +    | |           |            |
> +  +-.-.-----------.------------.----+
> +  | | \==========/             |    |
> +  |  \========================/     |
> +  |            Gunyah               |
> +  +---------------------------------+
> +
> +The source for the resource manager is available at https://github.com/quic/gunyah-resource-manager.
> +
> +The resource manager provides the following features:
> +
> +- VM lifecycle management: allocating a VM, starting VMs, destruction of VMs
> +- VM access control policy, including memory sharing and lending
> +- Interrupt routing configuration
> +- Forwarding of system-level events (e.g. VM shutdown) to owner VM
> +
> +When booting a virtual machine which uses a devicetree such as Linux, resource manager overlays a

You can omit "such as Linux" without loosing information. Also, "the
resource manager".

> +/hypervisor node. This node can let Linux know it is running as a Gunyah guest VM,

"can"? Looking at the implementation that doesn't seem to be how you
detect that you're running under Gunyah.

> +how to communicate with resource manager, and basic description and capabilities of

When it comes to RM, it doesn't seem to be "can" anymore. Here it _is_
the way you inform the OS about how to communicate with the resource
manager.

> +this VM. See Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml for a description
> +of this node.
> diff --git a/Documentation/virt/gunyah/message-queue.rst b/Documentation/virt/gunyah/message-queue.rst
> new file mode 100644
> index 0000000000000..b352918ae54b4
> --- /dev/null
> +++ b/Documentation/virt/gunyah/message-queue.rst
> @@ -0,0 +1,63 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Message Queues
> +==============
> +Message queue is a simple low-capacity IPC channel between two VMs. It is

s/VMs/virtual machines/

> +intended for sending small control and configuration messages. Each message
> +queue is unidirectional, so a full-duplex IPC channel requires a pair of queues.
> +
> +Messages can be up to 240 bytes in length. Longer messages require a further
> +protocol on top of the message queue messages themselves. For instance, communication
> +with the resource manager adds a header field for sending longer messages via multiple
> +message fragments.
> +
> +The diagram below shows how message queue works. A typical configuration involves
> +2 message queues. Message queue 1 allows VM_A to send messages to VM_B. Message
> +queue 2 allows VM_B to send messages to VM_A.

What is described below all relates to one message queue, I think it
would be clearer if you remove the second message queue from the example
below and simply state that each direction need its own queue (as you
do here).

> +
> +1. VM_A sends a message of up to 240 bytes in length. It raises a hypercall
> +   with the message to inform the hypervisor to add the message to
> +   message queue 1's queue. The hypervisor copies memory into the internal
> +   message queue representation; the memory doesn't need to be shared between
> +   VM_A and VM_B.
> +
> +2. Gunyah raises the corresponding interrupt for VM_B (Rx vIRQ) when any of
> +   these happens:
> +
> +   a. gh_msgq_send() has PUSH flag. Queue is immediately flushed. This is the typical case.

Funny when the description of the unusual word "push" directly grabs the
word "flush" which everyone understands the meaning of. But I guess this
is Gunyah nomenclature.

On the other hand, if you consider the PUSH flag to just mean "push a RX
interrupt", then the naming isn't so strange. But it wouldn't
necessarily imply that "the queue is flushed", it's simply raising the
rx irq for the receiving side to drain the queue - at it's own pace.

I wouldn't be surprised to see that pushed messages denotes the "typical
case", but I don't think it's relevant details for the rx irq
description.


> +   b. Explicility with gh_msgq_push command from VM_A.

There's no function named gh_msgq_send(), and the gh_msgq_push is
lacking parenthesis. Please refer to these consistently with references
to functions, or perhaps capitalize them to refer to the hypercall?

> +   c. Message queue has reached a threshold depth.
> +
> +3. VM_B calls gh_msgq_recv() and Gunyah copies message to requested buffer.

s/requested/a provided/

> +
> +4. Gunyah buffers messages in the queue. If the queue became full when VM_A added a message,

s/the/a/

> +   the return values for gh_msgq_send() include a flag that indicates the queue is full.
> +   Once VM_B receives the message and, thus, there is space in the queue, Gunyah

"Once messages are drained from the queue, Gunyah will raise the..."

> +   will raise the Tx vIRQ on VM_A to indicate it can continue sending messages.
> +
> +For VM_B to send a message to VM_A, the process is identical, except that hypercalls
> +reference message queue 2's capability ID. Each message queue has its own independent
> +vIRQ: two TX message queues will have two vIRQs (and two capability IDs).
> +
> +::
> +
> +      +---------------+         +-----------------+         +---------------+
> +      |      VM_A     |         |Gunyah hypervisor|         |      VM_B     |
> +      |               |         |                 |         |               |
> +      |               |         |                 |         |               |
> +      |               |   Tx    |                 |         |               |
> +      |               |-------->|                 | Rx vIRQ |               |
> +      |gh_msgq_send() | Tx vIRQ |Message queue 1  |-------->|gh_msgq_recv() |
> +      |               |<------- |                 |         |               |
> +      |               |         |                 |         |               |
> +      | Message Queue |         |                 |         | Message Queue |
> +      | driver        |         |                 |         | driver        |
> +      |               |         |                 |         |               |
> +      |               |         |                 |         |               |
> +      |               |         |                 |   Tx    |               |
> +      |               | Rx vIRQ |                 |<--------|               |
> +      |gh_msgq_recv() |<--------|Message queue 2  | Tx vIRQ |gh_msgq_send() |
> +      |               |         |                 |-------->|               |
> +      |               |         |                 |         |               |
> +      |               |         |                 |         |               |
> +      +---------------+         +-----------------+         +---------------+
> diff --git a/Documentation/virt/index.rst b/Documentation/virt/index.rst
> index 7fb55ae08598d..15869ee059b35 100644
> --- a/Documentation/virt/index.rst
> +++ b/Documentation/virt/index.rst
> @@ -16,6 +16,7 @@ Virtualization Support
>     coco/sev-guest
>     coco/tdx-guest
>     hyperv/index
> +   gunyah/index

'g' < 'h'

Regards,
Bjorn
Bjorn Andersson Aug. 5, 2023, 5:26 p.m. UTC | #9
On Tue, Jun 13, 2023 at 10:20:37AM -0700, Elliot Berman wrote:
> diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
> index 04c8e131d259f..a0faf126ee56e 100644
> --- a/drivers/virt/gunyah/rsc_mgr.c
> +++ b/drivers/virt/gunyah/rsc_mgr.c
> @@ -15,8 +15,10 @@
>  #include <linux/completion.h>
>  #include <linux/gunyah_rsc_mgr.h>
>  #include <linux/platform_device.h>
> +#include <linux/miscdevice.h>

'm' < 'p'

>  
>  #include "rsc_mgr.h"
> +#include "vm_mgr.h"
>  
>  #define RM_RPC_API_VERSION_MASK		GENMASK(3, 0)
>  #define RM_RPC_HEADER_WORDS_MASK	GENMASK(7, 4)
> @@ -130,6 +132,7 @@ struct gh_rm_connection {
>   * @cache: cache for allocating Tx messages
>   * @send_lock: synchronization to allow only one request to be sent at a time
>   * @nh: notifier chain for clients interested in RM notification messages
> + * @miscdev: /dev/gunyah
>   */
>  struct gh_rm {
>  	struct device *dev;
> @@ -146,6 +149,8 @@ struct gh_rm {
>  	struct kmem_cache *cache;
>  	struct mutex send_lock;
>  	struct blocking_notifier_head nh;
> +
> +	struct miscdevice miscdev;
>  };
>  
>  /**
> @@ -580,6 +585,33 @@ int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb)
>  }
>  EXPORT_SYMBOL_GPL(gh_rm_notifier_unregister);
>  
> +struct device *gh_rm_get(struct gh_rm *rm)
> +{
> +	return get_device(rm->miscdev.this_device);

Please add a comment to why the VM is parented off the miscdevice and
not the RM itself, when the function name indicates that you're
acquiring a reference to the rm...

> +}
> +EXPORT_SYMBOL_GPL(gh_rm_get);
> +
[..]
> diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
> new file mode 100644
> index 0000000000000..a43401cb34f7d
> --- /dev/null
> +++ b/drivers/virt/gunyah/vm_mgr.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#define pr_fmt(fmt) "gh_vm_mgr: " fmt

Unused?

Regards,
Bjorn
Bjorn Andersson Aug. 5, 2023, 5:55 p.m. UTC | #10
On Tue, Jun 13, 2023 at 10:20:38AM -0700, Elliot Berman wrote:
> Gunyah resource manager provides API to manipulate stage 2 page tables.
> Manipulations are represented as a memory parcel. Memory parcels
> describe a list of memory regions (intermediate physical address--IPA--and
> size), a list of new permissions for VMs, and the memory type (DDR or
> MMIO). Each memory parcel is uniquely identified by a handle allocated by
> Gunyah. There are a few types of memory parcel sharing which Gunyah
> supports:
> 
>  - Sharing: the guest and host VM both have access
>  - Lending: only the guest has access; host VM loses access
>  - Donating: Permanently lent (not reclaimed even if guest shuts down)
> 
> Memory parcels that have been shared or lent can be reclaimed by the
> host via an additional call. The reclaim operation restores the original
> access the host VM had to the memory parcel and removes the access to
> other VM.
> 
> One point to note that memory parcels don't describe where in the guest
> VM the memory parcel should reside. The guest VM must accept the memory
> parcel either explicitly via a "gh_rm_mem_accept" call (not introduced
> here) or be configured to accept it automatically at boot. When the guest
> VM accepts the memory parcel, it also mentions the IPA it wants to place
> memory parcel. Although the region might be discontiguous on the host,
> the memory parcel is place contiguously in the guest memory at the
> specified IPA.
> 
> Reviewed-by: Alex Elder <elder@linaro.org>
> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Co-developed-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
> Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
> ---
>  drivers/virt/gunyah/rsc_mgr_rpc.c | 227 ++++++++++++++++++++++++++++++
>  include/linux/gunyah_rsc_mgr.h    |  48 +++++++
>  2 files changed, 275 insertions(+)
> 
> diff --git a/drivers/virt/gunyah/rsc_mgr_rpc.c b/drivers/virt/gunyah/rsc_mgr_rpc.c
> index a4a9f0ba4e1fc..25064123a31c3 100644
> --- a/drivers/virt/gunyah/rsc_mgr_rpc.c
> +++ b/drivers/virt/gunyah/rsc_mgr_rpc.c
> @@ -6,6 +6,12 @@
>  #include <linux/gunyah_rsc_mgr.h>
>  #include "rsc_mgr.h"
>  
> +/* Message IDs: Memory Management */
> +#define GH_RM_RPC_MEM_LEND			0x51000012
> +#define GH_RM_RPC_MEM_SHARE			0x51000013
> +#define GH_RM_RPC_MEM_RECLAIM			0x51000015
> +#define GH_RM_RPC_MEM_APPEND			0x51000018
> +
>  /* Message IDs: VM Management */
>  #define GH_RM_RPC_VM_ALLOC_VMID			0x56000001
>  #define GH_RM_RPC_VM_DEALLOC_VMID		0x56000002
> @@ -22,6 +28,46 @@ struct gh_rm_vm_common_vmid_req {
>  	__le16 _padding;
>  } __packed;
>  
> +/* Call: MEM_LEND, MEM_SHARE */
> +#define GH_MEM_SHARE_REQ_FLAGS_APPEND		BIT(1)
> +
> +struct gh_rm_mem_share_req_header {
> +	u8 mem_type;
> +	u8 _padding0;
> +	u8 flags;
> +	u8 _padding1;
> +	__le32 label;
> +} __packed;
> +
> +struct gh_rm_mem_share_req_acl_section {
> +	__le32 n_entries;
> +	struct gh_rm_mem_acl_entry entries[];
> +};
> +
> +struct gh_rm_mem_share_req_mem_section {
> +	__le16 n_entries;
> +	__le16 _padding;
> +	struct gh_rm_mem_entry entries[];
> +};

Any reason why these two are not explicitly packed?

> +
> +/* Call: MEM_RELEASE */
> +struct gh_rm_mem_release_req {
> +	__le32 mem_handle;
> +	u8 flags; /* currently not used */
> +	u8 _padding0;
> +	__le16 _padding1;
> +} __packed;
> +
> +/* Call: MEM_APPEND */
> +#define GH_MEM_APPEND_REQ_FLAGS_END		BIT(0)
> +
> +struct gh_rm_mem_append_req_header {
> +	__le32 mem_handle;
> +	u8 flags;
> +	u8 _padding0;
> +	__le16 _padding1;
> +} __packed;
> +
>  /* Call: VM_ALLOC */
>  struct gh_rm_vm_alloc_vmid_resp {
>  	__le16 vmid;
> @@ -51,6 +97,8 @@ struct gh_rm_vm_config_image_req {
>  	__le64 dtb_size;
>  } __packed;
>  
> +#define GH_RM_MAX_MEM_ENTRIES	512
> +
>  /*
>   * Several RM calls take only a VMID as a parameter and give only standard
>   * response back. Deduplicate boilerplate code by using this common call.
> @@ -64,6 +112,185 @@ static int gh_rm_common_vmid_call(struct gh_rm *rm, u32 message_id, u16 vmid)
>  	return gh_rm_call(rm, message_id, &req_payload, sizeof(req_payload), NULL, NULL);
>  }
>  
> +static int _gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle, bool end_append,
> +			struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)
> +{
> +	struct gh_rm_mem_share_req_mem_section *mem_section;

"sections" is sufficient.

> +	struct gh_rm_mem_append_req_header *req_header;
> +	size_t msg_size = 0;
> +	void *msg;
> +	int ret;
> +
> +	msg_size += sizeof(struct gh_rm_mem_append_req_header);

msg_size is sizeof(header) + sizeof(sections), but written this way you
state that it's:

msg_size is X + sizeof(header),
and msg_size is also sizeof(sections)


It's the same thing, but you're forcing the reader to look for the
initial value of msg_size.

> +	msg_size += struct_size(mem_section, entries, n_mem_entries);
> +
> +	msg = kzalloc(msg_size, GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	req_header = msg;

Can't you just make msg a strcut gh_rm_mem_append_req_header?

> +	mem_section = (void *)(req_header + 1);
> +
> +	req_header->mem_handle = cpu_to_le32(mem_handle);
> +	if (end_append)
> +		req_header->flags |= GH_MEM_APPEND_REQ_FLAGS_END;
> +
> +	mem_section->n_entries = cpu_to_le16(n_mem_entries);
> +	memcpy(mem_section->entries, mem_entries, sizeof(*mem_entries) * n_mem_entries);
> +
> +	ret = gh_rm_call(rm, GH_RM_RPC_MEM_APPEND, msg, msg_size, NULL, NULL);
> +	kfree(msg);
> +
> +	return ret;
> +}
> +
> +static int gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle,
> +			struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)

You can name these arguments "entries" and "n_entries" (or num_entries)
without loosing any infromation, but saving a whole bunch of characters
that people have to read.

> +{
> +	bool end_append;

Iiuc this means "is this the last entry", perhaps calling it "last"
would make its purpose immediately obvious.

> +	int ret = 0;
> +	size_t n;
> +
> +	while (n_mem_entries) {
> +		if (n_mem_entries > GH_RM_MAX_MEM_ENTRIES) {
> +			end_append = false;
> +			n = GH_RM_MAX_MEM_ENTRIES;
> +		} else {
> +			end_append = true;
> +			n = n_mem_entries;
> +		}
> +
> +		ret = _gh_rm_mem_append(rm, mem_handle, end_append, mem_entries, n);

Every loop here you kzallc() and kfree() the same chunk of scratch
memory. If you inline _gh_rm_mem_append() here and do the allocation
once based on GH_RM_MAX_MEM_ENTRIES this would be faster and seemingly
easier to read/follow.

> +		if (ret)
> +			break;
> +
> +		mem_entries += n;
> +		n_mem_entries -= n;
> +	}
> +
> +	return ret;
> +}
> +
> +static int gh_rm_mem_lend_common(struct gh_rm *rm, u32 message_id, struct gh_rm_mem_parcel *p)
> +{
> +	size_t msg_size = 0, initial_mem_entries = p->n_mem_entries, resp_size;
> +	size_t acl_section_size, mem_section_size;

You keep all other variables one per line, and in reverse x-mas style.
Please do the same for this.

Naming these acl_size and mem_size would be beneficial.

> +	struct gh_rm_mem_share_req_acl_section *acl_section;
> +	struct gh_rm_mem_share_req_mem_section *mem_section;

Naming these acl and mem will make below easier to read.

> +	struct gh_rm_mem_share_req_header *req_header;
> +	u32 *attr_section;
> +	__le32 *resp;
> +	void *msg;
> +	int ret;
> +
> +	if (!p->acl_entries || !p->n_acl_entries || !p->mem_entries || !p->n_mem_entries ||
> +	    p->n_acl_entries > U8_MAX || p->mem_handle != GH_MEM_HANDLE_INVAL)
> +		return -EINVAL;
> +
> +	if (initial_mem_entries > GH_RM_MAX_MEM_ENTRIES)
> +		initial_mem_entries = GH_RM_MAX_MEM_ENTRIES;
> +
> +	acl_section_size = struct_size(acl_section, entries, p->n_acl_entries);
> +	mem_section_size = struct_size(mem_section, entries, initial_mem_entries);

An empty line here seems reasonable.

> +	/* The format of the message goes:
> +	 * request header
> +	 * ACL entries (which VMs get what kind of access to this memory parcel)
> +	 * Memory entries (list of memory regions to share)
> +	 * Memory attributes (currently unused, we'll hard-code the size to 0)
> +	 */
> +	msg_size += sizeof(struct gh_rm_mem_share_req_header);

Hidden above is the confirmation that msg_size was initialized to 0,
drop the '+' here to make it clear.

> +	msg_size += acl_section_size;
> +	msg_size += mem_section_size;
> +	msg_size += sizeof(u32); /* for memory attributes, currently unused */
> +
> +	msg = kzalloc(msg_size, GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	req_header = msg;

I don't think you need to keep req_header and msg separately.

> +	acl_section = (void *)req_header + sizeof(*req_header);
> +	mem_section = (void *)acl_section + acl_section_size;
> +	attr_section = (void *)mem_section + mem_section_size;
> +
> +	req_header->mem_type = p->mem_type;
> +	if (initial_mem_entries != p->n_mem_entries)

I think this will only happen if p->n_mem_entries >
GH_RM_MAX_MEM_ENTRIES, for which you already have a condition earlier.
Set the flag in that condition rather than spreading it over two (three)
code paths.

> +		req_header->flags |= GH_MEM_SHARE_REQ_FLAGS_APPEND;
> +	req_header->label = cpu_to_le32(p->label);
> +
> +	acl_section->n_entries = cpu_to_le32(p->n_acl_entries);
> +	memcpy(acl_section->entries, p->acl_entries,
> +		flex_array_size(acl_section, entries, p->n_acl_entries));
> +
> +	mem_section->n_entries = cpu_to_le16(initial_mem_entries);
> +	memcpy(mem_section->entries, p->mem_entries,
> +		flex_array_size(mem_section, entries, initial_mem_entries));
> +
> +	/* Set n_entries for memory attribute section to 0 */
> +	*attr_section = 0;
> +
> +	ret = gh_rm_call(rm, message_id, msg, msg_size, (void **)&resp, &resp_size);
> +	kfree(msg);
> +
> +	if (ret)
> +		return ret;
> +
> +	p->mem_handle = le32_to_cpu(*resp);
> +	kfree(resp);
> +
> +	if (initial_mem_entries != p->n_mem_entries) {

Another "we took that conditional path above", please make it obvious.


Can you please also write a comment describing that this operation is
split in a main one, followed by one of more appended operations as
necessary. This took me too long time to realize...

In particular I was wondering why "initial" didn't mean "the original
value", but now I see that it probably means "the value relevant for the
initial request".

> +		ret = gh_rm_mem_append(rm, p->mem_handle,
> +					&p->mem_entries[initial_mem_entries],
> +					p->n_mem_entries - initial_mem_entries);
> +		if (ret) {
> +			gh_rm_mem_reclaim(rm, p);
> +			p->mem_handle = GH_MEM_HANDLE_INVAL;

What happens to the entries that was already lended or shared?

Regards,
Bjorn
Bjorn Andersson Aug. 5, 2023, 6:07 p.m. UTC | #11
On Tue, Jun 13, 2023 at 10:20:42AM -0700, Elliot Berman wrote:
> On Qualcomm platforms, there is a firmware entity which controls access
> to physical pages. In order to share memory with another VM, this entity
> needs to be informed that the guest VM should have access to the memory.
> 

Why isn't this handled by the system above us?

> Reviewed-by: Alex Elder <elder@linaro.org>
> Co-developed-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
> Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
> ---
>  drivers/virt/gunyah/Kconfig                 |  4 ++
>  drivers/virt/gunyah/Makefile                |  2 +
>  drivers/virt/gunyah/gunyah_platform_hooks.c | 80 +++++++++++++++++++++
>  drivers/virt/gunyah/rsc_mgr.h               |  3 +
>  drivers/virt/gunyah/rsc_mgr_rpc.c           | 18 ++++-
>  include/linux/gunyah_rsc_mgr.h              | 17 +++++
>  6 files changed, 122 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/virt/gunyah/gunyah_platform_hooks.c
> 
> diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
> index 1a737694c333d..de815189dab6c 100644
> --- a/drivers/virt/gunyah/Kconfig
> +++ b/drivers/virt/gunyah/Kconfig
> @@ -4,6 +4,7 @@ config GUNYAH
>  	tristate "Gunyah Virtualization drivers"
>  	depends on ARM64
>  	depends on MAILBOX
> +	select GUNYAH_PLATFORM_HOOKS
>  	help
>  	  The Gunyah drivers are the helper interfaces that run in a guest VM
>  	  such as basic inter-VM IPC and signaling mechanisms, and higher level
> @@ -11,3 +12,6 @@ config GUNYAH
>  
>  	  Say Y/M here to enable the drivers needed to interact in a Gunyah
>  	  virtual environment.
> +
> +config GUNYAH_PLATFORM_HOOKS

Automatically selected by GUNYAH and the only platform so far, no need
to keep this as a separate Kconfig.

> +	tristate
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> index bacf78b8fa337..4fbeee521d60a 100644
> --- a/drivers/virt/gunyah/Makefile
> +++ b/drivers/virt/gunyah/Makefile
> @@ -1,4 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
> +obj-$(CONFIG_GUNYAH_PLATFORM_HOOKS) += gunyah_platform_hooks.o
> +

Why is this added before the core gunyah components?

>  gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
>  obj-$(CONFIG_GUNYAH) += gunyah.o

Regards,
Bjorn
Bjorn Andersson Aug. 5, 2023, 6:15 p.m. UTC | #12
On Tue, Jun 13, 2023 at 10:20:45AM -0700, Elliot Berman wrote:
> diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
[..]
> +struct gh_resource *gh_rm_alloc_resource(struct gh_rm *rm, struct gh_rm_hyp_resource *hyp_resource)
> +{
> +	struct gh_resource *ghrsc;
> +	int ret;
> +
> +	ghrsc = kzalloc(sizeof(*ghrsc), GFP_KERNEL);
> +	if (!ghrsc)
> +		return NULL;
> +
> +	ghrsc->type = hyp_resource->type;
> +	ghrsc->capid = le64_to_cpu(hyp_resource->cap_id);
> +	ghrsc->irq = IRQ_NOTCONNECTED;
> +	ghrsc->rm_label = le32_to_cpu(hyp_resource->resource_label);
> +	if (hyp_resource->virq) {
> +		struct gh_irq_chip_data irq_data = {
> +			.gh_virq = le32_to_cpu(hyp_resource->virq),
> +		};
> +
> +		ret = irq_domain_alloc_irqs(rm->irq_domain, 1, NUMA_NO_NODE, &irq_data);
> +		if (ret < 0) {
> +			dev_err(rm->dev,
> +				"Failed to allocate interrupt for resource %d label: %d: %d\n",
> +				ghrsc->type, ghrsc->rm_label, ret);
> +			kfree(ghrsc);
> +			return NULL;
> +		} else {

No need for the else here, the failure case has already returned.

> +			ghrsc->irq = ret;
> +		}
> +	}
> +
> +	return ghrsc;
> +}
[..]
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> index 982e27d10d57f..4b398b59c2c5a 100644
> --- a/include/linux/gunyah.h
> +++ b/include/linux/gunyah.h
> @@ -27,6 +27,9 @@ struct gh_resource {
>  	enum gh_resource_type type;
>  	u64 capid;
>  	unsigned int irq;
> +
> +	struct list_head list;

This is unused.

Regards,
Bjorn
Bjorn Andersson Aug. 5, 2023, 6:19 p.m. UTC | #13
On Tue, Jun 13, 2023 at 10:20:53AM -0700, Elliot Berman wrote:
> Add myself and Prakruthi as maintainers of Gunyah hypervisor drivers.
> 
> Reviewed-by: Alex Elder <elder@linaro.org>
> Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>

Reviewed-by: Bjorn Andersson <andersson@kernel.org>

Regards,
Bjorn