diff mbox series

[RFC] docs/devel: start documenting writing VirtIO devices

Message ID 20220309164929.19395-1-alex.bennee@linaro.org
State New
Headers show
Series [RFC] docs/devel: start documenting writing VirtIO devices | expand

Commit Message

Alex Bennée March 9, 2022, 4:49 p.m. UTC
While writing my own VirtIO devices I've gotten confused with how
things are structured and what sort of shared infrastructure there is.
If we can document how everything is supposed to work we can then
maybe start cleaning up inconsistencies in the code.

Based-on: 20220309135355.4149689-1-alex.bennee@linaro.org
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/devel/index-internals.rst |   2 +-
 docs/devel/virtio-backends.rst | 139 +++++++++++++++++++++++++++++++++
 2 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 docs/devel/virtio-backends.rst

Comments

Cornelia Huck March 9, 2022, 6:07 p.m. UTC | #1
On Wed, Mar 09 2022, Alex Bennée <alex.bennee@linaro.org> wrote:

> While writing my own VirtIO devices I've gotten confused with how
> things are structured and what sort of shared infrastructure there is.
> If we can document how everything is supposed to work we can then
> maybe start cleaning up inconsistencies in the code.

I agree that we could use some documentation here; OTOH, I'm a bit
confused in turn by your patch :) Let me comment below.

>
> Based-on: 20220309135355.4149689-1-alex.bennee@linaro.org
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  docs/devel/index-internals.rst |   2 +-
>  docs/devel/virtio-backends.rst | 139 +++++++++++++++++++++++++++++++++
>  2 files changed, 140 insertions(+), 1 deletion(-)
>  create mode 100644 docs/devel/virtio-backends.rst

(...)

> diff --git a/docs/devel/virtio-backends.rst b/docs/devel/virtio-backends.rst
> new file mode 100644
> index 0000000000..230538f46b
> --- /dev/null
> +++ b/docs/devel/virtio-backends.rst
> @@ -0,0 +1,139 @@
> +..
> +   Copyright (c) 2022, Linaro Limited
> +   Written by Alex Bennée
> +
> +Writing VirtIO backends for QEMU
> +================================
> +
> +This document attempts to outline the information a developer needs to
> +know to write backends for QEMU. It is specifically focused on
> +implementing VirtIO devices.

I think you first need to define a bit more clearly what you consider a
"backend". For virtio, it is probably "everything a device needs to
function as a specific device type like net, block, etc., which may be
implemented by different methods" (as you describe further below).

> +
> +Front End Transports
> +--------------------
> +
> +VirtIO supports a number of different front end transports. The
> +details of the device remain the same but there are differences in
> +command line for specifying the device (e.g. -device virtio-foo
> +and -device virtio-foo-pci). For example:
> +
> +.. code:: c
> +
> +  static const TypeInfo vhost_user_blk_info = {
> +      .name = TYPE_VHOST_USER_BLK,
> +      .parent = TYPE_VIRTIO_DEVICE,
> +      .instance_size = sizeof(VHostUserBlk),
> +      .instance_init = vhost_user_blk_instance_init,
> +      .class_init = vhost_user_blk_class_init,
> +  };
> +
> +defines ``TYPE_VHOST_USER_BLK`` as a child of the generic
> +``TYPE_VIRTIO_DEVICE``.

That's not what I'd consider a "front end", though?

> And then for the PCI device it wraps around the
> +base device (although explicitly initialising via
> +virtio_instance_init_common):
> +
> +.. code:: c
> +
> +  struct VHostUserBlkPCI {
> +      VirtIOPCIProxy parent_obj;
> +      VHostUserBlk vdev;
> +  };

The VirtIOPCIProxy seems to materialize a bit out of thin air
here... maybe the information simply needs to be structured in a
different way? Perhaps:

- describe that virtio devices consist of a part that implements the
  device functionality, which ultimately derives from VirtIODevice (the
  "backend"), and a part that exposes a way for the operating system to
  discover and use the device (the "frontend", what the virtio spec
  calls a "transport")
- decribe how the "frontend" part works (maybe mention VirtIOPCIProxy,
  VirtIOMMIOProxy, and VirtioCcwDevice as specialized proxy devices for
  PCI, MMIO, and CCW devices)
- list the different types of "backends" (as you did below), and give
  two examples of how VirtIODevice is extended (a plain one, and a
  vhost-user one)
- explain how frontend and backend together create an actual device
  (with the two device examples, and maybe also with the plain one
  plugged as both PCI and CCW?); maybe also mention that MMIO is a bit
  different? (it always confuses me)

> +   
> +  static void vhost_user_blk_pci_instance_init(Object *obj)
> +  {
> +      VHostUserBlkPCI *dev = VHOST_USER_BLK_PCI(obj);
> +
> +      virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> +                                  TYPE_VHOST_USER_BLK);
> +      object_property_add_alias(obj, "bootindex", OBJECT(&dev->vdev),
> +                                "bootindex");
> +  }
> +
> +  static const VirtioPCIDeviceTypeInfo vhost_user_blk_pci_info = {
> +      .base_name               = TYPE_VHOST_USER_BLK_PCI,
> +      .generic_name            = "vhost-user-blk-pci",
> +      .transitional_name       = "vhost-user-blk-pci-transitional",
> +      .non_transitional_name   = "vhost-user-blk-pci-non-transitional",
> +      .instance_size  = sizeof(VHostUserBlkPCI),
> +      .instance_init  = vhost_user_blk_pci_instance_init,
> +      .class_init     = vhost_user_blk_pci_class_init,
> +  };
> +
> +Back End Implementations
> +------------------------
> +
> +There are a number of places where the implementation of the backend
> +can be done:
> +
> +* in QEMU itself
> +* in the host kernel (a.k.a vhost)
> +* in a separate process (a.k.a. vhost-user)
> +
> +where a vhost-user implementation is being done the code in QEMU is
> +mainly boilerplate to handle the command line definition and
> +connection to the separate process with a socket (using the ``chardev``
> +functionality).
> +
> +Implementing a vhost-user wrapper
> +---------------------------------
> +
> +There are some classes defined that can wrap a lot of the common
> +vhost-user code in a ``VhostUserBackend``. For example:

Is VhostUserBackend something that is expected to be commonly used? I
think gpu and input use it, but not virtiofs (unless I misread the code).

> +
> +.. code:: c
> +
> +  struct VhostUserGPU {
> +      VirtIOGPUBase parent_obj;
> +
> +      VhostUserBackend *vhost;
> +      ...
> +  };
> +
> +  static void
> +  vhost_user_gpu_instance_init(Object *obj)
> +  {
> +      VhostUserGPU *g = VHOST_USER_GPU(obj);
> +
> +      g->vhost = VHOST_USER_BACKEND(object_new(TYPE_VHOST_USER_BACKEND));
> +      object_property_add_alias(obj, "chardev",
> +                                OBJECT(g->vhost), "chardev");
> +  }
> +
> +  static void
> +  vhost_user_gpu_device_realize(DeviceState *qdev, Error **errp)
> +  {
> +      VhostUserGPU *g = VHOST_USER_GPU(qdev);
> +      VirtIODevice *vdev = VIRTIO_DEVICE(g);
> +
> +      vhost_dev_set_config_notifier(&g->vhost->dev, &config_ops);
> +      if (vhost_user_backend_dev_init(g->vhost, vdev, 2, errp) < 0) {
> +          return;
> +      }
> +      ...
> +  }
> +
> +  static void
> +  vhost_user_gpu_class_init(ObjectClass *klass, void *data)
> +  {
> +      DeviceClass *dc = DEVICE_CLASS(klass);
> +      VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> +
> +      vdc->realize = vhost_user_gpu_device_realize;
> +      ...
> +  }
> +
> +  static const TypeInfo vhost_user_gpu_info = {
> +      .name = TYPE_VHOST_USER_GPU,
> +      .parent = TYPE_VIRTIO_GPU_BASE,
> +      .instance_size = sizeof(VhostUserGPU),
> +      .instance_init = vhost_user_gpu_instance_init,
> +      .class_init = vhost_user_gpu_class_init,
> +      ...
> +  };
> +
> +Here the ``TYPE_VHOST_USER_GPU`` is based off a shared base class
> +(``TYPE_VIRTIO_GPU_BASE`` which itself is based on
> +``TYPE_VIRTIO_DEVICE``). The chardev property is aliased to the
> +VhostUserBackend chardev so it can be specified on the command line
> +for this device.
> + 

I think using a "base" device is something that is device-specific; for
gpu, it makes sense as it can be implemented in different ways, but
e.g. virtiofs does not have a "plain" implementation, and some device
types have only "plain" implementations.
Alex Bennée March 16, 2022, 4:41 p.m. UTC | #2
Cornelia Huck <cohuck@redhat.com> writes:

> On Wed, Mar 09 2022, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>> While writing my own VirtIO devices I've gotten confused with how
>> things are structured and what sort of shared infrastructure there is.
>> If we can document how everything is supposed to work we can then
>> maybe start cleaning up inconsistencies in the code.
>
> I agree that we could use some documentation here; OTOH, I'm a bit
> confused in turn by your patch :) Let me comment below.

Almost by design ;-)

>
>>
>> Based-on: 20220309135355.4149689-1-alex.bennee@linaro.org
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>> Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> ---
>>  docs/devel/index-internals.rst |   2 +-
>>  docs/devel/virtio-backends.rst | 139 +++++++++++++++++++++++++++++++++
>>  2 files changed, 140 insertions(+), 1 deletion(-)
>>  create mode 100644 docs/devel/virtio-backends.rst
>
> (...)
>
>> diff --git a/docs/devel/virtio-backends.rst b/docs/devel/virtio-backends.rst
>> new file mode 100644
>> index 0000000000..230538f46b
>> --- /dev/null
>> +++ b/docs/devel/virtio-backends.rst
>> @@ -0,0 +1,139 @@
>> +..
>> +   Copyright (c) 2022, Linaro Limited
>> +   Written by Alex Bennée
>> +
>> +Writing VirtIO backends for QEMU
>> +================================
>> +
>> +This document attempts to outline the information a developer needs to
>> +know to write backends for QEMU. It is specifically focused on
>> +implementing VirtIO devices.
>
> I think you first need to define a bit more clearly what you consider a
> "backend". For virtio, it is probably "everything a device needs to
> function as a specific device type like net, block, etc., which may be
> implemented by different methods" (as you describe further below).

How about:

  This document attempts to outline the information a developer needs to
  know to write device emulations in QEMU. It is specifically focused on
  implementing VirtIO devices. For VirtIO the frontend is the driver
  running on the guest. The backend is the everything that QEMU needs to
  do to handle the emulation of the VirtIO device. This can be done
  entirely in QEMU, divided between QEMU and the kernel (vhost) or
  handled by a separate process which is configured by QEMU
  (vhost-user).

>
>> +
>> +Front End Transports
>> +--------------------
>> +
>> +VirtIO supports a number of different front end transports. The
>> +details of the device remain the same but there are differences in
>> +command line for specifying the device (e.g. -device virtio-foo
>> +and -device virtio-foo-pci). For example:
>> +
>> +.. code:: c
>> +
>> +  static const TypeInfo vhost_user_blk_info = {
>> +      .name = TYPE_VHOST_USER_BLK,
>> +      .parent = TYPE_VIRTIO_DEVICE,
>> +      .instance_size = sizeof(VHostUserBlk),
>> +      .instance_init = vhost_user_blk_instance_init,
>> +      .class_init = vhost_user_blk_class_init,
>> +  };
>> +
>> +defines ``TYPE_VHOST_USER_BLK`` as a child of the generic
>> +``TYPE_VIRTIO_DEVICE``.
>
> That's not what I'd consider a "front end", though?

Yeah clumsy wording. I'm trying to get find a good example to show how
QOM can be used to abstract the core device operation and the wrappers
for different transports. However in the code base there seems to be
considerable variation about how this is done. Any advice as to the
best exemplary device to follow is greatly welcomed.

>> And then for the PCI device it wraps around the
>> +base device (although explicitly initialising via
>> +virtio_instance_init_common):
>> +
>> +.. code:: c
>> +
>> +  struct VHostUserBlkPCI {
>> +      VirtIOPCIProxy parent_obj;
>> +      VHostUserBlk vdev;
>> +  };
>
> The VirtIOPCIProxy seems to materialize a bit out of thin air
> here... maybe the information simply needs to be structured in a
> different way? Perhaps:
>
> - describe that virtio devices consist of a part that implements the
>   device functionality, which ultimately derives from VirtIODevice (the
>   "backend"), and a part that exposes a way for the operating system to
>   discover and use the device (the "frontend", what the virtio spec
>   calls a "transport")
> - decribe how the "frontend" part works (maybe mention VirtIOPCIProxy,
>   VirtIOMMIOProxy, and VirtioCcwDevice as specialized proxy devices for
>   PCI, MMIO, and CCW devices)
> - list the different types of "backends" (as you did below), and give
>   two examples of how VirtIODevice is extended (a plain one, and a
>   vhost-user one)
> - explain how frontend and backend together create an actual device
>   (with the two device examples, and maybe also with the plain one
>   plugged as both PCI and CCW?); maybe also mention that MMIO is a bit
>   different? (it always confuses me)

OK I'll see how I can restructure things to make it clearer. Do we also
have to take into account the object heirarchy for different types of
device (i.e. block or net)? Or is that all plumbing into QEMUs
sub-system internals done in the VirtIO device objects?

>> +
>> +Back End Implementations
>> +------------------------
>> +
>> +There are a number of places where the implementation of the backend
>> +can be done:
>> +
>> +* in QEMU itself
>> +* in the host kernel (a.k.a vhost)
>> +* in a separate process (a.k.a. vhost-user)
>> +
>> +where a vhost-user implementation is being done the code in QEMU is
>> +mainly boilerplate to handle the command line definition and
>> +connection to the separate process with a socket (using the ``chardev``
>> +functionality).
>> +
>> +Implementing a vhost-user wrapper
>> +---------------------------------
>> +
>> +There are some classes defined that can wrap a lot of the common
>> +vhost-user code in a ``VhostUserBackend``. For example:
>
> Is VhostUserBackend something that is expected to be commonly used? I
> think gpu and input use it, but not virtiofs (unless I misread the
> code).

Possibly - but it does seem to be trying to avoid adding lots of
boilerplate to each individual device to setup and configure the
vhost-user backend. A problem I ran into when trying to fix the
squashing of VHOST_USER_PROTOCOL_F_CONFIG messages in
vhost_user_backend_init.

<snip>
>> +  static const TypeInfo vhost_user_gpu_info = {
>> +      .name = TYPE_VHOST_USER_GPU,
>> +      .parent = TYPE_VIRTIO_GPU_BASE,
>> +      .instance_size = sizeof(VhostUserGPU),
>> +      .instance_init = vhost_user_gpu_instance_init,
>> +      .class_init = vhost_user_gpu_class_init,
>> +      ...
>> +  };
>> +
>> +Here the ``TYPE_VHOST_USER_GPU`` is based off a shared base class
>> +(``TYPE_VIRTIO_GPU_BASE`` which itself is based on
>> +``TYPE_VIRTIO_DEVICE``). The chardev property is aliased to the
>> +VhostUserBackend chardev so it can be specified on the command line
>> +for this device.
>> + 
>
> I think using a "base" device is something that is device-specific; for
> gpu, it makes sense as it can be implemented in different ways, but
> e.g. virtiofs does not have a "plain" implementation, and some device
> types have only "plain" implementations.

Perhaps the GPU was a bad choice here. Do we have a good example device
that has both mmio and pci (or ccw) transports as well as QEMU internal
and vhost/vhost-user implementations?
Cornelia Huck April 5, 2022, 3:06 p.m. UTC | #3
On Wed, Mar 16 2022, Alex Bennée <alex.bennee@linaro.org> wrote:

> Cornelia Huck <cohuck@redhat.com> writes:
>
>> On Wed, Mar 09 2022, Alex Bennée <alex.bennee@linaro.org> wrote:

>>> +Writing VirtIO backends for QEMU
>>> +================================
>>> +
>>> +This document attempts to outline the information a developer needs to
>>> +know to write backends for QEMU. It is specifically focused on
>>> +implementing VirtIO devices.
>>
>> I think you first need to define a bit more clearly what you consider a
>> "backend". For virtio, it is probably "everything a device needs to
>> function as a specific device type like net, block, etc., which may be
>> implemented by different methods" (as you describe further below).
>
> How about:
>
>   This document attempts to outline the information a developer needs to
>   know to write device emulations in QEMU. It is specifically focused on
>   implementing VirtIO devices. For VirtIO the frontend is the driver
>   running on the guest. The backend is the everything that QEMU needs to
>   do to handle the emulation of the VirtIO device. This can be done
>   entirely in QEMU, divided between QEMU and the kernel (vhost) or
>   handled by a separate process which is configured by QEMU
>   (vhost-user).

I'm afraid that confuses me even more :)

This sounds to me like frontend == driver (in virtio spec terminology)
and backend == device. Is that really what you meant?

>
>>
>>> +
>>> +Front End Transports
>>> +--------------------
>>> +
>>> +VirtIO supports a number of different front end transports. The
>>> +details of the device remain the same but there are differences in
>>> +command line for specifying the device (e.g. -device virtio-foo
>>> +and -device virtio-foo-pci). For example:
>>> +
>>> +.. code:: c
>>> +
>>> +  static const TypeInfo vhost_user_blk_info = {
>>> +      .name = TYPE_VHOST_USER_BLK,
>>> +      .parent = TYPE_VIRTIO_DEVICE,
>>> +      .instance_size = sizeof(VHostUserBlk),
>>> +      .instance_init = vhost_user_blk_instance_init,
>>> +      .class_init = vhost_user_blk_class_init,
>>> +  };
>>> +
>>> +defines ``TYPE_VHOST_USER_BLK`` as a child of the generic
>>> +``TYPE_VIRTIO_DEVICE``.
>>
>> That's not what I'd consider a "front end", though?
>
> Yeah clumsy wording. I'm trying to get find a good example to show how
> QOM can be used to abstract the core device operation and the wrappers
> for different transports. However in the code base there seems to be
> considerable variation about how this is done. Any advice as to the
> best exemplary device to follow is greatly welcomed.

I'm not sure which of the example we can really consider a "good"
device; the normal modus operandi when writing a new device seems to be
"pick the first device you can think of and copy whatever it
does". Personally, I usally look at blk or net, but those carry a lot of
legacy baggage; so maybe a modern virtio-1 only device like gpu? That
one also has the advantage of not being pci-only.

Does anyone else have a good suggestion here?

>
>>> And then for the PCI device it wraps around the
>>> +base device (although explicitly initialising via
>>> +virtio_instance_init_common):
>>> +
>>> +.. code:: c
>>> +
>>> +  struct VHostUserBlkPCI {
>>> +      VirtIOPCIProxy parent_obj;
>>> +      VHostUserBlk vdev;
>>> +  };
>>
>> The VirtIOPCIProxy seems to materialize a bit out of thin air
>> here... maybe the information simply needs to be structured in a
>> different way? Perhaps:
>>
>> - describe that virtio devices consist of a part that implements the
>>   device functionality, which ultimately derives from VirtIODevice (the
>>   "backend"), and a part that exposes a way for the operating system to
>>   discover and use the device (the "frontend", what the virtio spec
>>   calls a "transport")
>> - decribe how the "frontend" part works (maybe mention VirtIOPCIProxy,
>>   VirtIOMMIOProxy, and VirtioCcwDevice as specialized proxy devices for
>>   PCI, MMIO, and CCW devices)
>> - list the different types of "backends" (as you did below), and give
>>   two examples of how VirtIODevice is extended (a plain one, and a
>>   vhost-user one)
>> - explain how frontend and backend together create an actual device
>>   (with the two device examples, and maybe also with the plain one
>>   plugged as both PCI and CCW?); maybe also mention that MMIO is a bit
>>   different? (it always confuses me)
>
> OK I'll see how I can restructure things to make it clearer. Do we also
> have to take into account the object heirarchy for different types of
> device (i.e. block or net)? Or is that all plumbing into QEMUs
> sub-system internals done in the VirtIO device objects?

An example of how a device plugs into a bigger infrastructure like the
block layer might be helpful, but it also might complicate the
documentation (as you probably won't need to do anything like that if
you write a device that does not use any established infrastructure.)
Maybe just gloss over it for now?

>
>>> +
>>> +Back End Implementations
>>> +------------------------
>>> +
>>> +There are a number of places where the implementation of the backend
>>> +can be done:
>>> +
>>> +* in QEMU itself
>>> +* in the host kernel (a.k.a vhost)
>>> +* in a separate process (a.k.a. vhost-user)
>>> +
>>> +where a vhost-user implementation is being done the code in QEMU is
>>> +mainly boilerplate to handle the command line definition and
>>> +connection to the separate process with a socket (using the ``chardev``
>>> +functionality).
>>> +
>>> +Implementing a vhost-user wrapper
>>> +---------------------------------
>>> +
>>> +There are some classes defined that can wrap a lot of the common
>>> +vhost-user code in a ``VhostUserBackend``. For example:
>>
>> Is VhostUserBackend something that is expected to be commonly used? I
>> think gpu and input use it, but not virtiofs (unless I misread the
>> code).
>
> Possibly - but it does seem to be trying to avoid adding lots of
> boilerplate to each individual device to setup and configure the
> vhost-user backend. A problem I ran into when trying to fix the
> squashing of VHOST_USER_PROTOCOL_F_CONFIG messages in
> vhost_user_backend_init.

Yeah. I think a lot of that comes from the "pick a random existing
device as a template" procedure I mentioned above. Maybe we really
should recommend using that common structure in new device
implementations.

>
> <snip>
>>> +  static const TypeInfo vhost_user_gpu_info = {
>>> +      .name = TYPE_VHOST_USER_GPU,
>>> +      .parent = TYPE_VIRTIO_GPU_BASE,
>>> +      .instance_size = sizeof(VhostUserGPU),
>>> +      .instance_init = vhost_user_gpu_instance_init,
>>> +      .class_init = vhost_user_gpu_class_init,
>>> +      ...
>>> +  };
>>> +
>>> +Here the ``TYPE_VHOST_USER_GPU`` is based off a shared base class
>>> +(``TYPE_VIRTIO_GPU_BASE`` which itself is based on
>>> +``TYPE_VIRTIO_DEVICE``). The chardev property is aliased to the
>>> +VhostUserBackend chardev so it can be specified on the command line
>>> +for this device.
>>> + 
>>
>> I think using a "base" device is something that is device-specific; for
>> gpu, it makes sense as it can be implemented in different ways, but
>> e.g. virtiofs does not have a "plain" implementation, and some device
>> types have only "plain" implementations.
>
> Perhaps the GPU was a bad choice here. Do we have a good example device
> that has both mmio and pci (or ccw) transports as well as QEMU internal
> and vhost/vhost-user implementations?

Ugh. Maybe scsi? It carries a bit of legacy stuff, though.
Alex Bennée April 5, 2022, 3:35 p.m. UTC | #4
Cornelia Huck <cohuck@redhat.com> writes:

> On Wed, Mar 16 2022, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>> Cornelia Huck <cohuck@redhat.com> writes:
>>
>>> On Wed, Mar 09 2022, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>>>> +Writing VirtIO backends for QEMU
>>>> +================================
>>>> +
>>>> +This document attempts to outline the information a developer needs to
>>>> +know to write backends for QEMU. It is specifically focused on
>>>> +implementing VirtIO devices.
>>>
>>> I think you first need to define a bit more clearly what you consider a
>>> "backend". For virtio, it is probably "everything a device needs to
>>> function as a specific device type like net, block, etc., which may be
>>> implemented by different methods" (as you describe further below).
>>
>> How about:
>>
>>   This document attempts to outline the information a developer needs to
>>   know to write device emulations in QEMU. It is specifically focused on
>>   implementing VirtIO devices. For VirtIO the frontend is the driver
>>   running on the guest. The backend is the everything that QEMU needs to
>>   do to handle the emulation of the VirtIO device. This can be done
>>   entirely in QEMU, divided between QEMU and the kernel (vhost) or
>>   handled by a separate process which is configured by QEMU
>>   (vhost-user).
>
> I'm afraid that confuses me even more :)
>
> This sounds to me like frontend == driver (in virtio spec terminology)
> and backend == device. Is that really what you meant?

I think so. To be honest it's the different types of backend (in QEMU,
vhost and vhost-user) I'm trying to be clear about here. The
frontend/driver is just mentioned for completeness.

>
>>
>>>
>>>> +
>>>> +Front End Transports
>>>> +--------------------
>>>> +
>>>> +VirtIO supports a number of different front end transports. The
>>>> +details of the device remain the same but there are differences in
>>>> +command line for specifying the device (e.g. -device virtio-foo
>>>> +and -device virtio-foo-pci). For example:
>>>> +
>>>> +.. code:: c
>>>> +
>>>> +  static const TypeInfo vhost_user_blk_info = {
>>>> +      .name = TYPE_VHOST_USER_BLK,
>>>> +      .parent = TYPE_VIRTIO_DEVICE,
>>>> +      .instance_size = sizeof(VHostUserBlk),
>>>> +      .instance_init = vhost_user_blk_instance_init,
>>>> +      .class_init = vhost_user_blk_class_init,
>>>> +  };
>>>> +
>>>> +defines ``TYPE_VHOST_USER_BLK`` as a child of the generic
>>>> +``TYPE_VIRTIO_DEVICE``.
>>>
>>> That's not what I'd consider a "front end", though?
>>
>> Yeah clumsy wording. I'm trying to get find a good example to show how
>> QOM can be used to abstract the core device operation and the wrappers
>> for different transports. However in the code base there seems to be
>> considerable variation about how this is done. Any advice as to the
>> best exemplary device to follow is greatly welcomed.
>
> I'm not sure which of the example we can really consider a "good"
> device; the normal modus operandi when writing a new device seems to be
> "pick the first device you can think of and copy whatever it
> does".

Yeah the QEMU curse. Hence trying to document the "best" approach or at
least make the picking of a reference a little less random ;-)

> Personally, I usally look at blk or net, but those carry a lot of
> legacy baggage; so maybe a modern virtio-1 only device like gpu? That
> one also has the advantage of not being pci-only.
>
> Does anyone else have a good suggestion here?

Sorry I totally forgot to include you in the Cc of the v1 posting:

  Subject: [PATCH  v1 09/13] docs/devel: start documenting writing VirtIO devices
  Date: Mon, 21 Mar 2022 15:30:33 +0000
  Message-Id: <20220321153037.3622127-10-alex.bennee@linaro.org>

although expect a v2 soonish (once I can get a reasonable qos-test
vhost-user test working).

>
>>
>>>> And then for the PCI device it wraps around the
>>>> +base device (although explicitly initialising via
>>>> +virtio_instance_init_common):
>>>> +
>>>> +.. code:: c
>>>> +
>>>> +  struct VHostUserBlkPCI {
>>>> +      VirtIOPCIProxy parent_obj;
>>>> +      VHostUserBlk vdev;
>>>> +  };
>>>
>>> The VirtIOPCIProxy seems to materialize a bit out of thin air
>>> here... maybe the information simply needs to be structured in a
>>> different way? Perhaps:
>>>
>>> - describe that virtio devices consist of a part that implements the
>>>   device functionality, which ultimately derives from VirtIODevice (the
>>>   "backend"), and a part that exposes a way for the operating system to
>>>   discover and use the device (the "frontend", what the virtio spec
>>>   calls a "transport")
>>> - decribe how the "frontend" part works (maybe mention VirtIOPCIProxy,
>>>   VirtIOMMIOProxy, and VirtioCcwDevice as specialized proxy devices for
>>>   PCI, MMIO, and CCW devices)
>>> - list the different types of "backends" (as you did below), and give
>>>   two examples of how VirtIODevice is extended (a plain one, and a
>>>   vhost-user one)
>>> - explain how frontend and backend together create an actual device
>>>   (with the two device examples, and maybe also with the plain one
>>>   plugged as both PCI and CCW?); maybe also mention that MMIO is a bit
>>>   different? (it always confuses me)
>>
>> OK I'll see how I can restructure things to make it clearer. Do we also
>> have to take into account the object heirarchy for different types of
>> device (i.e. block or net)? Or is that all plumbing into QEMUs
>> sub-system internals done in the VirtIO device objects?
>
> An example of how a device plugs into a bigger infrastructure like the
> block layer might be helpful, but it also might complicate the
> documentation (as you probably won't need to do anything like that if
> you write a device that does not use any established infrastructure.)
> Maybe just gloss over it for now?

Yeah. Certainly not the block layer as that is one of my blind spots
right now...

>
>>
>>>> +
>>>> +Back End Implementations
>>>> +------------------------
>>>> +
>>>> +There are a number of places where the implementation of the backend
>>>> +can be done:
>>>> +
>>>> +* in QEMU itself
>>>> +* in the host kernel (a.k.a vhost)
>>>> +* in a separate process (a.k.a. vhost-user)
>>>> +
>>>> +where a vhost-user implementation is being done the code in QEMU is
>>>> +mainly boilerplate to handle the command line definition and
>>>> +connection to the separate process with a socket (using the ``chardev``
>>>> +functionality).
>>>> +
>>>> +Implementing a vhost-user wrapper
>>>> +---------------------------------
>>>> +
>>>> +There are some classes defined that can wrap a lot of the common
>>>> +vhost-user code in a ``VhostUserBackend``. For example:
>>>
>>> Is VhostUserBackend something that is expected to be commonly used? I
>>> think gpu and input use it, but not virtiofs (unless I misread the
>>> code).
>>
>> Possibly - but it does seem to be trying to avoid adding lots of
>> boilerplate to each individual device to setup and configure the
>> vhost-user backend. A problem I ran into when trying to fix the
>> squashing of VHOST_USER_PROTOCOL_F_CONFIG messages in
>> vhost_user_backend_init.
>
> Yeah. I think a lot of that comes from the "pick a random existing
> device as a template" procedure I mentioned above. Maybe we really
> should recommend using that common structure in new device
> implementations.

I often point people at the unimp device but as a project it would be
nice if we could identify some best in class device types as references.
We give out the "pick a recent device" advice quite a lot.

>
>>
>> <snip>
>>>> +  static const TypeInfo vhost_user_gpu_info = {
>>>> +      .name = TYPE_VHOST_USER_GPU,
>>>> +      .parent = TYPE_VIRTIO_GPU_BASE,
>>>> +      .instance_size = sizeof(VhostUserGPU),
>>>> +      .instance_init = vhost_user_gpu_instance_init,
>>>> +      .class_init = vhost_user_gpu_class_init,
>>>> +      ...
>>>> +  };
>>>> +
>>>> +Here the ``TYPE_VHOST_USER_GPU`` is based off a shared base class
>>>> +(``TYPE_VIRTIO_GPU_BASE`` which itself is based on
>>>> +``TYPE_VIRTIO_DEVICE``). The chardev property is aliased to the
>>>> +VhostUserBackend chardev so it can be specified on the command line
>>>> +for this device.
>>>> + 
>>>
>>> I think using a "base" device is something that is device-specific; for
>>> gpu, it makes sense as it can be implemented in different ways, but
>>> e.g. virtiofs does not have a "plain" implementation, and some device
>>> types have only "plain" implementations.
>>
>> Perhaps the GPU was a bad choice here. Do we have a good example device
>> that has both mmio and pci (or ccw) transports as well as QEMU internal
>> and vhost/vhost-user implementations?
>
> Ugh. Maybe scsi? It carries a bit of legacy stuff, though.

virtio-rng is pretty simple and has a range of backend approaches?
diff mbox series

Patch

diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst
index 78270e89b3..5d9f95dd93 100644
--- a/docs/devel/index-internals.rst
+++ b/docs/devel/index-internals.rst
@@ -19,4 +19,4 @@  Details about QEMU's various subsystems including how to add features to them.
    tracing
    vfio-migration
    writing-monitor-commands
-      
+   virtio-backends
diff --git a/docs/devel/virtio-backends.rst b/docs/devel/virtio-backends.rst
new file mode 100644
index 0000000000..230538f46b
--- /dev/null
+++ b/docs/devel/virtio-backends.rst
@@ -0,0 +1,139 @@ 
+..
+   Copyright (c) 2022, Linaro Limited
+   Written by Alex Bennée
+
+Writing VirtIO backends for QEMU
+================================
+
+This document attempts to outline the information a developer needs to
+know to write backends for QEMU. It is specifically focused on
+implementing VirtIO devices.
+
+Front End Transports
+--------------------
+
+VirtIO supports a number of different front end transports. The
+details of the device remain the same but there are differences in
+command line for specifying the device (e.g. -device virtio-foo
+and -device virtio-foo-pci). For example:
+
+.. code:: c
+
+  static const TypeInfo vhost_user_blk_info = {
+      .name = TYPE_VHOST_USER_BLK,
+      .parent = TYPE_VIRTIO_DEVICE,
+      .instance_size = sizeof(VHostUserBlk),
+      .instance_init = vhost_user_blk_instance_init,
+      .class_init = vhost_user_blk_class_init,
+  };
+
+defines ``TYPE_VHOST_USER_BLK`` as a child of the generic
+``TYPE_VIRTIO_DEVICE``. And then for the PCI device it wraps around the
+base device (although explicitly initialising via
+virtio_instance_init_common):
+
+.. code:: c
+
+  struct VHostUserBlkPCI {
+      VirtIOPCIProxy parent_obj;
+      VHostUserBlk vdev;
+  };
+   
+  static void vhost_user_blk_pci_instance_init(Object *obj)
+  {
+      VHostUserBlkPCI *dev = VHOST_USER_BLK_PCI(obj);
+
+      virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+                                  TYPE_VHOST_USER_BLK);
+      object_property_add_alias(obj, "bootindex", OBJECT(&dev->vdev),
+                                "bootindex");
+  }
+
+  static const VirtioPCIDeviceTypeInfo vhost_user_blk_pci_info = {
+      .base_name               = TYPE_VHOST_USER_BLK_PCI,
+      .generic_name            = "vhost-user-blk-pci",
+      .transitional_name       = "vhost-user-blk-pci-transitional",
+      .non_transitional_name   = "vhost-user-blk-pci-non-transitional",
+      .instance_size  = sizeof(VHostUserBlkPCI),
+      .instance_init  = vhost_user_blk_pci_instance_init,
+      .class_init     = vhost_user_blk_pci_class_init,
+  };
+
+Back End Implementations
+------------------------
+
+There are a number of places where the implementation of the backend
+can be done:
+
+* in QEMU itself
+* in the host kernel (a.k.a vhost)
+* in a separate process (a.k.a. vhost-user)
+
+where a vhost-user implementation is being done the code in QEMU is
+mainly boilerplate to handle the command line definition and
+connection to the separate process with a socket (using the ``chardev``
+functionality).
+
+Implementing a vhost-user wrapper
+---------------------------------
+
+There are some classes defined that can wrap a lot of the common
+vhost-user code in a ``VhostUserBackend``. For example:
+
+.. code:: c
+
+  struct VhostUserGPU {
+      VirtIOGPUBase parent_obj;
+
+      VhostUserBackend *vhost;
+      ...
+  };
+
+  static void
+  vhost_user_gpu_instance_init(Object *obj)
+  {
+      VhostUserGPU *g = VHOST_USER_GPU(obj);
+
+      g->vhost = VHOST_USER_BACKEND(object_new(TYPE_VHOST_USER_BACKEND));
+      object_property_add_alias(obj, "chardev",
+                                OBJECT(g->vhost), "chardev");
+  }
+
+  static void
+  vhost_user_gpu_device_realize(DeviceState *qdev, Error **errp)
+  {
+      VhostUserGPU *g = VHOST_USER_GPU(qdev);
+      VirtIODevice *vdev = VIRTIO_DEVICE(g);
+
+      vhost_dev_set_config_notifier(&g->vhost->dev, &config_ops);
+      if (vhost_user_backend_dev_init(g->vhost, vdev, 2, errp) < 0) {
+          return;
+      }
+      ...
+  }
+
+  static void
+  vhost_user_gpu_class_init(ObjectClass *klass, void *data)
+  {
+      DeviceClass *dc = DEVICE_CLASS(klass);
+      VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+
+      vdc->realize = vhost_user_gpu_device_realize;
+      ...
+  }
+
+  static const TypeInfo vhost_user_gpu_info = {
+      .name = TYPE_VHOST_USER_GPU,
+      .parent = TYPE_VIRTIO_GPU_BASE,
+      .instance_size = sizeof(VhostUserGPU),
+      .instance_init = vhost_user_gpu_instance_init,
+      .class_init = vhost_user_gpu_class_init,
+      ...
+  };
+
+Here the ``TYPE_VHOST_USER_GPU`` is based off a shared base class
+(``TYPE_VIRTIO_GPU_BASE`` which itself is based on
+``TYPE_VIRTIO_DEVICE``). The chardev property is aliased to the
+VhostUserBackend chardev so it can be specified on the command line
+for this device.
+