Message ID | 1600817059-26721-1-git-send-email-kwankhede@nvidia.com |
---|---|
Headers | show |
Series | Add migration support for VFIO devices | expand |
On 2020/9/23 7:24, Kirti Wankhede wrote: > Patch 4-9: > - Generic migration functionality for VFIO device. > * This patch set adds functionality for PCI devices, but can be > extended to other VFIO devices. > * Added all the basic functions required for pre-copy, stop-and-copy and > resume phases of migration. > * Added state change notifier and from that notifier function, VFIO > device's state changed is conveyed to VFIO device driver. > * During save setup phase and resume/load setup phase, migration region > is queried and is used to read/write VFIO device data. > * .save_live_pending and .save_live_iterate are implemented to use QEMU's > functionality of iteration during pre-copy phase. > * In .save_live_complete_precopy, that is in stop-and-copy phase, > iteration to read data from VFIO device driver is implemented till pending > bytes returned by driver are not zero. s/are not zero/are zero/ ? [...] > Live migration resume path: > Incomming migration calls .load_setup for each device > (RESTORE_VM, _ACTIVE, STOPPED) The _RESUMING device state is missed here? > | > For each device, .load_state is called for that device section data > | > At the end, called .load_cleanup for each device and vCPUs are started. > | > (RUNNING, _NONE, _RUNNING) Thanks, Zenghui
On Wed, 23 Sep 2020 04:54:08 +0530 Kirti Wankhede <kwankhede@nvidia.com> wrote: > Added migration state change notifier to get notification on migration state > change. These states are translated to VFIO device state and conveyed to vendor > driver. > > Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> > Reviewed-by: Neo Jia <cjia@nvidia.com> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > hw/vfio/migration.c | 29 +++++++++++++++++++++++++++++ > hw/vfio/trace-events | 5 +++-- > include/hw/vfio/vfio-common.h | 1 + > 3 files changed, 33 insertions(+), 2 deletions(-) > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index a30d628ba963..f650fe9fc3c8 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -199,6 +199,28 @@ static void vfio_vmstate_change(void *opaque, int running, RunState state) > } > } > > +static void vfio_migration_state_notifier(Notifier *notifier, void *data) > +{ > + MigrationState *s = data; > + VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state); > + int ret; > + > + trace_vfio_migration_state_notifier(vbasedev->name, > + MigrationStatus_str(s->state)); > + > + switch (s->state) { > + case MIGRATION_STATUS_CANCELLING: > + case MIGRATION_STATUS_CANCELLED: > + case MIGRATION_STATUS_FAILED: > + ret = vfio_migration_set_state(vbasedev, > + ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING), > + VFIO_DEVICE_STATE_RUNNING); > + if (ret) { > + error_report("%s: Failed to set state RUNNING", vbasedev->name); > + } Here again the caller assumes success means the device has entered the desired state, but as implemented it only means the device is in some non-error state. > + } > +} > + > static int vfio_migration_init(VFIODevice *vbasedev, > struct vfio_region_info *info) > { > @@ -221,6 +243,8 @@ static int vfio_migration_init(VFIODevice *vbasedev, > > vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change, > vbasedev); > + vbasedev->migration_state.notify = vfio_migration_state_notifier; > + add_migration_state_change_notifier(&vbasedev->migration_state); > return ret; > } > > @@ -263,6 +287,11 @@ add_blocker: > > void vfio_migration_finalize(VFIODevice *vbasedev) > { > + > + if (vbasedev->migration_state.notify) { > + remove_migration_state_change_notifier(&vbasedev->migration_state); > + } > + > if (vbasedev->vm_state) { > qemu_del_vm_change_state_handler(vbasedev->vm_state); > } > diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events > index 6524734bf7b4..bcb3fa7314d7 100644 > --- a/hw/vfio/trace-events > +++ b/hw/vfio/trace-events > @@ -149,5 +149,6 @@ vfio_display_edid_write_error(void) "" > > # migration.c > vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d" > -vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d" > -vfio_vmstate_change(char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" > +vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d" > +vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" > +vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h > index 25e3b1a3b90a..49c7c7a0e29a 100644 > --- a/include/hw/vfio/vfio-common.h > +++ b/include/hw/vfio/vfio-common.h > @@ -123,6 +123,7 @@ typedef struct VFIODevice { > VMChangeStateEntry *vm_state; > uint32_t device_state; > int vm_running; > + Notifier migration_state; Can this live in VFIOMigration? Thanks, Alex > } VFIODevice; > > struct VFIODeviceOps {
On Wed, 23 Sep 2020 04:54:09 +0530 Kirti Wankhede <kwankhede@nvidia.com> wrote: > Define flags to be used as delimeter in migration file stream. > Added .save_setup and .save_cleanup functions. Mapped & unmapped migration > region from these functions at source during saving or pre-copy phase. > Set VFIO device state depending on VM's state. During live migration, VM is > running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO > device. During save-restore, VM is paused, _SAVING state is set for VFIO device. > > Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> > Reviewed-by: Neo Jia <cjia@nvidia.com> > --- > hw/vfio/migration.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > hw/vfio/trace-events | 2 ++ > 2 files changed, 93 insertions(+) > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index f650fe9fc3c8..8e8adaa25779 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -8,12 +8,15 @@ > */ > > #include "qemu/osdep.h" > +#include "qemu/main-loop.h" > +#include "qemu/cutils.h" > #include <linux/vfio.h> > > #include "sysemu/runstate.h" > #include "hw/vfio/vfio-common.h" > #include "cpu.h" > #include "migration/migration.h" > +#include "migration/vmstate.h" > #include "migration/qemu-file.h" > #include "migration/register.h" > #include "migration/blocker.h" > @@ -25,6 +28,17 @@ > #include "trace.h" > #include "hw/hw.h" > > +/* > + * Flags used as delimiter: > + * 0xffffffff => MSB 32-bit all 1s > + * 0xef10 => emulated (virtual) function IO > + * 0x0000 => 16-bits reserved for flags > + */ > +#define VFIO_MIG_FLAG_END_OF_STATE (0xffffffffef100001ULL) > +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL) > +#define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) > +#define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) > + > static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count, > off_t off, bool iswrite) > { > @@ -166,6 +180,65 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask, > return 0; > } > > +/* ---------------------------------------------------------------------- */ > + > +static int vfio_save_setup(QEMUFile *f, void *opaque) > +{ > + VFIODevice *vbasedev = opaque; > + VFIOMigration *migration = vbasedev->migration; > + int ret; > + > + trace_vfio_save_setup(vbasedev->name); > + > + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); > + > + if (migration->region.mmaps) { > + qemu_mutex_lock_iothread(); > + ret = vfio_region_mmap(&migration->region); > + qemu_mutex_unlock_iothread(); Please add a comment identifying why the iothread mutex lock is necessary here. > + if (ret) { > + error_report("%s: Failed to mmap VFIO migration region %d: %s", > + vbasedev->name, migration->region.nr, We don't support multiple migration regions, is it useful to include the region index here? > + strerror(-ret)); > + error_report("%s: Falling back to slow path", vbasedev->name); > + } > + } > + > + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK, > + VFIO_DEVICE_STATE_SAVING); > + if (ret) { > + error_report("%s: Failed to set state SAVING", vbasedev->name); > + return ret; > + } Again, doesn't match the function semantics that success only means the device is in a non-error state, maybe the one that was asked for. > + > + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); What's the overall purpose of writing these markers into the migration stream? vfio_load_state() doesn't do anything with this other than validate that the end-of-state immediately follows. Is this a placeholder for something in the future? > + > + ret = qemu_file_get_error(f); > + if (ret) { > + return ret; > + } > + > + return 0; > +} > + > +static void vfio_save_cleanup(void *opaque) > +{ > + VFIODevice *vbasedev = opaque; > + VFIOMigration *migration = vbasedev->migration; > + > + if (migration->region.mmaps) { > + vfio_region_unmap(&migration->region); > + } > + trace_vfio_save_cleanup(vbasedev->name); > +} > + > +static SaveVMHandlers savevm_vfio_handlers = { > + .save_setup = vfio_save_setup, > + .save_cleanup = vfio_save_cleanup, > +}; > + > +/* ---------------------------------------------------------------------- */ > + > static void vfio_vmstate_change(void *opaque, int running, RunState state) > { > VFIODevice *vbasedev = opaque; > @@ -225,6 +298,8 @@ static int vfio_migration_init(VFIODevice *vbasedev, > struct vfio_region_info *info) > { > int ret = -EINVAL; > + char id[256] = ""; > + Object *obj; > > if (!vbasedev->ops->vfio_get_object) { > return ret; > @@ -241,6 +316,22 @@ static int vfio_migration_init(VFIODevice *vbasedev, > return ret; > } > > + obj = vbasedev->ops->vfio_get_object(vbasedev); > + > + if (obj) { > + DeviceState *dev = DEVICE(obj); > + char *oid = vmstate_if_get_id(VMSTATE_IF(dev)); > + > + if (oid) { > + pstrcpy(id, sizeof(id), oid); > + pstrcat(id, sizeof(id), "/"); > + g_free(oid); > + } > + } Here's where vfio_migration_init() starts using vfio_get_object() as I referenced back on patch 04. We might as well get the object before calling vfio_migration_region_init() and then pass the object. The conditional branch to handle obj is strange here too, it's fatal if vfio_migration_region_init() doesn't find an object, why do we handle it as optional here? Also, what is this doing? Comments would be nice... Thanks, Alex > + pstrcat(id, sizeof(id), "vfio"); > + > + register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers, > + vbasedev); > vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change, > vbasedev); > vbasedev->migration_state.notify = vfio_migration_state_notifier; > diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events > index bcb3fa7314d7..982d8dccb219 100644 > --- a/hw/vfio/trace-events > +++ b/hw/vfio/trace-events > @@ -152,3 +152,5 @@ vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d" > vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d" > vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" > vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" > +vfio_save_setup(const char *name) " (%s)" > +vfio_save_cleanup(const char *name) " (%s)"
* Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > On 9/23/20 1:24 AM, Kirti Wankhede wrote: > > Define flags to be used as delimeter in migration file stream. > > Typo "delimiter". > > > Added .save_setup and .save_cleanup functions. Mapped & unmapped migration > > region from these functions at source during saving or pre-copy phase. > > Set VFIO device state depending on VM's state. During live migration, VM is > > running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO > > device. During save-restore, VM is paused, _SAVING state is set for VFIO device. > > > > Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> > > Reviewed-by: Neo Jia <cjia@nvidia.com> > > --- > > hw/vfio/migration.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > hw/vfio/trace-events | 2 ++ > > 2 files changed, 93 insertions(+) > > > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > > index f650fe9fc3c8..8e8adaa25779 100644 > > --- a/hw/vfio/migration.c > > +++ b/hw/vfio/migration.c > > @@ -8,12 +8,15 @@ > > */ > > > > #include "qemu/osdep.h" > > +#include "qemu/main-loop.h" > > +#include "qemu/cutils.h" > > #include <linux/vfio.h> > > > > #include "sysemu/runstate.h" > > #include "hw/vfio/vfio-common.h" > > #include "cpu.h" > > #include "migration/migration.h" > > +#include "migration/vmstate.h" > > #include "migration/qemu-file.h" > > #include "migration/register.h" > > #include "migration/blocker.h" > > @@ -25,6 +28,17 @@ > > #include "trace.h" > > #include "hw/hw.h" > > > > +/* > > + * Flags used as delimiter: > > + * 0xffffffff => MSB 32-bit all 1s > > + * 0xef10 => emulated (virtual) function IO > > + * 0x0000 => 16-bits reserved for flags > > + */ > > +#define VFIO_MIG_FLAG_END_OF_STATE (0xffffffffef100001ULL) > > +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL) > > +#define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) > > +#define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) > > + > > static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count, > > off_t off, bool iswrite) > > { > > @@ -166,6 +180,65 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask, > > return 0; > > } > > > > +/* ---------------------------------------------------------------------- */ > > + > > +static int vfio_save_setup(QEMUFile *f, void *opaque) > > +{ > > + VFIODevice *vbasedev = opaque; > > + VFIOMigration *migration = vbasedev->migration; > > + int ret; > > + > > + trace_vfio_save_setup(vbasedev->name); > > + > > + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); > > + > > + if (migration->region.mmaps) { > > + qemu_mutex_lock_iothread(); > > + ret = vfio_region_mmap(&migration->region); > > + qemu_mutex_unlock_iothread(); > > + if (ret) { > > + error_report("%s: Failed to mmap VFIO migration region %d: %s", > > + vbasedev->name, migration->region.nr, > > + strerror(-ret)); > > + error_report("%s: Falling back to slow path", vbasedev->name); > > + } > > + } > > + > > + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK, > > + VFIO_DEVICE_STATE_SAVING); > > + if (ret) { > > + error_report("%s: Failed to set state SAVING", vbasedev->name); > > + return ret; > > + } > > + > > + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); > > + > > + ret = qemu_file_get_error(f); > > + if (ret) { > > + return ret; > > + } > > + > > + return 0; > > +} > > + > > +static void vfio_save_cleanup(void *opaque) > > +{ > > + VFIODevice *vbasedev = opaque; > > + VFIOMigration *migration = vbasedev->migration; > > + > > + if (migration->region.mmaps) { > > + vfio_region_unmap(&migration->region); > > + } > > + trace_vfio_save_cleanup(vbasedev->name); > > +} > > + > > +static SaveVMHandlers savevm_vfio_handlers = { > > + .save_setup = vfio_save_setup, > > + .save_cleanup = vfio_save_cleanup, > > +}; > > + > > +/* ---------------------------------------------------------------------- */ > > + > > static void vfio_vmstate_change(void *opaque, int running, RunState state) > > { > > VFIODevice *vbasedev = opaque; > > @@ -225,6 +298,8 @@ static int vfio_migration_init(VFIODevice *vbasedev, > > struct vfio_region_info *info) > > { > > int ret = -EINVAL; > > + char id[256] = ""; > > + Object *obj; > > > > if (!vbasedev->ops->vfio_get_object) { > > return ret; > > @@ -241,6 +316,22 @@ static int vfio_migration_init(VFIODevice *vbasedev, > > return ret; > > } > > > > + obj = vbasedev->ops->vfio_get_object(vbasedev); > > + > > + if (obj) { > > + DeviceState *dev = DEVICE(obj); > > + char *oid = vmstate_if_get_id(VMSTATE_IF(dev)); > > + > > + if (oid) { > > + pstrcpy(id, sizeof(id), oid); > > + pstrcat(id, sizeof(id), "/"); > > + g_free(oid); > > + } > > + } > > + pstrcat(id, sizeof(id), "vfio"); > > Alternatively (easier to review, matter of taste): > > g_autofree char *path = NULL; > > if (oid) { > path = g_strdup_printf("%s/vfio", > vmstate_if_get_id(VMSTATE_IF(obj));); > } else { > path = g_strdup("vfio"); > } > strpadcpy(id, sizeof(id), path, '\0'); Maybe, although it's a straight copy of the magic in unregister_savevm. Dave > > + > > + register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers, > > + vbasedev); > > vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change, > > vbasedev); > > vbasedev->migration_state.notify = vfio_migration_state_notifier; > > diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events > > index bcb3fa7314d7..982d8dccb219 100644 > > --- a/hw/vfio/trace-events > > +++ b/hw/vfio/trace-events > > @@ -152,3 +152,5 @@ vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d" > > vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d" > > vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" > > vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" > > +vfio_save_setup(const char *name) " (%s)" > > +vfio_save_cleanup(const char *name) " (%s)" > > >
* Kirti Wankhede (kwankhede@nvidia.com) wrote: > Added amount of bytes transferred to the target VM by all VFIO devices > > Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> > --- > > Note: Comments from v25 for this patch are not addressed yet. > https://www.mail-archive.com/qemu-devel@nongnu.org/msg715620.html > > Alex, need more pointer on documentation part raised Markus Armbruster. I think I'm OK with this from the migration side, except for the minor 5.2's below, so Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Markus is right that we do have lots more information that falls out of the migration stats, but I'm not sure what more you have to collect. Dave > > hw/vfio/common.c | 20 ++++++++++++++++++++ > hw/vfio/migration.c | 10 ++++++++++ > include/qemu/vfio-helpers.h | 3 +++ > migration/migration.c | 14 ++++++++++++++ > monitor/hmp-cmds.c | 6 ++++++ > qapi/migration.json | 17 +++++++++++++++++ > 6 files changed, 70 insertions(+) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index 7eeaa368187a..286cdaac8674 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -39,6 +39,7 @@ > #include "trace.h" > #include "qapi/error.h" > #include "migration/migration.h" > +#include "qemu/vfio-helpers.h" > > VFIOGroupList vfio_group_list = > QLIST_HEAD_INITIALIZER(vfio_group_list); > @@ -292,6 +293,25 @@ const MemoryRegionOps vfio_region_ops = { > * Device state interfaces > */ > > +bool vfio_mig_active(void) > +{ > + VFIOGroup *group; > + VFIODevice *vbasedev; > + > + if (QLIST_EMPTY(&vfio_group_list)) { > + return false; > + } > + > + QLIST_FOREACH(group, &vfio_group_list, next) { > + QLIST_FOREACH(vbasedev, &group->device_list, next) { > + if (vbasedev->migration_blocker) { > + return false; > + } > + } > + } > + return true; > +} > + > static bool vfio_devices_all_stopped_and_saving(VFIOContainer *container) > { > VFIOGroup *group; > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index 822b68b4e015..c4226fa8b183 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -28,6 +28,7 @@ > #include "pci.h" > #include "trace.h" > #include "hw/hw.h" > +#include "qemu/vfio-helpers.h" > > /* > * Flags used as delimiter: > @@ -40,6 +41,8 @@ > #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) > #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) > > +static int64_t bytes_transferred; > + > static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count, > off_t off, bool iswrite) > { > @@ -289,6 +292,7 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size) > *size = data_size; > } > > + bytes_transferred += data_size; > return ret; > } > > @@ -770,6 +774,7 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data) > } > > vfio_set_dirty_page_tracking(vbasedev, false); > + bytes_transferred = 0; > } > } > > @@ -820,6 +825,11 @@ static int vfio_migration_init(VFIODevice *vbasedev, > > /* ---------------------------------------------------------------------- */ > > +int64_t vfio_mig_bytes_transferred(void) > +{ > + return bytes_transferred; > +} > + > int vfio_migration_probe(VFIODevice *vbasedev, Error **errp) > { > VFIOContainer *container = vbasedev->group->container; > diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h > index 1f057c2b9e40..26a7df0767b1 100644 > --- a/include/qemu/vfio-helpers.h > +++ b/include/qemu/vfio-helpers.h > @@ -29,4 +29,7 @@ void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, void *bar, > int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e, > int irq_type, Error **errp); > > +bool vfio_mig_active(void); > +int64_t vfio_mig_bytes_transferred(void); > + > #endif > diff --git a/migration/migration.c b/migration/migration.c > index 58a5452471f9..b204bb1f6cd9 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -56,6 +56,7 @@ > #include "net/announce.h" > #include "qemu/queue.h" > #include "multifd.h" > +#include "qemu/vfio-helpers.h" > > #define MAX_THROTTLE (32 << 20) /* Migration transfer speed throttling */ > > @@ -996,6 +997,17 @@ static void populate_disk_info(MigrationInfo *info) > } > } > > +static void populate_vfio_info(MigrationInfo *info) > +{ > +#ifdef CONFIG_LINUX > + if (vfio_mig_active()) { > + info->has_vfio = true; > + info->vfio = g_malloc0(sizeof(*info->vfio)); > + info->vfio->transferred = vfio_mig_bytes_transferred(); > + } > +#endif > +} > + > static void fill_source_migration_info(MigrationInfo *info) > { > MigrationState *s = migrate_get_current(); > @@ -1020,6 +1032,7 @@ static void fill_source_migration_info(MigrationInfo *info) > populate_time_info(info, s); > populate_ram_info(info, s); > populate_disk_info(info); > + populate_vfio_info(info); > break; > case MIGRATION_STATUS_COLO: > info->has_status = true; > @@ -1028,6 +1041,7 @@ static void fill_source_migration_info(MigrationInfo *info) > case MIGRATION_STATUS_COMPLETED: > populate_time_info(info, s); > populate_ram_info(info, s); > + populate_vfio_info(info); > break; > case MIGRATION_STATUS_FAILED: > info->has_status = true; > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c > index 7711726fd222..40d60d6a6651 100644 > --- a/monitor/hmp-cmds.c > +++ b/monitor/hmp-cmds.c > @@ -355,6 +355,12 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict) > } > monitor_printf(mon, "]\n"); > } > + > + if (info->has_vfio) { > + monitor_printf(mon, "vfio device transferred: %" PRIu64 " kbytes\n", > + info->vfio->transferred >> 10); > + } > + > qapi_free_MigrationInfo(info); > } > > diff --git a/qapi/migration.json b/qapi/migration.json > index 675f70bb6734..3535977123d3 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -147,6 +147,18 @@ > 'active', 'postcopy-active', 'postcopy-paused', > 'postcopy-recover', 'completed', 'failed', 'colo', > 'pre-switchover', 'device', 'wait-unplug' ] } > +## > +# @VfioStats: > +# > +# Detailed VFIO devices migration statistics > +# > +# @transferred: amount of bytes transferred to the target VM by VFIO devices > +# > +# Since: 5.1 > +# > +## > +{ 'struct': 'VfioStats', > + 'data': {'transferred': 'int' } } > > ## > # @MigrationInfo: > @@ -208,11 +220,16 @@ > # > # @socket-address: Only used for tcp, to know what the real port is (Since 4.0) > # > +# @vfio: @VfioStats containing detailed VFIO devices migration statistics, > +# only returned if VFIO device is present, migration is supported by all > +# VFIO devices and status is 'active' or 'completed' (since 5.1) > +# > # Since: 0.14.0 > ## > { 'struct': 'MigrationInfo', > 'data': {'*status': 'MigrationStatus', '*ram': 'MigrationStats', > '*disk': 'MigrationStats', > + '*vfio': 'VfioStats', > '*xbzrle-cache': 'XBZRLECacheStats', > '*total-time': 'int', > '*expected-downtime': 'int', > -- > 2.7.0 >
On 9/26/2020 1:50 AM, Alex Williamson wrote: > On Wed, 23 Sep 2020 04:54:08 +0530 > Kirti Wankhede <kwankhede@nvidia.com> wrote: > >> Added migration state change notifier to get notification on migration state >> change. These states are translated to VFIO device state and conveyed to vendor >> driver. >> >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> >> Reviewed-by: Neo Jia <cjia@nvidia.com> >> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> >> --- >> hw/vfio/migration.c | 29 +++++++++++++++++++++++++++++ >> hw/vfio/trace-events | 5 +++-- >> include/hw/vfio/vfio-common.h | 1 + >> 3 files changed, 33 insertions(+), 2 deletions(-) >> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index a30d628ba963..f650fe9fc3c8 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -199,6 +199,28 @@ static void vfio_vmstate_change(void *opaque, int running, RunState state) >> } >> } >> >> +static void vfio_migration_state_notifier(Notifier *notifier, void *data) >> +{ >> + MigrationState *s = data; >> + VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state); >> + int ret; >> + >> + trace_vfio_migration_state_notifier(vbasedev->name, >> + MigrationStatus_str(s->state)); >> + >> + switch (s->state) { >> + case MIGRATION_STATUS_CANCELLING: >> + case MIGRATION_STATUS_CANCELLED: >> + case MIGRATION_STATUS_FAILED: >> + ret = vfio_migration_set_state(vbasedev, >> + ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING), >> + VFIO_DEVICE_STATE_RUNNING); >> + if (ret) { >> + error_report("%s: Failed to set state RUNNING", vbasedev->name); >> + } > > Here again the caller assumes success means the device has entered the > desired state, but as implemented it only means the device is in some > non-error state. > >> + } >> +} >> + >> static int vfio_migration_init(VFIODevice *vbasedev, >> struct vfio_region_info *info) >> { >> @@ -221,6 +243,8 @@ static int vfio_migration_init(VFIODevice *vbasedev, >> >> vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change, >> vbasedev); >> + vbasedev->migration_state.notify = vfio_migration_state_notifier; >> + add_migration_state_change_notifier(&vbasedev->migration_state); >> return ret; >> } >> >> @@ -263,6 +287,11 @@ add_blocker: >> >> void vfio_migration_finalize(VFIODevice *vbasedev) >> { >> + >> + if (vbasedev->migration_state.notify) { >> + remove_migration_state_change_notifier(&vbasedev->migration_state); >> + } >> + >> if (vbasedev->vm_state) { >> qemu_del_vm_change_state_handler(vbasedev->vm_state); >> } >> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events >> index 6524734bf7b4..bcb3fa7314d7 100644 >> --- a/hw/vfio/trace-events >> +++ b/hw/vfio/trace-events >> @@ -149,5 +149,6 @@ vfio_display_edid_write_error(void) "" >> >> # migration.c >> vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d" >> -vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d" >> -vfio_vmstate_change(char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" >> +vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d" >> +vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" >> +vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h >> index 25e3b1a3b90a..49c7c7a0e29a 100644 >> --- a/include/hw/vfio/vfio-common.h >> +++ b/include/hw/vfio/vfio-common.h >> @@ -123,6 +123,7 @@ typedef struct VFIODevice { >> VMChangeStateEntry *vm_state; >> uint32_t device_state; >> int vm_running; >> + Notifier migration_state; > > Can this live in VFIOMigration? Thanks, > No, callback vfio_migration_state_notifier() has notifier argument and to reach its corresponding device structure as below, its should be in VFIODevice. VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state); Thanks, Kirti > Alex > >> } VFIODevice; >> >> struct VFIODeviceOps { >
On 9/29/2020 3:49 PM, Dr. David Alan Gilbert wrote: > * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: >> On 9/23/20 1:24 AM, Kirti Wankhede wrote: >>> Define flags to be used as delimeter in migration file stream. >> >> Typo "delimiter". >> >>> Added .save_setup and .save_cleanup functions. Mapped & unmapped migration >>> region from these functions at source during saving or pre-copy phase. >>> Set VFIO device state depending on VM's state. During live migration, VM is >>> running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO >>> device. During save-restore, VM is paused, _SAVING state is set for VFIO device. >>> >>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> >>> Reviewed-by: Neo Jia <cjia@nvidia.com> >>> --- >>> hw/vfio/migration.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> hw/vfio/trace-events | 2 ++ >>> 2 files changed, 93 insertions(+) >>> >>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >>> index f650fe9fc3c8..8e8adaa25779 100644 >>> --- a/hw/vfio/migration.c >>> +++ b/hw/vfio/migration.c >>> @@ -8,12 +8,15 @@ >>> */ >>> >>> #include "qemu/osdep.h" >>> +#include "qemu/main-loop.h" >>> +#include "qemu/cutils.h" >>> #include <linux/vfio.h> >>> >>> #include "sysemu/runstate.h" >>> #include "hw/vfio/vfio-common.h" >>> #include "cpu.h" >>> #include "migration/migration.h" >>> +#include "migration/vmstate.h" >>> #include "migration/qemu-file.h" >>> #include "migration/register.h" >>> #include "migration/blocker.h" >>> @@ -25,6 +28,17 @@ >>> #include "trace.h" >>> #include "hw/hw.h" >>> >>> +/* >>> + * Flags used as delimiter: >>> + * 0xffffffff => MSB 32-bit all 1s >>> + * 0xef10 => emulated (virtual) function IO >>> + * 0x0000 => 16-bits reserved for flags >>> + */ >>> +#define VFIO_MIG_FLAG_END_OF_STATE (0xffffffffef100001ULL) >>> +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL) >>> +#define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) >>> +#define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) >>> + >>> static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count, >>> off_t off, bool iswrite) >>> { >>> @@ -166,6 +180,65 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask, >>> return 0; >>> } >>> >>> +/* ---------------------------------------------------------------------- */ >>> + >>> +static int vfio_save_setup(QEMUFile *f, void *opaque) >>> +{ >>> + VFIODevice *vbasedev = opaque; >>> + VFIOMigration *migration = vbasedev->migration; >>> + int ret; >>> + >>> + trace_vfio_save_setup(vbasedev->name); >>> + >>> + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); >>> + >>> + if (migration->region.mmaps) { >>> + qemu_mutex_lock_iothread(); >>> + ret = vfio_region_mmap(&migration->region); >>> + qemu_mutex_unlock_iothread(); >>> + if (ret) { >>> + error_report("%s: Failed to mmap VFIO migration region %d: %s", >>> + vbasedev->name, migration->region.nr, >>> + strerror(-ret)); >>> + error_report("%s: Falling back to slow path", vbasedev->name); >>> + } >>> + } >>> + >>> + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK, >>> + VFIO_DEVICE_STATE_SAVING); >>> + if (ret) { >>> + error_report("%s: Failed to set state SAVING", vbasedev->name); >>> + return ret; >>> + } >>> + >>> + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); >>> + >>> + ret = qemu_file_get_error(f); >>> + if (ret) { >>> + return ret; >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +static void vfio_save_cleanup(void *opaque) >>> +{ >>> + VFIODevice *vbasedev = opaque; >>> + VFIOMigration *migration = vbasedev->migration; >>> + >>> + if (migration->region.mmaps) { >>> + vfio_region_unmap(&migration->region); >>> + } >>> + trace_vfio_save_cleanup(vbasedev->name); >>> +} >>> + >>> +static SaveVMHandlers savevm_vfio_handlers = { >>> + .save_setup = vfio_save_setup, >>> + .save_cleanup = vfio_save_cleanup, >>> +}; >>> + >>> +/* ---------------------------------------------------------------------- */ >>> + >>> static void vfio_vmstate_change(void *opaque, int running, RunState state) >>> { >>> VFIODevice *vbasedev = opaque; >>> @@ -225,6 +298,8 @@ static int vfio_migration_init(VFIODevice *vbasedev, >>> struct vfio_region_info *info) >>> { >>> int ret = -EINVAL; >>> + char id[256] = ""; >>> + Object *obj; >>> >>> if (!vbasedev->ops->vfio_get_object) { >>> return ret; >>> @@ -241,6 +316,22 @@ static int vfio_migration_init(VFIODevice *vbasedev, >>> return ret; >>> } >>> >>> + obj = vbasedev->ops->vfio_get_object(vbasedev); >>> + >>> + if (obj) { >>> + DeviceState *dev = DEVICE(obj); >>> + char *oid = vmstate_if_get_id(VMSTATE_IF(dev)); >>> + >>> + if (oid) { >>> + pstrcpy(id, sizeof(id), oid); >>> + pstrcat(id, sizeof(id), "/"); >>> + g_free(oid); >>> + } >>> + } >>> + pstrcat(id, sizeof(id), "vfio"); >> >> Alternatively (easier to review, matter of taste): >> >> g_autofree char *path = NULL; >> >> if (oid) { >> path = g_strdup_printf("%s/vfio", >> vmstate_if_get_id(VMSTATE_IF(obj));); >> } else { >> path = g_strdup("vfio"); >> } >> strpadcpy(id, sizeof(id), path, '\0'); > > Maybe, although it's a straight copy of the magic in unregister_savevm. > Ok. Changing it as Philippe suggested. Thanks, Kirti > Dave > >>> + >>> + register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers, >>> + vbasedev); >>> vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change, >>> vbasedev); >>> vbasedev->migration_state.notify = vfio_migration_state_notifier; >>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events >>> index bcb3fa7314d7..982d8dccb219 100644 >>> --- a/hw/vfio/trace-events >>> +++ b/hw/vfio/trace-events >>> @@ -152,3 +152,5 @@ vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d" >>> vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d" >>> vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" >>> vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" >>> +vfio_save_setup(const char *name) " (%s)" >>> +vfio_save_cleanup(const char *name) " (%s)" >>> >>
On 9/26/2020 1:50 AM, Alex Williamson wrote: > On Wed, 23 Sep 2020 04:54:09 +0530 > Kirti Wankhede <kwankhede@nvidia.com> wrote: > >> Define flags to be used as delimeter in migration file stream. >> Added .save_setup and .save_cleanup functions. Mapped & unmapped migration >> region from these functions at source during saving or pre-copy phase. >> Set VFIO device state depending on VM's state. During live migration, VM is >> running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO >> device. During save-restore, VM is paused, _SAVING state is set for VFIO device. >> >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> >> Reviewed-by: Neo Jia <cjia@nvidia.com> >> --- >> hw/vfio/migration.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> hw/vfio/trace-events | 2 ++ >> 2 files changed, 93 insertions(+) >> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index f650fe9fc3c8..8e8adaa25779 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -8,12 +8,15 @@ >> */ >> >> #include "qemu/osdep.h" >> +#include "qemu/main-loop.h" >> +#include "qemu/cutils.h" >> #include <linux/vfio.h> >> >> #include "sysemu/runstate.h" >> #include "hw/vfio/vfio-common.h" >> #include "cpu.h" >> #include "migration/migration.h" >> +#include "migration/vmstate.h" >> #include "migration/qemu-file.h" >> #include "migration/register.h" >> #include "migration/blocker.h" >> @@ -25,6 +28,17 @@ >> #include "trace.h" >> #include "hw/hw.h" >> >> +/* >> + * Flags used as delimiter: >> + * 0xffffffff => MSB 32-bit all 1s >> + * 0xef10 => emulated (virtual) function IO >> + * 0x0000 => 16-bits reserved for flags >> + */ >> +#define VFIO_MIG_FLAG_END_OF_STATE (0xffffffffef100001ULL) >> +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL) >> +#define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) >> +#define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) >> + >> static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count, >> off_t off, bool iswrite) >> { >> @@ -166,6 +180,65 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask, >> return 0; >> } >> >> +/* ---------------------------------------------------------------------- */ >> + >> +static int vfio_save_setup(QEMUFile *f, void *opaque) >> +{ >> + VFIODevice *vbasedev = opaque; >> + VFIOMigration *migration = vbasedev->migration; >> + int ret; >> + >> + trace_vfio_save_setup(vbasedev->name); >> + >> + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); >> + >> + if (migration->region.mmaps) { >> + qemu_mutex_lock_iothread(); >> + ret = vfio_region_mmap(&migration->region); >> + qemu_mutex_unlock_iothread(); > > Please add a comment identifying why the iothread mutex lock is > necessary here. > >> + if (ret) { >> + error_report("%s: Failed to mmap VFIO migration region %d: %s", >> + vbasedev->name, migration->region.nr, > > We don't support multiple migration regions, is it useful to include > the region index here? > Ok. Removing region.nr >> + strerror(-ret)); >> + error_report("%s: Falling back to slow path", vbasedev->name); >> + } >> + } >> + >> + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK, >> + VFIO_DEVICE_STATE_SAVING); >> + if (ret) { >> + error_report("%s: Failed to set state SAVING", vbasedev->name); >> + return ret; >> + } > > Again, doesn't match the function semantics that success only means the > device is in a non-error state, maybe the one that was asked for. > Fixed in patch 05. >> + >> + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); > > What's the overall purpose of writing these markers into the migration > stream? vfio_load_state() doesn't do anything with this other than > validate that the end-of-state immediately follows. Is this a > placeholder for something in the future? > Its not placeholder, it is used in vfio_load_state() to determine upto what point to loop to fetch data for each state, otherwise how would we know when to stop reading data from stream for that VFIO device. >> + >> + ret = qemu_file_get_error(f); >> + if (ret) { >> + return ret; >> + } >> + >> + return 0; >> +} >> + >> +static void vfio_save_cleanup(void *opaque) >> +{ >> + VFIODevice *vbasedev = opaque; >> + VFIOMigration *migration = vbasedev->migration; >> + >> + if (migration->region.mmaps) { >> + vfio_region_unmap(&migration->region); >> + } >> + trace_vfio_save_cleanup(vbasedev->name); >> +} >> + >> +static SaveVMHandlers savevm_vfio_handlers = { >> + .save_setup = vfio_save_setup, >> + .save_cleanup = vfio_save_cleanup, >> +}; >> + >> +/* ---------------------------------------------------------------------- */ >> + >> static void vfio_vmstate_change(void *opaque, int running, RunState state) >> { >> VFIODevice *vbasedev = opaque; >> @@ -225,6 +298,8 @@ static int vfio_migration_init(VFIODevice *vbasedev, >> struct vfio_region_info *info) >> { >> int ret = -EINVAL; >> + char id[256] = ""; >> + Object *obj; >> >> if (!vbasedev->ops->vfio_get_object) { >> return ret; >> @@ -241,6 +316,22 @@ static int vfio_migration_init(VFIODevice *vbasedev, >> return ret; >> } >> >> + obj = vbasedev->ops->vfio_get_object(vbasedev); >> + >> + if (obj) { >> + DeviceState *dev = DEVICE(obj); >> + char *oid = vmstate_if_get_id(VMSTATE_IF(dev)); >> + >> + if (oid) { >> + pstrcpy(id, sizeof(id), oid); >> + pstrcat(id, sizeof(id), "/"); >> + g_free(oid); >> + } >> + } > > Here's where vfio_migration_init() starts using vfio_get_object() as I > referenced back on patch 04. We might as well get the object before > calling vfio_migration_region_init() and then pass the object. The > conditional branch to handle obj is strange here too, it's fatal if > vfio_migration_region_init() doesn't find an object, why do we handle > it as optional here? Also, what is this doing? Comments would be > nice... Thanks, > Changing it as I mentioned in other patch reply. Thanks. > Alex > >> + pstrcat(id, sizeof(id), "vfio"); >> + >> + register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers, >> + vbasedev); >> vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change, >> vbasedev); >> vbasedev->migration_state.notify = vfio_migration_state_notifier; >> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events >> index bcb3fa7314d7..982d8dccb219 100644 >> --- a/hw/vfio/trace-events >> +++ b/hw/vfio/trace-events >> @@ -152,3 +152,5 @@ vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d" >> vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d" >> vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" >> vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" >> +vfio_save_setup(const char *name) " (%s)" >> +vfio_save_cleanup(const char *name) " (%s)" >
On 9/26/2020 2:32 AM, Alex Williamson wrote: > On Wed, 23 Sep 2020 04:54:10 +0530 > Kirti Wankhede <kwankhede@nvidia.com> wrote: > >> Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy >> functions. These functions handles pre-copy and stop-and-copy phase. >> >> In _SAVING|_RUNNING device state or pre-copy phase: >> - read pending_bytes. If pending_bytes > 0, go through below steps. >> - read data_offset - indicates kernel driver to write data to staging >> buffer. >> - read data_size - amount of data in bytes written by vendor driver in >> migration region. >> - read data_size bytes of data from data_offset in the migration region. >> - Write data packet to file stream as below: >> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data, >> VFIO_MIG_FLAG_END_OF_STATE } >> >> In _SAVING device state or stop-and-copy phase >> a. read config space of device and save to migration file stream. This >> doesn't need to be from vendor driver. Any other special config state >> from driver can be saved as data in following iteration. >> b. read pending_bytes. If pending_bytes > 0, go through below steps. >> c. read data_offset - indicates kernel driver to write data to staging >> buffer. >> d. read data_size - amount of data in bytes written by vendor driver in >> migration region. >> e. read data_size bytes of data from data_offset in the migration region. >> f. Write data packet as below: >> {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data} >> g. iterate through steps b to f while (pending_bytes > 0) >> h. Write {VFIO_MIG_FLAG_END_OF_STATE} >> >> When data region is mapped, its user's responsibility to read data from >> data_offset of data_size before moving to next steps. >> >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> >> Reviewed-by: Neo Jia <cjia@nvidia.com> >> --- >> hw/vfio/migration.c | 273 ++++++++++++++++++++++++++++++++++++++++++ >> hw/vfio/trace-events | 6 + >> include/hw/vfio/vfio-common.h | 1 + >> 3 files changed, 280 insertions(+) >> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index 8e8adaa25779..4611bb972228 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -180,6 +180,154 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask, >> return 0; >> } >> >> +static void *get_data_section_size(VFIORegion *region, uint64_t data_offset, >> + uint64_t data_size, uint64_t *size) >> +{ >> + void *ptr = NULL; >> + uint64_t limit = 0; >> + int i; >> + >> + if (!region->mmaps) { >> + if (size) { >> + *size = data_size; >> + } >> + return ptr; >> + } >> + >> + for (i = 0; i < region->nr_mmaps; i++) { >> + VFIOMmap *map = region->mmaps + i; >> + >> + if ((data_offset >= map->offset) && >> + (data_offset < map->offset + map->size)) { >> + >> + /* check if data_offset is within sparse mmap areas */ >> + ptr = map->mmap + data_offset - map->offset; >> + if (size) { >> + *size = MIN(data_size, map->offset + map->size - data_offset); >> + } >> + break; >> + } else if ((data_offset < map->offset) && >> + (!limit || limit > map->offset)) { >> + /* >> + * data_offset is not within sparse mmap areas, find size of >> + * non-mapped area. Check through all list since region->mmaps list >> + * is not sorted. >> + */ >> + limit = map->offset; >> + } >> + } >> + >> + if (!ptr && size) { >> + *size = limit ? limit - data_offset : data_size; > > 'limit - data_offset' doesn't take data_size into account, this should > be MIN(data_size, limit - data_offset). > Done. >> + } >> + return ptr; >> +} >> + >> +static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size) >> +{ >> + VFIOMigration *migration = vbasedev->migration; >> + VFIORegion *region = &migration->region; >> + uint64_t data_offset = 0, data_size = 0, sz; >> + int ret; >> + >> + ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset), >> + region->fd_offset + offsetof(struct vfio_device_migration_info, >> + data_offset)); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + ret = vfio_mig_read(vbasedev, &data_size, sizeof(data_size), >> + region->fd_offset + offsetof(struct vfio_device_migration_info, >> + data_size)); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + trace_vfio_save_buffer(vbasedev->name, data_offset, data_size, >> + migration->pending_bytes); >> + >> + qemu_put_be64(f, data_size); >> + sz = data_size; >> + >> + while (sz) { >> + void *buf = NULL; > > Unnecessary initialization. > >> + uint64_t sec_size; >> + bool buf_allocated = false; >> + >> + buf = get_data_section_size(region, data_offset, sz, &sec_size); >> + >> + if (!buf) { >> + buf = g_try_malloc(sec_size); >> + if (!buf) { >> + error_report("%s: Error allocating buffer ", __func__); >> + return -ENOMEM; >> + } >> + buf_allocated = true; >> + >> + ret = vfio_mig_read(vbasedev, buf, sec_size, >> + region->fd_offset + data_offset); >> + if (ret < 0) { >> + g_free(buf); >> + return ret; >> + } >> + } >> + >> + qemu_put_buffer(f, buf, sec_size); >> + >> + if (buf_allocated) { >> + g_free(buf); >> + } >> + sz -= sec_size; >> + data_offset += sec_size; >> + } >> + >> + ret = qemu_file_get_error(f); >> + >> + if (!ret && size) { >> + *size = data_size; >> + } >> + >> + return ret; >> +} >> + >> +static int vfio_update_pending(VFIODevice *vbasedev) >> +{ >> + VFIOMigration *migration = vbasedev->migration; >> + VFIORegion *region = &migration->region; >> + uint64_t pending_bytes = 0; >> + int ret; >> + >> + ret = vfio_mig_read(vbasedev, &pending_bytes, sizeof(pending_bytes), >> + region->fd_offset + offsetof(struct vfio_device_migration_info, >> + pending_bytes)); >> + if (ret < 0) { >> + migration->pending_bytes = 0; >> + return ret; >> + } >> + >> + migration->pending_bytes = pending_bytes; >> + trace_vfio_update_pending(vbasedev->name, pending_bytes); >> + return 0; >> +} >> + >> +static int vfio_save_device_config_state(QEMUFile *f, void *opaque) >> +{ >> + VFIODevice *vbasedev = opaque; >> + >> + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_CONFIG_STATE); >> + >> + if (vbasedev->ops && vbasedev->ops->vfio_save_config) { >> + vbasedev->ops->vfio_save_config(vbasedev, f); >> + } >> + >> + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); >> + >> + trace_vfio_save_device_config_state(vbasedev->name); >> + >> + return qemu_file_get_error(f); >> +} >> + >> /* ---------------------------------------------------------------------- */ >> >> static int vfio_save_setup(QEMUFile *f, void *opaque) >> @@ -232,9 +380,134 @@ static void vfio_save_cleanup(void *opaque) >> trace_vfio_save_cleanup(vbasedev->name); >> } >> >> +static void vfio_save_pending(QEMUFile *f, void *opaque, >> + uint64_t threshold_size, >> + uint64_t *res_precopy_only, >> + uint64_t *res_compatible, >> + uint64_t *res_postcopy_only) >> +{ >> + VFIODevice *vbasedev = opaque; >> + VFIOMigration *migration = vbasedev->migration; >> + int ret; >> + >> + ret = vfio_update_pending(vbasedev); >> + if (ret) { >> + return; >> + } >> + >> + *res_precopy_only += migration->pending_bytes; >> + >> + trace_vfio_save_pending(vbasedev->name, *res_precopy_only, >> + *res_postcopy_only, *res_compatible); >> +} >> + >> +static int vfio_save_iterate(QEMUFile *f, void *opaque) >> +{ >> + VFIODevice *vbasedev = opaque; >> + VFIOMigration *migration = vbasedev->migration; >> + uint64_t data_size; >> + int ret; >> + >> + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE); >> + >> + if (migration->pending_bytes == 0) { >> + ret = vfio_update_pending(vbasedev); >> + if (ret) { >> + return ret; >> + } >> + >> + if (migration->pending_bytes == 0) { >> + /* indicates data finished, goto complete phase */ >> + return 1; >> + } >> + } >> + >> + ret = vfio_save_buffer(f, vbasedev, &data_size); >> + >> + if (ret) { >> + error_report("%s: vfio_save_buffer failed %s", vbasedev->name, >> + strerror(errno)); >> + return ret; >> + } >> + >> + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); >> + >> + ret = qemu_file_get_error(f); >> + if (ret) { >> + return ret; >> + } >> + >> + trace_vfio_save_iterate(vbasedev->name, data_size); >> + >> + return 0; >> +} >> + >> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) >> +{ >> + VFIODevice *vbasedev = opaque; >> + VFIOMigration *migration = vbasedev->migration; >> + uint64_t data_size; >> + int ret; >> + >> + ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_RUNNING, >> + VFIO_DEVICE_STATE_SAVING); >> + if (ret) { >> + error_report("%s: Failed to set state STOP and SAVING", >> + vbasedev->name); >> + return ret; >> + } > > This also assumes success implies desired state. > >> + >> + ret = vfio_save_device_config_state(f, opaque); >> + if (ret) { >> + return ret; >> + } >> + >> + ret = vfio_update_pending(vbasedev); >> + if (ret) { >> + return ret; >> + } >> + >> + while (migration->pending_bytes > 0) { >> + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE); >> + ret = vfio_save_buffer(f, vbasedev, &data_size); >> + if (ret < 0) { >> + error_report("%s: Failed to save buffer", vbasedev->name); >> + return ret; >> + } >> + >> + if (data_size == 0) { >> + break; >> + } >> + >> + ret = vfio_update_pending(vbasedev); >> + if (ret) { >> + return ret; >> + } >> + } >> + >> + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); >> + >> + ret = qemu_file_get_error(f); >> + if (ret) { >> + return ret; >> + } >> + >> + ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_SAVING, 0); >> + if (ret) { >> + error_report("%s: Failed to set state STOPPED", vbasedev->name); >> + return ret; >> + } > > And another. Thanks, > Fixing in patch 5. Thanks, Kirti > Alex > >> + >> + trace_vfio_save_complete_precopy(vbasedev->name); >> + return ret; >> +} >> + >> static SaveVMHandlers savevm_vfio_handlers = { >> .save_setup = vfio_save_setup, >> .save_cleanup = vfio_save_cleanup, >> + .save_live_pending = vfio_save_pending, >> + .save_live_iterate = vfio_save_iterate, >> + .save_live_complete_precopy = vfio_save_complete_precopy, >> }; >> >> /* ---------------------------------------------------------------------- */ >> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events >> index 982d8dccb219..118b5547c921 100644 >> --- a/hw/vfio/trace-events >> +++ b/hw/vfio/trace-events >> @@ -154,3 +154,9 @@ vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t >> vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" >> vfio_save_setup(const char *name) " (%s)" >> vfio_save_cleanup(const char *name) " (%s)" >> +vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64 >> +vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64 >> +vfio_save_device_config_state(const char *name) " (%s)" >> +vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64 >> +vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d" >> +vfio_save_complete_precopy(const char *name) " (%s)" >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h >> index 49c7c7a0e29a..471e444a364c 100644 >> --- a/include/hw/vfio/vfio-common.h >> +++ b/include/hw/vfio/vfio-common.h >> @@ -60,6 +60,7 @@ typedef struct VFIORegion { >> >> typedef struct VFIOMigration { >> VFIORegion region; >> + uint64_t pending_bytes; >> } VFIOMigration; >> >> typedef struct VFIOAddressSpace { >
On 9/26/2020 3:25 AM, Alex Williamson wrote: > On Wed, 23 Sep 2020 04:54:14 +0530 > Kirti Wankhede <kwankhede@nvidia.com> wrote: > >> Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking >> for VFIO devices. >> >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> >> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> >> --- >> hw/vfio/migration.c | 36 ++++++++++++++++++++++++++++++++++++ >> 1 file changed, 36 insertions(+) >> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index 4306f6316417..822b68b4e015 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -11,6 +11,7 @@ >> #include "qemu/main-loop.h" >> #include "qemu/cutils.h" >> #include <linux/vfio.h> >> +#include <sys/ioctl.h> >> >> #include "sysemu/runstate.h" >> #include "hw/vfio/vfio-common.h" >> @@ -355,6 +356,32 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque) >> return qemu_file_get_error(f); >> } >> >> +static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start) >> +{ >> + int ret; >> + VFIOContainer *container = vbasedev->group->container; >> + struct vfio_iommu_type1_dirty_bitmap dirty = { >> + .argsz = sizeof(dirty), >> + }; >> + >> + if (start) { >> + if (vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) { >> + dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START; >> + } else { >> + return -EINVAL; >> + } >> + } else { >> + dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP; >> + } >> + >> + ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty); >> + if (ret) { >> + error_report("Failed to set dirty tracking flag 0x%x errno: %d", >> + dirty.flags, errno); >> + } > > Maybe doesn't matter in the long run, but do you want to use -errno for > the return rather than -1 from the ioctl on error? Thanks, > Makes sense. Changing it. Thanks, Kirti > Alex > >> + return ret; >> +} >> + >> /* ---------------------------------------------------------------------- */ >> >> static int vfio_save_setup(QEMUFile *f, void *opaque) >> @@ -386,6 +413,11 @@ static int vfio_save_setup(QEMUFile *f, void *opaque) >> return ret; >> } >> >> + ret = vfio_set_dirty_page_tracking(vbasedev, true); >> + if (ret) { >> + return ret; >> + } >> + >> qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); >> >> ret = qemu_file_get_error(f); >> @@ -401,6 +433,8 @@ static void vfio_save_cleanup(void *opaque) >> VFIODevice *vbasedev = opaque; >> VFIOMigration *migration = vbasedev->migration; >> >> + vfio_set_dirty_page_tracking(vbasedev, false); >> + >> if (migration->region.mmaps) { >> vfio_region_unmap(&migration->region); >> } >> @@ -734,6 +768,8 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data) >> if (ret) { >> error_report("%s: Failed to set state RUNNING", vbasedev->name); >> } >> + >> + vfio_set_dirty_page_tracking(vbasedev, false); >> } >> } >> >
On Sun, 18 Oct 2020 02:05:03 +0530 Kirti Wankhede <kwankhede@nvidia.com> wrote: > On 9/26/2020 1:50 AM, Alex Williamson wrote: > > On Wed, 23 Sep 2020 04:54:08 +0530 > > Kirti Wankhede <kwankhede@nvidia.com> wrote: > > > >> Added migration state change notifier to get notification on migration state > >> change. These states are translated to VFIO device state and conveyed to vendor > >> driver. > >> > >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> > >> Reviewed-by: Neo Jia <cjia@nvidia.com> > >> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > >> --- > >> hw/vfio/migration.c | 29 +++++++++++++++++++++++++++++ > >> hw/vfio/trace-events | 5 +++-- > >> include/hw/vfio/vfio-common.h | 1 + > >> 3 files changed, 33 insertions(+), 2 deletions(-) > >> > >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > >> index a30d628ba963..f650fe9fc3c8 100644 > >> --- a/hw/vfio/migration.c > >> +++ b/hw/vfio/migration.c > >> @@ -199,6 +199,28 @@ static void vfio_vmstate_change(void *opaque, int running, RunState state) > >> } > >> } > >> > >> +static void vfio_migration_state_notifier(Notifier *notifier, void *data) > >> +{ > >> + MigrationState *s = data; > >> + VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state); > >> + int ret; > >> + > >> + trace_vfio_migration_state_notifier(vbasedev->name, > >> + MigrationStatus_str(s->state)); > >> + > >> + switch (s->state) { > >> + case MIGRATION_STATUS_CANCELLING: > >> + case MIGRATION_STATUS_CANCELLED: > >> + case MIGRATION_STATUS_FAILED: > >> + ret = vfio_migration_set_state(vbasedev, > >> + ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING), > >> + VFIO_DEVICE_STATE_RUNNING); > >> + if (ret) { > >> + error_report("%s: Failed to set state RUNNING", vbasedev->name); > >> + } > > > > Here again the caller assumes success means the device has entered the > > desired state, but as implemented it only means the device is in some > > non-error state. > > > >> + } > >> +} > >> + > >> static int vfio_migration_init(VFIODevice *vbasedev, > >> struct vfio_region_info *info) > >> { > >> @@ -221,6 +243,8 @@ static int vfio_migration_init(VFIODevice *vbasedev, > >> > >> vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change, > >> vbasedev); > >> + vbasedev->migration_state.notify = vfio_migration_state_notifier; > >> + add_migration_state_change_notifier(&vbasedev->migration_state); > >> return ret; > >> } > >> > >> @@ -263,6 +287,11 @@ add_blocker: > >> > >> void vfio_migration_finalize(VFIODevice *vbasedev) > >> { > >> + > >> + if (vbasedev->migration_state.notify) { > >> + remove_migration_state_change_notifier(&vbasedev->migration_state); > >> + } > >> + > >> if (vbasedev->vm_state) { > >> qemu_del_vm_change_state_handler(vbasedev->vm_state); > >> } > >> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events > >> index 6524734bf7b4..bcb3fa7314d7 100644 > >> --- a/hw/vfio/trace-events > >> +++ b/hw/vfio/trace-events > >> @@ -149,5 +149,6 @@ vfio_display_edid_write_error(void) "" > >> > >> # migration.c > >> vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d" > >> -vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d" > >> -vfio_vmstate_change(char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" > >> +vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d" > >> +vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d" > >> +vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s" > >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h > >> index 25e3b1a3b90a..49c7c7a0e29a 100644 > >> --- a/include/hw/vfio/vfio-common.h > >> +++ b/include/hw/vfio/vfio-common.h > >> @@ -123,6 +123,7 @@ typedef struct VFIODevice { > >> VMChangeStateEntry *vm_state; > >> uint32_t device_state; > >> int vm_running; > >> + Notifier migration_state; > > > > Can this live in VFIOMigration? Thanks, > > > > No, callback vfio_migration_state_notifier() has notifier argument and > to reach its corresponding device structure as below, its should be in > VFIODevice. > > VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state); An alternative would be to place migration_state within VFIOMigration, along with a pointer back to vbasedev (like we do in VFIORegion) then the notifier could use container_of to get the VFIOMigration structure, from which we could get to the VFIODevice via the vbasedev pointer. This would better compartmentalize the migration related data into a single structure. Thanks, Alex
On Mon, 19 Oct 2020 11:57:47 -0600 Alex Williamson <alex.williamson@redhat.com> wrote: > On Sun, 18 Oct 2020 02:05:03 +0530 > Kirti Wankhede <kwankhede@nvidia.com> wrote: > > > On 9/26/2020 1:50 AM, Alex Williamson wrote: > > > On Wed, 23 Sep 2020 04:54:08 +0530 > > > Kirti Wankhede <kwankhede@nvidia.com> wrote: > > >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h > > >> index 25e3b1a3b90a..49c7c7a0e29a 100644 > > >> --- a/include/hw/vfio/vfio-common.h > > >> +++ b/include/hw/vfio/vfio-common.h > > >> @@ -123,6 +123,7 @@ typedef struct VFIODevice { > > >> VMChangeStateEntry *vm_state; > > >> uint32_t device_state; > > >> int vm_running; > > >> + Notifier migration_state; > > > > > > Can this live in VFIOMigration? Thanks, > > > > > > > No, callback vfio_migration_state_notifier() has notifier argument and > > to reach its corresponding device structure as below, its should be in > > VFIODevice. > > > > VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state); > > An alternative would be to place migration_state within VFIOMigration, > along with a pointer back to vbasedev (like we do in VFIORegion) then > the notifier could use container_of to get the VFIOMigration structure, > from which we could get to the VFIODevice via the vbasedev pointer. > This would better compartmentalize the migration related data into a > single structure. Thanks, > > Alex +1, I think having everything in VFIOMigration would be nicer.
On Mon, 19 Oct 2020 02:25:28 +0530 Kirti Wankhede <kwankhede@nvidia.com> wrote: > On 9/25/2020 5:23 PM, Cornelia Huck wrote: > > On Wed, 23 Sep 2020 04:54:09 +0530 > > Kirti Wankhede <kwankhede@nvidia.com> wrote: > > > >> Define flags to be used as delimeter in migration file stream. > >> Added .save_setup and .save_cleanup functions. Mapped & unmapped migration > >> region from these functions at source during saving or pre-copy phase. > >> Set VFIO device state depending on VM's state. During live migration, VM is > >> running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO > >> device. During save-restore, VM is paused, _SAVING state is set for VFIO device. > >> > >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> > >> Reviewed-by: Neo Jia <cjia@nvidia.com> > >> --- > >> hw/vfio/migration.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> hw/vfio/trace-events | 2 ++ > >> 2 files changed, 93 insertions(+) > >> > > > > (...) > > > >> +/* > >> + * Flags used as delimiter: > >> + * 0xffffffff => MSB 32-bit all 1s > >> + * 0xef10 => emulated (virtual) function IO > > > > Where is this value coming from? > > > > Delimiter flags should be unique and this is a magic number that > represents (e)mulated (f)unction (10) representing IO. > > >> + * 0x0000 => 16-bits reserved for flags > >> + */ > >> +#define VFIO_MIG_FLAG_END_OF_STATE (0xffffffffef100001ULL) > >> +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL) > >> +#define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) > >> +#define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) > > > > I think we need some more documentation what these values mean and how > > they are used. From reading ahead a bit, it seems there is always > > supposed to be a pair of DEV_*_STATE and END_OF_STATE framing some kind > > of data? > > > > Adding comment as below, hope it helps. > > /* > * Flags used as delimiter for VFIO devices should be unique in > migration stream Maybe "Flags to be used as unique delimiters for VFIO devices in the migration stream" ? > * These flags are composed as: > * 0xffffffff => MSB 32-bit all 1s > * 0xef10 => Magic ID, represents emulated (virtual) function IO > * 0x0000 => 16-bits reserved for flags > * > * Flags _DEV_CONFIG_STATE, _DEV_SETUP_STATE and _DEV_DATA_STATE marks > start of > * respective states in migration stream. > * FLAG _END_OF_STATE indicates end of current state, state could be any > * of above states. > */ "The beginning of state information is marked by _DEV_CONFIG_STATE, _DEV_SETUP_STATE, or _DEV_DATA_STATE, respectively. The end of a certain state information is marked by _END_OF_STATE." ? > > Thanks, > Kirti >