mbox series

[v5,00/10] migration: bring improved savevm/loadvm/delvm to QMP

Message ID 20201002162747.3123597-1-berrange@redhat.com
Headers show
Series migration: bring improved savevm/loadvm/delvm to QMP | expand

Message

Daniel P. Berrangé Oct. 2, 2020, 4:27 p.m. UTC
v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html
 v2: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg07523.html
 v3: https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg07076.html
 v4: https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg05221.html

This series aims to provide a better designed replacement for the
savevm/loadvm/delvm HMP commands, which despite their flaws continue
to be actively used in the QMP world via the HMP command passthrough
facility.

The main problems addressed are:

 - The logic to pick which disk to store the vmstate in is not
   satsifactory.

   The first block driver state cannot be assumed to be the root disk
   image, it might be OVMF varstore and we don't want to store vmstate
   in there.

 - The logic to decide which disks must be snapshotted is hardwired
   to all disks which are writable

   Again with OVMF there might be a writable varstore, but this can be
   raw rather than qcow2 format, and thus unable to be snapshotted.
   While users might wish to snapshot their varstore, in some/many/most
   cases it is entirely uneccessary. Users are blocked from snapshotting
   their VM though due to this varstore.

 - The commands are synchronous blocking execution and returning
   errors immediately.

   This is partially addressed by integrating with the job framework.
   This forces the client to use the async commands to determine
   the completion status or error message from the operations.

In the block code I've only dealt with node names for block devices, as
IIUC, this is all that libvirt should need in the -blockdev world it now
lives in. IOW, I've made not attempt to cope with people wanting to use
these QMP commands in combination with -drive args, as libvirt will
never use -drive with a QEMU new enough to have these new commands.

The main limitations of this current impl

 - The snapshot process runs serialized in the main thread. ie QEMU
   guest execution is blocked for the duration. The job framework
   lets us fix this in future without changing the QMP semantics
   exposed to the apps.

 - Most vmstate loading errors just go to stderr, as they are not
   using Error **errp reporting. Thus the job framework just
   reports a fairly generic message

     "Error -22 while loading VM state"

   Again this can be fixed later without changing the QMP semantics
   exposed to apps.

I've done some minimal work in libvirt to start to make use of the new
commands to validate their functionality, but this isn't finished yet.

My ultimate goal is to make the GNOME Boxes maintainer happy again by
having internal snapshots work with OVMF:

  https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e=
f45c5f64048f16a6e

Changed in v5:

 - Fix prevention of tag overwriting
 - Refactor and expand test suite coverage to validate
   more negative scenarios

Changed in v4:

 - Make the device lists mandatory, dropping all support for
   QEMU's built-in heuristics to select devices.

 - Improve some error reporting and I/O test coverage

Changed in v3:

 - Schedule a bottom half to escape from coroutine context in
   the jobs. This is needed because the locking in the snapshot
   code goes horribly wrong when run from a background coroutine
   instead of the main event thread.

 - Re-factor way we iterate over devices, so that we correctly
   report non-existant devices passed by the user over QMP.

 - Add QAPI docs notes about limitations wrt vmstate error
   reporting (it all goes to stderr not an Error **errp)
   so QMP only gets a fairly generic error message currently.

 - Add I/O test to validate many usage scenarios / errors

 - Add I/O test helpers to handle QMP events with a deterministic
   ordering

 - Ensure 'delete-snapshot' reports an error if requesting
   delete from devices that don't support snapshot, instead of
   silently succeeding with no erro.

Changed in v2:

 - Use new command names "snapshot-{load,save,delete}" to make it
   clear that these are different from the "savevm|loadvm|delvm"
   as they use the Job framework

 - Use an include list for block devs, not an exclude list

Daniel P. Berrang=C3=A9 (10):
  block: push error reporting into bdrv_all_*_snapshot functions
  migration: stop returning errno from load_snapshot()
  block: add ability to specify list of blockdevs during snapshot
  block: allow specifying name of block device for vmstate storage
  block: rename and alter bdrv_all_find_snapshot semantics
  migration: control whether snapshots are ovewritten
  migration: wire up support for snapshot device selection
  migration: introduce a delete_snapshot wrapper
  iotests: add support for capturing and matching QMP events
  migration: introduce snapshot-{save,load,delete} QMP commands

 block/monitor/block-hmp-cmds.c |   7 +-
 block/snapshot.c               | 256 +++++++++++++++------
 include/block/snapshot.h       |  23 +-
 include/migration/snapshot.h   |  14 +-
 migration/savevm.c             | 282 +++++++++++++++++++----
 monitor/hmp-cmds.c             |  12 +-
 qapi/job.json                  |   9 +-
 qapi/migration.json            | 120 ++++++++++
 replay/replay-snapshot.c       |   5 +-
 softmmu/vl.c                   |   2 +-
 tests/qemu-iotests/267.out     |  12 +-
 tests/qemu-iotests/310         | 385 +++++++++++++++++++++++++++++++
 tests/qemu-iotests/310.out     | 407 +++++++++++++++++++++++++++++++++
 tests/qemu-iotests/common.qemu | 107 ++++++++-
 tests/qemu-iotests/group       |   1 +
 15 files changed, 1500 insertions(+), 142 deletions(-)
 create mode 100755 tests/qemu-iotests/310
 create mode 100644 tests/qemu-iotests/310.out

--=20
2.26.2

Comments

Daniel P. Berrangé Oct. 5, 2020, 11:36 a.m. UTC | #1
On Mon, Oct 05, 2020 at 09:26:54AM +0200, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
> > On 10/2/20 11:27 AM, Daniel P. Berrangé wrote:
> >
> > Do we have a query- command handy to easily learn which snapshot names
> > are even available to attempt deletion on?  If not, that's worth a
> > separate patch.
> 
> Oh, I missed that one.  It's the QMP equivalent to "info snapshots", and
> it is required to finish the job.  Since we're at v5, I'd be okay with a
> follow-up patch, as long as it is done for 5.2.

"query-named-block-nodes" returns BlockDeviceInfo struct, which
contains ImageInfo which contains an array of SnapshotInfo. So
we don't need any new query command.


Regards,
Daniel
Markus Armbruster Oct. 5, 2020, 12:45 p.m. UTC | #2
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Oct 05, 2020 at 09:26:54AM +0200, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>> 
>> > On 10/2/20 11:27 AM, Daniel P. Berrangé wrote:
>> >
>> > Do we have a query- command handy to easily learn which snapshot names
>> > are even available to attempt deletion on?  If not, that's worth a
>> > separate patch.
>> 
>> Oh, I missed that one.  It's the QMP equivalent to "info snapshots", and
>> it is required to finish the job.  Since we're at v5, I'd be okay with a
>> follow-up patch, as long as it is done for 5.2.
>
> "query-named-block-nodes" returns BlockDeviceInfo struct, which
> contains ImageInfo which contains an array of SnapshotInfo. So
> we don't need any new query command.

My Acked-by stands without a new query then.
Dr. David Alan Gilbert Oct. 6, 2020, 5:36 p.m. UTC | #3
* Eric Blake (eblake@redhat.com) wrote:
> On 10/2/20 11:27 AM, Daniel P. Berrangé wrote:
> > savevm, loadvm and delvm are some of the few HMP commands that have never
> > been converted to use QMP. The reasons for the lack of conversion are
> > that they blocked execution of the event thread, and the semantics
> > around choice of disks were ill-defined.
> > 
> > Despite this downside, however, libvirt and applications using libvirt
> > have used these commands for as long as QMP has existed, via the
> > "human-monitor-command" passthrough command. IOW, while it is clearly
> > desirable to be able to fix the problems, they are not a blocker to
> > all real world usage.
> > 
> > Meanwhile there is a need for other features which involve adding new
> > parameters to the commands. This is possible with HMP passthrough, but
> > it provides no reliable way for apps to introspect features, so using
> > QAPI modelling is highly desirable.
> > 
> > This patch thus introduces new snapshot-{load,save,delete} commands to
> > QMP that are intended to replace the old HMP counterparts. The new
> > commands are given different names, because they will be using the new
> > QEMU job framework and thus will have diverging behaviour from the HMP
> > originals. It would thus be misleading to keep the same name.
> > 
> > While this design uses the generic job framework, the current impl is
> > still blocking. The intention that the blocking problem is fixed later.
> > None the less applications using these new commands should assume that
> > they are asynchronous and thus wait for the job status change event to
> > indicate completion.
> > 
> > In addition to using the job framework, the new commands require the
> > caller to be explicit about all the block device nodes used in the
> > snapshot operations, with no built-in default heuristics in use.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> 
> > +++ b/qapi/job.json
> > @@ -22,10 +22,17 @@
> >  #
> >  # @amend: image options amend job type, see "x-blockdev-amend" (since 5.1)
> >  #
> > +# @snapshot-load: snapshot load job type, see "snapshot-load" (since 5.2)
> > +#
> > +# @snapshot-save: snapshot save job type, see "snapshot-save" (since 5.2)
> > +#
> > +# @snapshot-delete: snapshot delete job type, see "snapshot-delete" (since 5.2)
> > +#
> >  # Since: 1.7
> >  ##
> >  { 'enum': 'JobType',
> > -  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend'] }
> > +  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend',
> > +           'snapshot-load', 'snapshot-save', 'snapshot-delete'] }
> >  
> >  ##
> >  # @JobStatus:
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 7f5e6fd681..d2bd551ad9 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1787,3 +1787,123 @@
> >  # Since: 5.2
> >  ##
> >  { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
> > +
> > +##
> > +# @snapshot-save:
> > +#
> > +# Save a VM snapshot
> > +#
> > +# @job-id: identifier for the newly created job
> > +# @tag: name of the snapshot to create
> > +# @devices: list of block device node names to save a snapshot to
> > +# @vmstate: block device node name to save vmstate to
> 
> Here, you document vmstate last,...
> 
> > +#
> > +# Applications should not assume that the snapshot save is complete
> > +# when this command returns. The job commands / events must be used
> > +# to determine completion and to fetch details of any errors that arise.
> > +#
> > +# Note that the VM CPUs will be paused during the time it takes to
> > +# save the snapshot
> 
> "will be", or "may be"?  As you stated above, we may be able to lift the
> synchronous limitations down the road, while still maintaining the
> present interface of using this command to start the job and waiting on
> the job id until it is finished, at which point the CPUs might not need
> to be paused as much.
> 
> > +#
> > +# It is strongly recommended that @devices contain all writable
> > +# block device nodes if a consistent snapshot is required.
> > +#
> > +# If @tag already exists, an error will be reported
> > +#
> > +# Returns: nothing
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "snapshot-save",
> > +#      "data": {
> > +#         "job-id": "snapsave0",
> > +#         "tag": "my-snap",
> > +#         "vmstate": "disk0",
> > +#         "devices": ["disk0", "disk1"]
> 
> ...here vmstate occurs before devices.  I don't know if our doc
> generator cares about inconsistent ordering.
> 
> > +#      }
> > +#    }
> > +# <- { "return": { } }
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'command': 'snapshot-save',
> > +  'data': { 'job-id': 'str',
> > +            'tag': 'str',
> > +            'vmstate': 'str',
> > +            'devices': ['str'] } }
> > +
> > +##
> > +# @snapshot-load:
> > +#
> > +# Load a VM snapshot
> > +#
> > +# @job-id: identifier for the newly created job
> > +# @tag: name of the snapshot to load.
> > +# @devices: list of block device node names to load a snapshot from
> > +# @vmstate: block device node name to load vmstate from
> > +#
> > +# Applications should not assume that the snapshot save is complete
> > +# when this command returns. The job commands / events must be used
> > +# to determine completion and to fetch details of any errors that arise.
> 
> s/save/load/
> 
> > +#
> > +# Note that the VM CPUs will be paused during the time it takes to
> > +# save the snapshot
> 
> s/save/load/
> 
> But while pausing CPUs during save is annoying, pausing CPUs during
> restore makes sense (after all, executing on stale data that will still
> be updated during the restore is just wasted execution).

Note that there are other snapshotting schemes that can do this more
dynamically and page/load the state on demand - a rapid resume from
snapshot like that is quite attractive.

Dave

> 
> > +#
> > +# It is strongly recommended that @devices contain all writable
> > +# block device nodes that can have changed since the original
> > +# @snapshot-save command execution.
> > +#
> > +# Returns: nothing
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "snapshot-load",
> > +#      "data": {
> > +#         "job-id": "snapload0",
> > +#         "tag": "my-snap",
> > +#         "vmstate": "disk0",
> > +#         "devices": ["disk0", "disk1"]
> > +#      }
> > +#    }
> > +# <- { "return": { } }
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'command': 'snapshot-load',
> > +  'data': { 'job-id': 'str',
> > +            'tag': 'str',
> > +            'vmstate': 'str',
> > +            'devices': ['str'] } }
> > +
> > +##
> > +# @snapshot-delete:
> > +#
> > +# Delete a VM snapshot
> > +#
> > +# @job-id: identifier for the newly created job
> > +# @tag: name of the snapshot to delete.
> > +# @devices: list of block device node names to delete a snapshot from
> > +#
> > +# Applications should not assume that the snapshot save is complete
> > +# when this command returns. The job commands / events must be used
> > +# to determine completion and to fetch details of any errors that arise.
> 
> Do we have a query- command handy to easily learn which snapshot names
> are even available to attempt deletion on?  If not, that's worth a
> separate patch.
> 
> > +#
> > +# Returns: nothing
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "snapshot-delete",
> > +#      "data": {
> > +#         "job-id": "snapdelete0",
> > +#         "tag": "my-snap",
> > +#         "devices": ["disk0", "disk1"]
> > +#      }
> > +#    }
> > +# <- { "return": { } }
> > +#
> > +# Since: 5.2
> > +##
> 
> > +++ b/tests/qemu-iotests/group
> > @@ -291,6 +291,7 @@
> >  277 rw quick
> >  279 rw backing quick
> >  280 rw migration quick
> > +310 rw quick
> >  281 rw quick
> >  282 rw img quick
> >  283 auto quick
> 
> What's wrong with sorted order? I get the renumbering to appease a merge
> conflict, but it also requires rearrangement ;)
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
Markus Armbruster Nov. 25, 2020, 10:13 a.m. UTC | #4
Didn't make 5.2.  Pity.  Try again for 6.0, please!