mbox series

[v7,0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

Message ID cover.1596528468.git.lukasstraub2@web.de
Headers show
Series Introduce 'yank' oob qmp command to recover from hanging qemu | expand

Message

Lukas Straub Aug. 4, 2020, 8:11 a.m. UTC
Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

v7:
 -yank_register_instance now returns error via Error **errp instead of aborting
 -dropped "chardev/char.c: Check for duplicate id before  creating chardev"

v6:
 -add Reviewed-by and Acked-by tags
 -rebase on master
 -lots of changes in nbd due to rebase
 -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
 -fix a crash discovered by the newly added chardev test
 -fix the test itself

v5:
 -move yank.c to util/
 -move yank.h to include/qemu/
 -add license to yank.h
 -use const char*
 -nbd: use atomic_store_release and atomic_load_aqcuire
 -io-channel: ensure thread-safety and document it
 -add myself as maintainer for yank

v4:
 -fix build errors...

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances

Lukas Straub (8):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  io: Document thread-safety of qio_channel_shutdown
  MAINTAINERS: Add myself as maintainer for yank feature
  tests/test-char.c: Wait for the chardev to connect in
    char_socket_client_dupid_test

 MAINTAINERS                   |   6 ++
 block/nbd.c                   | 129 +++++++++++++++---------
 chardev/char-socket.c         |  31 ++++++
 include/io/channel.h          |   2 +
 include/qemu/yank.h           |  80 +++++++++++++++
 io/channel-tls.c              |   6 +-
 migration/channel.c           |  12 +++
 migration/migration.c         |  25 ++++-
 migration/multifd.c           |  10 ++
 migration/qemu-file-channel.c |   6 ++
 migration/savevm.c            |   6 ++
 qapi/misc.json                |  45 +++++++++
 tests/Makefile.include        |   2 +-
 tests/test-char.c             |   1 +
 util/Makefile.objs            |   1 +
 util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
 16 files changed, 493 insertions(+), 53 deletions(-)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

--
2.20.1

Comments

Lukas Straub Sept. 4, 2020, 12:33 p.m. UTC | #1
On Thu, 27 Aug 2020 14:37:00 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> I apologize for not reviewing this much earlier.
> 
> Lukas Straub <lukasstraub2@web.de> writes:
> 
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> >
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> ...
> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 9d32820dc1..0d6a8f20b7 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -1615,3 +1615,48 @@
> >  ##
> >  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
> >
> > +##
> > +# @YankInstances:
> > +#
> > +# @instances: List of yank instances.
> > +#
> > +# Yank instances are named after the following schema:
> > +# "blockdev:<node-name>", "chardev:<chardev-name>" and "migration"
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }  
> 
> I'm afraid this is a problematic QMP interface.
> 
> By making YankInstances a struct, you keep the door open to adding more
> members, which is good.
> 
> But by making its 'instances' member a ['str'], you close the door to
> using anything but a single string for the individual instances.  Not so
> good.
> 
> The single string encodes information which QMP client will need to
> parse from the string.  We frown on that in QMP.  Use QAPI complex types
> capabilities for structured data.
> 
> Could you use something like this instead?
> 
> { 'enum': 'YankInstanceType',
>   'data': { 'block-node', 'chardev', 'migration' } }
> 
> { 'struct': 'YankInstanceBlockNode',
>   'data': { 'node-name': 'str' } }
> 
> { 'struct': 'YankInstanceChardev',
>   'data' { 'label': 'str' } }
> 
> { 'union': 'YankInstance',
>   'base': { 'type': 'YankInstanceType' },
>   'discriminator': 'type',
>   'data': {
>       'block-node': 'YankInstanceBlockNode',
>       'chardev': 'YankInstanceChardev' } }
> 
> { 'command': 'yank',
>   'data': { 'instances': ['YankInstance'] },
>   'allow-oob': true }

This proposal looks good to me. Does everyone agree?

Regards,
Lukas Straub

> If you're confident nothing will ever be added to YankInstanceBlockNode
> and YankInstanceChardev, you could use str instead.
> 
> > +
> > +##
> > +# @yank:
> > +#
> > +# Recover from hanging qemu by yanking the specified instances.  
> 
> What's an "instance", and what does it mean to "yank" it?
> 
> The documentation of YankInstances above gives a clue on what an
> "instance" is: presumably a block node, a character device or the
> migration job.
> 
> I guess a YankInstance is whatever the code chooses to make one, and the
> current code makes these three kinds.
> 
> Does it make every block node a YankInstance?  If not, which ones?
> 
> Does it make every character device a YankInstance?  If not, which ones?
> 
> Does it make migration always a YankInstance?  If not, when?
> 
> > +#
> > +# Takes @YankInstances as argument.
> > +#
> > +# Returns: nothing.
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
> > +# <- { "return": {} }
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
> > +
> > +##
> > +# @query-yank:
> > +#
> > +# Query yank instances.
> > +#
> > +# Returns: @YankInstances
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "query-yank" }
> > +# <- { "return": { "instances": ["blockdev:nbd0"] } }
> > +#
> > +# Since: 5.1
> > +##
> > +{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
> ...