diff mbox series

[v8,1/8] Introduce yank feature

Message ID ab5c04766f270d53e90f17f76c0af7e5b66f8623.1598951375.git.lukasstraub2@web.de
State New
Headers show
Series Introduce 'yank' oob qmp command to recover from hanging qemu | expand

Commit Message

Lukas Straub Sept. 1, 2020, 9:15 a.m. UTC
The yank feature allows to recover from hanging qemu by "yanking"
at various parts. Other qemu systems can register themselves and
multiple yank functions. Then all yank functions for selected
instances can be called by the 'yank' out-of-band qmp command.
Available instances can be queried by a 'query-yank' oob command.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/yank.h |  81 +++++++++++++++++++
 qapi/misc.json      |  62 +++++++++++++++
 util/meson.build    |   1 +
 util/yank.c         | 187 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 331 insertions(+)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

--
2.20.1

Comments

Daniel P. Berrangé Sept. 1, 2020, 9:45 a.m. UTC | #1
On Tue, Sep 01, 2020 at 11:15:07AM +0200, Lukas Straub wrote:
> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
> 
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/qemu/yank.h |  81 +++++++++++++++++++
>  qapi/misc.json      |  62 +++++++++++++++
>  util/meson.build    |   1 +
>  util/yank.c         | 187 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 331 insertions(+)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
Markus Armbruster Sept. 1, 2020, 11:30 a.m. UTC | #2
Lukas Straub <lukasstraub2@web.de> writes:

> The yank feature allows to recover from hanging qemu by "yanking"
> at various parts. Other qemu systems can register themselves and
> multiple yank functions. Then all yank functions for selected
> instances can be called by the 'yank' out-of-band qmp command.
> Available instances can be queried by a 'query-yank' oob command.
>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
[...]
> diff --git a/qapi/misc.json b/qapi/misc.json
> index 9d32820dc1..7de330416a 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -1615,3 +1615,65 @@
>  ##
>  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
>
> +##
> +# @YankInstances:
> +#
> +# @instances: List of yank instances.
> +#
> +# A yank instance can be yanked with the "yank" qmp command to recover from a
> +# hanging qemu.
> +#
> +# Yank instances are named after the following schema:
> +# "blockdev:<node-name>" refers to a block device. Currently only nbd block
> +# devices are implemented.
> +# "chardev:<chardev-name>" refers to a chardev. Currently only socket chardevs
> +# are implemented.

The two "Currently only ... are implemented" are redundant with ...

> +# "migration" refers to the migration currently in progress.
> +#
> +# Currently implemented yank instances:
> +#  -nbd block device:
> +#   Yanking it will shutdown the connection to the nbd server without
> +#   attempting to reconnect.
> +#  -socket chardev:
> +#   Yanking it will shutdown the connected socket.
> +#  -migration:
> +#   Yanking it will shutdown all migration connections.

... this list.  Not a blocker, but if you have to respin for some other
reason, consider deleting them.

> +#
> +# Since: 5.2
> +##
> +{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }
> +
> +##
> +# @yank:
> +#
> +# Recover from hanging qemu by yanking the specified instances. See
> +# "YankInstances" for more information.
> +#
> +# Takes @YankInstances as argument.
> +#
> +# Returns: nothing.
> +#
> +# Example:
> +#
> +# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
> +# <- { "return": {} }
> +#
> +# Since: 5.2
> +##
> +{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
> +
> +##
> +# @query-yank:
> +#
> +# Query yank instances. See "YankInstances" for more information.
> +#
> +# Returns: @YankInstances
> +#
> +# Example:
> +#
> +# -> { "execute": "query-yank" }
> +# <- { "return": { "instances": ["blockdev:nbd0"] } }
> +#
> +# Since: 5.2
> +##
> +{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }

You addressed all my review comments nicely, except for the one
questioning the wisdom of encoding structured data into the instance
name.  Daniel and you pointed me to the spot where this was discussed
previously, in review of your v1.  I read the thread, and replied to it.
I'd like us to put this series on hold to give us time to discuss and
decide.

Thanks!

[...]
Daniel P. Berrangé Sept. 1, 2020, 3:22 p.m. UTC | #3
On Tue, Sep 01, 2020 at 04:38:46PM +0200, Markus Armbruster wrote:
> One more question...
> 
> Lukas Straub <lukasstraub2@web.de> writes:
> 
> > The yank feature allows to recover from hanging qemu by "yanking"
> > at various parts. Other qemu systems can register themselves and
> > multiple yank functions. Then all yank functions for selected
> > instances can be called by the 'yank' out-of-band qmp command.
> > Available instances can be queried by a 'query-yank' oob command.
> >
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> [...]
> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 9d32820dc1..7de330416a 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -1615,3 +1615,65 @@
> >  ##
> >  { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
> >
> > +##
> > +# @YankInstances:
> > +#
> > +# @instances: List of yank instances.
> > +#
> > +# A yank instance can be yanked with the "yank" qmp command to recover from a
> > +# hanging qemu.
> > +#
> > +# Yank instances are named after the following schema:
> > +# "blockdev:<node-name>" refers to a block device. Currently only nbd block
> > +# devices are implemented.
> > +# "chardev:<chardev-name>" refers to a chardev. Currently only socket chardevs
> > +# are implemented.
> > +# "migration" refers to the migration currently in progress.
> > +#
> > +# Currently implemented yank instances:
> > +#  -nbd block device:
> > +#   Yanking it will shutdown the connection to the nbd server without
> > +#   attempting to reconnect.
> > +#  -socket chardev:
> > +#   Yanking it will shutdown the connected socket.
> > +#  -migration:
> > +#   Yanking it will shutdown all migration connections.
> 
> How is yanking migration related to command migrate_cancel?

migrate_cancel will do a shutdown() on the primary migration socket only.
In addition it will toggle the migration state.

Yanking will do a shutdown on all migration sockets (important for
multifd), but won't touch migration state or any other aspect of QEMU
code.

Overall yanking has less potential for things to go wrong than the
migrate_cancel method, as it doesn't try to do any kind of cleanup
or migration.


Regards,
Daniel
diff mbox series

Patch

diff --git a/include/qemu/yank.h b/include/qemu/yank.h
new file mode 100644
index 0000000000..c5ab53965a
--- /dev/null
+++ b/include/qemu/yank.h
@@ -0,0 +1,81 @@ 
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef YANK_H
+#define YANK_H
+
+typedef void (YankFn)(void *opaque);
+
+/**
+ * yank_register_instance: Register a new instance.
+ *
+ * This registers a new instance for yanking. Must be called before any yank
+ * function is registered for this instance.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The globally unique name of the instance.
+ * @errp: Error object.
+ */
+void yank_register_instance(const char *instance_name, Error **errp);
+
+/**
+ * yank_unregister_instance: Unregister a instance.
+ *
+ * This unregisters a instance. Must be called only after every yank function
+ * of the instance has been unregistered.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance.
+ */
+void yank_unregister_instance(const char *instance_name);
+
+/**
+ * yank_register_function: Register a yank function
+ *
+ * This registers a yank function. All limitations of qmp oob commands apply
+ * to the yank function as well. See docs/devel/qapi-code-gen.txt under
+ * "An OOB-capable command handler must satisfy the following conditions".
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance
+ * @func: The yank function
+ * @opaque: Will be passed to the yank function
+ */
+void yank_register_function(const char *instance_name,
+                            YankFn *func,
+                            void *opaque);
+
+/**
+ * yank_unregister_function: Unregister a yank function
+ *
+ * This unregisters a yank function.
+ *
+ * This function is thread-safe.
+ *
+ * @instance_name: The name of the instance
+ * @func: func that was passed to yank_register_function
+ * @opaque: opaque that was passed to yank_register_function
+ */
+void yank_unregister_function(const char *instance_name,
+                              YankFn *func,
+                              void *opaque);
+
+/**
+ * yank_generic_iochannel: Generic yank function for iochannel
+ *
+ * This is a generic yank function which will call qio_channel_shutdown on the
+ * provided QIOChannel.
+ *
+ * @opaque: QIOChannel to shutdown
+ */
+void yank_generic_iochannel(void *opaque);
+#endif
diff --git a/qapi/misc.json b/qapi/misc.json
index 9d32820dc1..7de330416a 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1615,3 +1615,65 @@ 
 ##
 { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }

+##
+# @YankInstances:
+#
+# @instances: List of yank instances.
+#
+# A yank instance can be yanked with the "yank" qmp command to recover from a
+# hanging qemu.
+#
+# Yank instances are named after the following schema:
+# "blockdev:<node-name>" refers to a block device. Currently only nbd block
+# devices are implemented.
+# "chardev:<chardev-name>" refers to a chardev. Currently only socket chardevs
+# are implemented.
+# "migration" refers to the migration currently in progress.
+#
+# Currently implemented yank instances:
+#  -nbd block device:
+#   Yanking it will shutdown the connection to the nbd server without
+#   attempting to reconnect.
+#  -socket chardev:
+#   Yanking it will shutdown the connected socket.
+#  -migration:
+#   Yanking it will shutdown all migration connections.
+#
+# Since: 5.2
+##
+{ 'struct': 'YankInstances', 'data': {'instances': ['str'] } }
+
+##
+# @yank:
+#
+# Recover from hanging qemu by yanking the specified instances. See
+# "YankInstances" for more information.
+#
+# Takes @YankInstances as argument.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "yank", "arguments": { "instances": ["blockdev:nbd0"] } }
+# <- { "return": {} }
+#
+# Since: 5.2
+##
+{ 'command': 'yank', 'data': 'YankInstances', 'allow-oob': true }
+
+##
+# @query-yank:
+#
+# Query yank instances. See "YankInstances" for more information.
+#
+# Returns: @YankInstances
+#
+# Example:
+#
+# -> { "execute": "query-yank" }
+# <- { "return": { "instances": ["blockdev:nbd0"] } }
+#
+# Since: 5.2
+##
+{ 'command': 'query-yank', 'returns': 'YankInstances', 'allow-oob': true }
diff --git a/util/meson.build b/util/meson.build
index e6b207a99e..f3989a1869 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -50,6 +50,7 @@  endif

 if have_system
   util_ss.add(when: 'CONFIG_GIO', if_true: [files('dbus.c'), gio])
+  util_ss.add(files('yank.c'))
 endif

 if have_block
diff --git a/util/yank.c b/util/yank.c
new file mode 100644
index 0000000000..f63bfdca50
--- /dev/null
+++ b/util/yank.c
@@ -0,0 +1,187 @@ 
+/*
+ * QEMU yank feature
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/thread.h"
+#include "qemu/queue.h"
+#include "qapi/qapi-commands-misc.h"
+#include "io/channel.h"
+#include "qemu/yank.h"
+
+struct YankFuncAndParam {
+    YankFn *func;
+    void *opaque;
+    QLIST_ENTRY(YankFuncAndParam) next;
+};
+
+struct YankInstance {
+    char *name;
+    QLIST_HEAD(, YankFuncAndParam) yankfns;
+    QLIST_ENTRY(YankInstance) next;
+};
+
+/*
+ * This lock protects the yank_instance_list below.
+ */
+static QemuMutex yank_lock;
+
+static QLIST_HEAD(, YankInstance) yank_instance_list
+    = QLIST_HEAD_INITIALIZER(yank_instance_list);
+
+static struct YankInstance *yank_find_instance(const char *name)
+{
+    struct YankInstance *instance;
+
+    QLIST_FOREACH(instance, &yank_instance_list, next) {
+        if (!strcmp(instance->name, name)) {
+            return instance;
+        }
+    }
+    return NULL;
+}
+
+void yank_register_instance(const char *instance_name, Error **errp)
+{
+    struct YankInstance *instance;
+
+    qemu_mutex_lock(&yank_lock);
+
+    if (yank_find_instance(instance_name)) {
+        error_setg(errp, "duplicate yank instance name: '%s'",
+                   instance_name);
+        qemu_mutex_unlock(&yank_lock);
+        return;
+    }
+
+    instance = g_slice_new(struct YankInstance);
+    instance->name = g_strdup(instance_name);
+    QLIST_INIT(&instance->yankfns);
+    QLIST_INSERT_HEAD(&yank_instance_list, instance, next);
+
+    qemu_mutex_unlock(&yank_lock);
+}
+
+void yank_unregister_instance(const char *instance_name)
+{
+    struct YankInstance *instance;
+
+    qemu_mutex_lock(&yank_lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    assert(QLIST_EMPTY(&instance->yankfns));
+    QLIST_REMOVE(instance, next);
+    g_free(instance->name);
+    g_slice_free(struct YankInstance, instance);
+
+    qemu_mutex_unlock(&yank_lock);
+}
+
+void yank_register_function(const char *instance_name,
+                            YankFn *func,
+                            void *opaque)
+{
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&yank_lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    entry = g_slice_new(struct YankFuncAndParam);
+    entry->func = func;
+    entry->opaque = opaque;
+
+    QLIST_INSERT_HEAD(&instance->yankfns, entry, next);
+    qemu_mutex_unlock(&yank_lock);
+}
+
+void yank_unregister_function(const char *instance_name,
+                              YankFn *func,
+                              void *opaque)
+{
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&yank_lock);
+    instance = yank_find_instance(instance_name);
+    assert(instance);
+
+    QLIST_FOREACH(entry, &instance->yankfns, next) {
+        if (entry->func == func && entry->opaque == opaque) {
+            QLIST_REMOVE(entry, next);
+            g_slice_free(struct YankFuncAndParam, entry);
+            qemu_mutex_unlock(&yank_lock);
+            return;
+        }
+    }
+
+    abort();
+}
+
+void yank_generic_iochannel(void *opaque)
+{
+    QIOChannel *ioc = QIO_CHANNEL(opaque);
+
+    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
+void qmp_yank(strList *instances,
+              Error **errp)
+{
+    strList *tail;
+    struct YankInstance *instance;
+    struct YankFuncAndParam *entry;
+
+    qemu_mutex_lock(&yank_lock);
+    for (tail = instances; tail; tail = tail->next) {
+        instance = yank_find_instance(tail->value);
+        if (!instance) {
+            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+                      "Instance '%s' not found", tail->value);
+            qemu_mutex_unlock(&yank_lock);
+            return;
+        }
+    }
+    for (tail = instances; tail; tail = tail->next) {
+        instance = yank_find_instance(tail->value);
+        assert(instance);
+        QLIST_FOREACH(entry, &instance->yankfns, next) {
+            entry->func(entry->opaque);
+        }
+    }
+    qemu_mutex_unlock(&yank_lock);
+}
+
+YankInstances *qmp_query_yank(Error **errp)
+{
+    struct YankInstance *instance;
+    YankInstances *ret;
+
+    ret = g_new0(YankInstances, 1);
+    ret->instances = NULL;
+
+    qemu_mutex_lock(&yank_lock);
+    QLIST_FOREACH(instance, &yank_instance_list, next) {
+        strList *entry;
+        entry = g_new0(strList, 1);
+        entry->value = g_strdup(instance->name);
+        entry->next = ret->instances;
+        ret->instances = entry;
+    }
+    qemu_mutex_unlock(&yank_lock);
+
+    return ret;
+}
+
+static void __attribute__((__constructor__)) yank_init(void)
+{
+    qemu_mutex_init(&yank_lock);
+}