diff mbox series

[2/3] nbd: Add new qemu:allocation-depth metacontext

Message ID 20200925203249.155705-3-eblake@redhat.com
State New
Headers show
Series Exposing backing-chain allocation over NBD | expand

Commit Message

Eric Blake Sept. 25, 2020, 8:32 p.m. UTC
'qemu-img map' provides a way to determine which extents of an image
come from the top layer vs. inherited from a backing chain.  This is
useful information worth exposing over NBD.  There is a proposal to
add a QMP command block-dirty-bitmap-populate which can create a dirty
bitmap that reflects allocation information, at which point
qemu:dirty-bitmap:NAME can expose that information via the creation of
a temporary bitmap, but we can shorten the effort by adding a new
qemu:allocation-depth context that does the same thing without an
intermediate bitmap (this patch does not eliminate the need for that
proposal, as it will have other uses as well).

For this patch, I just encoded a tri-state value (unallocated, from
this layer, from any of the backing layers); we could instead or in
addition report an actual depth count per extent, if that proves more
useful.

Note that this patch does not actually enable any way to request a
server to enable this context; that will come in the next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/interop/nbd.txt |  22 +++++++--
 qapi/block-core.json |   6 ++-
 include/block/nbd.h  |   8 +++-
 nbd/server.c         | 110 +++++++++++++++++++++++++++++++++++++++----
 4 files changed, 132 insertions(+), 14 deletions(-)

Comments

Richard W.M. Jones Sept. 26, 2020, 7:33 a.m. UTC | #1
On Fri, Sep 25, 2020 at 03:32:48PM -0500, Eric Blake wrote:
> +The second is related to exposing the source of various extents within
> +the image, with a single context named:
> +
> +    qemu:allocation-depth
> +
> +In the allocation depth context, bits 0 and 1 form a tri-state value:
> +
> +    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated
> +    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image
> +    bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
> +               backing layer
Vladimir Sementsov-Ogievskiy Sept. 26, 2020, 1:15 p.m. UTC | #2
25.09.2020 23:32, Eric Blake wrote:
> 'qemu-img map' provides a way to determine which extents of an image
> come from the top layer vs. inherited from a backing chain.  This is
> useful information worth exposing over NBD.  There is a proposal to
> add a QMP command block-dirty-bitmap-populate which can create a dirty
> bitmap that reflects allocation information, at which point
> qemu:dirty-bitmap:NAME can expose that information via the creation of
> a temporary bitmap, but we can shorten the effort by adding a new
> qemu:allocation-depth context that does the same thing without an
> intermediate bitmap (this patch does not eliminate the need for that
> proposal, as it will have other uses as well).
> 
> For this patch, I just encoded a tri-state value (unallocated, from
> this layer, from any of the backing layers); we could instead or in
> addition report an actual depth count per extent, if that proves more
> useful.
> 
> Note that this patch does not actually enable any way to request a
> server to enable this context; that will come in the next patch.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Looks good to me overall, need to rebase if patch 01 changed (as I propose or in some better way).
Vladimir Sementsov-Ogievskiy Sept. 26, 2020, 1:19 p.m. UTC | #3
26.09.2020 10:33, Richard W.M. Jones wrote:
> On Fri, Sep 25, 2020 at 03:32:48PM -0500, Eric Blake wrote:

>> +The second is related to exposing the source of various extents within

>> +the image, with a single context named:

>> +

>> +    qemu:allocation-depth

>> +

>> +In the allocation depth context, bits 0 and 1 form a tri-state value:

>> +

>> +    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated

>> +    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image

>> +    bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a

>> +               backing layer

> 

>  From the cover description I imagined it would show the actual depth, ie:

> 

>           top -> backing -> backing -> backing

>   depth:   1        2         3   ....          (0 = unallocated)

> 

> I wonder if that is possible?  (Perhaps there's something I don't

> understand here.)

> 

That's possible if we want it. Probably the best way is to add *depth parameter to bdrv_is_allocated_above (and better on top of my "[PATCH v7 0/5] fix & merge block_status_above and is_allocated_above")


-- 
Best regards,
Vladimir
Eric Blake Sept. 28, 2020, 2:33 p.m. UTC | #4
On 9/26/20 2:33 AM, Richard W.M. Jones wrote:
> On Fri, Sep 25, 2020 at 03:32:48PM -0500, Eric Blake wrote:

>> +The second is related to exposing the source of various extents within

>> +the image, with a single context named:

>> +

>> +    qemu:allocation-depth

>> +

>> +In the allocation depth context, bits 0 and 1 form a tri-state value:

>> +

>> +    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated

>> +    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image

>> +    bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a

>> +               backing layer

> 

>>From the cover description I imagined it would show the actual depth, ie:

> 

>           top -> backing -> backing -> backing

>   depth:   1        2         3   ....          (0 = unallocated)

> 

> I wonder if that is possible?  (Perhaps there's something I don't

> understand here.)


The real reason I don't want to do a straight depth number is that 
'qemu-img map' combined with x-dirty-bitmap is still a very convenient 
way to get at bits 0 and 1 (even if it requires decoding).  But if we 
plumb in a way for bdrv_get_status to return depth counts (rather than 
reimplementing the depth count ourselves), I would have no problem with 
returning a struct:

bits 31-4: the depth of the chain
bits 3-2: reserved (to make reading hex values easier...)
bits 1-0: tri-state of unalloc, local, or backing

where it would look like:

0x0000 -> unallocated
0x0011 -> depth 1, local
0x0022 -> depth 2, from the first backing layer
0x0032 -> depth 3, from the second backing layer
0x0042 ...

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
Richard W.M. Jones Sept. 28, 2020, 4:17 p.m. UTC | #5
On Mon, Sep 28, 2020 at 09:33:22AM -0500, Eric Blake wrote:
> On 9/26/20 2:33 AM, Richard W.M. Jones wrote:

> >On Fri, Sep 25, 2020 at 03:32:48PM -0500, Eric Blake wrote:

> >>+The second is related to exposing the source of various extents within

> >>+the image, with a single context named:

> >>+

> >>+    qemu:allocation-depth

> >>+

> >>+In the allocation depth context, bits 0 and 1 form a tri-state value:

> >>+

> >>+    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated

> >>+    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image

> >>+    bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a

> >>+               backing layer

> >

> >>From the cover description I imagined it would show the actual depth, ie:

> >

> >          top -> backing -> backing -> backing

> >  depth:   1        2         3   ....          (0 = unallocated)

> >

> >I wonder if that is possible?  (Perhaps there's something I don't

> >understand here.)

> 

> The real reason I don't want to do a straight depth number is that

> 'qemu-img map' combined with x-dirty-bitmap is still a very

> convenient way to get at bits 0 and 1 (even if it requires

> decoding).  But if we plumb in a way for bdrv_get_status to return

> depth counts (rather than reimplementing the depth count ourselves),

> I would have no problem with returning a struct:

> 

> bits 31-4: the depth of the chain

> bits 3-2: reserved (to make reading hex values easier...)

> bits 1-0: tri-state of unalloc, local, or backing

> 

> where it would look like:

> 

> 0x0000 -> unallocated

> 0x0011 -> depth 1, local

> 0x0022 -> depth 2, from the first backing layer

> 0x0032 -> depth 3, from the second backing layer

> 0x0042 ...


This looks nice too.  However I was only bikeshedding so if any of
this is hard to do then don't worry too much.

Would like to add support for this map to nbdinfo too :-)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
diff mbox series

Patch

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f3b3cacc9621..56efec7aee12 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -17,9 +17,9 @@  namespace "qemu".

 == "qemu" namespace ==

-The "qemu" namespace currently contains only one type of context,
-related to exposing the contents of a dirty bitmap alongside the
-associated disk contents.  That context has the following form:
+The "qemu" namespace currently contains two types of context.  The
+first is related to exposing the contents of a dirty bitmap alongside
+the associated disk contents.  That context has the following form:

     qemu:dirty-bitmap:<dirty-bitmap-export-name>

@@ -28,8 +28,21 @@  in reply for NBD_CMD_BLOCK_STATUS:

     bit 0: NBD_STATE_DIRTY, means that the extent is "dirty"

+The second is related to exposing the source of various extents within
+the image, with a single context named:
+
+    qemu:allocation-depth
+
+In the allocation depth context, bits 0 and 1 form a tri-state value:
+
+    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated
+    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image
+    bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
+               backing layer
+
 For NBD_OPT_LIST_META_CONTEXT the following queries are supported
-in addition to "qemu:dirty-bitmap:<dirty-bitmap-export-name>":
+in addition to the specific "qemu:allocation-depth" and
+"qemu:dirty-bitmap:<dirty-bitmap-export-name>":

 * "qemu:" - returns list of all available metadata contexts in the
             namespace.
@@ -55,3 +68,4 @@  the operation of that feature.
 NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
+* 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 3c16f1e11d6b..b3ec30a83cd7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5261,11 +5261,15 @@ 
 #          NBD client can use NBD_OPT_SET_META_CONTEXT with
 #          "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0)
 #
+# @alloc: Also export the allocation map for @device, so the NBD client
+#         can use NBD_OPT_SET_META_CONTEXT with "qemu:allocation-depth"
+#         to inspect allocation details. (since 5.2)
+#
 # Since: 5.0
 ##
 { 'struct': 'BlockExportNbd',
   'data': {'device': 'str', '*name': 'str', '*description': 'str',
-           '*writable': 'bool', '*bitmap': 'str' } }
+           '*writable': 'bool', '*bitmap': 'str', '*alloc': 'bool' } }

 ##
 # @nbd-server-add:
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 9bc3bfaeecf8..71a2623f7842 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@ 
 /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright (C) 2016-2020 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -257,6 +257,12 @@  enum {
 /* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_DIRTY (1 << 0)

+/* Extent flags for qemu:allocation-depth in NBD_REPLY_TYPE_BLOCK_STATUS */
+#define NBD_STATE_DEPTH_UNALLOC (0 << 0)
+#define NBD_STATE_DEPTH_LOCAL   (1 << 0)
+#define NBD_STATE_DEPTH_BACKING (2 << 0)
+#define NBD_STATE_DEPTH_MASK    (0x3)
+
 static inline bool nbd_reply_type_is_error(int type)
 {
     return type & (1 << 15);
diff --git a/nbd/server.c b/nbd/server.c
index 0d2d7e52058f..02d5fb375b24 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -25,7 +25,8 @@ 
 #include "qemu/units.h"

 #define NBD_META_ID_BASE_ALLOCATION 0
-#define NBD_META_ID_DIRTY_BITMAP 1
+#define NBD_META_ID_ALLOCATION_DEPTH 1
+#define NBD_META_ID_DIRTY_BITMAP 2

 /*
  * NBD_MAX_BLOCK_STATUS_EXTENTS: 1 MiB of extents data. An empirical
@@ -97,6 +98,7 @@  struct NBDExport {
     BlockBackend *eject_notifier_blk;
     Notifier eject_notifier;

+    bool alloc_context;
     BdrvDirtyBitmap *export_bitmap;
     char *export_bitmap_context;
 };
@@ -113,6 +115,7 @@  typedef struct NBDExportMetaContexts {
     bool valid; /* means that negotiation of the option finished without
                    errors */
     bool base_allocation; /* export base:allocation context (block status) */
+    bool allocation_depth; /* export qemu:allocation-depth */
     bool bitmap; /* export qemu:dirty-bitmap:<export bitmap name> */
 } NBDExportMetaContexts;

@@ -836,15 +839,24 @@  static void nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
 static void nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
                                 const char *ns, uint32_t len, Error **errp)
 {
-    if (!meta->exp->export_bitmap) {
-        trace_nbd_negotiate_meta_query_skip("no dirty-bitmap exported");
-    } else if (len == 0) {
+    if (len == 0) {
         if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-            meta->bitmap = true;
+            meta->allocation_depth = meta->exp->alloc_context;
+            meta->bitmap = !!meta->exp->export_bitmap;
         }
         trace_nbd_negotiate_meta_query_parse("empty");
+    } else if (strcmp(ns + 5, "allocation-depth") == 0) {
+        if (meta->exp->alloc_context) {
+            trace_nbd_negotiate_meta_query_parse("allocation-depth");
+            meta->allocation_depth = true;
+        } else {
+            trace_nbd_negotiate_meta_query_skip("allocation-depth not "
+                                                "available");
+        }
     } else if (!g_str_has_prefix(ns + 5, "dirty-bitmap:")) {
         trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
+    } else if (!meta->exp->export_bitmap) {
+        trace_nbd_negotiate_meta_query_skip("no dirty-bitmap exported");
     } else {
         trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
         nbd_meta_empty_or_pattern(client, meta->exp->export_bitmap_context,
@@ -954,6 +966,7 @@  static int nbd_negotiate_meta_queries(NBDClient *client,
     if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
         /* enable all known contexts */
         meta->base_allocation = true;
+        meta->allocation_depth = meta->exp->alloc_context;
         meta->bitmap = !!meta->exp->export_bitmap;
     } else {
         for (i = 0; i < nb_queries; ++i) {
@@ -973,6 +986,15 @@  static int nbd_negotiate_meta_queries(NBDClient *client,
         }
     }

+    if (meta->allocation_depth) {
+        ret = nbd_negotiate_send_meta_context(client, "qemu:allocation-depth",
+                                              NBD_META_ID_ALLOCATION_DEPTH,
+                                              errp);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
     if (meta->bitmap) {
         ret = nbd_negotiate_send_meta_context(client,
                                               meta->exp->export_bitmap_context,
@@ -1972,6 +1994,40 @@  static int blockstatus_to_extents(BlockDriverState *bs, uint64_t offset,
     return 0;
 }

+static int blockalloc_to_extents(BlockDriverState *bs, uint64_t offset,
+                                 uint64_t bytes, NBDExtentArray *ea)
+{
+    while (bytes) {
+        uint32_t flags;
+        int64_t num;
+        int ret = bdrv_is_allocated(bs, offset, bytes, &num);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (ret == 1) {
+            flags = NBD_STATE_DEPTH_LOCAL;
+        } else {
+            ret = bdrv_is_allocated_above(bs, NULL, false, offset, num,
+                                          &num);
+            if (ret < 0) {
+                return ret;
+            }
+            flags = ret ? NBD_STATE_DEPTH_BACKING : NBD_STATE_DEPTH_UNALLOC;
+        }
+
+        if (nbd_extent_array_add(ea, num, flags) < 0) {
+            return 0;
+        }
+
+        offset += num;
+        bytes -= num;
+    }
+
+    return 0;
+}
+
 /*
  * nbd_co_send_extents
  *
@@ -2020,6 +2076,26 @@  static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
     return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
 }

+/* Get allocation depth from the exported device and send it to the client */
+static int nbd_co_send_block_alloc(NBDClient *client, uint64_t handle,
+                                   BlockDriverState *bs, uint64_t offset,
+                                   uint32_t length, bool dont_fragment,
+                                   bool last, uint32_t context_id,
+                                   Error **errp)
+{
+    int ret;
+    unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
+    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+
+    ret = blockalloc_to_extents(bs, offset, length, ea);
+    if (ret < 0) {
+        return nbd_co_send_structured_error(
+                client, handle, -ret, "can't get block status", errp);
+    }
+
+    return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
+}
+
 /* Populate @ea from a dirty bitmap. */
 static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
                               uint64_t offset, uint64_t length,
@@ -2348,15 +2424,19 @@  static coroutine_fn int nbd_handle_request(NBDClient *client,
         }
         if (client->export_meta.valid &&
             (client->export_meta.base_allocation ||
+             client->export_meta.allocation_depth ||
              client->export_meta.bitmap))
         {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
+            int contexts_remaining = client->export_meta.base_allocation +
+                client->export_meta.allocation_depth +
+                client->export_meta.bitmap;

             if (client->export_meta.base_allocation) {
                 ret = nbd_co_send_block_status(client, request->handle,
                                                blk_bs(exp->blk), request->from,
                                                request->len, dont_fragment,
-                                               !client->export_meta.bitmap,
+                                               !--contexts_remaining,
                                                NBD_META_ID_BASE_ALLOCATION,
                                                errp);
                 if (ret < 0) {
@@ -2364,17 +2444,31 @@  static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

+            if (client->export_meta.allocation_depth) {
+                ret = nbd_co_send_block_alloc(client, request->handle,
+                                              blk_bs(exp->blk), request->from,
+                                              request->len, dont_fragment,
+                                              !--contexts_remaining,
+                                              NBD_META_ID_ALLOCATION_DEPTH,
+                                              errp);
+                if (ret < 0) {
+                    return ret;
+                }
+            }
+
             if (client->export_meta.bitmap) {
                 ret = nbd_co_send_bitmap(client, request->handle,
                                          client->exp->export_bitmap,
                                          request->from, request->len,
-                                         dont_fragment,
-                                         true, NBD_META_ID_DIRTY_BITMAP, errp);
+                                         dont_fragment, !--contexts_remaining,
+                                         NBD_META_ID_DIRTY_BITMAP, errp);
                 if (ret < 0) {
                     return ret;
                 }
             }

+            assert(!contexts_remaining);
+
             return 0;
         } else {
             return nbd_send_generic_reply(client, request->handle, -EINVAL,