diff mbox series

[RFC,4/5] Add migration support for KVM guest with MTE

Message ID 25a922038d256e47f3eb99683c5e3bd9c34753ac.1612747873.git.haibo.xu@linaro.org
State Superseded
Headers show
Series target/arm: Add MTE support to KVM guest | expand

Commit Message

Haibo Xu Feb. 8, 2021, 3:20 a.m. UTC
To make it easier to keep the page tags sync with
the page data, tags for one page are appended to
the data during ram save iteration.

This patch only add the pre-copy migration support.
Post-copy and compress as well as zero page saving
are not supported yet.

Signed-off-by: Haibo Xu <haibo.xu@linaro.org>

---
 include/hw/arm/virt.h    |  2 +
 include/migration/misc.h |  1 +
 migration/ram.c          | 86 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 88 insertions(+), 1 deletion(-)

-- 
2.17.1

Comments

Richard Henderson Feb. 16, 2021, 3:31 p.m. UTC | #1
On 2/7/21 7:20 PM, Haibo Xu wrote:
> +    if (kvm_physical_memory_addr_from_host(kvm_state, addr, &ipa)) {

> +        /* Buffer for the page tags(one byte per tag) */

> +        tag_buf = g_try_malloc0(size);

> +        if (!tag_buf) {

> +            error_report("%s: Error allocating MTE tag_buf", __func__);

> +            return 0;

> +        }

> +

> +        if (kvm_arm_mte_get_tags(ipa, TARGET_PAGE_SIZE, tag_buf) < 0) {

> +            error_report("%s: Can't get MTE tags from guest", __func__);

> +            g_free(tag_buf);

> +            return 0;

> +        }

> +

> +        qemu_put_buffer(f, tag_buf, size);

> +

> +        g_free(tag_buf);

> +

> +        return size;

> +    }


So, in patch 2 you disabled the allocation of tag-memory.  Now you're
allocating new memory (admittedly quite a small amount -- 1/16th of a page,
small enough to just be a local variable).

Why don't you allocate tag-memory, copy the data into it, and then let
migration proceed as normal.  Then you don't have to have a random data block
that happens to follow each ram page.

I'm concerned that what you're doing here makes it impossible to migrate
between kvm and tcg.


r~
Haibo Xu Feb. 22, 2021, 9:46 a.m. UTC | #2
On Tue, 16 Feb 2021 at 23:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>

> On 2/7/21 7:20 PM, Haibo Xu wrote:

> > +    if (kvm_physical_memory_addr_from_host(kvm_state, addr, &ipa)) {

> > +        /* Buffer for the page tags(one byte per tag) */

> > +        tag_buf = g_try_malloc0(size);

> > +        if (!tag_buf) {

> > +            error_report("%s: Error allocating MTE tag_buf", __func__);

> > +            return 0;

> > +        }

> > +

> > +        if (kvm_arm_mte_get_tags(ipa, TARGET_PAGE_SIZE, tag_buf) < 0) {

> > +            error_report("%s: Can't get MTE tags from guest", __func__);

> > +            g_free(tag_buf);

> > +            return 0;

> > +        }

> > +

> > +        qemu_put_buffer(f, tag_buf, size);

> > +

> > +        g_free(tag_buf);

> > +

> > +        return size;

> > +    }

>

> So, in patch 2 you disabled the allocation of tag-memory.  Now you're

> allocating new memory (admittedly quite a small amount -- 1/16th of a page,

> small enough to just be a local variable).

>


Hi Richard!

Thanks so much for the comments!

Yes, the allocated memory here is for temporary tag storage. As you
mentioned, it can be
defined as a local variable which is reserved for temporary tag buffers.

> Why don't you allocate tag-memory, copy the data into it, and then let

> migration proceed as normal.  Then you don't have to have a random data block

> that happens to follow each ram page.

>


As I mentioned in the cover later, the reason to let the tag go with
the memory data together
is to make it easier to sync with each other. I think if we migratie
them separately, it would be
hard to keep the tags to sync with the data.

Saying if we migration all the data first, then the tags. If the data
got dirty during the migration
of the tag memory, we may need to send the data again, or freeze the
source VM after data
migration? What's more, the KVM_GET_DIRTY_LOG API may not be able to
differentiate
between a tag and data changes.

I'm not sure whether the aforementioned situation does exist. Please
correct me if something goes wrong!

> I'm concerned that what you're doing here makes it impossible to migrate

> between kvm and tcg.

>

>


You mean to migrate from a KVM mode VM to a TCG mode VM?

> r~


Regards,
Haibo
Richard Henderson Feb. 22, 2021, 10:47 p.m. UTC | #3
On 2/22/21 1:46 AM, Haibo Xu wrote:
> As I mentioned in the cover later, the reason to let the tag go with the

> memory data together is to make it easier to sync with each other. I think

> if we migratie them separately, it would be hard to keep the tags to sync

> with the data.

Well, maybe, maybe not.  See below.


> Saying if we migration all the data first, then the tags. If the data got

> dirty during the migration of the tag memory, we may need to send the data

> again, or freeze the source VM after data migration? What's more, the

> KVM_GET_DIRTY_LOG API may not be able to differentiate between a tag and

> data changes.

I would certainly expect KVM_GET_DIRTY_LOG to only care about the normal
memory.  That is, pages as viewed by the guest.

I would expect the separate tag_memory block to be private to the host.  If a
normal page is dirty, then we would read the tags into the tag_memory and
manually mark that dirty.  Migration would continue as normal, and eventually
both normal and tag memory would all be clean and migrated.

But I'll admit that it does require that we retain a buffer 1/16 the size of
main memory, which is otherwise unused, and thus this is less than ideal.  So
if we do it your way, we should arrange for tcg to migrate the tag data in the
same way.

I'll still wait for migration experts, of which I am not one.


r~
Haibo Xu March 12, 2021, 1:50 a.m. UTC | #4
++ more migration experts!

On Tue, 23 Feb 2021 at 06:47, Richard Henderson
<richard.henderson@linaro.org> wrote:
>

> On 2/22/21 1:46 AM, Haibo Xu wrote:

> > As I mentioned in the cover later, the reason to let the tag go with the

> > memory data together is to make it easier to sync with each other. I think

> > if we migratie them separately, it would be hard to keep the tags to sync

> > with the data.

> Well, maybe, maybe not.  See below.

>

>

> > Saying if we migration all the data first, then the tags. If the data got

> > dirty during the migration of the tag memory, we may need to send the data

> > again, or freeze the source VM after data migration? What's more, the

> > KVM_GET_DIRTY_LOG API may not be able to differentiate between a tag and

> > data changes.

> I would certainly expect KVM_GET_DIRTY_LOG to only care about the normal

> memory.  That is, pages as viewed by the guest.

>

> I would expect the separate tag_memory block to be private to the host.  If a

> normal page is dirty, then we would read the tags into the tag_memory and

> manually mark that dirty.  Migration would continue as normal, and eventually

> both normal and tag memory would all be clean and migrated.

>

> But I'll admit that it does require that we retain a buffer 1/16 the size of

> main memory, which is otherwise unused, and thus this is less than ideal.  So

> if we do it your way, we should arrange for tcg to migrate the tag data in the

> same way.

>

> I'll still wait for migration experts, of which I am not one.

>

>

> r~
diff mbox series

Patch

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 36fcb29641..6182b3e1d1 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -165,6 +165,8 @@  struct VirtMachineState {
     DeviceState *acpi_dev;
     Notifier powerdown_notifier;
     PCIBus *bus;
+    /* migrate memory tags */
+    NotifierWithReturn precopy_notifier;
 };
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index bccc1b6b44..005133f471 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -38,6 +38,7 @@  void precopy_add_notifier(NotifierWithReturn *n);
 void precopy_remove_notifier(NotifierWithReturn *n);
 int precopy_notify(PrecopyNotifyReason reason, Error **errp);
 void precopy_enable_free_page_optimization(void);
+void precopy_enable_metadata_migration(void);
 
 void ram_mig_init(void);
 void qemu_guest_free_page_hint(void *addr, size_t len);
diff --git a/migration/ram.c b/migration/ram.c
index 7811cde643..32f764680d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -53,9 +53,11 @@ 
 #include "block.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/cpu-throttle.h"
+#include "sysemu/kvm.h"
 #include "savevm.h"
 #include "qemu/iov.h"
 #include "multifd.h"
+#include "kvm_arm.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -75,6 +77,9 @@ 
 #define RAM_SAVE_FLAG_XBZRLE   0x40
 /* 0x80 is reserved in migration.h start with 0x100 next */
 #define RAM_SAVE_FLAG_COMPRESS_PAGE    0x100
+#define RAM_SAVE_FLAG_MTE              0x200
+
+#define MTE_GRANULE_SIZE   (16)
 
 static inline bool is_zero_range(uint8_t *p, uint64_t size)
 {
@@ -310,6 +315,8 @@  struct RAMState {
     bool ram_bulk_stage;
     /* The free page optimization is enabled */
     bool fpo_enabled;
+    /* The RAM meta data(e.t memory tag) is enabled */
+    bool metadata_enabled;
     /* How many times we have dirty too many pages */
     int dirty_rate_high_cnt;
     /* these variables are used for bitmap sync */
@@ -387,6 +394,15 @@  void precopy_enable_free_page_optimization(void)
     ram_state->fpo_enabled = true;
 }
 
+void precopy_enable_metadata_migration(void)
+{
+    if (!ram_state) {
+        return;
+    }
+
+    ram_state->metadata_enabled = true;
+}
+
 uint64_t ram_bytes_remaining(void)
 {
     return ram_state ? (ram_state->migration_dirty_pages * TARGET_PAGE_SIZE) :
@@ -1127,6 +1143,61 @@  static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
     return true;
 }
 
+static int save_normal_page_mte_tags(QEMUFile *f, uint8_t *addr)
+{
+    uint8_t *tag_buf = NULL;
+    uint64_t ipa;
+    int size = TARGET_PAGE_SIZE / MTE_GRANULE_SIZE;
+
+    if (kvm_physical_memory_addr_from_host(kvm_state, addr, &ipa)) {
+        /* Buffer for the page tags(one byte per tag) */
+        tag_buf = g_try_malloc0(size);
+        if (!tag_buf) {
+            error_report("%s: Error allocating MTE tag_buf", __func__);
+            return 0;
+        }
+
+        if (kvm_arm_mte_get_tags(ipa, TARGET_PAGE_SIZE, tag_buf) < 0) {
+            error_report("%s: Can't get MTE tags from guest", __func__);
+            g_free(tag_buf);
+            return 0;
+        }
+
+        qemu_put_buffer(f, tag_buf, size);
+
+        g_free(tag_buf);
+
+        return size;
+    }
+
+    return 0;
+}
+
+static void load_normal_page_mte_tags(QEMUFile *f, uint8_t *addr)
+{
+    uint8_t *tag_buf = NULL;
+    uint64_t ipa;
+    int size = TARGET_PAGE_SIZE / MTE_GRANULE_SIZE;
+
+    if (kvm_physical_memory_addr_from_host(kvm_state, addr, &ipa)) {
+        /* Buffer for the page tags(one byte per tag) */
+        tag_buf = g_try_malloc0(size);
+        if (!tag_buf) {
+            error_report("%s: Error allocating MTE tag_buf", __func__);
+            return;
+        }
+
+        qemu_get_buffer(f, tag_buf, size);
+        if (kvm_arm_mte_set_tags(ipa, TARGET_PAGE_SIZE, tag_buf) < 0) {
+            error_report("%s: Can't set MTE tags to guest", __func__);
+        }
+
+        g_free(tag_buf);
+    }
+
+    return;
+}
+
 /*
  * directly send the page to the stream
  *
@@ -1141,6 +1212,10 @@  static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
 static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
                             uint8_t *buf, bool async)
 {
+    if (rs->metadata_enabled) {
+        offset |= RAM_SAVE_FLAG_MTE;
+    }
+
     ram_counters.transferred += save_page_header(rs, rs->f, block,
                                                  offset | RAM_SAVE_FLAG_PAGE);
     if (async) {
@@ -1152,6 +1227,11 @@  static int save_normal_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
     }
     ram_counters.transferred += TARGET_PAGE_SIZE;
     ram_counters.normal++;
+
+    if (rs->metadata_enabled) {
+        ram_counters.transferred += save_normal_page_mte_tags(rs->f, buf);
+    }
+
     return 1;
 }
 
@@ -1905,6 +1985,7 @@  static void ram_state_reset(RAMState *rs)
     rs->last_version = ram_list.version;
     rs->ram_bulk_stage = true;
     rs->fpo_enabled = false;
+    rs->metadata_enabled = false;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -3492,7 +3573,7 @@  static int ram_load_precopy(QEMUFile *f)
             trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
         }
 
-        switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
+        switch (flags & ~(RAM_SAVE_FLAG_CONTINUE | RAM_SAVE_FLAG_MTE)) {
         case RAM_SAVE_FLAG_MEM_SIZE:
             /* Synchronize RAM block list */
             total_ram_bytes = addr;
@@ -3562,6 +3643,9 @@  static int ram_load_precopy(QEMUFile *f)
 
         case RAM_SAVE_FLAG_PAGE:
             qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+            if (flags & RAM_SAVE_FLAG_MTE) {
+                load_normal_page_mte_tags(f, host);
+            }
             break;
 
         case RAM_SAVE_FLAG_COMPRESS_PAGE: