Liang Li
2016-Mar-03  10:44 UTC
[RFC qemu 0/4] A PV solution for live migration optimization
The current QEMU live migration implementation mark the all the guest's RAM pages as dirtied in the ram bulk stage, all these pages will be processed and that takes quit a lot of CPU cycles.>From guest's point of view, it doesn't care about the content in freepages. We can make use of this fact and skip processing the free pages in the ram bulk stage, it can save a lot CPU cycles and reduce the network traffic significantly while speed up the live migration process obviously. This patch set is the QEMU side implementation. The virtio-balloon is extended so that QEMU can get the free pages information from the guest through virtio. After getting the free pages information (a bitmap), QEMU can use it to filter out the guest's free pages in the ram bulk stage. This make the live migration process much more efficient. This RFC version doesn't take the post-copy and RDMA into consideration, maybe both of them can benefit from this PV solution by with some extra modifications. Performance data =============== Test environment: CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB Host Linux Kernel: 4.2.0 Host OS: CentOS 7.1 Guest Linux Kernel: 4.5.rc6 Guest OS: CentOS 6.6 Network: X540-AT2 with 10 Gigabit connection Guest RAM: 8GB Case 1: Idle guest just boots: =========================================== | original | pv ------------------------------------------- total time(ms) | 1894 | 421 -------------------------------------------- transferred ram(KB) | 398017 | 353242 =========================================== Case 2: The guest has ever run some memory consuming workload, the workload is terminated just before live migration. =========================================== | original | pv ------------------------------------------- total time(ms) | 7436 | 552 -------------------------------------------- transferred ram(KB) | 8146291 | 361375 =========================================== Liang Li (4): pc: Add code to get the lowmem form PCMachineState virtio-balloon: Add a new feature to balloon device migration: not set migration bitmap in setup stage migration: filter out guest's free pages in ram bulk stage balloon.c | 30 ++++++++- hw/i386/pc.c | 5 ++ hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + hw/virtio/virtio-balloon.c | 81 ++++++++++++++++++++++++- include/hw/i386/pc.h | 3 +- include/hw/virtio/virtio-balloon.h | 17 +++++- include/standard-headers/linux/virtio_balloon.h | 1 + include/sysemu/balloon.h | 10 ++- migration/ram.c | 64 +++++++++++++++---- 10 files changed, 195 insertions(+), 18 deletions(-) -- 1.8.3.1
Liang Li
2016-Mar-03  10:44 UTC
[RFC qemu 1/4] pc: Add code to get the lowmem form PCMachineState
The lowmem will be used by the following patch to get
a correct free pages bitmap.
Signed-off-by: Liang Li <liang.z.li at intel.com>
---
 hw/i386/pc.c         | 5 +++++
 hw/i386/pc_piix.c    | 1 +
 hw/i386/pc_q35.c     | 1 +
 include/hw/i386/pc.h | 3 ++-
 4 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 0aeefd2..f794a84 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1115,6 +1115,11 @@ void pc_hot_add_cpu(const int64_t id, Error **errp)
     object_unref(OBJECT(cpu));
 }
 
+ram_addr_t pc_get_lowmem(PCMachineState *pcms)
+{
+   return pcms->lowmem;
+}
+
 void pc_cpus_init(PCMachineState *pcms)
 {
     int i;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 6f8c2cd..268a08c 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -113,6 +113,7 @@ static void pc_init1(MachineState *machine,
         }
     }
 
+    pcms->lowmem = lowmem;
     if (machine->ram_size >= lowmem) {
         pcms->above_4g_mem_size = machine->ram_size - lowmem;
         pcms->below_4g_mem_size = lowmem;
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 46522c9..8d9bd39 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -101,6 +101,7 @@ static void pc_q35_init(MachineState *machine)
         }
     }
 
+    pcms->lowmem = lowmem;
     if (machine->ram_size >= lowmem) {
         pcms->above_4g_mem_size = machine->ram_size - lowmem;
         pcms->below_4g_mem_size = lowmem;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 8b3546e..3694c91 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -60,7 +60,7 @@ struct PCMachineState {
     bool nvdimm;
 
     /* RAM information (sizes, addresses, configuration): */
-    ram_addr_t below_4g_mem_size, above_4g_mem_size;
+    ram_addr_t below_4g_mem_size, above_4g_mem_size, lowmem;
 
     /* CPU and apic information: */
     bool apic_xrupt_override;
@@ -229,6 +229,7 @@ void pc_hot_add_cpu(const int64_t id, Error **errp);
 void pc_acpi_init(const char *default_dsdt);
 
 void pc_guest_info_init(PCMachineState *pcms);
+ram_addr_t pc_get_lowmem(PCMachineState *pcms);
 
 #define PCI_HOST_PROP_PCI_HOLE_START   "pci-hole-start"
 #define PCI_HOST_PROP_PCI_HOLE_END     "pci-hole-end"
-- 
1.8.3.1
Liang Li
2016-Mar-03  10:44 UTC
[RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device
Extend the virtio balloon device to support a new feature, this
new feature can help to get guest's free pages information, which
can be used for live migration optimzation.
Signed-off-by: Liang Li <liang.z.li at intel.com>
---
 balloon.c                                       | 30 ++++++++-
 hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
 include/hw/virtio/virtio-balloon.h              | 17 +++++-
 include/standard-headers/linux/virtio_balloon.h |  1 +
 include/sysemu/balloon.h                        | 10 ++-
 5 files changed, 134 insertions(+), 5 deletions(-)
diff --git a/balloon.c b/balloon.c
index f2ef50c..a37717e 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,7 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePages *balloon_free_pages_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -65,9 +66,12 @@ static bool have_balloon(Error **errp)
 }
 
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+                             QEMUBalloonStatus *stat_func,
+                             QEMUBalloonFreePages *free_pages_func,
+                             void *opaque)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_pages_fn
+        || balloon_opaque) {
         /* We're already registered one balloon handler.  How many can
          * a guest really have?
          */
@@ -75,6 +79,7 @@ int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
     }
     balloon_event_fn = event_func;
     balloon_stat_fn = stat_func;
+    balloon_free_pages_fn = free_pages_func;
     balloon_opaque = opaque;
     return 0;
 }
@@ -86,6 +91,7 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_pages_fn = NULL;
     balloon_opaque = NULL;
 }
 
@@ -116,3 +122,23 @@ void qmp_balloon(int64_t target, Error **errp)
     trace_balloon_event(balloon_opaque, target);
     balloon_event_fn(balloon_opaque, target);
 }
+
+bool balloon_free_pages_support(void)
+{
+    return balloon_free_pages_fn ? true : false;
+}
+
+int balloon_get_free_pages(unsigned long *free_pages_bitmap,
+                           unsigned long *free_pages_count)
+{
+    if (!balloon_free_pages_fn) {
+        return -1;
+    }
+
+    if (!free_pages_bitmap || !free_pages_count) {
+        return -1;
+    }
+
+    return balloon_free_pages_fn(balloon_opaque,
+                                 free_pages_bitmap, free_pages_count);
+ }
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e9c30e9..a5b9d08 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -76,6 +76,12 @@ static bool balloon_stats_supported(const VirtIOBalloon *s)
     return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ);
 }
 
+static bool balloon_free_pages_supported(const VirtIOBalloon *s)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_GET_FREE_PAGES);
+}
+
 static bool balloon_stats_enabled(const VirtIOBalloon *s)
 {
     return s->stats_poll_interval > 0;
@@ -293,6 +299,37 @@ out:
     }
 }
 
+static void virtio_balloon_get_free_pages(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+    VirtQueueElement *elem;
+    size_t offset = 0;
+    uint64_t bitmap_bytes = 0, free_pages_count = 0;
+
+    elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+    if (!elem) {
+        return;
+    }
+    s->free_pages_vq_elem = elem;
+
+    if (!elem->out_num) {
+        return;
+    }
+
+    iov_to_buf(elem->out_sg, elem->out_num, offset,
+               &free_pages_count, sizeof(uint64_t));
+
+    offset += sizeof(uint64_t);
+    iov_to_buf(elem->out_sg, elem->out_num, offset,
+               &bitmap_bytes, sizeof(uint64_t));
+
+    offset += sizeof(uint64_t);
+    iov_to_buf(elem->out_sg, elem->out_num, offset,
+               s->free_pages_bitmap, bitmap_bytes);
+    s->req_status = DONE;
+    s->free_pages_count = free_pages_count;
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -362,6 +399,7 @@ static uint64_t virtio_balloon_get_features(VirtIODevice
*vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+    virtio_add_feature(&f, VIRTIO_BALLOON_F_GET_FREE_PAGES);
     return f;
 }
 
@@ -372,6 +410,45 @@ static void virtio_balloon_stat(void *opaque, BalloonInfo
*info)
                                              VIRTIO_BALLOON_PFN_SHIFT);
 }
 
+static int virtio_balloon_free_pages(void *opaque,
+                                     unsigned long *free_pages_bitmap,
+                                     unsigned long *free_pages_count)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    VirtQueueElement *elem = s->free_pages_vq_elem;
+    int len;
+
+    if (!balloon_free_pages_supported(s)) {
+        return -1;
+    }
+
+    if (s->req_status == NOT_STARTED) {
+        s->free_pages_bitmap = free_pages_bitmap;
+        s->req_status = STARTED;
+        s->mem_layout.low_mem = pc_get_lowmem(PC_MACHINE(current_machine));
+        if (!elem->in_num) {
+            elem = virtqueue_pop(s->fvq, sizeof(VirtQueueElement));
+            if (!elem) {
+                return 0;
+            }
+            s->free_pages_vq_elem = elem;
+        }
+        len = iov_from_buf(elem->in_sg, elem->in_num, 0,
&s->mem_layout,
+                           sizeof(s->mem_layout));
+        virtqueue_push(s->fvq, elem, len);
+        virtio_notify(vdev, s->fvq);
+        return 0;
+    } else if (s->req_status == STARTED) {
+        return 0;
+    } else if (s->req_status == DONE) {
+        *free_pages_count = s->free_pages_count;
+        s->req_status = NOT_STARTED;
+    }
+
+    return 1;
+}
+
 static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
@@ -429,7 +506,8 @@ static void virtio_balloon_device_realize(DeviceState *dev,
Error **errp)
                 sizeof(struct virtio_balloon_config));
 
     ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
+                                   virtio_balloon_stat,
+                                   virtio_balloon_free_pages, s);
 
     if (ret < 0) {
         error_setg(errp, "Only one balloon device is supported");
@@ -440,6 +518,7 @@ static void virtio_balloon_device_realize(DeviceState *dev,
Error **errp)
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
+    s->fvq = virtio_add_queue(vdev, 128, virtio_balloon_get_free_pages);
 
     reset_stats(s);
 
diff --git a/include/hw/virtio/virtio-balloon.h
b/include/hw/virtio/virtio-balloon.h
index 35f62ac..fc173e4 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -23,6 +23,16 @@
 #define VIRTIO_BALLOON(obj) \
         OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
 
+typedef enum virtio_req_status {
+    NOT_STARTED,
+    STARTED,
+    DONE,
+} VIRTIO_REQ_STATUS;
+
+typedef struct MemLayout {
+    uint64_t low_mem;
+} MemLayout;
+
 typedef struct virtio_balloon_stat VirtIOBalloonStat;
 
 typedef struct virtio_balloon_stat_modern {
@@ -33,16 +43,21 @@ typedef struct virtio_balloon_stat_modern {
 
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *fvq;
     uint32_t num_pages;
     uint32_t actual;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
+    VirtQueueElement *free_pages_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
+    uint64_t *free_pages_bitmap;
+    uint64_t free_pages_count;
+    MemLayout mem_layout;
+    VIRTIO_REQ_STATUS req_status;
 } VirtIOBalloon;
 
 #endif
diff --git a/include/standard-headers/linux/virtio_balloon.h
b/include/standard-headers/linux/virtio_balloon.h
index 2e2a6dc..95b7d0c 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,6 +34,7 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_GET_FREE_PAGES 3 /* Get the free pages bitmap */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 3f976b4..205b272 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,19 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef int (QEMUBalloonFreePages)(void *opaque,
+                                   unsigned long *free_pages_bitmap,
+                                   unsigned long *free_pages_count);
 
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
+                             QEMUBalloonStatus *stat_func,
+                             QEMUBalloonFreePages *free_pages_func,
+                             void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_pages_support(void);
+int balloon_get_free_pages(unsigned long *free_pages_bitmap,
+                           unsigned long *free_pages_count);
 
 #endif
-- 
1.8.3.1
Liang Li
2016-Mar-03  10:44 UTC
[RFC qemu 3/4] migration: not set migration bitmap in setup stage
Set ram_list.dirty_memory instead of migration bitmap, the migration
bitmap will be update when doing migration_bitmap_sync().
Set migration_dirty_pages to 0 and it will be updated by
migration_dirty_pages() too.
The following patch is based on this change.
Signed-off-by: Liang Li <liang.z.li at intel.com>
---
 migration/ram.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 704f6a9..ee2547d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1931,19 +1931,19 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_bitmap_pages = last_ram_offset() >> TARGET_PAGE_BITS;
     migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
     migration_bitmap_rcu->bmap = bitmap_new(ram_bitmap_pages);
-    bitmap_set(migration_bitmap_rcu->bmap, 0, ram_bitmap_pages);
 
     if (migrate_postcopy_ram()) {
         migration_bitmap_rcu->unsentmap = bitmap_new(ram_bitmap_pages);
         bitmap_set(migration_bitmap_rcu->unsentmap, 0, ram_bitmap_pages);
     }
 
-    /*
-     * Count the total number of pages used by ram blocks not including any
-     * gaps due to alignment or unplugs.
-     */
-    migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
+    migration_dirty_pages = 0;
 
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        cpu_physical_memory_set_dirty_range(block->offset,
+                                            block->used_length,
+                                            DIRTY_MEMORY_MIGRATION);
+    }
     memory_global_dirty_log_start();
     migration_bitmap_sync();
     qemu_mutex_unlock_ramlist();
-- 
1.8.3.1
Liang Li
2016-Mar-03  10:44 UTC
[RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage
Get the free pages information through virtio and filter out the free
pages in the ram bulk stage. This can significantly reduce the total
live migration time as well as network traffic.
Signed-off-by: Liang Li <liang.z.li at intel.com>
---
 migration/ram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 6 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index ee2547d..819553b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -40,6 +40,7 @@
 #include "trace.h"
 #include "exec/ram_addr.h"
 #include "qemu/rcu_queue.h"
+#include "sysemu/balloon.h"
 
 #ifdef DEBUG_MIGRATION_RAM
 #define DPRINTF(fmt, ...) \
@@ -241,6 +242,7 @@ static struct BitmapRcu {
     struct rcu_head rcu;
     /* Main migration bitmap */
     unsigned long *bmap;
+    unsigned long *free_pages_bmap;
     /* bitmap of pages that haven't been sent even once
      * only maintained and used in postcopy at the moment
      * where it's used to send the dirtymap at the start
@@ -561,12 +563,7 @@ ram_addr_t migration_bitmap_find_dirty(RAMBlock *rb,
     unsigned long next;
 
     bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
-    if (ram_bulk_stage && nr > base) {
-        next = nr + 1;
-    } else {
-        next = find_next_bit(bitmap, size, nr);
-    }
-
+    next = find_next_bit(bitmap, size, nr);
     *ram_addr_abs = next << TARGET_PAGE_BITS;
     return (next - base) << TARGET_PAGE_BITS;
 }
@@ -1415,6 +1412,9 @@ void free_xbzrle_decoded_buf(void)
 static void migration_bitmap_free(struct BitmapRcu *bmap)
 {
     g_free(bmap->bmap);
+    if (balloon_free_pages_support()) {
+        g_free(bmap->free_pages_bmap);
+    }
     g_free(bmap->unsentmap);
     g_free(bmap);
 }
@@ -1873,6 +1873,28 @@ err:
     return ret;
 }
 
+static void filter_out_guest_free_pages(unsigned long *free_pages_bmap)
+{
+    RAMBlock *block;
+    DirtyMemoryBlocks *blocks;
+    unsigned long end, page;
+
+    blocks =
atomic_rcu_read(&ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+    end = TARGET_PAGE_ALIGN(block->offset +
+                            block->used_length) >> TARGET_PAGE_BITS;
+    page = block->offset >> TARGET_PAGE_BITS;
+
+    while (page < end) {
+        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+        unsigned long *p = free_pages_bmap + BIT_WORD(page);
+
+        slow_bitmap_complement(blocks->blocks[idx], p, num);
+        page += num;
+    }
+}
 
 /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -1884,6 +1906,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
     int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
+    uint64_t free_pages_count = 0;
 
     dirty_rate_high_cnt = 0;
     bitmap_sync_count = 0;
@@ -1931,6 +1954,9 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_bitmap_pages = last_ram_offset() >> TARGET_PAGE_BITS;
     migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
     migration_bitmap_rcu->bmap = bitmap_new(ram_bitmap_pages);
+    if (balloon_free_pages_support()) {
+        migration_bitmap_rcu->free_pages_bmap =
bitmap_new(ram_bitmap_pages);
+    }
 
     if (migrate_postcopy_ram()) {
         migration_bitmap_rcu->unsentmap = bitmap_new(ram_bitmap_pages);
@@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
                                             DIRTY_MEMORY_MIGRATION);
     }
     memory_global_dirty_log_start();
+
+    if (balloon_free_pages_support() &&
+        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
+                               &free_pages_count) == 0) {
+        qemu_mutex_unlock_iothread();
+        while (balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
+                                      &free_pages_count) == 0) {
+            usleep(1000);
+        }
+        qemu_mutex_lock_iothread();
+
+        filter_out_guest_free_pages(migration_bitmap_rcu->free_pages_bmap);
+    }
+
     migration_bitmap_sync();
     qemu_mutex_unlock_ramlist();
     qemu_mutex_unlock_iothread();
-- 
1.8.3.1
Cornelia Huck
2016-Mar-03  12:16 UTC
[RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage
On Thu, 3 Mar 2016 18:44:28 +0800 Liang Li <liang.z.li at intel.com> wrote:> Get the free pages information through virtio and filter out the free > pages in the ram bulk stage. This can significantly reduce the total > live migration time as well as network traffic. > > Signed-off-by: Liang Li <liang.z.li at intel.com> > --- > migration/ram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 46 insertions(+), 6 deletions(-) >> @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > DIRTY_MEMORY_MIGRATION); > } > memory_global_dirty_log_start(); > + > + if (balloon_free_pages_support() && > + balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap, > + &free_pages_count) == 0) { > + qemu_mutex_unlock_iothread(); > + while (balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap, > + &free_pages_count) == 0) { > + usleep(1000); > + } > + qemu_mutex_lock_iothread(); > + > + filter_out_guest_free_pages(migration_bitmap_rcu->free_pages_bmap);A general comment: Using the ballooner to get information about pages that can be filtered out is too limited (there may be other ways to do this; we might be able to use cmma on s390, for example), and I don't like hardcoding to a specific method. What about the reverse approach: Code may register a handler that populates the free_pages_bitmap which is called during this stage? <I like the idea of filtering in general, but I haven't looked at the code yet>> + } > + > migration_bitmap_sync(); > qemu_mutex_unlock_ramlist(); > qemu_mutex_unlock_iothread();
Cornelia Huck
2016-Mar-03  12:23 UTC
[RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device
On Thu, 3 Mar 2016 18:44:26 +0800 Liang Li <liang.z.li at intel.com> wrote:> Extend the virtio balloon device to support a new feature, this > new feature can help to get guest's free pages information, which > can be used for live migration optimzation.Do you have a spec for this, e.g. as a patch to the virtio spec?> > Signed-off-by: Liang Li <liang.z.li at intel.com> > --- > balloon.c | 30 ++++++++- > hw/virtio/virtio-balloon.c | 81 ++++++++++++++++++++++++- > include/hw/virtio/virtio-balloon.h | 17 +++++- > include/standard-headers/linux/virtio_balloon.h | 1 + > include/sysemu/balloon.h | 10 ++- > 5 files changed, 134 insertions(+), 5 deletions(-)> +static int virtio_balloon_free_pages(void *opaque, > + unsigned long *free_pages_bitmap, > + unsigned long *free_pages_count) > +{ > + VirtIOBalloon *s = opaque; > + VirtIODevice *vdev = VIRTIO_DEVICE(s); > + VirtQueueElement *elem = s->free_pages_vq_elem; > + int len; > + > + if (!balloon_free_pages_supported(s)) { > + return -1; > + } > + > + if (s->req_status == NOT_STARTED) { > + s->free_pages_bitmap = free_pages_bitmap; > + s->req_status = STARTED; > + s->mem_layout.low_mem = pc_get_lowmem(PC_MACHINE(current_machine));Please don't leak pc-specific information into generic code.
Daniel P. Berrange
2016-Mar-03  12:45 UTC
[Qemu-devel] [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage
On Thu, Mar 03, 2016 at 06:44:28PM +0800, Liang Li wrote:> Get the free pages information through virtio and filter out the free > pages in the ram bulk stage. This can significantly reduce the total > live migration time as well as network traffic. > > Signed-off-by: Liang Li <liang.z.li at intel.com> > --- > migration/ram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 46 insertions(+), 6 deletions(-)> @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque) > DIRTY_MEMORY_MIGRATION); > } > memory_global_dirty_log_start(); > + > + if (balloon_free_pages_support() && > + balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap, > + &free_pages_count) == 0) { > + qemu_mutex_unlock_iothread(); > + while (balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap, > + &free_pages_count) == 0) { > + usleep(1000); > + } > + qemu_mutex_lock_iothread(); > + > + filter_out_guest_free_pages(migration_bitmap_rcu->free_pages_bmap); > + }IIUC, this code is synchronous wrt to the guest OS balloon drive. ie it is asking the geust for free pages and waiting for a response. If the guest OS has crashed this is going to mean QEMU waits forever and thus migration won't complete. Similarly you need to consider that the guest OS may be malicious and simply never respond. So if the migration code is going to use the guest balloon driver to get info about free pages it has to be done in an asynchronous manner so that migration can never be stalled by a slow/crashed/malicious guest driver. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Michael S. Tsirkin
2016-Mar-03  12:56 UTC
[RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device
On Thu, Mar 03, 2016 at 06:44:26PM +0800, Liang Li wrote:> Extend the virtio balloon device to support a new feature, this > new feature can help to get guest's free pages information, which > can be used for live migration optimzation. > > Signed-off-by: Liang Li <liang.z.li at intel.com>I don't understand why we need a new interface. Balloon already sends free pages to host. Just teach host to skip these pages. Maybe instead of starting with code, you should send a high level description to the virtio tc for consideration? You can do it through the mailing list or using the web form: http://www.oasis-open.org/committees/comments/form.php?wg_abbrev=virtio> --- > balloon.c | 30 ++++++++- > hw/virtio/virtio-balloon.c | 81 ++++++++++++++++++++++++- > include/hw/virtio/virtio-balloon.h | 17 +++++- > include/standard-headers/linux/virtio_balloon.h | 1 + > include/sysemu/balloon.h | 10 ++- > 5 files changed, 134 insertions(+), 5 deletions(-) > > diff --git a/balloon.c b/balloon.c > index f2ef50c..a37717e 100644 > --- a/balloon.c > +++ b/balloon.c > @@ -36,6 +36,7 @@ > > static QEMUBalloonEvent *balloon_event_fn; > static QEMUBalloonStatus *balloon_stat_fn; > +static QEMUBalloonFreePages *balloon_free_pages_fn; > static void *balloon_opaque; > static bool balloon_inhibited; > > @@ -65,9 +66,12 @@ static bool have_balloon(Error **errp) > } > > int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, > - QEMUBalloonStatus *stat_func, void *opaque) > + QEMUBalloonStatus *stat_func, > + QEMUBalloonFreePages *free_pages_func, > + void *opaque) > { > - if (balloon_event_fn || balloon_stat_fn || balloon_opaque) { > + if (balloon_event_fn || balloon_stat_fn || balloon_free_pages_fn > + || balloon_opaque) { > /* We're already registered one balloon handler. How many can > * a guest really have? > */ > @@ -75,6 +79,7 @@ int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, > } > balloon_event_fn = event_func; > balloon_stat_fn = stat_func; > + balloon_free_pages_fn = free_pages_func; > balloon_opaque = opaque; > return 0; > } > @@ -86,6 +91,7 @@ void qemu_remove_balloon_handler(void *opaque) > } > balloon_event_fn = NULL; > balloon_stat_fn = NULL; > + balloon_free_pages_fn = NULL; > balloon_opaque = NULL; > } > > @@ -116,3 +122,23 @@ void qmp_balloon(int64_t target, Error **errp) > trace_balloon_event(balloon_opaque, target); > balloon_event_fn(balloon_opaque, target); > } > + > +bool balloon_free_pages_support(void) > +{ > + return balloon_free_pages_fn ? true : false; > +} > + > +int balloon_get_free_pages(unsigned long *free_pages_bitmap, > + unsigned long *free_pages_count) > +{ > + if (!balloon_free_pages_fn) { > + return -1; > + } > + > + if (!free_pages_bitmap || !free_pages_count) { > + return -1; > + } > + > + return balloon_free_pages_fn(balloon_opaque, > + free_pages_bitmap, free_pages_count); > + } > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index e9c30e9..a5b9d08 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c > @@ -76,6 +76,12 @@ static bool balloon_stats_supported(const VirtIOBalloon *s) > return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ); > } > > +static bool balloon_free_pages_supported(const VirtIOBalloon *s) > +{ > + VirtIODevice *vdev = VIRTIO_DEVICE(s); > + return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_GET_FREE_PAGES); > +} > + > static bool balloon_stats_enabled(const VirtIOBalloon *s) > { > return s->stats_poll_interval > 0; > @@ -293,6 +299,37 @@ out: > } > } > > +static void virtio_balloon_get_free_pages(VirtIODevice *vdev, VirtQueue *vq) > +{ > + VirtIOBalloon *s = VIRTIO_BALLOON(vdev); > + VirtQueueElement *elem; > + size_t offset = 0; > + uint64_t bitmap_bytes = 0, free_pages_count = 0; > + > + elem = virtqueue_pop(vq, sizeof(VirtQueueElement)); > + if (!elem) { > + return; > + } > + s->free_pages_vq_elem = elem; > + > + if (!elem->out_num) { > + return; > + } > + > + iov_to_buf(elem->out_sg, elem->out_num, offset, > + &free_pages_count, sizeof(uint64_t)); > + > + offset += sizeof(uint64_t); > + iov_to_buf(elem->out_sg, elem->out_num, offset, > + &bitmap_bytes, sizeof(uint64_t)); > + > + offset += sizeof(uint64_t); > + iov_to_buf(elem->out_sg, elem->out_num, offset, > + s->free_pages_bitmap, bitmap_bytes); > + s->req_status = DONE; > + s->free_pages_count = free_pages_count; > +} > + > static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data) > { > VirtIOBalloon *dev = VIRTIO_BALLOON(vdev); > @@ -362,6 +399,7 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f, > VirtIOBalloon *dev = VIRTIO_BALLOON(vdev); > f |= dev->host_features; > virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ); > + virtio_add_feature(&f, VIRTIO_BALLOON_F_GET_FREE_PAGES); > return f; > } > > @@ -372,6 +410,45 @@ static void virtio_balloon_stat(void *opaque, BalloonInfo *info) > VIRTIO_BALLOON_PFN_SHIFT); > } > > +static int virtio_balloon_free_pages(void *opaque, > + unsigned long *free_pages_bitmap, > + unsigned long *free_pages_count) > +{ > + VirtIOBalloon *s = opaque; > + VirtIODevice *vdev = VIRTIO_DEVICE(s); > + VirtQueueElement *elem = s->free_pages_vq_elem; > + int len; > + > + if (!balloon_free_pages_supported(s)) { > + return -1; > + } > + > + if (s->req_status == NOT_STARTED) { > + s->free_pages_bitmap = free_pages_bitmap; > + s->req_status = STARTED; > + s->mem_layout.low_mem = pc_get_lowmem(PC_MACHINE(current_machine)); > + if (!elem->in_num) { > + elem = virtqueue_pop(s->fvq, sizeof(VirtQueueElement)); > + if (!elem) { > + return 0; > + } > + s->free_pages_vq_elem = elem; > + } > + len = iov_from_buf(elem->in_sg, elem->in_num, 0, &s->mem_layout, > + sizeof(s->mem_layout)); > + virtqueue_push(s->fvq, elem, len); > + virtio_notify(vdev, s->fvq); > + return 0; > + } else if (s->req_status == STARTED) { > + return 0; > + } else if (s->req_status == DONE) { > + *free_pages_count = s->free_pages_count; > + s->req_status = NOT_STARTED; > + } > + > + return 1; > +} > + > static void virtio_balloon_to_target(void *opaque, ram_addr_t target) > { > VirtIOBalloon *dev = VIRTIO_BALLOON(opaque); > @@ -429,7 +506,8 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp) > sizeof(struct virtio_balloon_config)); > > ret = qemu_add_balloon_handler(virtio_balloon_to_target, > - virtio_balloon_stat, s); > + virtio_balloon_stat, > + virtio_balloon_free_pages, s); > > if (ret < 0) { > error_setg(errp, "Only one balloon device is supported"); > @@ -440,6 +518,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp) > s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output); > s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output); > s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats); > + s->fvq = virtio_add_queue(vdev, 128, virtio_balloon_get_free_pages); > > reset_stats(s); > > diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h > index 35f62ac..fc173e4 100644 > --- a/include/hw/virtio/virtio-balloon.h > +++ b/include/hw/virtio/virtio-balloon.h > @@ -23,6 +23,16 @@ > #define VIRTIO_BALLOON(obj) \ > OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON) > > +typedef enum virtio_req_status { > + NOT_STARTED, > + STARTED, > + DONE, > +} VIRTIO_REQ_STATUS; > + > +typedef struct MemLayout { > + uint64_t low_mem; > +} MemLayout; > + > typedef struct virtio_balloon_stat VirtIOBalloonStat; > > typedef struct virtio_balloon_stat_modern { > @@ -33,16 +43,21 @@ typedef struct virtio_balloon_stat_modern { > > typedef struct VirtIOBalloon { > VirtIODevice parent_obj; > - VirtQueue *ivq, *dvq, *svq; > + VirtQueue *ivq, *dvq, *svq, *fvq; > uint32_t num_pages; > uint32_t actual; > uint64_t stats[VIRTIO_BALLOON_S_NR]; > VirtQueueElement *stats_vq_elem; > + VirtQueueElement *free_pages_vq_elem; > size_t stats_vq_offset; > QEMUTimer *stats_timer; > int64_t stats_last_update; > int64_t stats_poll_interval; > uint32_t host_features; > + uint64_t *free_pages_bitmap; > + uint64_t free_pages_count; > + MemLayout mem_layout; > + VIRTIO_REQ_STATUS req_status; > } VirtIOBalloon; > > #endif > diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h > index 2e2a6dc..95b7d0c 100644 > --- a/include/standard-headers/linux/virtio_balloon.h > +++ b/include/standard-headers/linux/virtio_balloon.h > @@ -34,6 +34,7 @@ > #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */ > #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ > #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ > +#define VIRTIO_BALLOON_F_GET_FREE_PAGES 3 /* Get the free pages bitmap */ > > /* Size of a PFN in the balloon interface. */ > #define VIRTIO_BALLOON_PFN_SHIFT 12 > diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h > index 3f976b4..205b272 100644 > --- a/include/sysemu/balloon.h > +++ b/include/sysemu/balloon.h > @@ -18,11 +18,19 @@ > > typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target); > typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info); > +typedef int (QEMUBalloonFreePages)(void *opaque, > + unsigned long *free_pages_bitmap, > + unsigned long *free_pages_count); > > int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, > - QEMUBalloonStatus *stat_func, void *opaque); > + QEMUBalloonStatus *stat_func, > + QEMUBalloonFreePages *free_pages_func, > + void *opaque); > void qemu_remove_balloon_handler(void *opaque); > bool qemu_balloon_is_inhibited(void); > void qemu_balloon_inhibit(bool state); > +bool balloon_free_pages_support(void); > +int balloon_get_free_pages(unsigned long *free_pages_bitmap, > + unsigned long *free_pages_count); > > #endif > -- > 1.8.3.1
Roman Kagan
2016-Mar-03  13:58 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
On Thu, Mar 03, 2016 at 06:44:24PM +0800, Liang Li wrote:> The current QEMU live migration implementation mark the all the > guest's RAM pages as dirtied in the ram bulk stage, all these pages > will be processed and that takes quit a lot of CPU cycles. > > From guest's point of view, it doesn't care about the content in free > pages. We can make use of this fact and skip processing the free > pages in the ram bulk stage, it can save a lot CPU cycles and reduce > the network traffic significantly while speed up the live migration > process obviously. > > This patch set is the QEMU side implementation. > > The virtio-balloon is extended so that QEMU can get the free pages > information from the guest through virtio. > > After getting the free pages information (a bitmap), QEMU can use it > to filter out the guest's free pages in the ram bulk stage. This make > the live migration process much more efficient. > > This RFC version doesn't take the post-copy and RDMA into > consideration, maybe both of them can benefit from this PV solution > by with some extra modifications. > > Performance data > ===============> > Test environment: > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz > Host RAM: 64GB > Host Linux Kernel: 4.2.0 Host OS: CentOS 7.1 > Guest Linux Kernel: 4.5.rc6 Guest OS: CentOS 6.6 > Network: X540-AT2 with 10 Gigabit connection > Guest RAM: 8GB > > Case 1: Idle guest just boots: > ===========================================> | original | pv > ------------------------------------------- > total time(ms) | 1894 | 421 > -------------------------------------------- > transferred ram(KB) | 398017 | 353242 > ===========================================> > > Case 2: The guest has ever run some memory consuming workload, the > workload is terminated just before live migration. > ===========================================> | original | pv > ------------------------------------------- > total time(ms) | 7436 | 552 > -------------------------------------------- > transferred ram(KB) | 8146291 | 361375 > ===========================================Both cases look very artificial to me. Normally you migrate VMs which have started long ago and which can't have their services terminated before the migration, so I wouldn't expect any useful amount of free pages obtained this way. OTOH I don't see why you can't just inflate the balloon before the migration, and really optimize the amount of transferred data this way? With the recently proposed VIRTIO_BALLOON_S_AVAIL you can have a fairly good estimate of the optimal balloon size, and with the recently merged balloon deflation on OOM it's a safe thing to do without exposing the guest workloads to OOM risks. Roman.
Dr. David Alan Gilbert
2016-Mar-03  17:46 UTC
[RFC qemu 0/4] A PV solution for live migration optimization
* Liang Li (liang.z.li at intel.com) wrote:> The current QEMU live migration implementation mark the all the > guest's RAM pages as dirtied in the ram bulk stage, all these pages > will be processed and that takes quit a lot of CPU cycles. > > From guest's point of view, it doesn't care about the content in free > pages. We can make use of this fact and skip processing the free > pages in the ram bulk stage, it can save a lot CPU cycles and reduce > the network traffic significantly while speed up the live migration > process obviously. > > This patch set is the QEMU side implementation. > > The virtio-balloon is extended so that QEMU can get the free pages > information from the guest through virtio. > > After getting the free pages information (a bitmap), QEMU can use it > to filter out the guest's free pages in the ram bulk stage. This make > the live migration process much more efficient.Hi, An interesting solution; I know a few different people have been looking at how to speed up ballooned VM migration. I wonder if it would be possible to avoid the kernel changes by parsing /proc/self/pagemap - if that can be used to detect unmapped/zero mapped pages in the guest ram, would it achieve the same result?> This RFC version doesn't take the post-copy and RDMA into > consideration, maybe both of them can benefit from this PV solution > by with some extra modifications.For postcopy to be safe, you would still need to send a message to the destination telling it that there were zero pages, otherwise the destination can't tell if it's supposed to request the page from the source or treat the page as zero. Dave> > Performance data > ===============> > Test environment: > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz > Host RAM: 64GB > Host Linux Kernel: 4.2.0 Host OS: CentOS 7.1 > Guest Linux Kernel: 4.5.rc6 Guest OS: CentOS 6.6 > Network: X540-AT2 with 10 Gigabit connection > Guest RAM: 8GB > > Case 1: Idle guest just boots: > ===========================================> | original | pv > ------------------------------------------- > total time(ms) | 1894 | 421 > -------------------------------------------- > transferred ram(KB) | 398017 | 353242 > ===========================================> > > Case 2: The guest has ever run some memory consuming workload, the > workload is terminated just before live migration. > ===========================================> | original | pv > ------------------------------------------- > total time(ms) | 7436 | 552 > -------------------------------------------- > transferred ram(KB) | 8146291 | 361375 > ===========================================> > Liang Li (4): > pc: Add code to get the lowmem form PCMachineState > virtio-balloon: Add a new feature to balloon device > migration: not set migration bitmap in setup stage > migration: filter out guest's free pages in ram bulk stage > > balloon.c | 30 ++++++++- > hw/i386/pc.c | 5 ++ > hw/i386/pc_piix.c | 1 + > hw/i386/pc_q35.c | 1 + > hw/virtio/virtio-balloon.c | 81 ++++++++++++++++++++++++- > include/hw/i386/pc.h | 3 +- > include/hw/virtio/virtio-balloon.h | 17 +++++- > include/standard-headers/linux/virtio_balloon.h | 1 + > include/sysemu/balloon.h | 10 ++- > migration/ram.c | 64 +++++++++++++++---- > 10 files changed, 195 insertions(+), 18 deletions(-) > > -- > 1.8.3.1 >-- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK
Li, Liang Z
2016-Mar-04  01:35 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
> On Thu, Mar 03, 2016 at 06:44:24PM +0800, Liang Li wrote: > > The current QEMU live migration implementation mark the all the > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > will be processed and that takes quit a lot of CPU cycles. > > > > From guest's point of view, it doesn't care about the content in free > > pages. We can make use of this fact and skip processing the free pages > > in the ram bulk stage, it can save a lot CPU cycles and reduce the > > network traffic significantly while speed up the live migration > > process obviously. > > > > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > > > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > > > Performance data > > ===============> > > > Test environment: > > > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB > > Host Linux Kernel: 4.2.0 Host OS: CentOS 7.1 > > Guest Linux Kernel: 4.5.rc6 Guest OS: CentOS 6.6 > > Network: X540-AT2 with 10 Gigabit connection Guest RAM: 8GB > > > > Case 1: Idle guest just boots: > > ===========================================> > | original | pv > > ------------------------------------------- > > total time(ms) | 1894 | 421 > > -------------------------------------------- > > transferred ram(KB) | 398017 | 353242 > > ===========================================> > > > > > Case 2: The guest has ever run some memory consuming workload, the > > workload is terminated just before live migration. > > ===========================================> > | original | pv > > ------------------------------------------- > > total time(ms) | 7436 | 552 > > -------------------------------------------- > > transferred ram(KB) | 8146291 | 361375 > > ===========================================> > Both cases look very artificial to me. Normally you migrate VMs which have > started long ago and which can't have their services terminated before the > migration, so I wouldn't expect any useful amount of free pages obtained > this way. >Yes, it's somewhat artificial, just to emphasize the effect. And I think these two cases are very easy to reproduce. Using the real workload and do the test in production environment will be more convince. We can predict that as long as the guest doesn't use out of its memory, this solution may still take affect and shorten the total live migration time. (Off cause, we should consider the time cost of the virtio communication.)> OTOH I don't see why you can't just inflate the balloon before the migration, > and really optimize the amount of transferred data this way? > With the recently proposed VIRTIO_BALLOON_S_AVAIL you can have a fairly > good estimate of the optimal balloon size, and with the recently merged > balloon deflation on OOM it's a safe thing to do without exposing the guest > workloads to OOM risks. > > Roman.Thanks for your information. The size of the free page bitmap is not very large, for a guest with 8GB RAM, only 256KB extra memory is required. Comparing to this solution, inflate the balloon is more expensive. If the balloon size is not so optimal and guest request more memory during live migration, the guest's performance will be impacted. Liang
Li, Liang Z
2016-Mar-04  01:52 UTC
[RFC qemu 0/4] A PV solution for live migration optimization
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization > > * Liang Li (liang.z.li at intel.com) wrote: > > The current QEMU live migration implementation mark the all the > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > will be processed and that takes quit a lot of CPU cycles. > > > > From guest's point of view, it doesn't care about the content in free > > pages. We can make use of this fact and skip processing the free pages > > in the ram bulk stage, it can save a lot CPU cycles and reduce the > > network traffic significantly while speed up the live migration > > process obviously. > > > > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > Hi, > An interesting solution; I know a few different people have been looking at > how to speed up ballooned VM migration. >Ooh, different solutions for the same purpose, and both based on the balloon.> I wonder if it would be possible to avoid the kernel changes by parsing > /proc/self/pagemap - if that can be used to detect unmapped/zero mapped > pages in the guest ram, would it achieve the same result? >Only detect the unmapped/zero mapped pages is not enough. Consider the situation like case 2, it can't achieve the same result.> > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > For postcopy to be safe, you would still need to send a message to the > destination telling it that there were zero pages, otherwise the destination > can't tell if it's supposed to request the page from the source or treat the > page as zero. > > DaveI will consider this later, thanks, Dave. Liang> > > > > Performance data > > ===============> > > > Test environment: > > > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB > > Host Linux Kernel: 4.2.0 Host OS: CentOS 7.1 > > Guest Linux Kernel: 4.5.rc6 Guest OS: CentOS 6.6 > > Network: X540-AT2 with 10 Gigabit connection Guest RAM: 8GB > > > > Case 1: Idle guest just boots: > > ===========================================> > | original | pv > > ------------------------------------------- > > total time(ms) | 1894 | 421 > > -------------------------------------------- > > transferred ram(KB) | 398017 | 353242 > > ===========================================> > > > > > Case 2: The guest has ever run some memory consuming workload, the > > workload is terminated just before live migration. > > ===========================================> > | original | pv > > ------------------------------------------- > > total time(ms) | 7436 | 552 > > -------------------------------------------- > > transferred ram(KB) | 8146291 | 361375 > > ===========================================> >
Roman Kagan
2016-Mar-04  07:55 UTC
[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization
On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert wrote:> * Liang Li (liang.z.li at intel.com) wrote: > > The current QEMU live migration implementation mark the all the > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > will be processed and that takes quit a lot of CPU cycles. > > > > From guest's point of view, it doesn't care about the content in free > > pages. We can make use of this fact and skip processing the free > > pages in the ram bulk stage, it can save a lot CPU cycles and reduce > > the network traffic significantly while speed up the live migration > > process obviously. > > > > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > Hi, > An interesting solution; I know a few different people have been looking > at how to speed up ballooned VM migration. > > I wonder if it would be possible to avoid the kernel changes by > parsing /proc/self/pagemap - if that can be used to detect unmapped/zero > mapped pages in the guest ram, would it achieve the same result?Yes I was about to suggest the same thing: it's simple and makes use of the existing infrastructure. And you wouldn't need to care if the pages were unmapped by ballooning or anything else (alternative balloon implementations, not yet touched by the guest, etc.). Besides, you wouldn't need to synchronize with the guest. Roman.
Amit Shah
2016-Mar-08  11:13 UTC
[RFC qemu 0/4] A PV solution for live migration optimization
On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:> The current QEMU live migration implementation mark the all the > guest's RAM pages as dirtied in the ram bulk stage, all these pages > will be processed and that takes quit a lot of CPU cycles. > > From guest's point of view, it doesn't care about the content in free > pages. We can make use of this fact and skip processing the free > pages in the ram bulk stage, it can save a lot CPU cycles and reduce > the network traffic significantly while speed up the live migration > process obviously. > > This patch set is the QEMU side implementation. > > The virtio-balloon is extended so that QEMU can get the free pages > information from the guest through virtio. > > After getting the free pages information (a bitmap), QEMU can use it > to filter out the guest's free pages in the ram bulk stage. This make > the live migration process much more efficient. > > This RFC version doesn't take the post-copy and RDMA into > consideration, maybe both of them can benefit from this PV solution > by with some extra modifications.I like the idea, just have to prove (review) and test it a lot to ensure we don't end up skipping pages that matter. However, there are a couple of points: In my opinion, the information that's exchanged between the guest and the host should be exchanged over a virtio-serial channel rather than virtio-balloon. First, there's nothing related to the balloon here. It just happens to be memory info. Second, I would never enable balloon in a guest that I want to be performance-sensitive. So even if you add this as part of balloon, you'll find no one is using this solution. Secondly, I suggest virtio-serial, because it's meant exactly to exchange free-flowing information between a host and a guest, and you don't need to extend any part of the protocol for it (hence no changes necessary to the spec). You can see how spice, vnc, etc., use virtio-serial to exchange data. Amit
Li, Liang Z
2016-Mar-08  13:11 UTC
[RFC qemu 0/4] A PV solution for live migration optimization
> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization > > On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote: > > The current QEMU live migration implementation mark the all the > > guest's RAM pages as dirtied in the ram bulk stage, all these pages > > will be processed and that takes quit a lot of CPU cycles. > > > > From guest's point of view, it doesn't care about the content in free > > pages. We can make use of this fact and skip processing the free pages > > in the ram bulk stage, it can save a lot CPU cycles and reduce the > > network traffic significantly while speed up the live migration > > process obviously. > > > > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > > > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > I like the idea, just have to prove (review) and test it a lot to ensure we don't > end up skipping pages that matter. > > However, there are a couple of points: > > In my opinion, the information that's exchanged between the guest and the > host should be exchanged over a virtio-serial channel rather than virtio- > balloon. First, there's nothing related to the balloon here. > It just happens to be memory info. Second, I would never enable balloon in > a guest that I want to be performance-sensitive. So even if you add this as > part of balloon, you'll find no one is using this solution. > > Secondly, I suggest virtio-serial, because it's meant exactly to exchange free- > flowing information between a host and a guest, and you don't need to > extend any part of the protocol for it (hence no changes necessary to the > spec). You can see how spice, vnc, etc., use virtio-serial to exchange data. > > > AmitI don't like to use the virtio-balloon too, and it's confusing. It's grate if the virtio-serial can be used, I will take a look at it. Thanks for your suggestion! Liang
Li, Liang Z
2016-Mar-10  07:44 UTC
[RFC qemu 0/4] A PV solution for live migration optimization
> > This patch set is the QEMU side implementation. > > > > The virtio-balloon is extended so that QEMU can get the free pages > > information from the guest through virtio. > > > > After getting the free pages information (a bitmap), QEMU can use it > > to filter out the guest's free pages in the ram bulk stage. This make > > the live migration process much more efficient. > > > > This RFC version doesn't take the post-copy and RDMA into > > consideration, maybe both of them can benefit from this PV solution by > > with some extra modifications. > > I like the idea, just have to prove (review) and test it a lot to ensure we don't > end up skipping pages that matter. > > However, there are a couple of points: > > In my opinion, the information that's exchanged between the guest and the > host should be exchanged over a virtio-serial channel rather than virtio- > balloon. First, there's nothing related to the balloon here. > It just happens to be memory info. Second, I would never enable balloon in > a guest that I want to be performance-sensitive. So even if you add this as > part of balloon, you'll find no one is using this solution. > > Secondly, I suggest virtio-serial, because it's meant exactly to exchange free- > flowing information between a host and a guest, and you don't need to > extend any part of the protocol for it (hence no changes necessary to the > spec). You can see how spice, vnc, etc., use virtio-serial to exchange data. > > > AmitHi Amit, Could provide more information on how to use virtio-serial to exchange data? Thread , Wiki or code are all OK. I have not find some useful information yet. Thanks Liang
Apparently Analagous Threads
- [RFC qemu 0/4] A PV solution for live migration optimization
- [RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device
- [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage
- [RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device
- [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage