thr3ads.net - Linux Virtualization - [RFC qemu 0/4] A PV solution for live migration optimization [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Liang Li

2016-Mar-03 10:44 UTC

[RFC qemu 0/4] A PV solution for live migration optimization

The current QEMU live migration implementation mark the all the
guest's RAM pages as dirtied in the ram bulk stage, all these pages
will be processed and that takes quit a lot of CPU cycles.
>From guest's point of view, it doesn't care about the content in
freepages. We can make use of this fact and skip processing the free
pages in the ram bulk stage, it can save a lot CPU cycles and reduce
the network traffic significantly while speed up the live migration
process obviously.

This patch set is the QEMU side implementation.

The virtio-balloon is extended so that QEMU can get the free pages
information from the guest through virtio.

After getting the free pages information (a bitmap), QEMU can use it
to filter out the guest's free pages in the ram bulk stage. This make
the live migration process much more efficient.

This RFC version doesn't take the post-copy and RDMA into
consideration, maybe both of them can benefit from this PV solution
by with some extra modifications.

Performance data
===============
Test environment:

CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
Host RAM: 64GB
Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
Network:  X540-AT2 with 10 Gigabit connection
Guest RAM: 8GB

Case 1: Idle guest just boots:
===========================================                    | original  |   
pv
-------------------------------------------
total time(ms)      |    1894   |   421
--------------------------------------------
transferred ram(KB) |   398017  |  353242
===========================================

Case 2: The guest has ever run some memory consuming workload, the
workload is terminated just before live migration.
===========================================                    | original  |   
pv
-------------------------------------------
total time(ms)      |   7436    |   552
--------------------------------------------
transferred ram(KB) |  8146291  |  361375
===========================================
Liang Li (4):
  pc: Add code to get the lowmem form PCMachineState
  virtio-balloon: Add a new feature to balloon device
  migration: not set migration bitmap in setup stage
  migration: filter out guest's free pages in ram bulk stage

 balloon.c                                       | 30 ++++++++-
 hw/i386/pc.c                                    |  5 ++
 hw/i386/pc_piix.c                               |  1 +
 hw/i386/pc_q35.c                                |  1 +
 hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
 include/hw/i386/pc.h                            |  3 +-
 include/hw/virtio/virtio-balloon.h              | 17 +++++-
 include/standard-headers/linux/virtio_balloon.h |  1 +
 include/sysemu/balloon.h                        | 10 ++-
 migration/ram.c                                 | 64 +++++++++++++++----
 10 files changed, 195 insertions(+), 18 deletions(-)

-- 
1.8.3.1

Liang Li

2016-Mar-03 10:44 UTC

head link

[RFC qemu 1/4] pc: Add code to get the lowmem form PCMachineState

The lowmem will be used by the following patch to get
a correct free pages bitmap.

Signed-off-by: Liang Li <liang.z.li at intel.com>
---
 hw/i386/pc.c         | 5 +++++
 hw/i386/pc_piix.c    | 1 +
 hw/i386/pc_q35.c     | 1 +
 include/hw/i386/pc.h | 3 ++-
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 0aeefd2..f794a84 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1115,6 +1115,11 @@ void pc_hot_add_cpu(const int64_t id, Error **errp)
     object_unref(OBJECT(cpu));
 }
 
+ram_addr_t pc_get_lowmem(PCMachineState *pcms)
+{
+   return pcms->lowmem;
+}
+
 void pc_cpus_init(PCMachineState *pcms)
 {
     int i;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 6f8c2cd..268a08c 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -113,6 +113,7 @@ static void pc_init1(MachineState *machine,
         }
     }
 
+    pcms->lowmem = lowmem;
     if (machine->ram_size >= lowmem) {
         pcms->above_4g_mem_size = machine->ram_size - lowmem;
         pcms->below_4g_mem_size = lowmem;
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 46522c9..8d9bd39 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -101,6 +101,7 @@ static void pc_q35_init(MachineState *machine)
         }
     }
 
+    pcms->lowmem = lowmem;
     if (machine->ram_size >= lowmem) {
         pcms->above_4g_mem_size = machine->ram_size - lowmem;
         pcms->below_4g_mem_size = lowmem;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 8b3546e..3694c91 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -60,7 +60,7 @@ struct PCMachineState {
     bool nvdimm;
 
     /* RAM information (sizes, addresses, configuration): */
-    ram_addr_t below_4g_mem_size, above_4g_mem_size;
+    ram_addr_t below_4g_mem_size, above_4g_mem_size, lowmem;
 
     /* CPU and apic information: */
     bool apic_xrupt_override;
@@ -229,6 +229,7 @@ void pc_hot_add_cpu(const int64_t id, Error **errp);
 void pc_acpi_init(const char *default_dsdt);
 
 void pc_guest_info_init(PCMachineState *pcms);
+ram_addr_t pc_get_lowmem(PCMachineState *pcms);
 
 #define PCI_HOST_PROP_PCI_HOLE_START   "pci-hole-start"
 #define PCI_HOST_PROP_PCI_HOLE_END     "pci-hole-end"
-- 
1.8.3.1

Liang Li

2016-Mar-03 10:44 UTC

head link

[RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device

Extend the virtio balloon device to support a new feature, this
new feature can help to get guest's free pages information, which
can be used for live migration optimzation.

Signed-off-by: Liang Li <liang.z.li at intel.com>
---
 balloon.c                                       | 30 ++++++++-
 hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
 include/hw/virtio/virtio-balloon.h              | 17 +++++-
 include/standard-headers/linux/virtio_balloon.h |  1 +
 include/sysemu/balloon.h                        | 10 ++-
 5 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/balloon.c b/balloon.c
index f2ef50c..a37717e 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,7 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePages *balloon_free_pages_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -65,9 +66,12 @@ static bool have_balloon(Error **errp)
 }
 
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+                             QEMUBalloonStatus *stat_func,
+                             QEMUBalloonFreePages *free_pages_func,
+                             void *opaque)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_pages_fn
+        || balloon_opaque) {
         /* We're already registered one balloon handler.  How many can
          * a guest really have?
          */
@@ -75,6 +79,7 @@ int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
     }
     balloon_event_fn = event_func;
     balloon_stat_fn = stat_func;
+    balloon_free_pages_fn = free_pages_func;
     balloon_opaque = opaque;
     return 0;
 }
@@ -86,6 +91,7 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_pages_fn = NULL;
     balloon_opaque = NULL;
 }
 
@@ -116,3 +122,23 @@ void qmp_balloon(int64_t target, Error **errp)
     trace_balloon_event(balloon_opaque, target);
     balloon_event_fn(balloon_opaque, target);
 }
+
+bool balloon_free_pages_support(void)
+{
+    return balloon_free_pages_fn ? true : false;
+}
+
+int balloon_get_free_pages(unsigned long *free_pages_bitmap,
+                           unsigned long *free_pages_count)
+{
+    if (!balloon_free_pages_fn) {
+        return -1;
+    }
+
+    if (!free_pages_bitmap || !free_pages_count) {
+        return -1;
+    }
+
+    return balloon_free_pages_fn(balloon_opaque,
+                                 free_pages_bitmap, free_pages_count);
+ }
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e9c30e9..a5b9d08 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -76,6 +76,12 @@ static bool balloon_stats_supported(const VirtIOBalloon *s)
     return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ);
 }
 
+static bool balloon_free_pages_supported(const VirtIOBalloon *s)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_GET_FREE_PAGES);
+}
+
 static bool balloon_stats_enabled(const VirtIOBalloon *s)
 {
     return s->stats_poll_interval > 0;
@@ -293,6 +299,37 @@ out:
     }
 }
 
+static void virtio_balloon_get_free_pages(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+    VirtQueueElement *elem;
+    size_t offset = 0;
+    uint64_t bitmap_bytes = 0, free_pages_count = 0;
+
+    elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+    if (!elem) {
+        return;
+    }
+    s->free_pages_vq_elem = elem;
+
+    if (!elem->out_num) {
+        return;
+    }
+
+    iov_to_buf(elem->out_sg, elem->out_num, offset,
+               &free_pages_count, sizeof(uint64_t));
+
+    offset += sizeof(uint64_t);
+    iov_to_buf(elem->out_sg, elem->out_num, offset,
+               &bitmap_bytes, sizeof(uint64_t));
+
+    offset += sizeof(uint64_t);
+    iov_to_buf(elem->out_sg, elem->out_num, offset,
+               s->free_pages_bitmap, bitmap_bytes);
+    s->req_status = DONE;
+    s->free_pages_count = free_pages_count;
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -362,6 +399,7 @@ static uint64_t virtio_balloon_get_features(VirtIODevice
*vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+    virtio_add_feature(&f, VIRTIO_BALLOON_F_GET_FREE_PAGES);
     return f;
 }
 
@@ -372,6 +410,45 @@ static void virtio_balloon_stat(void *opaque, BalloonInfo
*info)
                                              VIRTIO_BALLOON_PFN_SHIFT);
 }
 
+static int virtio_balloon_free_pages(void *opaque,
+                                     unsigned long *free_pages_bitmap,
+                                     unsigned long *free_pages_count)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    VirtQueueElement *elem = s->free_pages_vq_elem;
+    int len;
+
+    if (!balloon_free_pages_supported(s)) {
+        return -1;
+    }
+
+    if (s->req_status == NOT_STARTED) {
+        s->free_pages_bitmap = free_pages_bitmap;
+        s->req_status = STARTED;
+        s->mem_layout.low_mem = pc_get_lowmem(PC_MACHINE(current_machine));
+        if (!elem->in_num) {
+            elem = virtqueue_pop(s->fvq, sizeof(VirtQueueElement));
+            if (!elem) {
+                return 0;
+            }
+            s->free_pages_vq_elem = elem;
+        }
+        len = iov_from_buf(elem->in_sg, elem->in_num, 0,
&s->mem_layout,
+                           sizeof(s->mem_layout));
+        virtqueue_push(s->fvq, elem, len);
+        virtio_notify(vdev, s->fvq);
+        return 0;
+    } else if (s->req_status == STARTED) {
+        return 0;
+    } else if (s->req_status == DONE) {
+        *free_pages_count = s->free_pages_count;
+        s->req_status = NOT_STARTED;
+    }
+
+    return 1;
+}
+
 static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
@@ -429,7 +506,8 @@ static void virtio_balloon_device_realize(DeviceState *dev,
Error **errp)
                 sizeof(struct virtio_balloon_config));
 
     ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
+                                   virtio_balloon_stat,
+                                   virtio_balloon_free_pages, s);
 
     if (ret < 0) {
         error_setg(errp, "Only one balloon device is supported");
@@ -440,6 +518,7 @@ static void virtio_balloon_device_realize(DeviceState *dev,
Error **errp)
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
+    s->fvq = virtio_add_queue(vdev, 128, virtio_balloon_get_free_pages);
 
     reset_stats(s);
 
diff --git a/include/hw/virtio/virtio-balloon.h
b/include/hw/virtio/virtio-balloon.h
index 35f62ac..fc173e4 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -23,6 +23,16 @@
 #define VIRTIO_BALLOON(obj) \
         OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
 
+typedef enum virtio_req_status {
+    NOT_STARTED,
+    STARTED,
+    DONE,
+} VIRTIO_REQ_STATUS;
+
+typedef struct MemLayout {
+    uint64_t low_mem;
+} MemLayout;
+
 typedef struct virtio_balloon_stat VirtIOBalloonStat;
 
 typedef struct virtio_balloon_stat_modern {
@@ -33,16 +43,21 @@ typedef struct virtio_balloon_stat_modern {
 
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *fvq;
     uint32_t num_pages;
     uint32_t actual;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
+    VirtQueueElement *free_pages_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
+    uint64_t *free_pages_bitmap;
+    uint64_t free_pages_count;
+    MemLayout mem_layout;
+    VIRTIO_REQ_STATUS req_status;
 } VirtIOBalloon;
 
 #endif
diff --git a/include/standard-headers/linux/virtio_balloon.h
b/include/standard-headers/linux/virtio_balloon.h
index 2e2a6dc..95b7d0c 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,6 +34,7 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_GET_FREE_PAGES 3 /* Get the free pages bitmap */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 3f976b4..205b272 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,19 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef int (QEMUBalloonFreePages)(void *opaque,
+                                   unsigned long *free_pages_bitmap,
+                                   unsigned long *free_pages_count);
 
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
+                             QEMUBalloonStatus *stat_func,
+                             QEMUBalloonFreePages *free_pages_func,
+                             void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_pages_support(void);
+int balloon_get_free_pages(unsigned long *free_pages_bitmap,
+                           unsigned long *free_pages_count);
 
 #endif
-- 
1.8.3.1

Liang Li

2016-Mar-03 10:44 UTC

head link

[RFC qemu 3/4] migration: not set migration bitmap in setup stage

Set ram_list.dirty_memory instead of migration bitmap, the migration
bitmap will be update when doing migration_bitmap_sync().
Set migration_dirty_pages to 0 and it will be updated by
migration_dirty_pages() too.

The following patch is based on this change.

Signed-off-by: Liang Li <liang.z.li at intel.com>
---
 migration/ram.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 704f6a9..ee2547d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1931,19 +1931,19 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_bitmap_pages = last_ram_offset() >> TARGET_PAGE_BITS;
     migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
     migration_bitmap_rcu->bmap = bitmap_new(ram_bitmap_pages);
-    bitmap_set(migration_bitmap_rcu->bmap, 0, ram_bitmap_pages);
 
     if (migrate_postcopy_ram()) {
         migration_bitmap_rcu->unsentmap = bitmap_new(ram_bitmap_pages);
         bitmap_set(migration_bitmap_rcu->unsentmap, 0, ram_bitmap_pages);
     }
 
-    /*
-     * Count the total number of pages used by ram blocks not including any
-     * gaps due to alignment or unplugs.
-     */
-    migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
+    migration_dirty_pages = 0;
 
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        cpu_physical_memory_set_dirty_range(block->offset,
+                                            block->used_length,
+                                            DIRTY_MEMORY_MIGRATION);
+    }
     memory_global_dirty_log_start();
     migration_bitmap_sync();
     qemu_mutex_unlock_ramlist();
-- 
1.8.3.1

Liang Li

2016-Mar-03 10:44 UTC

head link

[RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage

Get the free pages information through virtio and filter out the free
pages in the ram bulk stage. This can significantly reduce the total
live migration time as well as network traffic.

Signed-off-by: Liang Li <liang.z.li at intel.com>
---
 migration/ram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ee2547d..819553b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -40,6 +40,7 @@
 #include "trace.h"
 #include "exec/ram_addr.h"
 #include "qemu/rcu_queue.h"
+#include "sysemu/balloon.h"
 
 #ifdef DEBUG_MIGRATION_RAM
 #define DPRINTF(fmt, ...) \
@@ -241,6 +242,7 @@ static struct BitmapRcu {
     struct rcu_head rcu;
     /* Main migration bitmap */
     unsigned long *bmap;
+    unsigned long *free_pages_bmap;
     /* bitmap of pages that haven't been sent even once
      * only maintained and used in postcopy at the moment
      * where it's used to send the dirtymap at the start
@@ -561,12 +563,7 @@ ram_addr_t migration_bitmap_find_dirty(RAMBlock *rb,
     unsigned long next;
 
     bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
-    if (ram_bulk_stage && nr > base) {
-        next = nr + 1;
-    } else {
-        next = find_next_bit(bitmap, size, nr);
-    }
-
+    next = find_next_bit(bitmap, size, nr);
     *ram_addr_abs = next << TARGET_PAGE_BITS;
     return (next - base) << TARGET_PAGE_BITS;
 }
@@ -1415,6 +1412,9 @@ void free_xbzrle_decoded_buf(void)
 static void migration_bitmap_free(struct BitmapRcu *bmap)
 {
     g_free(bmap->bmap);
+    if (balloon_free_pages_support()) {
+        g_free(bmap->free_pages_bmap);
+    }
     g_free(bmap->unsentmap);
     g_free(bmap);
 }
@@ -1873,6 +1873,28 @@ err:
     return ret;
 }
 
+static void filter_out_guest_free_pages(unsigned long *free_pages_bmap)
+{
+    RAMBlock *block;
+    DirtyMemoryBlocks *blocks;
+    unsigned long end, page;
+
+    blocks =
atomic_rcu_read(&ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+    end = TARGET_PAGE_ALIGN(block->offset +
+                            block->used_length) >> TARGET_PAGE_BITS;
+    page = block->offset >> TARGET_PAGE_BITS;
+
+    while (page < end) {
+        unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
+        unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
+        unsigned long *p = free_pages_bmap + BIT_WORD(page);
+
+        slow_bitmap_complement(blocks->blocks[idx], p, num);
+        page += num;
+    }
+}
 
 /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -1884,6 +1906,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 {
     RAMBlock *block;
     int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
+    uint64_t free_pages_count = 0;
 
     dirty_rate_high_cnt = 0;
     bitmap_sync_count = 0;
@@ -1931,6 +1954,9 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ram_bitmap_pages = last_ram_offset() >> TARGET_PAGE_BITS;
     migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
     migration_bitmap_rcu->bmap = bitmap_new(ram_bitmap_pages);
+    if (balloon_free_pages_support()) {
+        migration_bitmap_rcu->free_pages_bmap =
bitmap_new(ram_bitmap_pages);
+    }
 
     if (migrate_postcopy_ram()) {
         migration_bitmap_rcu->unsentmap = bitmap_new(ram_bitmap_pages);
@@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
                                             DIRTY_MEMORY_MIGRATION);
     }
     memory_global_dirty_log_start();
+
+    if (balloon_free_pages_support() &&
+        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
+                               &free_pages_count) == 0) {
+        qemu_mutex_unlock_iothread();
+        while (balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
+                                      &free_pages_count) == 0) {
+            usleep(1000);
+        }
+        qemu_mutex_lock_iothread();
+
+        filter_out_guest_free_pages(migration_bitmap_rcu->free_pages_bmap);
+    }
+
     migration_bitmap_sync();
     qemu_mutex_unlock_ramlist();
     qemu_mutex_unlock_iothread();
-- 
1.8.3.1

Cornelia Huck

2016-Mar-03 12:16 UTC

head link

[RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage

On Thu,  3 Mar 2016 18:44:28 +0800
Liang Li <liang.z.li at intel.com> wrote:
> Get the free pages information through virtio and filter out the free
> pages in the ram bulk stage. This can significantly reduce the total
> live migration time as well as network traffic.
> 
> Signed-off-by: Liang Li <liang.z.li at intel.com>
> ---
>  migration/ram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 46 insertions(+), 6 deletions(-)
> 
> @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>                                              DIRTY_MEMORY_MIGRATION);
>      }
>      memory_global_dirty_log_start();
> +
> +    if (balloon_free_pages_support() &&
> +        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> +                               &free_pages_count) == 0) {
> +        qemu_mutex_unlock_iothread();
> +        while
(balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> +                                      &free_pages_count) == 0) {
> +            usleep(1000);
> +        }
> +        qemu_mutex_lock_iothread();
> +
> +       
filter_out_guest_free_pages(migration_bitmap_rcu->free_pages_bmap);
A general comment: Using the ballooner to get information about pages
that can be filtered out is too limited (there may be other ways to do
this; we might be able to use cmma on s390, for example), and I don't
like hardcoding to a specific method.

What about the reverse approach: Code may register a handler that
populates the free_pages_bitmap which is called during this stage?

<I like the idea of filtering in general, but I haven't looked at the
code yet>
> +    }
> +
>      migration_bitmap_sync();
>      qemu_mutex_unlock_ramlist();
>      qemu_mutex_unlock_iothread();

Cornelia Huck

2016-Mar-03 12:23 UTC

head link

[RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device

On Thu,  3 Mar 2016 18:44:26 +0800
Liang Li <liang.z.li at intel.com> wrote:
> Extend the virtio balloon device to support a new feature, this
> new feature can help to get guest's free pages information, which
> can be used for live migration optimzation.
Do you have a spec for this, e.g. as a patch to the virtio spec?
> 
> Signed-off-by: Liang Li <liang.z.li at intel.com>
> ---
>  balloon.c                                       | 30 ++++++++-
>  hw/virtio/virtio-balloon.c                      | 81
++++++++++++++++++++++++-
>  include/hw/virtio/virtio-balloon.h              | 17 +++++-
>  include/standard-headers/linux/virtio_balloon.h |  1 +
>  include/sysemu/balloon.h                        | 10 ++-
>  5 files changed, 134 insertions(+), 5 deletions(-)
> +static int virtio_balloon_free_pages(void *opaque,
> +                                     unsigned long *free_pages_bitmap,
> +                                     unsigned long *free_pages_count)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +    VirtQueueElement *elem = s->free_pages_vq_elem;
> +    int len;
> +
> +    if (!balloon_free_pages_supported(s)) {
> +        return -1;
> +    }
> +
> +    if (s->req_status == NOT_STARTED) {
> +        s->free_pages_bitmap = free_pages_bitmap;
> +        s->req_status = STARTED;
> +        s->mem_layout.low_mem =
pc_get_lowmem(PC_MACHINE(current_machine));
Please don't leak pc-specific information into generic code.

Daniel P. Berrange

2016-Mar-03 12:45 UTC

head link

[Qemu-devel] [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage

On Thu, Mar 03, 2016 at 06:44:28PM +0800, Liang Li
wrote:> Get the free pages information through virtio and filter out the free
> pages in the ram bulk stage. This can significantly reduce the total
> live migration time as well as network traffic.
> 
> Signed-off-by: Liang Li <liang.z.li at intel.com>
> ---
>  migration/ram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 46 insertions(+), 6 deletions(-)
> @@ -1945,6 +1971,20 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>                                              DIRTY_MEMORY_MIGRATION);
>      }
>      memory_global_dirty_log_start();
> +
> +    if (balloon_free_pages_support() &&
> +        balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> +                               &free_pages_count) == 0) {
> +        qemu_mutex_unlock_iothread();
> +        while
(balloon_get_free_pages(migration_bitmap_rcu->free_pages_bmap,
> +                                      &free_pages_count) == 0) {
> +            usleep(1000);
> +        }
> +        qemu_mutex_lock_iothread();
> +
> +       
filter_out_guest_free_pages(migration_bitmap_rcu->free_pages_bmap);
> +    }
IIUC, this code is synchronous wrt to the guest OS balloon drive. ie it
is asking the geust for free pages and waiting for a response. If the
guest OS has crashed this is going to mean QEMU waits forever and thus
migration won't complete. Similarly you need to consider that the guest
OS may be malicious and simply never respond.

So if the migration code is going to use the guest balloon driver to get
info about free pages it has to be done in an asynchronous manner so that
migration can never be stalled by a slow/crashed/malicious guest driver.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

Michael S. Tsirkin

2016-Mar-03 12:56 UTC

head link

[RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device

On Thu, Mar 03, 2016 at 06:44:26PM +0800, Liang Li
wrote:> Extend the virtio balloon device to support a new feature, this
> new feature can help to get guest's free pages information, which
> can be used for live migration optimzation.
> 
> Signed-off-by: Liang Li <liang.z.li at intel.com>
I don't understand why we need a new interface.
Balloon already sends free pages to host.
Just teach host to skip these pages.

Maybe instead of starting with code, you
should send a high level description to the
virtio tc for consideration?

You can do it through the mailing list or
using the web form:
http://www.oasis-open.org/committees/comments/form.php?wg_abbrev=virtio

> ---
>  balloon.c                                       | 30 ++++++++-
>  hw/virtio/virtio-balloon.c                      | 81
++++++++++++++++++++++++-
>  include/hw/virtio/virtio-balloon.h              | 17 +++++-
>  include/standard-headers/linux/virtio_balloon.h |  1 +
>  include/sysemu/balloon.h                        | 10 ++-
>  5 files changed, 134 insertions(+), 5 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index f2ef50c..a37717e 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -36,6 +36,7 @@
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonFreePages *balloon_free_pages_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
>  
> @@ -65,9 +66,12 @@ static bool have_balloon(Error **errp)
>  }
>  
>  int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -                             QEMUBalloonStatus *stat_func, void *opaque)
> +                             QEMUBalloonStatus *stat_func,
> +                             QEMUBalloonFreePages *free_pages_func,
> +                             void *opaque)
>  {
> -    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> +    if (balloon_event_fn || balloon_stat_fn || balloon_free_pages_fn
> +        || balloon_opaque) {
>          /* We're already registered one balloon handler.  How many can
>           * a guest really have?
>           */
> @@ -75,6 +79,7 @@ int qemu_add_balloon_handler(QEMUBalloonEvent
*event_func,
>      }
>      balloon_event_fn = event_func;
>      balloon_stat_fn = stat_func;
> +    balloon_free_pages_fn = free_pages_func;
>      balloon_opaque = opaque;
>      return 0;
>  }
> @@ -86,6 +91,7 @@ void qemu_remove_balloon_handler(void *opaque)
>      }
>      balloon_event_fn = NULL;
>      balloon_stat_fn = NULL;
> +    balloon_free_pages_fn = NULL;
>      balloon_opaque = NULL;
>  }
>  
> @@ -116,3 +122,23 @@ void qmp_balloon(int64_t target, Error **errp)
>      trace_balloon_event(balloon_opaque, target);
>      balloon_event_fn(balloon_opaque, target);
>  }
> +
> +bool balloon_free_pages_support(void)
> +{
> +    return balloon_free_pages_fn ? true : false;
> +}
> +
> +int balloon_get_free_pages(unsigned long *free_pages_bitmap,
> +                           unsigned long *free_pages_count)
> +{
> +    if (!balloon_free_pages_fn) {
> +        return -1;
> +    }
> +
> +    if (!free_pages_bitmap || !free_pages_count) {
> +        return -1;
> +    }
> +
> +    return balloon_free_pages_fn(balloon_opaque,
> +                                 free_pages_bitmap, free_pages_count);
> + }
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index e9c30e9..a5b9d08 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -76,6 +76,12 @@ static bool balloon_stats_supported(const VirtIOBalloon
*s)
>      return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ);
>  }
>  
> +static bool balloon_free_pages_supported(const VirtIOBalloon *s)
> +{
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_GET_FREE_PAGES);
> +}
> +
>  static bool balloon_stats_enabled(const VirtIOBalloon *s)
>  {
>      return s->stats_poll_interval > 0;
> @@ -293,6 +299,37 @@ out:
>      }
>  }
>  
> +static void virtio_balloon_get_free_pages(VirtIODevice *vdev, VirtQueue
*vq)
> +{
> +    VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> +    VirtQueueElement *elem;
> +    size_t offset = 0;
> +    uint64_t bitmap_bytes = 0, free_pages_count = 0;
> +
> +    elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +    if (!elem) {
> +        return;
> +    }
> +    s->free_pages_vq_elem = elem;
> +
> +    if (!elem->out_num) {
> +        return;
> +    }
> +
> +    iov_to_buf(elem->out_sg, elem->out_num, offset,
> +               &free_pages_count, sizeof(uint64_t));
> +
> +    offset += sizeof(uint64_t);
> +    iov_to_buf(elem->out_sg, elem->out_num, offset,
> +               &bitmap_bytes, sizeof(uint64_t));
> +
> +    offset += sizeof(uint64_t);
> +    iov_to_buf(elem->out_sg, elem->out_num, offset,
> +               s->free_pages_bitmap, bitmap_bytes);
> +    s->req_status = DONE;
> +    s->free_pages_count = free_pages_count;
> +}
> +
>  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t
*config_data)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -362,6 +399,7 @@ static uint64_t
virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +    virtio_add_feature(&f, VIRTIO_BALLOON_F_GET_FREE_PAGES);
>      return f;
>  }
>  
> @@ -372,6 +410,45 @@ static void virtio_balloon_stat(void *opaque,
BalloonInfo *info)
>                                               VIRTIO_BALLOON_PFN_SHIFT);
>  }
>  
> +static int virtio_balloon_free_pages(void *opaque,
> +                                     unsigned long *free_pages_bitmap,
> +                                     unsigned long *free_pages_count)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +    VirtQueueElement *elem = s->free_pages_vq_elem;
> +    int len;
> +
> +    if (!balloon_free_pages_supported(s)) {
> +        return -1;
> +    }
> +
> +    if (s->req_status == NOT_STARTED) {
> +        s->free_pages_bitmap = free_pages_bitmap;
> +        s->req_status = STARTED;
> +        s->mem_layout.low_mem =
pc_get_lowmem(PC_MACHINE(current_machine));
> +        if (!elem->in_num) {
> +            elem = virtqueue_pop(s->fvq, sizeof(VirtQueueElement));
> +            if (!elem) {
> +                return 0;
> +            }
> +            s->free_pages_vq_elem = elem;
> +        }
> +        len = iov_from_buf(elem->in_sg, elem->in_num, 0,
&s->mem_layout,
> +                           sizeof(s->mem_layout));
> +        virtqueue_push(s->fvq, elem, len);
> +        virtio_notify(vdev, s->fvq);
> +        return 0;
> +    } else if (s->req_status == STARTED) {
> +        return 0;
> +    } else if (s->req_status == DONE) {
> +        *free_pages_count = s->free_pages_count;
> +        s->req_status = NOT_STARTED;
> +    }
> +
> +    return 1;
> +}
> +
>  static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
> @@ -429,7 +506,8 @@ static void virtio_balloon_device_realize(DeviceState
*dev, Error **errp)
>                  sizeof(struct virtio_balloon_config));
>  
>      ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> +                                   virtio_balloon_stat,
> +                                   virtio_balloon_free_pages, s);
>  
>      if (ret < 0) {
>          error_setg(errp, "Only one balloon device is
supported");
> @@ -440,6 +518,7 @@ static void virtio_balloon_device_realize(DeviceState
*dev, Error **errp)
>      s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> +    s->fvq = virtio_add_queue(vdev, 128,
virtio_balloon_get_free_pages);
>  
>      reset_stats(s);
>  
> diff --git a/include/hw/virtio/virtio-balloon.h
b/include/hw/virtio/virtio-balloon.h
> index 35f62ac..fc173e4 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -23,6 +23,16 @@
>  #define VIRTIO_BALLOON(obj) \
>          OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
>  
> +typedef enum virtio_req_status {
> +    NOT_STARTED,
> +    STARTED,
> +    DONE,
> +} VIRTIO_REQ_STATUS;
> +
> +typedef struct MemLayout {
> +    uint64_t low_mem;
> +} MemLayout;
> +
>  typedef struct virtio_balloon_stat VirtIOBalloonStat;
>  
>  typedef struct virtio_balloon_stat_modern {
> @@ -33,16 +43,21 @@ typedef struct virtio_balloon_stat_modern {
>  
>  typedef struct VirtIOBalloon {
>      VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq;
> +    VirtQueue *ivq, *dvq, *svq, *fvq;
>      uint32_t num_pages;
>      uint32_t actual;
>      uint64_t stats[VIRTIO_BALLOON_S_NR];
>      VirtQueueElement *stats_vq_elem;
> +    VirtQueueElement *free_pages_vq_elem;
>      size_t stats_vq_offset;
>      QEMUTimer *stats_timer;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> +    uint64_t *free_pages_bitmap;
> +    uint64_t free_pages_count;
> +    MemLayout mem_layout;
> +    VIRTIO_REQ_STATUS req_status;
>  } VirtIOBalloon;
>  
>  #endif
> diff --git a/include/standard-headers/linux/virtio_balloon.h
b/include/standard-headers/linux/virtio_balloon.h
> index 2e2a6dc..95b7d0c 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -34,6 +34,7 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages
*/
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_GET_FREE_PAGES 3 /* Get the free pages bitmap */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index 3f976b4..205b272 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -18,11 +18,19 @@
>  
>  typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
>  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef int (QEMUBalloonFreePages)(void *opaque,
> +                                   unsigned long *free_pages_bitmap,
> +                                   unsigned long *free_pages_count);
>  
>  int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -			     QEMUBalloonStatus *stat_func, void *opaque);
> +                             QEMUBalloonStatus *stat_func,
> +                             QEMUBalloonFreePages *free_pages_func,
> +                             void *opaque);
>  void qemu_remove_balloon_handler(void *opaque);
>  bool qemu_balloon_is_inhibited(void);
>  void qemu_balloon_inhibit(bool state);
> +bool balloon_free_pages_support(void);
> +int balloon_get_free_pages(unsigned long *free_pages_bitmap,
> +                           unsigned long *free_pages_count);
>  
>  #endif
> -- 
> 1.8.3.1

Roman Kagan

2016-Mar-03 13:58 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

On Thu, Mar 03, 2016 at 06:44:24PM +0800, Liang Li
wrote:> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in
free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.
> 
> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.
> 
> Performance data
> ===============> 
> Test environment:
> 
> CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
> Host RAM: 64GB
> Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> Network:  X540-AT2 with 10 Gigabit connection
> Guest RAM: 8GB
> 
> Case 1: Idle guest just boots:
> ===========================================>                     |
original  |    pv
> -------------------------------------------
> total time(ms)      |    1894   |   421
> --------------------------------------------
> transferred ram(KB) |   398017  |  353242
> ===========================================> 
> 
> Case 2: The guest has ever run some memory consuming workload, the
> workload is terminated just before live migration.
> ===========================================>                     |
original  |    pv
> -------------------------------------------
> total time(ms)      |   7436    |   552
> --------------------------------------------
> transferred ram(KB) |  8146291  |  361375
> ===========================================Both cases look very artificial to me.  Normally you migrate VMs which
have started long ago and which can't have their services terminated
before the migration, so I wouldn't expect any useful amount of free
pages obtained this way.

OTOH I don't see why you can't just inflate the balloon before the
migration, and really optimize the amount of transferred data this way?
With the recently proposed VIRTIO_BALLOON_S_AVAIL you can have a fairly
good estimate of the optimal balloon size, and with the recently merged
balloon deflation on OOM it's a safe thing to do without exposing the
guest workloads to OOM risks.

Roman.

Dr. David Alan Gilbert

2016-Mar-03 17:46 UTC

head link

[RFC qemu 0/4] A PV solution for live migration optimization

* Liang Li (liang.z.li at intel.com) wrote:> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in
free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.
Hi,
  An interesting solution; I know a few different people have been looking
at how to speed up ballooned VM migration.

  I wonder if it would be possible to avoid the kernel changes by
parsing /proc/self/pagemap - if that can be used to detect unmapped/zero
mapped pages in the guest ram, would it achieve the same result?
> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.
For postcopy to be safe, you would still need to send a message to the
destination telling it that there were zero pages, otherwise the destination
can't tell if it's supposed to request the page from the source or
treat the page as zero.

Dave
> 
> Performance data
> ===============> 
> Test environment:
> 
> CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
> Host RAM: 64GB
> Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> Network:  X540-AT2 with 10 Gigabit connection
> Guest RAM: 8GB
> 
> Case 1: Idle guest just boots:
> ===========================================>                     |
original  |    pv
> -------------------------------------------
> total time(ms)      |    1894   |   421
> --------------------------------------------
> transferred ram(KB) |   398017  |  353242
> ===========================================> 
> 
> Case 2: The guest has ever run some memory consuming workload, the
> workload is terminated just before live migration.
> ===========================================>                     |
original  |    pv
> -------------------------------------------
> total time(ms)      |   7436    |   552
> --------------------------------------------
> transferred ram(KB) |  8146291  |  361375
> ===========================================> 
> Liang Li (4):
>   pc: Add code to get the lowmem form PCMachineState
>   virtio-balloon: Add a new feature to balloon device
>   migration: not set migration bitmap in setup stage
>   migration: filter out guest's free pages in ram bulk stage
> 
>  balloon.c                                       | 30 ++++++++-
>  hw/i386/pc.c                                    |  5 ++
>  hw/i386/pc_piix.c                               |  1 +
>  hw/i386/pc_q35.c                                |  1 +
>  hw/virtio/virtio-balloon.c                      | 81
++++++++++++++++++++++++-
>  include/hw/i386/pc.h                            |  3 +-
>  include/hw/virtio/virtio-balloon.h              | 17 +++++-
>  include/standard-headers/linux/virtio_balloon.h |  1 +
>  include/sysemu/balloon.h                        | 10 ++-
>  migration/ram.c                                 | 64 +++++++++++++++----
>  10 files changed, 195 insertions(+), 18 deletions(-)
> 
> -- 
> 1.8.3.1
> --
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK

Li, Liang Z

2016-Mar-04 01:35 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

> On Thu, Mar 03, 2016 at 06:44:24PM +0800, Liang Li wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these
pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content
in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This
make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> >
> > Performance data
> > ===============> >
> > Test environment:
> >
> > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> > Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> >
> > Case 1: Idle guest just boots:
> > ===========================================> >                  
| original  |    pv
> > -------------------------------------------
> > total time(ms)      |    1894   |   421
> > --------------------------------------------
> > transferred ram(KB) |   398017  |  353242
> > ===========================================> >
> >
> > Case 2: The guest has ever run some memory consuming workload, the
> > workload is terminated just before live migration.
> > ===========================================> >                  
| original  |    pv
> > -------------------------------------------
> > total time(ms)      |   7436    |   552
> > --------------------------------------------
> > transferred ram(KB) |  8146291  |  361375
> > ===========================================> 
> Both cases look very artificial to me.  Normally you migrate VMs which have
> started long ago and which can't have their services terminated before
the
> migration, so I wouldn't expect any useful amount of free pages
obtained
> this way.
> 
Yes, it's somewhat artificial, just to emphasize the effect.  And I think
these two
cases are very easy to reproduce. Using the real workload and do the test
in production environment will be more convince.

We can predict that as long as the guest doesn't use out of its memory, this
solution
may still take affect and shorten the total live migration time. (Off cause, we
should
consider the time cost of the virtio communication.)
> OTOH I don't see why you can't just inflate the balloon before the
migration,
> and really optimize the amount of transferred data this way?
> With the recently proposed VIRTIO_BALLOON_S_AVAIL you can have a fairly
> good estimate of the optimal balloon size, and with the recently merged
> balloon deflation on OOM it's a safe thing to do without exposing the
guest
> workloads to OOM risks.
> 
> Roman.
Thanks for your information.  The size of the free page bitmap is not very
large, for a
guest with 8GB RAM, only 256KB  extra memory is required.
Comparing to this solution, inflate the balloon is more expensive. If the
balloon size
is not so optimal and guest request more memory during live migration, the
guest's
performance will be impacted.

Liang

Li, Liang Z

2016-Mar-04 01:52 UTC

head link

[RFC qemu 0/4] A PV solution for live migration optimization

> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> * Liang Li (liang.z.li at intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these
pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content
in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This
make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking
at
> how to speed up ballooned VM migration.
> 
Ooh, different solutions for the same purpose, and both based on the balloon.
>   I wonder if it would be possible to avoid the kernel changes by parsing
> /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> pages in the guest ram, would it achieve the same result?
> 
Only detect the unmapped/zero mapped pages is not enough. Consider the 
situation like case 2, it can't achieve the same result.
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> For postcopy to be safe, you would still need to send a message to the
> destination telling it that there were zero pages, otherwise the
destination
> can't tell if it's supposed to request the page from the source or
treat the
> page as zero.
> 
> Dave
I will consider this later, thanks, Dave.

Liang
> 
> >
> > Performance data
> > ===============> >
> > Test environment:
> >
> > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> > Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> >
> > Case 1: Idle guest just boots:
> > ===========================================> >                  
| original  |    pv
> > -------------------------------------------
> > total time(ms)      |    1894   |   421
> > --------------------------------------------
> > transferred ram(KB) |   398017  |  353242
> > ===========================================> >
> >
> > Case 2: The guest has ever run some memory consuming workload, the
> > workload is terminated just before live migration.
> > ===========================================> >                  
| original  |    pv
> > -------------------------------------------
> > total time(ms)      |   7436    |   552
> > --------------------------------------------
> > transferred ram(KB) |  8146291  |  361375
> > ===========================================> >

Roman Kagan

2016-Mar-04 07:55 UTC

head link

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

On Thu, Mar 03, 2016 at 05:46:15PM +0000, Dr. David Alan Gilbert
wrote:> * Liang Li (liang.z.li at intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these
pages
> > will be processed and that takes quit a lot of CPU cycles.
> > 
> > From guest's point of view, it doesn't care about the content
in free
> > pages. We can make use of this fact and skip processing the free
> > pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> > the network traffic significantly while speed up the live migration
> > process obviously.
> > 
> > This patch set is the QEMU side implementation.
> > 
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> > 
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This
make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking
> at how to speed up ballooned VM migration.
> 
>   I wonder if it would be possible to avoid the kernel changes by
> parsing /proc/self/pagemap - if that can be used to detect unmapped/zero
> mapped pages in the guest ram, would it achieve the same result?
Yes I was about to suggest the same thing: it's simple and makes use of
the existing infrastructure.  And you wouldn't need to care if the pages
were unmapped by ballooning or anything else (alternative balloon
implementations, not yet touched by the guest, etc.).  Besides, you
wouldn't need to synchronize with the guest.

Roman.

Amit Shah

2016-Mar-08 11:13 UTC

head link

[RFC qemu 0/4] A PV solution for live migration optimization

On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in
free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.
> 
> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.
I like the idea, just have to prove (review) and test it a lot to
ensure we don't end up skipping pages that matter.

However, there are a couple of points:

In my opinion, the information that's exchanged between the guest and
the host should be exchanged over a virtio-serial channel rather than
virtio-balloon.  First, there's nothing related to the balloon here.
It just happens to be memory info.  Second, I would never enable
balloon in a guest that I want to be performance-sensitive.  So even
if you add this as part of balloon, you'll find no one is using this
solution.

Secondly, I suggest virtio-serial, because it's meant exactly to
exchange free-flowing information between a host and a guest, and you
don't need to extend any part of the protocol for it (hence no changes
necessary to the spec).  You can see how spice, vnc, etc., use
virtio-serial to exchange data.

		Amit

Li, Liang Z

2016-Mar-08 13:11 UTC

head link

[RFC qemu 0/4] A PV solution for live migration optimization

> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these
pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content
in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This
make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we
don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and
the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this
as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to
exchange free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange
data.
> 
> 
> 		Amit
I don't like to use the virtio-balloon too, and it's confusing. 
It's grate if the virtio-serial can be used, I will take a look at it. 

Thanks for your suggestion!

Liang

Li, Liang Z

2016-Mar-10 07:44 UTC

head link

[RFC qemu 0/4] A PV solution for live migration optimization

> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This
make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we
don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and
the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this
as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to
exchange free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange
data.
> 
> 
> 		Amit
Hi Amit,

 Could provide more information on how to use virtio-serial to exchange data? 
Thread , Wiki or code are all OK.
 I have not find some useful information yet.

Thanks
Liang

Apparently Analagous Threads

Search for more seemingly similar threads

Linux Virtualization - Mar 2016 - [RFC qemu 0/4] A PV solution for live migration optimization

[RFC qemu 0/4] A PV solution for live migration optimization

[RFC qemu 1/4] pc: Add code to get the lowmem form PCMachineState

[RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device

[RFC qemu 3/4] migration: not set migration bitmap in setup stage

[RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage

[RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage

[RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device

[Qemu-devel] [RFC qemu 4/4] migration: filter out guest's free pages in ram bulk stage

[RFC qemu 2/4] virtio-balloon: Add a new feature to balloon device

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[RFC qemu 0/4] A PV solution for live migration optimization

[Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

[RFC qemu 0/4] A PV solution for live migration optimization

[RFC qemu 0/4] A PV solution for live migration optimization

[RFC qemu 0/4] A PV solution for live migration optimization

Apparently Analagous Threads