Liang Li
2016-Jun-29 10:32 UTC
[PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
This patch set contains two parts of changes to the virtio-balloon. One is the change for speeding up the inflating & deflating process, the main idea of this optimization is to use bitmap to send the page information to host instead of the PFNs, to reduce the overhead of virtio data transmission, address translation and madvise(). This can help to improve the performance by about 85%. Another change is for speeding up live migration. By skipping process guest's free pages in the first round of data copy, to reduce needless data processing, this can help to save quite a lot of CPU cycles and network bandwidth. We put guest's free page information in bitmap and send it to host with the virt queue of virtio-balloon. For an idle 8GB guest, this can help to shorten the total live migration time from 2Sec to about 500ms in the 10Gbps network environment. Changes from v1 to v2: * Abandon the patch for dropping page cache. * Put some structures to uapi head file. * Use a new way to determine the page bitmap size. * Use a unified way to send the free page information with the bitmap * Address the issues referred in MST's comments Liang Li (7): virtio-balloon: rework deflate to add page to a list virtio-balloon: define new feature bit and page bitmap head mm: add a function to get the max pfn virtio-balloon: speed up inflate/deflate process virtio-balloon: define feature bit and head for misc virt queue mm: add the related functions to get free page info virtio-balloon: tell host vm's free page info drivers/virtio/virtio_balloon.c | 306 +++++++++++++++++++++++++++++++----- include/uapi/linux/virtio_balloon.h | 41 +++++ mm/page_alloc.c | 52 ++++++ 3 files changed, 359 insertions(+), 40 deletions(-) -- 1.8.3.1
Liang Li
2016-Jun-29 10:32 UTC
[PATCH v2 kernel 1/7] virtio-balloon: rework deflate to add page to a list
will allow faster notifications using a bitmap down the road. balloon_pfn_to_page() can be removed because it's useless. Signed-off-by: Liang Li <liang.z.li at intel.com> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Cornelia Huck <cornelia.huck at de.ibm.com> Cc: Amit Shah <amit.shah at redhat.com> --- drivers/virtio/virtio_balloon.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 476c0e3..8d649a2 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -98,12 +98,6 @@ static u32 page_to_balloon_pfn(struct page *page) return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE; } -static struct page *balloon_pfn_to_page(u32 pfn) -{ - BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE); - return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE); -} - static void balloon_ack(struct virtqueue *vq) { struct virtio_balloon *vb = vq->vdev->priv; @@ -176,18 +170,16 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) return num_allocated_pages; } -static void release_pages_balloon(struct virtio_balloon *vb) +static void release_pages_balloon(struct virtio_balloon *vb, + struct list_head *pages) { - unsigned int i; - struct page *page; + struct page *page, *next; - /* Find pfns pointing at start of each page, get pages and free them. */ - for (i = 0; i < vb->num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) { - page = balloon_pfn_to_page(virtio32_to_cpu(vb->vdev, - vb->pfns[i])); + list_for_each_entry_safe(page, next, pages, lru) { if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) adjust_managed_page_count(page, 1); + list_del(&page->lru); put_page(page); /* balloon reference */ } } @@ -197,6 +189,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) unsigned num_freed_pages; struct page *page; struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; + LIST_HEAD(pages); /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb->pfns)); @@ -208,6 +201,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) if (!page) break; set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + list_add(&page->lru, &pages); vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -219,7 +213,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) */ if (vb->num_pfns != 0) tell_host(vb, vb->deflate_vq); - release_pages_balloon(vb); + release_pages_balloon(vb, &pages); mutex_unlock(&vb->balloon_lock); return num_freed_pages; } -- 1.8.3.1
Liang Li
2016-Jun-29 10:32 UTC
[PATCH v2 kernel 2/7] virtio-balloon: define new feature bit and page bitmap head
Add a new feature which supports sending the page information with a bitmap. The current implementation uses PFNs array, which is not very efficient. Using bitmap can improve the performance of inflating/deflating significantly The page bitmap header will used to tell the host some information about the page bitmap. e.g. the page size, page bitmap length and start pfn. Signed-off-by: Liang Li <liang.z.li at intel.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Cornelia Huck <cornelia.huck at de.ibm.com> Cc: Amit Shah <amit.shah at redhat.com> --- include/uapi/linux/virtio_balloon.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index 343d7dd..d3b182a 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -34,6 +34,7 @@ #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ +#define VIRTIO_BALLOON_F_PAGE_BITMAP 3 /* Send page info with bitmap */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -82,4 +83,22 @@ struct virtio_balloon_stat { __virtio64 val; } __attribute__((packed)); +/* Page bitmap header structure */ +struct balloon_bmap_hdr { + /* Used to distinguish different request */ + __virtio16 cmd; + /* Shift width of page in the bitmap */ + __virtio16 page_shift; + /* flag used to identify different status */ + __virtio16 flag; + /* Reserved */ + __virtio16 reserved; + /* ID of the request */ + __virtio64 req_id; + /* The pfn of 0 bit in the bitmap */ + __virtio64 start_pfn; + /* The length of the bitmap, in bytes */ + __virtio64 bmap_len; +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ -- 1.8.3.1
Liang Li
2016-Jun-29 10:32 UTC
[PATCH v2 kernel 3/7] mm: add a function to get the max pfn
Expose the function to get the max pfn, so it can be used in the virtio-balloon device driver. Signed-off-by: Liang Li <liang.z.li at intel.com> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Mel Gorman <mgorman at techsingularity.net> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Cornelia Huck <cornelia.huck at de.ibm.com> Cc: Amit Shah <amit.shah at redhat.com> --- mm/page_alloc.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6903b69..2083b40 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4515,6 +4515,12 @@ void show_free_areas(unsigned int filter) show_swap_cache_info(); } +unsigned long get_max_pfn(void) +{ + return max_pfn; +} +EXPORT_SYMBOL(get_max_pfn); + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone = zone; -- 1.8.3.1
Liang Li
2016-Jun-29 10:32 UTC
[PATCH v2 kernel 4/7] virtio-balloon: speed up inflate/deflate process
The implementation of the current virtio-balloon is not very efficient, the time spends on different stages of inflating the balloon to 7GB of a 8GB idle guest: a. allocating pages (6.5%) b. sending PFNs to host (68.3%) c. address translation (6.1%) d. madvise (19%) It takes about 4126ms for the inflating process to complete. Debugging shows that the bottle neck are the stage b and stage d. If using a bitmap to send the page info instead of the PFNs, we can reduce the overhead in stage b quite a lot. Furthermore, we can do the address translation and call madvise() with a bulk of RAM pages, instead of the current page per page way, the overhead of stage c and stage d can also be reduced a lot. This patch is the kernel side implementation which is intended to speed up the inflating & deflating process by adding a new feature to the virtio-balloon device. With this new feature, inflating the balloon to 7GB of a 8GB idle guest only takes 590ms, the performance improvement is about 85%. TODO: optimize stage a by allocating/freeing a chunk of pages instead of a single page at a time. Signed-off-by: Liang Li <liang.z.li at intel.com> Suggested-by: Michael S. Tsirkin <mst at redhat.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Cornelia Huck <cornelia.huck at de.ibm.com> Cc: Amit Shah <amit.shah at redhat.com> --- drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++----- 1 file changed, 162 insertions(+), 22 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 8d649a2..2d18ff6 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -41,10 +41,28 @@ #define OOM_VBALLOON_DEFAULT_PAGES 256 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 +/* + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap + * to prevent a very large page bitmap, there are two reasons for this: + * 1) to save memory. + * 2) allocate a large bitmap may fail. + * + * The actual limit of pfn is determined by: + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT); + * + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan + * the page list and send the PFNs with several times. To reduce the + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should + * be set with a value which can cover most cases. + */ +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */ + static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; module_param(oom_pages, int, S_IRUSR | S_IWUSR); MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); +extern unsigned long get_max_pfn(void); + struct virtio_balloon { struct virtio_device *vdev; struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; @@ -62,6 +80,15 @@ struct virtio_balloon { /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; + /* Pointer of the bitmap header. */ + void *bmap_hdr; + /* Bitmap and length used to tell the host the pages */ + unsigned long *page_bitmap; + unsigned long bmap_len; + /* Pfn limit */ + unsigned long pfn_limit; + /* Used to record the processed pfn range */ + unsigned long min_pfn, max_pfn, start_pfn, end_pfn; /* * The pages we've told the Host we're not using are enqueued * at vb_dev_info->pages list. @@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq) wake_up(&vb->acked); } +static inline void init_pfn_range(struct virtio_balloon *vb) +{ + vb->min_pfn = ULONG_MAX; + vb->max_pfn = 0; +} + +static inline void update_pfn_range(struct virtio_balloon *vb, + struct page *page) +{ + unsigned long balloon_pfn = page_to_balloon_pfn(page); + + if (balloon_pfn < vb->min_pfn) + vb->min_pfn = balloon_pfn; + if (balloon_pfn > vb->max_pfn) + vb->max_pfn = balloon_pfn; +} + static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) { struct scatterlist sg; unsigned int len; - sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns); + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) { + struct balloon_bmap_hdr *hdr = vb->bmap_hdr; + unsigned long bmap_len; + + /* cmd and req_id are not used here, set them to 0 */ + hdr->cmd = cpu_to_virtio16(vb->vdev, 0); + hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT); + hdr->reserved = cpu_to_virtio16(vb->vdev, 0); + hdr->req_id = cpu_to_virtio64(vb->vdev, 0); + hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn); + bmap_len = min(vb->bmap_len, + (vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE); + hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len); + sg_init_one(&sg, hdr, + sizeof(struct balloon_bmap_hdr) + bmap_len); + } else + sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns); /* We should always be able to add one buffer to an empty queue. */ virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL); @@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) /* When host has read buffer, this completes via balloon_ack */ wait_event(vb->acked, virtqueue_get_buf(vq, &len)); - } static void set_page_pfns(struct virtio_balloon *vb, @@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb, page_to_balloon_pfn(page) + i); } -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) +static void set_page_bitmap(struct virtio_balloon *vb, + struct list_head *pages, struct virtqueue *vq) +{ + unsigned long pfn; + struct page *page; + bool found; + + vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG); + vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG); + for (pfn = vb->min_pfn; pfn < vb->max_pfn; + pfn += vb->pfn_limit) { + vb->start_pfn = pfn + vb->pfn_limit; + vb->end_pfn = pfn; + memset(vb->page_bitmap, 0, vb->bmap_len); + found = false; + list_for_each_entry(page, pages, lru) { + unsigned long balloon_pfn = page_to_balloon_pfn(page); + + if (balloon_pfn < pfn || + balloon_pfn >= pfn + vb->pfn_limit) + continue; + set_bit(balloon_pfn - pfn, vb->page_bitmap); + if (balloon_pfn > vb->end_pfn) + vb->end_pfn = balloon_pfn; + if (balloon_pfn < vb->start_pfn) + vb->start_pfn = balloon_pfn; + found = true; + } + if (found) { + vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG); + vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG); + tell_host(vb, vq); + } + } +} + +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num, + bool use_bmap) { struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; - unsigned num_allocated_pages; + unsigned int num_allocated_pages; - /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb->pfns)); + if (use_bmap) + init_pfn_range(vb); + else + /* We can only do one array worth at a time. */ + num = min(num, ARRAY_SIZE(vb->pfns)); mutex_lock(&vb->balloon_lock); for (vb->num_pfns = 0; vb->num_pfns < num; @@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) msleep(200); break; } - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + if (use_bmap) + update_pfn_range(vb, page); + else + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) @@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) num_allocated_pages = vb->num_pfns; /* Did we get any? */ - if (vb->num_pfns != 0) - tell_host(vb, vb->inflate_vq); + if (vb->num_pfns != 0) { + if (use_bmap) + set_page_bitmap(vb, &vb_dev_info->pages, + vb->inflate_vq); + else + tell_host(vb, vb->inflate_vq); + } mutex_unlock(&vb->balloon_lock); return num_allocated_pages; @@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb, } } -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num, + bool use_bmap) { - unsigned num_freed_pages; + unsigned int num_freed_pages; struct page *page; struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; LIST_HEAD(pages); - /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb->pfns)); + if (use_bmap) + init_pfn_range(vb); + else + /* We can only do one array worth at a time. */ + num = min(num, ARRAY_SIZE(vb->pfns)); mutex_lock(&vb->balloon_lock); for (vb->num_pfns = 0; vb->num_pfns < num; @@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) page = balloon_page_dequeue(vb_dev_info); if (!page) break; - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + if (use_bmap) + update_pfn_range(vb, page); + else + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); list_add(&page->lru, &pages); vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - if (vb->num_pfns != 0) - tell_host(vb, vb->deflate_vq); - release_pages_balloon(vb, &pages); + if (vb->num_pfns != 0) { + if (use_bmap) + set_page_bitmap(vb, &pages, vb->deflate_vq); + else + tell_host(vb, vb->deflate_vq); + + release_pages_balloon(vb, &pages); + } mutex_unlock(&vb->balloon_lock); return num_freed_pages; } @@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self, struct virtio_balloon *vb; unsigned long *freed; unsigned num_freed_pages; + bool use_bmap; vb = container_of(self, struct virtio_balloon, nb); if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) return NOTIFY_OK; freed = parm; - num_freed_pages = leak_balloon(vb, oom_pages); + use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP); + num_freed_pages = leak_balloon(vb, oom_pages, use_bmap); update_balloon_size(vb); *freed += num_freed_pages; @@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work) { struct virtio_balloon *vb; s64 diff; + bool use_bmap; vb = container_of(work, struct virtio_balloon, update_balloon_size_work); + use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP); diff = towards_target(vb); if (diff > 0) - diff -= fill_balloon(vb, diff); + diff -= fill_balloon(vb, diff, use_bmap); else if (diff < 0) - diff += leak_balloon(vb, -diff); + diff += leak_balloon(vb, -diff, use_bmap); update_balloon_size(vb); if (diff) @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, static int virtballoon_probe(struct virtio_device *vdev) { struct virtio_balloon *vb; - int err; + int err, hdr_len; if (!vdev->config->get) { dev_err(&vdev->dev, "%s failure: config access disabled\n", @@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev) spin_lock_init(&vb->stop_update_lock); vb->stop_update = false; vb->num_pages = 0; + vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT; + vb->pfn_limit = min(vb->pfn_limit, get_max_pfn()); + vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) / + BITS_PER_BYTE + 2 * sizeof(unsigned long); + hdr_len = sizeof(struct balloon_bmap_hdr); + vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL); + + /* Clear the feature bit if memory allocation fails */ + if (!vb->bmap_hdr) + __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP); + else + vb->page_bitmap = vb->bmap_hdr + hdr_len; mutex_init(&vb->balloon_lock); init_waitqueue_head(&vb->acked); vb->vdev = vdev; @@ -541,9 +676,12 @@ out: static void remove_common(struct virtio_balloon *vb) { + bool use_bmap; + + use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP); /* There might be pages left in the balloon: free them. */ while (vb->num_pages) - leak_balloon(vb, vb->num_pages); + leak_balloon(vb, vb->num_pages, use_bmap); update_balloon_size(vb); /* Now we reset the device so we can clean up the queues. */ @@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev) cancel_work_sync(&vb->update_balloon_stats_work); remove_common(vb); + kfree(vb->page_bitmap); kfree(vb); } @@ -603,6 +742,7 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, + VIRTIO_BALLOON_F_PAGE_BITMAP, }; static struct virtio_driver virtio_balloon_driver = { -- 1.8.3.1
Liang Li
2016-Jun-29 10:32 UTC
[PATCH v2 kernel 5/7] virtio-balloon: define feature bit and head for misc virt queue
Define a new feature bit which supports a new virtual queue. This new virtual qeuque is for information exchange between hypervisor and guest. The VMM hypervisor can make use of this virtual queue to request the guest do some operations, e.g. drop page cache, synchronize file system, etc. And the VMM hypervisor can get some of guest's runtime information through this virtual queue, e.g. the guest's free page information, which can be used for live migration optimization. Signed-off-by: Liang Li <liang.z.li at intel.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Cornelia Huck <cornelia.huck at de.ibm.com> Cc: Amit Shah <amit.shah at redhat.com> --- include/uapi/linux/virtio_balloon.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index d3b182a..be4880f 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_PAGE_BITMAP 3 /* Send page info with bitmap */ +#define VIRTIO_BALLOON_F_MISC_VQ 4 /* Misc info virtqueue */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -101,4 +102,25 @@ struct balloon_bmap_hdr { __virtio64 bmap_len; }; +enum balloon_req_id { + /* Get free pages information */ + BALLOON_GET_FREE_PAGES, +}; + +enum balloon_flag { + /* Have more data for a request */ + BALLOON_FLAG_CONT, + /* No more data for a request */ + BALLOON_FLAG_DONE, +}; + +struct balloon_req_hdr { + /* Used to distinguish different request */ + __virtio16 cmd; + /* Reserved */ + __virtio16 reserved[3]; + /* Request parameter */ + __virtio64 param; +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ -- 1.8.3.1
Liang Li
2016-Jun-29 10:32 UTC
[PATCH v2 kernel 6/7] mm: add the related functions to get free page info
Save the free page info into a page bitmap, will be used in virtio balloon device driver. Signed-off-by: Liang Li <liang.z.li at intel.com> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Mel Gorman <mgorman at techsingularity.net> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Cornelia Huck <cornelia.huck at de.ibm.com> Cc: Amit Shah <amit.shah at redhat.com> --- mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2083b40..c2a6669 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4521,6 +4521,52 @@ unsigned long get_max_pfn(void) } EXPORT_SYMBOL(get_max_pfn); +static void mark_free_pages_bitmap(struct zone *zone, unsigned long start_pfn, + unsigned long end_pfn, unsigned long *bitmap, unsigned long len) +{ + unsigned long pfn, flags, page_num; + unsigned int order, t; + struct list_head *curr; + + if (zone_is_empty(zone)) + return; + end_pfn = min(start_pfn + len, end_pfn); + spin_lock_irqsave(&zone->lock, flags); + + for_each_migratetype_order(order, t) { + list_for_each(curr, &zone->free_area[order].free_list[t]) { + pfn = page_to_pfn(list_entry(curr, struct page, lru)); + if (pfn >= start_pfn && pfn <= end_pfn) { + page_num = 1UL << order; + if (pfn + page_num > end_pfn) + page_num = end_pfn - pfn; + bitmap_set(bitmap, pfn - start_pfn, page_num); + } + } + } + + spin_unlock_irqrestore(&zone->lock, flags); +} + +int get_free_pages(unsigned long start_pfn, unsigned long end_pfn, + unsigned long *bitmap, unsigned long len) +{ + struct zone *zone; + int ret = 0; + + if (bitmap == NULL || start_pfn > end_pfn || start_pfn >= max_pfn) + return 0; + if (end_pfn < max_pfn) + ret = 1; + if (end_pfn >= max_pfn) + ret = 0; + + for_each_populated_zone(zone) + mark_free_pages_bitmap(zone, start_pfn, end_pfn, bitmap, len); + return ret; +} +EXPORT_SYMBOL(get_free_pages); + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone = zone; -- 1.8.3.1
Liang Li
2016-Jun-29 10:32 UTC
[PATCH v2 kernel 7/7] virtio-balloon: tell host vm's free page info
Support the request for vm's free page information, response with a page bitmap. QEMU can make use of this free page bitmap to speed up live migration process by skipping process the free pages. Signed-off-by: Liang Li <liang.z.li at intel.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Cornelia Huck <cornelia.huck at de.ibm.com> Cc: Amit Shah <amit.shah at redhat.com> --- drivers/virtio/virtio_balloon.c | 104 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 98 insertions(+), 6 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 2d18ff6..5ca4ad3 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -62,10 +62,13 @@ module_param(oom_pages, int, S_IRUSR | S_IWUSR); MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); extern unsigned long get_max_pfn(void); +extern int get_free_pages(unsigned long start_pfn, unsigned long end_pfn, + unsigned long *bitmap, unsigned long len); + struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *misc_vq; /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; @@ -89,6 +92,8 @@ struct virtio_balloon { unsigned long pfn_limit; /* Used to record the processed pfn range */ unsigned long min_pfn, max_pfn, start_pfn, end_pfn; + /* Request header */ + struct balloon_req_hdr req_hdr; /* * The pages we've told the Host we're not using are enqueued * at vb_dev_info->pages list. @@ -373,6 +378,49 @@ static void update_balloon_stats(struct virtio_balloon *vb) pages_to_bytes(available)); } +static void update_free_pages_stats(struct virtio_balloon *vb, + unsigned long req_id) +{ + struct scatterlist sg_in, sg_out; + unsigned long pfn = 0, bmap_len, max_pfn; + struct virtqueue *vq = vb->misc_vq; + struct balloon_bmap_hdr *hdr = vb->bmap_hdr; + int ret = 1; + + max_pfn = get_max_pfn(); + mutex_lock(&vb->balloon_lock); + while (pfn < max_pfn) { + memset(vb->page_bitmap, 0, vb->bmap_len); + ret = get_free_pages(pfn, pfn + vb->pfn_limit, + vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE); + hdr->cmd = cpu_to_virtio16(vb->vdev, BALLOON_GET_FREE_PAGES); + hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT); + hdr->req_id = cpu_to_virtio64(vb->vdev, req_id); + hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn); + bmap_len = vb->pfn_limit / BITS_PER_BYTE; + if (!ret) { + hdr->flag = cpu_to_virtio16(vb->vdev, + BALLOON_FLAG_DONE); + if (pfn + vb->pfn_limit > max_pfn) + bmap_len = (max_pfn - pfn) / BITS_PER_BYTE; + } else + hdr->flag = cpu_to_virtio16(vb->vdev, + BALLOON_FLAG_CONT); + hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len); + sg_init_one(&sg_out, hdr, + sizeof(struct balloon_bmap_hdr) + bmap_len); + + virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL); + virtqueue_kick(vq); + pfn += vb->pfn_limit; + } + + sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr)); + virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL); + virtqueue_kick(vq); + mutex_unlock(&vb->balloon_lock); +} + /* * While most virtqueues communicate guest-initiated requests to the hypervisor, * the stats queue operates in reverse. The driver initializes the virtqueue @@ -511,18 +559,49 @@ static void update_balloon_size_func(struct work_struct *work) queue_work(system_freezable_wq, work); } +static void misc_handle_rq(struct virtio_balloon *vb) +{ + struct balloon_req_hdr *ptr_hdr; + unsigned int len; + + ptr_hdr = virtqueue_get_buf(vb->misc_vq, &len); + if (!ptr_hdr || len != sizeof(vb->req_hdr)) + return; + + switch (ptr_hdr->cmd) { + case BALLOON_GET_FREE_PAGES: + update_free_pages_stats(vb, ptr_hdr->param); + break; + default: + break; + } +} + +static void misc_request(struct virtqueue *vq) +{ + struct virtio_balloon *vb = vq->vdev->priv; + + misc_handle_rq(vb); +} + static int init_vqs(struct virtio_balloon *vb) { - struct virtqueue *vqs[3]; - vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request }; - static const char * const names[] = { "inflate", "deflate", "stats" }; + struct virtqueue *vqs[4]; + vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, + stats_request, misc_request }; + static const char * const names[] = { "inflate", "deflate", "stats", + "misc" }; int err, nvqs; /* * We expect two virtqueues: inflate and deflate, and * optionally stat. */ - nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) + nvqs = 4; + else + nvqs = virtio_has_feature(vb->vdev, + VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2; err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names); if (err) return err; @@ -543,6 +622,16 @@ static int init_vqs(struct virtio_balloon *vb) BUG(); virtqueue_kick(vb->stats_vq); } + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) { + struct scatterlist sg_in; + + vb->misc_vq = vqs[3]; + sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr)); + if (virtqueue_add_inbuf(vb->misc_vq, &sg_in, 1, + &vb->req_hdr, GFP_KERNEL) < 0) + BUG(); + virtqueue_kick(vb->misc_vq); + } return 0; } @@ -639,8 +728,10 @@ static int virtballoon_probe(struct virtio_device *vdev) vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL); /* Clear the feature bit if memory allocation fails */ - if (!vb->bmap_hdr) + if (!vb->bmap_hdr) { __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP); + __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_MISC_VQ); + } else vb->page_bitmap = vb->bmap_hdr + hdr_len; mutex_init(&vb->balloon_lock); @@ -743,6 +834,7 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, VIRTIO_BALLOON_F_PAGE_BITMAP, + VIRTIO_BALLOON_F_MISC_VQ, }; static struct virtio_driver virtio_balloon_driver = { -- 1.8.3.1
Li, Liang Z
2016-Jul-06 00:43 UTC
[PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
Ping ... Liang> -----Original Message----- > From: Li, Liang Z > Sent: Wednesday, June 29, 2016 6:32 PM > To: mst at redhat.com > Cc: linux-kernel at vger.kernel.org; virtualization at lists.linux-foundation.org; > kvm at vger.kernel.org; qemu-devel at nongnu.org; virtio-dev at lists.oasis- > open.org; dgilbert at redhat.com; quintela at redhat.com; Li, Liang Z > Subject: [PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & > fast live migration > > This patch set contains two parts of changes to the virtio-balloon. > > One is the change for speeding up the inflating & deflating process, the main > idea of this optimization is to use bitmap to send the page information to > host instead of the PFNs, to reduce the overhead of virtio data transmission, > address translation and madvise(). This can help to improve the performance > by about 85%. > > Another change is for speeding up live migration. By skipping process guest's > free pages in the first round of data copy, to reduce needless data processing, > this can help to save quite a lot of CPU cycles and network bandwidth. We > put guest's free page information in bitmap and send it to host with the virt > queue of virtio-balloon. For an idle 8GB guest, this can help to shorten the > total live migration time from 2Sec to about 500ms in the 10Gbps network > environment. > > > Changes from v1 to v2: > * Abandon the patch for dropping page cache. > * Put some structures to uapi head file. > * Use a new way to determine the page bitmap size. > * Use a unified way to send the free page information with the bitmap > * Address the issues referred in MST's comments > > Liang Li (7): > virtio-balloon: rework deflate to add page to a list > virtio-balloon: define new feature bit and page bitmap head > mm: add a function to get the max pfn > virtio-balloon: speed up inflate/deflate process > virtio-balloon: define feature bit and head for misc virt queue > mm: add the related functions to get free page info > virtio-balloon: tell host vm's free page info > > drivers/virtio/virtio_balloon.c | 306 > +++++++++++++++++++++++++++++++----- > include/uapi/linux/virtio_balloon.h | 41 +++++ > mm/page_alloc.c | 52 ++++++ > 3 files changed, 359 insertions(+), 40 deletions(-) > > -- > 1.8.3.1
Li, Liang Z
2016-Jul-21 08:14 UTC
[PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
Hi Michael, If you have time, could you help to review this patch set? Thanks! Liang> -----Original Message----- > From: Li, Liang Z > Sent: Wednesday, June 29, 2016 6:32 PM > To: mst at redhat.com > Cc: linux-kernel at vger.kernel.org; virtualization at lists.linux-foundation.org; > kvm at vger.kernel.org; qemu-devel at nongnu.org; virtio-dev at lists.oasis- > open.org; dgilbert at redhat.com; quintela at redhat.com; Li, Liang Z > Subject: [PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & > fast live migration > > This patch set contains two parts of changes to the virtio-balloon. > > One is the change for speeding up the inflating & deflating process, the main > idea of this optimization is to use bitmap to send the page information to > host instead of the PFNs, to reduce the overhead of virtio data transmission, > address translation and madvise(). This can help to improve the performance > by about 85%. > > Another change is for speeding up live migration. By skipping process guest's > free pages in the first round of data copy, to reduce needless data processing, > this can help to save quite a lot of CPU cycles and network bandwidth. We > put guest's free page information in bitmap and send it to host with the virt > queue of virtio-balloon. For an idle 8GB guest, this can help to shorten the > total live migration time from 2Sec to about 500ms in the 10Gbps network > environment. > > > Changes from v1 to v2: > * Abandon the patch for dropping page cache. > * Put some structures to uapi head file. > * Use a new way to determine the page bitmap size. > * Use a unified way to send the free page information with the bitmap > * Address the issues referred in MST's comments > > Liang Li (7): > virtio-balloon: rework deflate to add page to a list > virtio-balloon: define new feature bit and page bitmap head > mm: add a function to get the max pfn > virtio-balloon: speed up inflate/deflate process > virtio-balloon: define feature bit and head for misc virt queue > mm: add the related functions to get free page info > virtio-balloon: tell host vm's free page info > > drivers/virtio/virtio_balloon.c | 306 > +++++++++++++++++++++++++++++++----- > include/uapi/linux/virtio_balloon.h | 41 +++++ > mm/page_alloc.c | 52 ++++++ > 3 files changed, 359 insertions(+), 40 deletions(-) > > -- > 1.8.3.1
Michael S. Tsirkin
2016-Jul-26 18:55 UTC
[PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
On Wed, Jun 29, 2016 at 06:32:13PM +0800, Liang Li wrote:> This patch set contains two parts of changes to the virtio-balloon. > > One is the change for speeding up the inflating & deflating process, > the main idea of this optimization is to use bitmap to send the page > information to host instead of the PFNs, to reduce the overhead of > virtio data transmission, address translation and madvise(). This can > help to improve the performance by about 85%. > > Another change is for speeding up live migration. By skipping process > guest's free pages in the first round of data copy, to reduce needless > data processing, this can help to save quite a lot of CPU cycles and > network bandwidth. We put guest's free page information in bitmap and > send it to host with the virt queue of virtio-balloon. For an idle 8GB > guest, this can help to shorten the total live migration time from 2Sec > to about 500ms in the 10Gbps network environment.So I'm fine with this patchset, but I noticed it was not yet reviewed by MM people. And that is not surprising since you did not copy memory management mailing list on it. I added linux-mm at kvack.org Cc on this mail but this might not be enough. Please repost (e.g. [PATCH v2 repost]) copying the relevant mailing list so we can get some reviews.> > Changes from v1 to v2: > * Abandon the patch for dropping page cache. > * Put some structures to uapi head file. > * Use a new way to determine the page bitmap size. > * Use a unified way to send the free page information with the bitmap > * Address the issues referred in MST's comments > > Liang Li (7): > virtio-balloon: rework deflate to add page to a list > virtio-balloon: define new feature bit and page bitmap head > mm: add a function to get the max pfn > virtio-balloon: speed up inflate/deflate process > virtio-balloon: define feature bit and head for misc virt queue > mm: add the related functions to get free page info > virtio-balloon: tell host vm's free page info > > drivers/virtio/virtio_balloon.c | 306 +++++++++++++++++++++++++++++++----- > include/uapi/linux/virtio_balloon.h | 41 +++++ > mm/page_alloc.c | 52 ++++++ > 3 files changed, 359 insertions(+), 40 deletions(-) > > -- > 1.8.3.1
Li, Liang Z
2016-Jul-27 01:32 UTC
[virtio-dev] Re: [PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
> So I'm fine with this patchset, but I noticed it was not yet reviewed by MM > people. And that is not surprising since you did not copy memory > management mailing list on it. > > I added linux-mm at kvack.org Cc on this mail but this might not be enough. > > Please repost (e.g. [PATCH v2 repost]) copying the relevant mailing list so we > can get some reviews. >I will repost. Thanks! Liang
Reasonably Related Threads
- [PATCH v2 repost 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
- [PATCH v2 repost 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
- [RESEND PATCH v3 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
- [RESEND PATCH v3 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
- [PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration