Wei Wang
2017-May-04 08:50 UTC
[PATCH v10 0/6] Extend virtio-balloon for fast (de)inflating & fast live migration
This patch series implements the follow two things: 1) Optimization of balloon page transfer: instead of transferring balloon pages to host one by one, the new mechanism transfers them in chunks. 2) A mechanism to report info of guest unused pages: the pages have been unused at some time between when host sent command and when guest reported them. Host uses that by tracking memory changes and then discarding changes made to the pages that it gets from guest before it sent the command. Changes: v9->v10: 1) mm: put report_unused_page_block() under CONFIG_VIRTIO_BALLOON; 2) virtio-balloon: add virtballoon_validate(); 3) virtio-balloon: msg format change; 4) virtio-balloon: move miscq handling to a task on system_freezable_wq; 5) virtio-balloon: code cleanup. v8->v9: 1) Split the two new features, VIRTIO_BALLOON_F_BALLOON_CHUNKS and VIRTIO_BALLOON_F_MISC_VQ, which were mixed together in the previous implementation; 2) Simpler function to get the free page block. v7->v8: 1) Use only one chunk format, instead of two. 2) re-write the virtio-balloon implementation patch. 3) commit changes 4) patch re-org Liang Li (1): virtio-balloon: deflate via a page list Wei Wang (5): virtio-balloon: coding format cleanup virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS mm: function to offer a page block on the free list mm: export symbol of next_zone and first_online_pgdat virtio-balloon: VIRTIO_BALLOON_F_MISC_VQ drivers/virtio/virtio_balloon.c | 696 +++++++++++++++++++++++++++++++++--- include/linux/mm.h | 5 + include/uapi/linux/virtio_balloon.h | 26 ++ mm/mmzone.c | 2 + mm/page_alloc.c | 91 +++++ 5 files changed, 761 insertions(+), 59 deletions(-) -- 2.7.4
From: Liang Li <liang.z.li at intel.com> This patch saves the deflated pages to a list, instead of the PFN array. Accordingly, the balloon_pfn_to_page() function is removed. Signed-off-by: Liang Li <liang.z.li at intel.com> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> Signed-off-by: Wei Wang <wei.w.wang at intel.com> --- drivers/virtio/virtio_balloon.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 34adf9b..4a9f307 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -104,12 +104,6 @@ static u32 page_to_balloon_pfn(struct page *page) return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE; } -static struct page *balloon_pfn_to_page(u32 pfn) -{ - BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE); - return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE); -} - static void balloon_ack(struct virtqueue *vq) { struct virtio_balloon *vb = vq->vdev->priv; @@ -182,18 +176,16 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) return num_allocated_pages; } -static void release_pages_balloon(struct virtio_balloon *vb) +static void release_pages_balloon(struct virtio_balloon *vb, + struct list_head *pages) { - unsigned int i; - struct page *page; + struct page *page, *next; - /* Find pfns pointing at start of each page, get pages and free them. */ - for (i = 0; i < vb->num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) { - page = balloon_pfn_to_page(virtio32_to_cpu(vb->vdev, - vb->pfns[i])); + list_for_each_entry_safe(page, next, pages, lru) { if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) adjust_managed_page_count(page, 1); + list_del(&page->lru); put_page(page); /* balloon reference */ } } @@ -203,6 +195,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) unsigned num_freed_pages; struct page *page; struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; + LIST_HEAD(pages); /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb->pfns)); @@ -216,6 +209,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) if (!page) break; set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + list_add(&page->lru, &pages); vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -227,7 +221,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) */ if (vb->num_pfns != 0) tell_host(vb, vb->deflate_vq); - release_pages_balloon(vb); + release_pages_balloon(vb, &pages); mutex_unlock(&vb->balloon_lock); return num_freed_pages; } -- 2.7.4
Clean up the comment format. Signed-off-by: Wei Wang <wei.w.wang at intel.com> --- drivers/virtio/virtio_balloon.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 4a9f307..ecb64e9 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -132,8 +132,10 @@ static void set_page_pfns(struct virtio_balloon *vb, { unsigned int i; - /* Set balloon pfns pointing at this page. - * Note that the first pfn points at start of the page. */ + /* + * Set balloon pfns pointing at this page. + * Note that the first pfn points at start of the page. + */ for (i = 0; i < VIRTIO_BALLOON_PAGES_PER_PAGE; i++) pfns[i] = cpu_to_virtio32(vb->vdev, page_to_balloon_pfn(page) + i); -- 2.7.4
Wei Wang
2017-May-04 08:50 UTC
[PATCH v10 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS
Add a new feature, VIRTIO_BALLOON_F_PAGE_CHUNKS, which enables the transfer of the ballooned (i.e. inflated/deflated) pages in chunks to the host. The implementation of the previous virtio-balloon is not very efficient, because the ballooned pages are transferred to the host one by one. Here is the breakdown of the time in percentage spent on each step of the balloon inflating process (inflating 7GB of an 8GB idle guest). 1) allocating pages (6.5%) 2) sending PFNs to host (68.3%) 3) address translation (6.1%) 4) madvise (19%) It takes about 4126ms for the inflating process to complete. The above profiling shows that the bottlenecks are stage 2) and stage 4). This patch optimizes step 2) by transferring pages to the host in chunks. A chunk consists of guest physically continuous pages. When the pages are packed into a chunk, they are converted into balloon page size (4KB) pages. A chunk is offered to the host via a base PFN (i.e. the start PFN of those physically continuous pages) and the size (i.e. the total number of the 4KB balloon size pages). A chunk is formatted as below: -------------------------------------------------------- | Base (52 bit) | Rsvd (12 bit) | -------------------------------------------------------- -------------------------------------------------------- | Size (52 bit) | Rsvd (12 bit) | -------------------------------------------------------- By doing so, step 4) can also be optimized by doing address translation and madvise() in chunks rather than page by page. With this new feature, the above ballooning process takes ~590ms resulting in an improvement of ~85%. TODO: optimize stage 1) by allocating/freeing a chunk of pages instead of a single page each time. Signed-off-by: Wei Wang <wei.w.wang at intel.com> Signed-off-by: Liang Li <liang.z.li at intel.com> Suggested-by: Michael S. Tsirkin <mst at redhat.com> --- drivers/virtio/virtio_balloon.c | 407 +++++++++++++++++++++++++++++++++--- include/uapi/linux/virtio_balloon.h | 14 ++ 2 files changed, 396 insertions(+), 25 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index ecb64e9..df16912 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -43,6 +43,20 @@ #define OOM_VBALLOON_DEFAULT_PAGES 256 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 +/* The size of one page_bmap used to record inflated/deflated pages. */ +#define VIRTIO_BALLOON_PAGE_BMAP_SIZE (8 * PAGE_SIZE) +/* + * Callulates how many pfns can a page_bmap record. A bit corresponds to a + * page of PAGE_SIZE. + */ +#define VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP \ + (VIRTIO_BALLOON_PAGE_BMAP_SIZE * BITS_PER_BYTE) + +/* The number of page_bmap to allocate by default. */ +#define VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM 1 +/* The maximum number of page_bmap that can be allocated. */ +#define VIRTIO_BALLOON_PAGE_BMAP_MAX_NUM 32 + static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; module_param(oom_pages, int, S_IRUSR | S_IWUSR); MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); @@ -51,6 +65,11 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); static struct vfsmount *balloon_mnt; #endif +/* Maximum number of page chunks */ +#define VIRTIO_BALLOON_MAX_PAGE_CHUNKS ((8 * PAGE_SIZE - \ + sizeof(struct virtio_balloon_page_chunk)) / \ + sizeof(struct virtio_balloon_page_chunk_entry)) + struct virtio_balloon { struct virtio_device *vdev; struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; @@ -79,6 +98,12 @@ struct virtio_balloon { /* Synchronize access/update to this struct virtio_balloon elements */ struct mutex balloon_lock; + /* Buffer for chunks of ballooned pages. */ + struct virtio_balloon_page_chunk *balloon_page_chunk; + + /* Bitmap used to record pages. */ + unsigned long *page_bmap[VIRTIO_BALLOON_PAGE_BMAP_MAX_NUM]; + /* The array of pfns we tell the Host about. */ unsigned int num_pfns; __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; @@ -111,6 +136,136 @@ static void balloon_ack(struct virtqueue *vq) wake_up(&vb->acked); } +/* Update pfn_max and pfn_min according to the pfn of page */ +static inline void update_pfn_range(struct virtio_balloon *vb, + struct page *page, + unsigned long *pfn_min, + unsigned long *pfn_max) +{ + unsigned long pfn = page_to_pfn(page); + + *pfn_min = min(pfn, *pfn_min); + *pfn_max = max(pfn, *pfn_max); +} + +static unsigned int extend_page_bmap_size(struct virtio_balloon *vb, + unsigned long pfn_num) +{ + unsigned int i, bmap_num, allocated_bmap_num; + unsigned long bmap_len; + + allocated_bmap_num = VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; + bmap_len = ALIGN(pfn_num, BITS_PER_LONG) / BITS_PER_BYTE; + bmap_len = roundup(bmap_len, VIRTIO_BALLOON_PAGE_BMAP_SIZE); + /* + * VIRTIO_BALLOON_PAGE_BMAP_SIZE is the size of one page_bmap, so + * divide it to calculate how many page_bmap that we need. + */ + bmap_num = (unsigned int)(bmap_len / VIRTIO_BALLOON_PAGE_BMAP_SIZE); + /* The number of page_bmap to allocate should not exceed the max */ + bmap_num = min_t(unsigned int, VIRTIO_BALLOON_PAGE_BMAP_MAX_NUM, + bmap_num); + + for (i = VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; i < bmap_num; i++) { + vb->page_bmap[i] = kmalloc(VIRTIO_BALLOON_PAGE_BMAP_SIZE, + GFP_KERNEL); + if (vb->page_bmap[i]) + allocated_bmap_num++; + else + break; + } + + return allocated_bmap_num; +} + +static void free_extended_page_bmap(struct virtio_balloon *vb, + unsigned int page_bmap_num) +{ + unsigned int i; + + for (i = VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; i < page_bmap_num; + i++) { + kfree(vb->page_bmap[i]); + vb->page_bmap[i] = NULL; + page_bmap_num--; + } +} + +static void clear_page_bmap(struct virtio_balloon *vb, + unsigned int page_bmap_num) +{ + int i; + + for (i = 0; i < page_bmap_num; i++) + memset(vb->page_bmap[i], 0, VIRTIO_BALLOON_PAGE_BMAP_SIZE); +} + +static void send_page_chunks(struct virtio_balloon *vb, struct virtqueue *vq) +{ + struct scatterlist sg; + struct virtio_balloon_page_chunk *chunk; + unsigned int len; + + chunk = vb->balloon_page_chunk; + len = sizeof(__le64) + + le64_to_cpu(chunk->chunk_num) * + sizeof(struct virtio_balloon_page_chunk_entry); + sg_init_one(&sg, chunk, len); + if (!virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL)) { + virtqueue_kick(vq); + wait_event(vb->acked, virtqueue_get_buf(vq, &len)); + chunk->chunk_num = 0; + } +} + +/* Add a chunk entry to the buffer. */ +static void add_one_chunk(struct virtio_balloon *vb, struct virtqueue *vq, + u64 base, u64 size) +{ + struct virtio_balloon_page_chunk *chunk = vb->balloon_page_chunk; + struct virtio_balloon_page_chunk_entry *entry; + uint64_t chunk_num = le64_to_cpu(chunk->chunk_num); + + entry = &chunk->entry[chunk_num]; + entry->base = cpu_to_le64(base << VIRTIO_BALLOON_CHUNK_BASE_SHIFT); + entry->size = cpu_to_le64(size << VIRTIO_BALLOON_CHUNK_SIZE_SHIFT); + chunk->chunk_num = cpu_to_le64(++chunk_num); + if (chunk_num == VIRTIO_BALLOON_MAX_PAGE_CHUNKS) + send_page_chunks(vb, vq); +} + +static void convert_bmap_to_chunks(struct virtio_balloon *vb, + struct virtqueue *vq, + unsigned long *bmap, + unsigned long pfn_start, + unsigned long size) +{ + unsigned long next_one, next_zero, chunk_size, pos = 0; + + while (pos < size) { + next_one = find_next_bit(bmap, size, pos); + /* + * No "1" bit found, which means that there is no pfn + * recorded in the rest of this bmap. + */ + if (next_one == size) + break; + next_zero = find_next_zero_bit(bmap, size, next_one + 1); + /* + * A bit in page_bmap corresponds to a page of PAGE_SIZE. + * Convert it to be pages of 4KB balloon page size when + * adding it to a chunk. + */ + chunk_size = (next_zero - next_one) * + VIRTIO_BALLOON_PAGES_PER_PAGE; + if (chunk_size) { + add_one_chunk(vb, vq, pfn_start + next_one, + chunk_size); + pos += next_zero + 1; + } + } +} + static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) { struct scatterlist sg; @@ -124,7 +279,33 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) /* When host has read buffer, this completes via balloon_ack */ wait_event(vb->acked, virtqueue_get_buf(vq, &len)); +} + +static void tell_host_from_page_bmap(struct virtio_balloon *vb, + struct virtqueue *vq, + unsigned long pfn_start, + unsigned long pfn_end, + unsigned int page_bmap_num) +{ + unsigned long i, pfn_num; + for (i = 0; i < page_bmap_num; i++) { + /* + * For the last page_bmap, only the remaining number of pfns + * need to be searched rather than the entire page_bmap. + */ + if (i + 1 == page_bmap_num) + pfn_num = (pfn_end - pfn_start) % + VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP; + else + pfn_num = VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP; + + convert_bmap_to_chunks(vb, vq, vb->page_bmap[i], pfn_start + + i * VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP, + pfn_num); + } + if (le64_to_cpu(vb->balloon_page_chunk->chunk_num) > 0) + send_page_chunks(vb, vq); } static void set_page_pfns(struct virtio_balloon *vb, @@ -141,13 +322,88 @@ static void set_page_pfns(struct virtio_balloon *vb, page_to_balloon_pfn(page) + i); } +/* + * Send ballooned pages in chunks to host. + * The ballooned pages are recorded in page bitmaps. Each bit in a bitmap + * corresponds to a page of PAGE_SIZE. The page bitmaps are searched for + * continuous "1" bits, which correspond to continuous pages, to chunk. + * When packing those continuous pages into chunks, pages are converted into + * 4KB balloon pages. + * + * pfn_max and pfn_min form the range of pfns that need to use page bitmaps to + * record. If the range is too large to be recorded into the allocated page + * bitmaps, the page bitmaps are used multiple times to record the entire + * range of pfns. + */ +static void tell_host_page_chunks(struct virtio_balloon *vb, + struct list_head *pages, + struct virtqueue *vq, + unsigned long pfn_max, + unsigned long pfn_min) +{ + /* + * The pfn_start and pfn_end form the range of pfns that the allocated + * page_bmap can record in each round. + */ + unsigned long pfn_start, pfn_end; + /* Total number of allocated page_bmap */ + unsigned int page_bmap_num; + struct page *page; + bool found; + + /* + * In the case that one page_bmap is not sufficient to record the pfn + * range, page_bmap will be extended by allocating more numbers of + * page_bmap. + */ + page_bmap_num = extend_page_bmap_size(vb, pfn_max - pfn_min + 1); + + /* Start from the beginning of the whole pfn range */ + pfn_start = pfn_min; + while (pfn_start < pfn_max) { + pfn_end = pfn_start + + VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP * page_bmap_num; + pfn_end = pfn_end < pfn_max ? pfn_end : pfn_max; + clear_page_bmap(vb, page_bmap_num); + found = false; + + list_for_each_entry(page, pages, lru) { + unsigned long bmap_idx, bmap_pos, this_pfn; + + this_pfn = page_to_pfn(page); + if (this_pfn < pfn_start || this_pfn > pfn_end) + continue; + bmap_idx = (this_pfn - pfn_start) / + VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP; + bmap_pos = (this_pfn - pfn_start) % + VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP; + set_bit(bmap_pos, vb->page_bmap[bmap_idx]); + + found = true; + } + if (found) + tell_host_from_page_bmap(vb, vq, pfn_start, pfn_end, + page_bmap_num); + /* + * Start the next round when pfn_start and pfn_end couldn't + * cover the whole pfn range given by pfn_max and pfn_min. + */ + pfn_start = pfn_end; + } + free_extended_page_bmap(vb, page_bmap_num); +} + static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) { struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; unsigned num_allocated_pages; + bool chunking = virtio_has_feature(vb->vdev, + VIRTIO_BALLOON_F_PAGE_CHUNKS); + unsigned long pfn_max = 0, pfn_min = ULONG_MAX; /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb->pfns)); + if (!chunking) + num = min(num, ARRAY_SIZE(vb->pfns)); mutex_lock(&vb->balloon_lock); for (vb->num_pfns = 0; vb->num_pfns < num; @@ -162,7 +418,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) msleep(200); break; } - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + if (chunking) + update_pfn_range(vb, page, &pfn_max, &pfn_min); + else + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) @@ -171,8 +430,14 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) num_allocated_pages = vb->num_pfns; /* Did we get any? */ - if (vb->num_pfns != 0) - tell_host(vb, vb->inflate_vq); + if (vb->num_pfns != 0) { + if (chunking) + tell_host_page_chunks(vb, &vb_dev_info->pages, + vb->inflate_vq, + pfn_max, pfn_min); + else + tell_host(vb, vb->inflate_vq); + } mutex_unlock(&vb->balloon_lock); return num_allocated_pages; @@ -198,9 +463,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) struct page *page; struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; LIST_HEAD(pages); + bool chunking = virtio_has_feature(vb->vdev, + VIRTIO_BALLOON_F_PAGE_CHUNKS); + unsigned long pfn_max = 0, pfn_min = ULONG_MAX; - /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb->pfns)); + /* Traditionally, we can only do one array worth at a time. */ + if (!chunking) + num = min(num, ARRAY_SIZE(vb->pfns)); mutex_lock(&vb->balloon_lock); /* We can't release more pages than taken */ @@ -210,7 +479,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) page = balloon_page_dequeue(vb_dev_info); if (!page) break; - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + if (chunking) + update_pfn_range(vb, page, &pfn_max, &pfn_min); + else + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); list_add(&page->lru, &pages); vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -221,8 +493,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - if (vb->num_pfns != 0) - tell_host(vb, vb->deflate_vq); + if (vb->num_pfns != 0) { + if (chunking) + tell_host_page_chunks(vb, &pages, vb->deflate_vq, + pfn_max, pfn_min); + else + tell_host(vb, vb->deflate_vq); + } release_pages_balloon(vb, &pages); mutex_unlock(&vb->balloon_lock); return num_freed_pages; @@ -442,6 +719,14 @@ static int init_vqs(struct virtio_balloon *vb) } #ifdef CONFIG_BALLOON_COMPACTION + +static void tell_host_one_page(struct virtio_balloon *vb, + struct virtqueue *vq, struct page *page) +{ + add_one_chunk(vb, vq, page_to_pfn(page), + VIRTIO_BALLOON_PAGES_PER_PAGE); +} + /* * virtballoon_migratepage - perform the balloon page migration on behalf of * a compation thread. (called under page lock) @@ -465,6 +750,8 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, { struct virtio_balloon *vb = container_of(vb_dev_info, struct virtio_balloon, vb_dev_info); + bool chunking = virtio_has_feature(vb->vdev, + VIRTIO_BALLOON_F_PAGE_CHUNKS); unsigned long flags; /* @@ -486,16 +773,22 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, vb_dev_info->isolated_pages--; __count_vm_event(BALLOON_MIGRATE); spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags); - vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, newpage); - tell_host(vb, vb->inflate_vq); - + if (chunking) { + tell_host_one_page(vb, vb->inflate_vq, newpage); + } else { + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; + set_page_pfns(vb, vb->pfns, newpage); + tell_host(vb, vb->inflate_vq); + } /* balloon's page migration 2nd step -- deflate "page" */ balloon_page_delete(page); - vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, page); - tell_host(vb, vb->deflate_vq); - + if (chunking) { + tell_host_one_page(vb, vb->deflate_vq, page); + } else { + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; + set_page_pfns(vb, vb->pfns, page); + tell_host(vb, vb->deflate_vq); + } mutex_unlock(&vb->balloon_lock); put_page(page); /* balloon reference */ @@ -522,9 +815,75 @@ static struct file_system_type balloon_fs = { #endif /* CONFIG_BALLOON_COMPACTION */ +static void free_page_bmap(struct virtio_balloon *vb) +{ + int i; + + for (i = 0; i < VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; i++) { + kfree(vb->page_bmap[i]); + vb->page_bmap[i] = NULL; + } +} + +static int balloon_page_chunk_init(struct virtio_balloon *vb) +{ + int i; + + vb->balloon_page_chunk = kmalloc(sizeof(__le64) + + sizeof(struct virtio_balloon_page_chunk_entry) * + VIRTIO_BALLOON_MAX_PAGE_CHUNKS, GFP_KERNEL); + if (!vb->balloon_page_chunk) + goto err_page_chunk; + + /* + * The default number of page_bmaps are allocated. More may be + * allocated on demand. + */ + for (i = 0; i < VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; i++) { + vb->page_bmap[i] = kmalloc(VIRTIO_BALLOON_PAGE_BMAP_SIZE, + GFP_KERNEL); + if (!vb->page_bmap[i]) + goto err_page_bmap; + } + + return 0; +err_page_bmap: + free_page_bmap(vb); + kfree(vb->balloon_page_chunk); + vb->balloon_page_chunk = NULL; +err_page_chunk: + __virtio_clear_bit(vb->vdev, VIRTIO_BALLOON_F_PAGE_CHUNKS); + dev_warn(&vb->vdev->dev, "%s: failed\n", __func__); + return -ENOMEM; +} + +static int virtballoon_validate(struct virtio_device *vdev) +{ + struct virtio_balloon *vb = NULL; + int err; + + vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL); + if (!vb) { + err = -ENOMEM; + goto err_vb; + } + + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_CHUNKS)) { + err = balloon_page_chunk_init(vb); + if (err < 0) + goto err_page_chunk; + } + + return 0; +err_page_chunk: + kfree(vb); +err_vb: + return err; +} + static int virtballoon_probe(struct virtio_device *vdev) { - struct virtio_balloon *vb; + struct virtio_balloon *vb = vdev->priv; int err; if (!vdev->config->get) { @@ -533,17 +892,12 @@ static int virtballoon_probe(struct virtio_device *vdev) return -EINVAL; } - vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL); - if (!vb) { - err = -ENOMEM; - goto out; - } - INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func); INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func); spin_lock_init(&vb->stop_update_lock); vb->stop_update = false; vb->num_pages = 0; + mutex_init(&vb->balloon_lock); init_waitqueue_head(&vb->acked); vb->vdev = vdev; @@ -590,7 +944,6 @@ static int virtballoon_probe(struct virtio_device *vdev) vdev->config->del_vqs(vdev); out_free_vb: kfree(vb); -out: return err; } @@ -620,6 +973,8 @@ static void virtballoon_remove(struct virtio_device *vdev) cancel_work_sync(&vb->update_balloon_stats_work); remove_common(vb); + free_page_bmap(vb); + kfree(vb->balloon_page_chunk); #ifdef CONFIG_BALLOON_COMPACTION if (vb->vb_dev_info.inode) iput(vb->vb_dev_info.inode); @@ -664,6 +1019,7 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, + VIRTIO_BALLOON_F_PAGE_CHUNKS, }; static struct virtio_driver virtio_balloon_driver = { @@ -674,6 +1030,7 @@ static struct virtio_driver virtio_balloon_driver = { .id_table = id_table, .probe = virtballoon_probe, .remove = virtballoon_remove, + .validate = virtballoon_validate, .config_changed = virtballoon_changed, #ifdef CONFIG_PM_SLEEP .freeze = virtballoon_freeze, diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index 343d7dd..d532ed16 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -34,6 +34,7 @@ #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ +#define VIRTIO_BALLOON_F_PAGE_CHUNKS 3 /* Inflate/Deflate pages in chunks */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -82,4 +83,17 @@ struct virtio_balloon_stat { __virtio64 val; } __attribute__((packed)); +#define VIRTIO_BALLOON_CHUNK_BASE_SHIFT 12 +#define VIRTIO_BALLOON_CHUNK_SIZE_SHIFT 12 +struct virtio_balloon_page_chunk_entry { + __le64 base; + __le64 size; +}; + +struct virtio_balloon_page_chunk { + /* Number of chunks in the payload */ + __le64 chunk_num; + struct virtio_balloon_page_chunk_entry entry[]; +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ -- 2.7.4
Wei Wang
2017-May-04 08:50 UTC
[PATCH v10 4/6] mm: function to offer a page block on the free list
Add a function to find a page block on the free list specified by the caller. Pages from the page block may be used immediately after the function returns. The caller is responsible for detecting or preventing the use of such pages. Signed-off-by: Wei Wang <wei.w.wang at intel.com> Signed-off-by: Liang Li <liang.z.li at intel.com> --- include/linux/mm.h | 5 +++ mm/page_alloc.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 96 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5d22e69..82361a6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1841,6 +1841,11 @@ extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +#if IS_ENABLED(CONFIG_VIRTIO_BALLOON) +extern int report_unused_page_block(struct zone *zone, unsigned int order, + unsigned int migratetype, + struct page **page); +#endif /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) * into the buddy system. The freed pages will be poisoned with pattern diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2c25de4..e554ab8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4615,6 +4615,97 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) show_swap_cache_info(); } +#if IS_ENABLED(CONFIG_VIRTIO_BALLOON) + +/** + * Heuristically get a page block in the system that is unused. + * It is possible that pages from the page block are used immediately after + * report_unused_page_block() returns. It is the caller's responsibility + * to either detect or prevent the use of such pages. + * + * The free list to check: zone->free_area[order].free_list[migratetype]. + * + * If the caller supplied page block (i.e. **page) is on the free list, offer + * the next page block on the list to the caller. Otherwise, offer the first + * page block on the list. + * + * Return 0 when a page block is found on the caller specified free list. + */ +int report_unused_page_block(struct zone *zone, unsigned int order, + unsigned int migratetype, struct page **page) +{ + struct zone *this_zone; + struct list_head *this_list; + int ret = 0; + unsigned long flags; + + /* Sanity check */ + if (zone == NULL || page == NULL || order >= MAX_ORDER || + migratetype >= MIGRATE_TYPES) + return -EINVAL; + + /* Zone validity check */ + for_each_populated_zone(this_zone) { + if (zone == this_zone) + break; + } + + /* Got a non-existent zone from the caller? */ + if (zone != this_zone) + return -EINVAL; + + spin_lock_irqsave(&this_zone->lock, flags); + + this_list = &zone->free_area[order].free_list[migratetype]; + if (list_empty(this_list)) { + *page = NULL; + ret = 1; + goto out; + } + + /* The caller is asking for the first free page block on the list */ + if ((*page) == NULL) { + *page = list_first_entry(this_list, struct page, lru); + ret = 0; + goto out; + } + + /** + * The page block passed from the caller is not on this free list + * anymore (e.g. a 1MB free page block has been split). In this case, + * offer the first page block on the free list that the caller is + * asking for. + */ + if (PageBuddy(*page) && order != page_order(*page)) { + *page = list_first_entry(this_list, struct page, lru); + ret = 0; + goto out; + } + + /** + * The page block passed from the caller has been the last page block + * on the list. + */ + if ((*page)->lru.next == this_list) { + *page = NULL; + ret = 1; + goto out; + } + + /** + * Finally, fall into the regular case: the page block passed from the + * caller is still on the free list. Offer the next one. + */ + *page = list_next_entry((*page), lru); + ret = 0; +out: + spin_unlock_irqrestore(&this_zone->lock, flags); + return ret; +} +EXPORT_SYMBOL(report_unused_page_block); + +#endif + static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) { zoneref->zone = zone; -- 2.7.4
Wei Wang
2017-May-04 08:50 UTC
[PATCH v10 5/6] mm: export symbol of next_zone and first_online_pgdat
This patch enables for_each_zone()/for_each_populated_zone() to be invoked by a kernel module. Signed-off-by: Wei Wang <wei.w.wang at intel.com> --- mm/mmzone.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/mmzone.c b/mm/mmzone.c index a51c0a6..08a2a3a 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -13,6 +13,7 @@ struct pglist_data *first_online_pgdat(void) { return NODE_DATA(first_online_node); } +EXPORT_SYMBOL_GPL(first_online_pgdat); struct pglist_data *next_online_pgdat(struct pglist_data *pgdat) { @@ -41,6 +42,7 @@ struct zone *next_zone(struct zone *zone) } return zone; } +EXPORT_SYMBOL_GPL(next_zone); static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes) { -- 2.7.4
Add a new vq, miscq, to handle miscellaneous requests between the device and the driver. Only one request is handled in-flight each time. This patch implements the VIRTIO_BALLOON_MISCQ_CMD_REPORT_UNUSED_PAGES request sent from the device. Upon receiving the request from the miscq, the driver offers to the device the guest unused pages. Tests have shown that skipping the transfer of unused pages of a 32G idle guest can get the live migration time reduced to 1/8. Signed-off-by: Wei Wang <wei.w.wang at intel.com> Signed-off-by: Liang Li <liang.z.li at intel.com> --- drivers/virtio/virtio_balloon.c | 299 +++++++++++++++++++++++++++++++----- include/uapi/linux/virtio_balloon.h | 12 ++ 2 files changed, 274 insertions(+), 37 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index df16912..4dcee2c 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -57,6 +57,10 @@ /* The maximum number of page_bmap that can be allocated. */ #define VIRTIO_BALLOON_PAGE_BMAP_MAX_NUM 32 +/* Types of pages to chunk */ +#define PAGE_CHUNK_TYPE_BALLOON 0 /* Chunk of inflate/deflate pages */ +#define PAGE_CHUNK_TYPE_UNUSED 1 /* Chunk of unused pages */ + static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; module_param(oom_pages, int, S_IRUSR | S_IWUSR); MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); @@ -67,16 +71,17 @@ static struct vfsmount *balloon_mnt; /* Maximum number of page chunks */ #define VIRTIO_BALLOON_MAX_PAGE_CHUNKS ((8 * PAGE_SIZE - \ - sizeof(struct virtio_balloon_page_chunk)) / \ - sizeof(struct virtio_balloon_page_chunk_entry)) + sizeof(struct virtio_balloon_miscq_msg)) / \ + sizeof(struct virtio_balloon_page_chunk_entry)) struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *miscq; /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; struct work_struct update_balloon_size_work; + struct work_struct miscq_handle_work; /* Prevent updating balloon when it is being canceled. */ spinlock_t stop_update_lock; @@ -98,6 +103,9 @@ struct virtio_balloon { /* Synchronize access/update to this struct virtio_balloon elements */ struct mutex balloon_lock; + /* Miscq msg buffer for the REPORT_UNUSED_PAGES cmd */ + struct virtio_balloon_miscq_msg *miscq_msg_rup; + /* Buffer for chunks of ballooned pages. */ struct virtio_balloon_page_chunk *balloon_page_chunk; @@ -200,38 +208,85 @@ static void clear_page_bmap(struct virtio_balloon *vb, memset(vb->page_bmap[i], 0, VIRTIO_BALLOON_PAGE_BMAP_SIZE); } -static void send_page_chunks(struct virtio_balloon *vb, struct virtqueue *vq) +static void send_page_chunks(struct virtio_balloon *vb, struct virtqueue *vq, + int type, bool busy_wait) { struct scatterlist sg; struct virtio_balloon_page_chunk *chunk; - unsigned int len; + void *msg_buf; + unsigned int msg_len; + uint64_t chunk_num = 0; + + switch (type) { + case PAGE_CHUNK_TYPE_BALLOON: + chunk = vb->balloon_page_chunk; + chunk_num = le64_to_cpu(chunk->chunk_num); + msg_buf = vb->balloon_page_chunk; + msg_len = sizeof(struct virtio_balloon_page_chunk) + + sizeof(struct virtio_balloon_page_chunk_entry) * + chunk_num; + break; + case PAGE_CHUNK_TYPE_UNUSED: + chunk = &vb->miscq_msg_rup->payload.chunk; + chunk_num = le64_to_cpu(chunk->chunk_num); + msg_buf = vb->miscq_msg_rup; + msg_len = sizeof(struct virtio_balloon_miscq_msg) + + sizeof(struct virtio_balloon_page_chunk_entry) * + chunk_num; + break; + default: + dev_warn(&vb->vdev->dev, "%s: chunk %d of unknown pages\n", + __func__, type); + return; + } - chunk = vb->balloon_page_chunk; - len = sizeof(__le64) + - le64_to_cpu(chunk->chunk_num) * - sizeof(struct virtio_balloon_page_chunk_entry); - sg_init_one(&sg, chunk, len); + sg_init_one(&sg, msg_buf, msg_len); if (!virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL)) { virtqueue_kick(vq); - wait_event(vb->acked, virtqueue_get_buf(vq, &len)); + if (busy_wait) + while (!virtqueue_get_buf(vq, &msg_len) && + !virtqueue_is_broken(vq)) + cpu_relax(); + else + wait_event(vb->acked, virtqueue_get_buf(vq, &msg_len)); + /* + * Now, the chunks have been delivered to the host. + * Reset the filed in the structure that records the number of + * added chunks, so that new added chunks can be re-counted. + */ chunk->chunk_num = 0; } } /* Add a chunk entry to the buffer. */ static void add_one_chunk(struct virtio_balloon *vb, struct virtqueue *vq, - u64 base, u64 size) + int type, u64 base, u64 size) { - struct virtio_balloon_page_chunk *chunk = vb->balloon_page_chunk; + struct virtio_balloon_page_chunk *chunk; struct virtio_balloon_page_chunk_entry *entry; - uint64_t chunk_num = le64_to_cpu(chunk->chunk_num); - + uint64_t chunk_num; + + switch (type) { + case PAGE_CHUNK_TYPE_BALLOON: + chunk = vb->balloon_page_chunk; + chunk_num = le64_to_cpu(vb->balloon_page_chunk->chunk_num); + break; + case PAGE_CHUNK_TYPE_UNUSED: + chunk = &vb->miscq_msg_rup->payload.chunk; + chunk_num + le64_to_cpu(vb->miscq_msg_rup->payload.chunk.chunk_num); + break; + default: + dev_warn(&vb->vdev->dev, "%s: chunk %d of unknown pages\n", + __func__, type); + return; + } entry = &chunk->entry[chunk_num]; entry->base = cpu_to_le64(base << VIRTIO_BALLOON_CHUNK_BASE_SHIFT); entry->size = cpu_to_le64(size << VIRTIO_BALLOON_CHUNK_SIZE_SHIFT); chunk->chunk_num = cpu_to_le64(++chunk_num); if (chunk_num == VIRTIO_BALLOON_MAX_PAGE_CHUNKS) - send_page_chunks(vb, vq); + send_page_chunks(vb, vq, type, 0); } static void convert_bmap_to_chunks(struct virtio_balloon *vb, @@ -259,8 +314,8 @@ static void convert_bmap_to_chunks(struct virtio_balloon *vb, chunk_size = (next_zero - next_one) * VIRTIO_BALLOON_PAGES_PER_PAGE; if (chunk_size) { - add_one_chunk(vb, vq, pfn_start + next_one, - chunk_size); + add_one_chunk(vb, vq, PAGE_CHUNK_TYPE_BALLOON, + pfn_start + next_one, chunk_size); pos += next_zero + 1; } } @@ -305,7 +360,7 @@ static void tell_host_from_page_bmap(struct virtio_balloon *vb, pfn_num); } if (le64_to_cpu(vb->balloon_page_chunk->chunk_num) > 0) - send_page_chunks(vb, vq); + send_page_chunks(vb, vq, PAGE_CHUNK_TYPE_BALLOON, 0); } static void set_page_pfns(struct virtio_balloon *vb, @@ -679,43 +734,186 @@ static void update_balloon_size_func(struct work_struct *work) queue_work(system_freezable_wq, work); } +/* Add a message buffer for the host to fill in a request */ +static void miscq_msg_inbuf_add(struct virtio_balloon *vb, + struct virtio_balloon_miscq_msg *req_buf) +{ + struct scatterlist sg_in; + + sg_init_one(&sg_in, req_buf, sizeof(struct virtio_balloon_miscq_msg)); + if (virtqueue_add_inbuf(vb->miscq, &sg_in, 1, req_buf, GFP_KERNEL) + < 0) { + __virtio_clear_bit(vb->vdev, + VIRTIO_BALLOON_F_MISC_VQ); + dev_warn(&vb->vdev->dev, "%s: add miscq msg buf err\n", + __func__); + return; + } + virtqueue_kick(vb->miscq); +} + +static void miscq_report_unused_pages(struct virtio_balloon *vb) +{ + struct virtio_balloon_miscq_msg *msg = vb->miscq_msg_rup; + struct virtqueue *vq = vb->miscq; + int ret = 0; + unsigned int order = 0, migratetype = 0; + struct zone *zone = NULL; + struct page *page = NULL; + u64 pfn; + + msg->cmd = cpu_to_le32(VIRTIO_BALLOON_MISCQ_CMD_REPORT_UNUSED_PAGES); + msg->flags = 0; + + for_each_populated_zone(zone) { + for (order = MAX_ORDER - 1; order > 0; order--) { + for (migratetype = 0; migratetype < MIGRATE_TYPES; + migratetype++) { + do { + ret = report_unused_page_block(zone, + order, migratetype, &page); + if (!ret) { + pfn = (u64)page_to_pfn(page); + add_one_chunk(vb, vq, + PAGE_CHUNK_TYPE_UNUSED, + pfn, + (u64)(1 << order) * + VIRTIO_BALLOON_PAGES_PER_PAGE); + } + } while (!ret); + } + } + } + /* Set the cmd completion flag */ + msg->flags |= cpu_to_le32(VIRTIO_BALLOON_MISCQ_F_COMPLETION); + send_page_chunks(vb, vq, PAGE_CHUNK_TYPE_UNUSED, true); +} + +static void miscq_handle_func(struct work_struct *work) +{ + struct virtio_balloon *vb; + struct virtio_balloon_miscq_msg *msg; + unsigned int len; + + vb = container_of(work, struct virtio_balloon, + miscq_handle_work); + msg = virtqueue_get_buf(vb->miscq, &len); + if (!msg || len != sizeof(struct virtio_balloon_miscq_msg)) { + dev_warn(&vb->vdev->dev, "%s: invalid miscq msg len\n", + __func__); + miscq_msg_inbuf_add(vb, vb->miscq_msg_rup); + return; + } + switch (msg->cmd) { + case VIRTIO_BALLOON_MISCQ_CMD_REPORT_UNUSED_PAGES: + miscq_report_unused_pages(vb); + break; + default: + dev_warn(&vb->vdev->dev, "%s: miscq cmd %d not supported\n", + __func__, msg->cmd); + } + miscq_msg_inbuf_add(vb, vb->miscq_msg_rup); +} + +static void miscq_request(struct virtqueue *vq) +{ + struct virtio_balloon *vb = vq->vdev->priv; + + queue_work(system_freezable_wq, &vb->miscq_handle_work); +} + static int init_vqs(struct virtio_balloon *vb) { - struct virtqueue *vqs[3]; - vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request }; - static const char * const names[] = { "inflate", "deflate", "stats" }; - int err, nvqs; + struct virtqueue **vqs; + vq_callback_t **callbacks; + const char **names; + int err = -ENOMEM; + int i, nvqs; + + /* Inflateq and deflateq are used unconditionally */ + nvqs = 2; + + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) + nvqs++; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) + nvqs++; + + /* Allocate space for find_vqs parameters */ + vqs = kcalloc(nvqs, sizeof(*vqs), GFP_KERNEL); + if (!vqs) + goto err_vq; + callbacks = kmalloc_array(nvqs, sizeof(*callbacks), GFP_KERNEL); + if (!callbacks) + goto err_callback; + names = kmalloc_array(nvqs, sizeof(*names), GFP_KERNEL); + if (!names) + goto err_names; + + callbacks[0] = balloon_ack; + names[0] = "inflate"; + callbacks[1] = balloon_ack; + names[1] = "deflate"; + + i = 2; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { + callbacks[i] = stats_request; + names[i] = "stats"; + i++; + } - /* - * We expect two virtqueues: inflate and deflate, and - * optionally stat. - */ - nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2; - err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names, - NULL); + if (virtio_has_feature(vb->vdev, + VIRTIO_BALLOON_F_MISC_VQ)) { + callbacks[i] = miscq_request; + names[i] = "miscq"; + } + + err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, + names, NULL); if (err) - return err; + goto err_find; vb->inflate_vq = vqs[0]; vb->deflate_vq = vqs[1]; + i = 2; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { struct scatterlist sg; - unsigned int num_stats; - vb->stats_vq = vqs[2]; + vb->stats_vq = vqs[i++]; /* * Prime this virtqueue with one buffer so the hypervisor can * use it to signal us later (it can't be broken yet!). */ - num_stats = update_balloon_stats(vb); - - sg_init_one(&sg, vb->stats, sizeof(vb->stats[0]) * num_stats); + sg_init_one(&sg, vb->stats, sizeof(vb->stats)); if (virtqueue_add_outbuf(vb->stats_vq, &sg, 1, vb, GFP_KERNEL) < 0) BUG(); virtqueue_kick(vb->stats_vq); } + + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) { + vb->miscq = vqs[i]; + /* + * Add the msg buf for the REPORT_UNUSED_PAGES request. + * The request is handled one in-flight each time. So, just + * use the response buffer, msicq_msg_rup, for the host to + * fill in a request. + */ + miscq_msg_inbuf_add(vb, vb->miscq_msg_rup); + } + + kfree(names); + kfree(callbacks); + kfree(vqs); return 0; + +err_find: + kfree(names); +err_names: + kfree(callbacks); +err_callback: + kfree(vqs); +err_vq: + return err; } #ifdef CONFIG_BALLOON_COMPACTION @@ -723,7 +921,7 @@ static int init_vqs(struct virtio_balloon *vb) static void tell_host_one_page(struct virtio_balloon *vb, struct virtqueue *vq, struct page *page) { - add_one_chunk(vb, vq, page_to_pfn(page), + add_one_chunk(vb, vq, PAGE_CHUNK_TYPE_BALLOON, page_to_pfn(page), VIRTIO_BALLOON_PAGES_PER_PAGE); } @@ -857,6 +1055,22 @@ static int balloon_page_chunk_init(struct virtio_balloon *vb) return -ENOMEM; } +static int miscq_init(struct virtio_balloon *vb) +{ + vb->miscq_msg_rup = kmalloc(sizeof(struct virtio_balloon_miscq_msg) + + sizeof(struct virtio_balloon_page_chunk_entry) * + VIRTIO_BALLOON_MAX_PAGE_CHUNKS, GFP_KERNEL); + if (!vb->miscq_msg_rup) { + __virtio_clear_bit(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ); + dev_warn(&vb->vdev->dev, "%s: failed\n", __func__); + return -ENOMEM; + } + + INIT_WORK(&vb->miscq_handle_work, miscq_handle_func); + + return 0; +} + static int virtballoon_validate(struct virtio_device *vdev) { struct virtio_balloon *vb = NULL; @@ -874,7 +1088,16 @@ static int virtballoon_validate(struct virtio_device *vdev) goto err_page_chunk; } + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_MISC_VQ)) { + err = miscq_init(vb); + if (err < 0) + goto err_miscq_rup; + } + return 0; +err_miscq_rup: + free_page_bmap(vb); + kfree(vb->balloon_page_chunk); err_page_chunk: kfree(vb); err_vb: @@ -971,6 +1194,7 @@ static void virtballoon_remove(struct virtio_device *vdev) spin_unlock_irq(&vb->stop_update_lock); cancel_work_sync(&vb->update_balloon_size_work); cancel_work_sync(&vb->update_balloon_stats_work); + cancel_work_sync(&vb->miscq_handle_work); remove_common(vb); free_page_bmap(vb); @@ -1020,6 +1244,7 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, VIRTIO_BALLOON_F_PAGE_CHUNKS, + VIRTIO_BALLOON_F_MISC_VQ, }; static struct virtio_driver virtio_balloon_driver = { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index d532ed16..ea83b74 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_PAGE_CHUNKS 3 /* Inflate/Deflate pages in chunks */ +#define VIRTIO_BALLOON_F_MISC_VQ 4 /* Virtqueue for misc. requests */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -96,4 +97,15 @@ struct virtio_balloon_page_chunk { struct virtio_balloon_page_chunk_entry entry[]; }; +struct virtio_balloon_miscq_msg { +#define VIRTIO_BALLOON_MISCQ_CMD_REPORT_UNUSED_PAGES 0 + __le32 cmd; +/* Flag to indicate the completion of handling a command */ +#define VIRTIO_BALLOON_MISCQ_F_COMPLETION 1 + __le32 flags; + union { + struct virtio_balloon_page_chunk chunk; + } payload; +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ -- 2.7.4
kbuild test robot
2017-May-05 00:21 UTC
[PATCH v10 4/6] mm: function to offer a page block on the free list
Hi Wei, [auto build test WARNING on linus/master] [also build test WARNING on v4.11 next-20170504] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Wei-Wang/Extend-virtio-balloon-for-fast-de-inflating-fast-live-migration/20170505-052958 reproduce: make htmldocs All warnings (new ones prefixed by >>): WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org) arch/x86/include/asm/uaccess_32.h:1: warning: no structured comments found>> mm/page_alloc.c:4663: warning: No description found for parameter 'zone' >> mm/page_alloc.c:4663: warning: No description found for parameter 'order' >> mm/page_alloc.c:4663: warning: No description found for parameter 'migratetype' >> mm/page_alloc.c:4663: warning: No description found for parameter 'page'include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' include/net/cfg80211.h:1738: warning: No description found for parameter 'report_results' include/net/cfg80211.h:1738: warning: Excess struct/union/enum/typedef member 'results_wk' description in 'cfg80211_sched_scan_request' vim +/zone +4663 mm/page_alloc.c 4647 * Heuristically get a page block in the system that is unused. 4648 * It is possible that pages from the page block are used immediately after 4649 * report_unused_page_block() returns. It is the caller's responsibility 4650 * to either detect or prevent the use of such pages. 4651 * 4652 * The free list to check: zone->free_area[order].free_list[migratetype]. 4653 * 4654 * If the caller supplied page block (i.e. **page) is on the free list, offer 4655 * the next page block on the list to the caller. Otherwise, offer the first 4656 * page block on the list. 4657 * 4658 * Return 0 when a page block is found on the caller specified free list. 4659 */ 4660 int report_unused_page_block(struct zone *zone, unsigned int order, 4661 unsigned int migratetype, struct page **page) 4662 {> 4663 struct zone *this_zone;4664 struct list_head *this_list; 4665 int ret = 0; 4666 unsigned long flags; 4667 4668 /* Sanity check */ 4669 if (zone == NULL || page == NULL || order >= MAX_ORDER || 4670 migratetype >= MIGRATE_TYPES) 4671 return -EINVAL; --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation -------------- next part -------------- A non-text attachment was scrubbed... Name: .config.gz Type: application/gzip Size: 6564 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20170505/f1991d3a/attachment.bin>
Michael S. Tsirkin
2017-May-05 22:29 UTC
[PATCH v10 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS
On Thu, May 04, 2017 at 04:50:12PM +0800, Wei Wang wrote:> Add a new feature, VIRTIO_BALLOON_F_PAGE_CHUNKS, which enables > the transfer of the ballooned (i.e. inflated/deflated) pages in > chunks to the host. > > The implementation of the previous virtio-balloon is not very > efficient, because the ballooned pages are transferred to the > host one by one. Here is the breakdown of the time in percentage > spent on each step of the balloon inflating process (inflating > 7GB of an 8GB idle guest). > > 1) allocating pages (6.5%) > 2) sending PFNs to host (68.3%) > 3) address translation (6.1%) > 4) madvise (19%) > > It takes about 4126ms for the inflating process to complete. > The above profiling shows that the bottlenecks are stage 2) > and stage 4). > > This patch optimizes step 2) by transferring pages to the host in > chunks. A chunk consists of guest physically continuous pages. > When the pages are packed into a chunk, they are converted into > balloon page size (4KB) pages. A chunk is offered to the host > via a base PFN (i.e. the start PFN of those physically continuous > pages) and the size (i.e. the total number of the 4KB balloon size > pages). A chunk is formatted as below: > -------------------------------------------------------- > | Base (52 bit) | Rsvd (12 bit) | > -------------------------------------------------------- > -------------------------------------------------------- > | Size (52 bit) | Rsvd (12 bit) | > -------------------------------------------------------- > > By doing so, step 4) can also be optimized by doing address > translation and madvise() in chunks rather than page by page. > > With this new feature, the above ballooning process takes ~590ms > resulting in an improvement of ~85%. > > TODO: optimize stage 1) by allocating/freeing a chunk of pages > instead of a single page each time. > > Signed-off-by: Wei Wang <wei.w.wang at intel.com> > Signed-off-by: Liang Li <liang.z.li at intel.com> > Suggested-by: Michael S. Tsirkin <mst at redhat.com>This is much cleaner, thanks. It might be even better to have wrappers that put array and its size in a struct and manage that struct, but I won't require this for submission.> --- > drivers/virtio/virtio_balloon.c | 407 +++++++++++++++++++++++++++++++++--- > include/uapi/linux/virtio_balloon.h | 14 ++ > 2 files changed, 396 insertions(+), 25 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index ecb64e9..df16912 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -43,6 +43,20 @@ > #define OOM_VBALLOON_DEFAULT_PAGES 256 > #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 > > +/* The size of one page_bmap used to record inflated/deflated pages. */ > +#define VIRTIO_BALLOON_PAGE_BMAP_SIZE (8 * PAGE_SIZE) > +/* > + * Callulates how many pfns can a page_bmap record. A bit corresponds to a > + * page of PAGE_SIZE. > + */ > +#define VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP \ > + (VIRTIO_BALLOON_PAGE_BMAP_SIZE * BITS_PER_BYTE) > + > +/* The number of page_bmap to allocate by default. */ > +#define VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM 1 > +/* The maximum number of page_bmap that can be allocated. */ > +#define VIRTIO_BALLOON_PAGE_BMAP_MAX_NUM 32 > + > static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; > module_param(oom_pages, int, S_IRUSR | S_IWUSR); > MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); > @@ -51,6 +65,11 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); > static struct vfsmount *balloon_mnt; > #endif > > +/* Maximum number of page chunks */ > +#define VIRTIO_BALLOON_MAX_PAGE_CHUNKS ((8 * PAGE_SIZE - \ > + sizeof(struct virtio_balloon_page_chunk)) / \ > + sizeof(struct virtio_balloon_page_chunk_entry)) > + > struct virtio_balloon { > struct virtio_device *vdev; > struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; > @@ -79,6 +98,12 @@ struct virtio_balloon { > /* Synchronize access/update to this struct virtio_balloon elements */ > struct mutex balloon_lock; > > + /* Buffer for chunks of ballooned pages. */ > + struct virtio_balloon_page_chunk *balloon_page_chunk; > + > + /* Bitmap used to record pages. */ > + unsigned long *page_bmap[VIRTIO_BALLOON_PAGE_BMAP_MAX_NUM]; > + > /* The array of pfns we tell the Host about. */ > unsigned int num_pfns; > __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; > @@ -111,6 +136,136 @@ static void balloon_ack(struct virtqueue *vq) > wake_up(&vb->acked); > } > > +/* Update pfn_max and pfn_min according to the pfn of page */ > +static inline void update_pfn_range(struct virtio_balloon *vb, > + struct page *page, > + unsigned long *pfn_min, > + unsigned long *pfn_max) > +{ > + unsigned long pfn = page_to_pfn(page); > + > + *pfn_min = min(pfn, *pfn_min); > + *pfn_max = max(pfn, *pfn_max); > +} > + > +static unsigned int extend_page_bmap_size(struct virtio_balloon *vb, > + unsigned long pfn_num) > +{ > + unsigned int i, bmap_num, allocated_bmap_num; > + unsigned long bmap_len; > + > + allocated_bmap_num = VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; > + bmap_len = ALIGN(pfn_num, BITS_PER_LONG) / BITS_PER_BYTE; > + bmap_len = roundup(bmap_len, VIRTIO_BALLOON_PAGE_BMAP_SIZE); > + /* > + * VIRTIO_BALLOON_PAGE_BMAP_SIZE is the size of one page_bmap, so > + * divide it to calculate how many page_bmap that we need. > + */ > + bmap_num = (unsigned int)(bmap_len / VIRTIO_BALLOON_PAGE_BMAP_SIZE); > + /* The number of page_bmap... arrays ...> to allocate should not exceed the max */ > + bmap_num = min_t(unsigned int, VIRTIO_BALLOON_PAGE_BMAP_MAX_NUM, > + bmap_num); > + > + for (i = VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; i < bmap_num; i++) { > + vb->page_bmap[i] = kmalloc(VIRTIO_BALLOON_PAGE_BMAP_SIZE, > + GFP_KERNEL); > + if (vb->page_bmap[i]) > + allocated_bmap_num++; > + else > + break; > + } > + > + return allocated_bmap_num; > +} > + > +static void free_extended_page_bmap(struct virtio_balloon *vb, > + unsigned int page_bmap_num) > +{ > + unsigned int i; > + > + for (i = VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; i < page_bmap_num; > + i++) { > + kfree(vb->page_bmap[i]); > + vb->page_bmap[i] = NULL; > + page_bmap_num--; > + } > +} > + > +static void clear_page_bmap(struct virtio_balloon *vb, > + unsigned int page_bmap_num) > +{ > + int i; > + > + for (i = 0; i < page_bmap_num; i++) > + memset(vb->page_bmap[i], 0, VIRTIO_BALLOON_PAGE_BMAP_SIZE); > +} > + > +static void send_page_chunks(struct virtio_balloon *vb, struct virtqueue *vq) > +{ > + struct scatterlist sg; > + struct virtio_balloon_page_chunk *chunk; > + unsigned int len; > + > + chunk = vb->balloon_page_chunk; > + len = sizeof(__le64) + > + le64_to_cpu(chunk->chunk_num) * > + sizeof(struct virtio_balloon_page_chunk_entry); > + sg_init_one(&sg, chunk, len); > + if (!virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL)) { > + virtqueue_kick(vq); > + wait_event(vb->acked, virtqueue_get_buf(vq, &len)); > + chunk->chunk_num = 0; > + } > +} > + > +/* Add a chunk entry to the buffer. */ > +static void add_one_chunk(struct virtio_balloon *vb, struct virtqueue *vq, > + u64 base, u64 size) > +{ > + struct virtio_balloon_page_chunk *chunk = vb->balloon_page_chunk; > + struct virtio_balloon_page_chunk_entry *entry; > + uint64_t chunk_num = le64_to_cpu(chunk->chunk_num); > + > + entry = &chunk->entry[chunk_num]; > + entry->base = cpu_to_le64(base << VIRTIO_BALLOON_CHUNK_BASE_SHIFT); > + entry->size = cpu_to_le64(size << VIRTIO_BALLOON_CHUNK_SIZE_SHIFT); > + chunk->chunk_num = cpu_to_le64(++chunk_num); > + if (chunk_num == VIRTIO_BALLOON_MAX_PAGE_CHUNKS) > + send_page_chunks(vb, vq); > +} > + > +static void convert_bmap_to_chunks(struct virtio_balloon *vb, > + struct virtqueue *vq, > + unsigned long *bmap, > + unsigned long pfn_start, > + unsigned long size) > +{ > + unsigned long next_one, next_zero, chunk_size, pos = 0; > + > + while (pos < size) { > + next_one = find_next_bit(bmap, size, pos); > + /* > + * No "1" bit found, which means that there is no pfn > + * recorded in the rest of this bmap. > + */ > + if (next_one == size) > + break; > + next_zero = find_next_zero_bit(bmap, size, next_one + 1); > + /* > + * A bit in page_bmap corresponds to a page of PAGE_SIZE. > + * Convert it to be pages of 4KB balloon page size when > + * adding it to a chunk. > + */ > + chunk_size = (next_zero - next_one) * > + VIRTIO_BALLOON_PAGES_PER_PAGE; > + if (chunk_size) { > + add_one_chunk(vb, vq, pfn_start + next_one, > + chunk_size); > + pos += next_zero + 1; > + } > + } > +} > + > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) > { > struct scatterlist sg; > @@ -124,7 +279,33 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) > > /* When host has read buffer, this completes via balloon_ack */ > wait_event(vb->acked, virtqueue_get_buf(vq, &len)); > +} > + > +static void tell_host_from_page_bmap(struct virtio_balloon *vb, > + struct virtqueue *vq, > + unsigned long pfn_start, > + unsigned long pfn_end, > + unsigned int page_bmap_num) > +{ > + unsigned long i, pfn_num; > > + for (i = 0; i < page_bmap_num; i++) { > + /* > + * For the last page_bmap, only the remaining number of pfns > + * need to be searched rather than the entire page_bmap. > + */ > + if (i + 1 == page_bmap_num) > + pfn_num = (pfn_end - pfn_start) % > + VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP; > + else > + pfn_num = VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP; > + > + convert_bmap_to_chunks(vb, vq, vb->page_bmap[i], pfn_start + > + i * VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP, > + pfn_num); > + } > + if (le64_to_cpu(vb->balloon_page_chunk->chunk_num) > 0) > + send_page_chunks(vb, vq); > } > > static void set_page_pfns(struct virtio_balloon *vb, > @@ -141,13 +322,88 @@ static void set_page_pfns(struct virtio_balloon *vb, > page_to_balloon_pfn(page) + i); > } > > +/* > + * Send ballooned pages in chunks to host. > + * The ballooned pages are recorded in page bitmaps. Each bit in a bitmap > + * corresponds to a page of PAGE_SIZE. The page bitmaps are searched for > + * continuous "1" bits, which correspond to continuous pages, to chunk. > + * When packing those continuous pages into chunks, pages are converted into > + * 4KB balloon pages. > + * > + * pfn_max and pfn_min form the range of pfns that need to use page bitmaps to > + * record. If the range is too large to be recorded into the allocated page > + * bitmaps, the page bitmaps are used multiple times to record the entire > + * range of pfns. > + */ > +static void tell_host_page_chunks(struct virtio_balloon *vb, > + struct list_head *pages, > + struct virtqueue *vq, > + unsigned long pfn_max, > + unsigned long pfn_min) > +{ > + /* > + * The pfn_start and pfn_end form the range of pfns that the allocated > + * page_bmap can record in each round. > + */ > + unsigned long pfn_start, pfn_end; > + /* Total number of allocated page_bmaparrays> */ > + unsigned int page_bmap_num; > + struct page *page; > + bool found; > + > + /* > + * In the case that one page_bmap is not sufficient to record the pfn > + * range, page_bmap will be extended by allocating more numbers of > + * page_bmap. > + */ > + page_bmap_num = extend_page_bmap_size(vb, pfn_max - pfn_min + 1); > + > + /* Start from the beginning of the whole pfn range */ > + pfn_start = pfn_min; > + while (pfn_start < pfn_max) { > + pfn_end = pfn_start + > + VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP * page_bmap_num; > + pfn_end = pfn_end < pfn_max ? pfn_end : pfn_max; > + clear_page_bmap(vb, page_bmap_num); > + found = false; > + > + list_for_each_entry(page, pages, lru) { > + unsigned long bmap_idx, bmap_pos, this_pfn; > + > + this_pfn = page_to_pfn(page); > + if (this_pfn < pfn_start || this_pfn > pfn_end) > + continue; > + bmap_idx = (this_pfn - pfn_start) / > + VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP; > + bmap_pos = (this_pfn - pfn_start) % > + VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP; > + set_bit(bmap_pos, vb->page_bmap[bmap_idx]); > + > + found = true; > + } > + if (found) > + tell_host_from_page_bmap(vb, vq, pfn_start, pfn_end, > + page_bmap_num); > + /* > + * Start the next round when pfn_start and pfn_end couldn't > + * cover the whole pfn range given by pfn_max and pfn_min. > + */ > + pfn_start = pfn_end; > + } > + free_extended_page_bmap(vb, page_bmap_num); > +} > + > static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > { > struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; > unsigned num_allocated_pages; > + bool chunking = virtio_has_feature(vb->vdev, > + VIRTIO_BALLOON_F_PAGE_CHUNKS); > + unsigned long pfn_max = 0, pfn_min = ULONG_MAX; > > /* We can only do one array worth at a time. */ > - num = min(num, ARRAY_SIZE(vb->pfns)); > + if (!chunking) > + num = min(num, ARRAY_SIZE(vb->pfns)); > > mutex_lock(&vb->balloon_lock); > for (vb->num_pfns = 0; vb->num_pfns < num; > @@ -162,7 +418,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > msleep(200); > break; > } > - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > + if (chunking) > + update_pfn_range(vb, page, &pfn_max, &pfn_min); > + else > + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; > if (!virtio_has_feature(vb->vdev, > VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > @@ -171,8 +430,14 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > > num_allocated_pages = vb->num_pfns; > /* Did we get any? */ > - if (vb->num_pfns != 0) > - tell_host(vb, vb->inflate_vq); > + if (vb->num_pfns != 0) { > + if (chunking) > + tell_host_page_chunks(vb, &vb_dev_info->pages, > + vb->inflate_vq, > + pfn_max, pfn_min); > + else > + tell_host(vb, vb->inflate_vq); > + } > mutex_unlock(&vb->balloon_lock); > > return num_allocated_pages; > @@ -198,9 +463,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) > struct page *page; > struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; > LIST_HEAD(pages); > + bool chunking = virtio_has_feature(vb->vdev, > + VIRTIO_BALLOON_F_PAGE_CHUNKS); > + unsigned long pfn_max = 0, pfn_min = ULONG_MAX; > > - /* We can only do one array worth at a time. */ > - num = min(num, ARRAY_SIZE(vb->pfns)); > + /* Traditionally, we can only do one array worth at a time. */ > + if (!chunking) > + num = min(num, ARRAY_SIZE(vb->pfns)); > > mutex_lock(&vb->balloon_lock); > /* We can't release more pages than taken */ > @@ -210,7 +479,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) > page = balloon_page_dequeue(vb_dev_info); > if (!page) > break; > - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > + if (chunking) > + update_pfn_range(vb, page, &pfn_max, &pfn_min); > + else > + set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > list_add(&page->lru, &pages); > vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; > } > @@ -221,8 +493,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) > * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); > * is true, we *have* to do it in this order > */ > - if (vb->num_pfns != 0) > - tell_host(vb, vb->deflate_vq); > + if (vb->num_pfns != 0) { > + if (chunking) > + tell_host_page_chunks(vb, &pages, vb->deflate_vq, > + pfn_max, pfn_min); > + else > + tell_host(vb, vb->deflate_vq); > + } > release_pages_balloon(vb, &pages); > mutex_unlock(&vb->balloon_lock); > return num_freed_pages; > @@ -442,6 +719,14 @@ static int init_vqs(struct virtio_balloon *vb) > } > > #ifdef CONFIG_BALLOON_COMPACTION > + > +static void tell_host_one_page(struct virtio_balloon *vb, > + struct virtqueue *vq, struct page *page) > +{ > + add_one_chunk(vb, vq, page_to_pfn(page), > + VIRTIO_BALLOON_PAGES_PER_PAGE); > +} > + > /* > * virtballoon_migratepage - perform the balloon page migration on behalf of > * a compation thread. (called under page lock) > @@ -465,6 +750,8 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, > { > struct virtio_balloon *vb = container_of(vb_dev_info, > struct virtio_balloon, vb_dev_info); > + bool chunking = virtio_has_feature(vb->vdev, > + VIRTIO_BALLOON_F_PAGE_CHUNKS); > unsigned long flags; > > /* > @@ -486,16 +773,22 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, > vb_dev_info->isolated_pages--; > __count_vm_event(BALLOON_MIGRATE); > spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags); > - vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; > - set_page_pfns(vb, vb->pfns, newpage); > - tell_host(vb, vb->inflate_vq); > - > + if (chunking) { > + tell_host_one_page(vb, vb->inflate_vq, newpage); > + } else { > + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; > + set_page_pfns(vb, vb->pfns, newpage); > + tell_host(vb, vb->inflate_vq); > + } > /* balloon's page migration 2nd step -- deflate "page" */ > balloon_page_delete(page); > - vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; > - set_page_pfns(vb, vb->pfns, page); > - tell_host(vb, vb->deflate_vq); > - > + if (chunking) { > + tell_host_one_page(vb, vb->deflate_vq, page); > + } else { > + vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; > + set_page_pfns(vb, vb->pfns, page); > + tell_host(vb, vb->deflate_vq); > + } > mutex_unlock(&vb->balloon_lock); > > put_page(page); /* balloon reference */ > @@ -522,9 +815,75 @@ static struct file_system_type balloon_fs = { > > #endif /* CONFIG_BALLOON_COMPACTION */ > > +static void free_page_bmap(struct virtio_balloon *vb) > +{ > + int i; > + > + for (i = 0; i < VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; i++) { > + kfree(vb->page_bmap[i]); > + vb->page_bmap[i] = NULL; > + } > +} > + > +static int balloon_page_chunk_init(struct virtio_balloon *vb) > +{ > + int i; > + > + vb->balloon_page_chunk = kmalloc(sizeof(__le64) + > + sizeof(struct virtio_balloon_page_chunk_entry) * > + VIRTIO_BALLOON_MAX_PAGE_CHUNKS, GFP_KERNEL); > + if (!vb->balloon_page_chunk) > + goto err_page_chunk; > + > + /* > + * The default number of page_bmaps are allocated. More may be > + * allocated on demand. > + */ > + for (i = 0; i < VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM; i++) { > + vb->page_bmap[i] = kmalloc(VIRTIO_BALLOON_PAGE_BMAP_SIZE, > + GFP_KERNEL); > + if (!vb->page_bmap[i]) > + goto err_page_bmap; > + } > + > + return 0; > +err_page_bmap: > + free_page_bmap(vb); > + kfree(vb->balloon_page_chunk); > + vb->balloon_page_chunk = NULL; > +err_page_chunk: > + __virtio_clear_bit(vb->vdev, VIRTIO_BALLOON_F_PAGE_CHUNKS); > + dev_warn(&vb->vdev->dev, "%s: failed\n", __func__); > + return -ENOMEM; > +} > + > +static int virtballoon_validate(struct virtio_device *vdev) > +{ > + struct virtio_balloon *vb = NULL; > + int err; > + > + vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL); > + if (!vb) { > + err = -ENOMEM; > + goto err_vb; > + } > + > + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_CHUNKS)) { > + err = balloon_page_chunk_init(vb); > + if (err < 0) > + goto err_page_chunk; > + } > + > + return 0; > +err_page_chunk: > + kfree(vb); > +err_vb: > + return err; > +} > + > static int virtballoon_probe(struct virtio_device *vdev) > { > - struct virtio_balloon *vb; > + struct virtio_balloon *vb = vdev->priv; > int err; > > if (!vdev->config->get) { > @@ -533,17 +892,12 @@ static int virtballoon_probe(struct virtio_device *vdev) > return -EINVAL; > } > > - vdev->priv = vb = kmalloc(sizeof(*vb), GFP_KERNEL); > - if (!vb) { > - err = -ENOMEM; > - goto out; > - } > - > INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func); > INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func); > spin_lock_init(&vb->stop_update_lock); > vb->stop_update = false; > vb->num_pages = 0; > + > mutex_init(&vb->balloon_lock); > init_waitqueue_head(&vb->acked); > vb->vdev = vdev; > @@ -590,7 +944,6 @@ static int virtballoon_probe(struct virtio_device *vdev) > vdev->config->del_vqs(vdev); > out_free_vb: > kfree(vb); > -out: > return err; > } > > @@ -620,6 +973,8 @@ static void virtballoon_remove(struct virtio_device *vdev) > cancel_work_sync(&vb->update_balloon_stats_work); > > remove_common(vb); > + free_page_bmap(vb); > + kfree(vb->balloon_page_chunk); > #ifdef CONFIG_BALLOON_COMPACTION > if (vb->vb_dev_info.inode) > iput(vb->vb_dev_info.inode); > @@ -664,6 +1019,7 @@ static unsigned int features[] = { > VIRTIO_BALLOON_F_MUST_TELL_HOST, > VIRTIO_BALLOON_F_STATS_VQ, > VIRTIO_BALLOON_F_DEFLATE_ON_OOM, > + VIRTIO_BALLOON_F_PAGE_CHUNKS, > }; > > static struct virtio_driver virtio_balloon_driver = { > @@ -674,6 +1030,7 @@ static struct virtio_driver virtio_balloon_driver = { > .id_table = id_table, > .probe = virtballoon_probe, > .remove = virtballoon_remove, > + .validate = virtballoon_validate, > .config_changed = virtballoon_changed, > #ifdef CONFIG_PM_SLEEP > .freeze = virtballoon_freeze, > diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h > index 343d7dd..d532ed16 100644 > --- a/include/uapi/linux/virtio_balloon.h > +++ b/include/uapi/linux/virtio_balloon.h > @@ -34,6 +34,7 @@ > #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */ > #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ > #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ > +#define VIRTIO_BALLOON_F_PAGE_CHUNKS 3 /* Inflate/Deflate pages in chunks */ > > /* Size of a PFN in the balloon interface. */ > #define VIRTIO_BALLOON_PFN_SHIFT 12 > @@ -82,4 +83,17 @@ struct virtio_balloon_stat { > __virtio64 val; > } __attribute__((packed)); > > +#define VIRTIO_BALLOON_CHUNK_BASE_SHIFT 12 > +#define VIRTIO_BALLOON_CHUNK_SIZE_SHIFT 12 > +struct virtio_balloon_page_chunk_entry { > + __le64 base; > + __le64 size; > +}; > + > +struct virtio_balloon_page_chunk { > + /* Number of chunks in the payload */ > + __le64 chunk_num; > + struct virtio_balloon_page_chunk_entry entry[]; > +}; > + > #endif /* _LINUX_VIRTIO_BALLOON_H */ > -- > 2.7.4
Apparently Analagous Threads
- [PATCH v10 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS
- [PATCH v10 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS
- [PATCH v11 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS
- [PATCH v11 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS
- [PATCH v11 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS