thr3ads.net - Linux Virtualization - [virtio-dev] Re: [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2017-Mar-08 04:01 UTC

[PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

On Fri, Mar 03, 2017 at 01:40:28PM +0800, Wei Wang
wrote:> From: Liang Li <liang.z.li at intel.com>
> 
> The implementation of the current virtio-balloon is not very
> efficient, because the pages are transferred to the host one by one.
> Here is the breakdown of the time in percentage spent on each
> step of the balloon inflating process (inflating 7GB of an 8GB
> idle guest).
> 
> 1) allocating pages (6.5%)
> 2) sending PFNs to host (68.3%)
> 3) address translation (6.1%)
> 4) madvise (19%)
> 
> It takes about 4126ms for the inflating process to complete.
> The above profiling shows that the bottlenecks are stage 2)
> and stage 4).
> 
> This patch optimizes step 2) by transfering pages to the host in
> chunks. A chunk consists of guest physically continuous pages, and
> it is offered to the host via a base PFN (i.e. the start PFN of
> those physically continuous pages) and the size (i.e. the total
> number of the pages). A normal chunk is formated as below:
> -----------------------------------------------
> |  Base (52 bit)               | Size (12 bit)|
> -----------------------------------------------
> For large size chunks, an extended chunk format is used:
> -----------------------------------------------
> |                 Base (64 bit)               |
> -----------------------------------------------
> -----------------------------------------------
> |                 Size (64 bit)               |
> -----------------------------------------------
> 
> By doing so, step 4) can also be optimized by doing address
> translation and madvise() in chunks rather than page by page.
> 
> This optimization requires the negotation of a new feature bit,
> VIRTIO_BALLOON_F_CHUNK_TRANSFER.
> 
> With this new feature, the above ballooning process takes ~590ms
> resulting in an improvement of ~85%.
> 
> TODO: optimize stage 1) by allocating/freeing a chunk of pages
> instead of a single page each time.
> 
> Signed-off-by: Liang Li <liang.z.li at intel.com>
> Signed-off-by: Wei Wang <wei.w.wang at intel.com>
> Suggested-by: Michael S. Tsirkin <mst at redhat.com>
> Cc: Michael S. Tsirkin <mst at redhat.com>
> Cc: Paolo Bonzini <pbonzini at redhat.com>
> Cc: Cornelia Huck <cornelia.huck at de.ibm.com>
> Cc: Amit Shah <amit.shah at redhat.com>
> Cc: Dave Hansen <dave.hansen at intel.com>
> Cc: Andrea Arcangeli <aarcange at redhat.com>
> Cc: David Hildenbrand <david at redhat.com>
> Cc: Liang Li <liliang324 at gmail.com>
> Cc: Wei Wang <wei.w.wang at intel.com>
Does this pass sparse? I see some endian-ness issues here.
> ---
>  drivers/virtio/virtio_balloon.c | 351
++++++++++++++++++++++++++++++++++++----
>  1 file changed, 323 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c
b/drivers/virtio/virtio_balloon.c
> index f59cb4f..4416370 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -42,6 +42,10 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +#define PAGE_BMAP_SIZE	(8 * PAGE_SIZE)
> +#define PFNS_PER_PAGE_BMAP	(PAGE_BMAP_SIZE * BITS_PER_BYTE)
> +#define PAGE_BMAP_COUNT_MAX	32
> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> @@ -50,6 +54,16 @@ MODULE_PARM_DESC(oom_pages, "pages to free on
OOM");
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +struct balloon_page_chunk {
> +	__le64 base : 52;
> +	__le64 size : 12;
> +};
> +
> +struct balloon_page_chunk_ext {
> +	__le64 base;
> +	__le64 size;
> +};
> +
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
>  	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -67,6 +81,20 @@ struct virtio_balloon {
>  
>  	/* Number of balloon pages we've told the Host we're not using.
*/
>  	unsigned int num_pages;
> +	/* Pointer to the response header. */
> +	struct virtio_balloon_resp_hdr resp_hdr;
> +	/* Pointer to the start address of response data. */
> +	__le64 *resp_data;
> +	/* Size of response data buffer. */
> +	unsigned int resp_buf_size;
> +	/* Pointer offset of the response data. */
> +	unsigned int resp_pos;
> +	/* Bitmap used to save the pfns info */
> +	unsigned long *page_bitmap[PAGE_BMAP_COUNT_MAX];
> +	/* Number of split page bitmaps */
> +	unsigned int nr_page_bmap;
> +	/* Used to record the processed pfn range */
> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -110,20 +138,180 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> -static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> +static inline void init_bmap_pfn_range(struct virtio_balloon *vb)
>  {
> -	struct scatterlist sg;
> +	vb->min_pfn = ULONG_MAX;
> +	vb->max_pfn = 0;
> +}
> +
> +static inline void update_bmap_pfn_range(struct virtio_balloon *vb,
> +				 struct page *page)
> +{
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	vb->min_pfn = min(balloon_pfn, vb->min_pfn);
> +	vb->max_pfn = max(balloon_pfn, vb->max_pfn);
> +}
> +
> +static void extend_page_bitmap(struct virtio_balloon *vb,
> +				unsigned long nr_pfn)

what exactly does this do?
> +{
> +	int i, bmap_count;
> +	unsigned long bmap_len;
> +
> +	bmap_len = ALIGN(nr_pfn, BITS_PER_LONG) / BITS_PER_BYTE;
> +	bmap_len = ALIGN(bmap_len, PAGE_BMAP_SIZE);
> +	bmap_count = min((int)(bmap_len / PAGE_BMAP_SIZE),
> +				 PAGE_BMAP_COUNT_MAX);
> +
> +	for (i = 1; i < bmap_count; i++) {
> +		vb->page_bitmap[i] = kmalloc(PAGE_BMAP_SIZE, GFP_KERNEL);
> +		if (vb->page_bitmap[i])
> +			vb->nr_page_bmap++;
> +		else
> +			break;
seems to fail silently ...
> +	}
> +}
> +
> +static void free_extended_page_bitmap(struct virtio_balloon *vb)
> +{
> +	int i, bmap_count = vb->nr_page_bmap;
> +
> +	for (i = 1; i < bmap_count; i++) {
> +		kfree(vb->page_bitmap[i]);
> +		vb->page_bitmap[i] = NULL;
> +		vb->nr_page_bmap--;
> +	}
> +}
> +
> +static void kfree_page_bitmap(struct virtio_balloon *vb)
> +{
> +	int i;
> +
> +	for (i = 0; i < vb->nr_page_bmap; i++)
> +		kfree(vb->page_bitmap[i]);
> +}
> +
A very confusing name. pls don't start with kfree.
> +static void clear_page_bitmap(struct virtio_balloon *vb)
> +{
> +	int i;
> +
> +	for (i = 0; i < vb->nr_page_bmap; i++)
> +		memset(vb->page_bitmap[i], 0, PAGE_BMAP_SIZE);
> +}
> +
> +static void send_resp_data(struct virtio_balloon *vb, struct virtqueue
*vq,
> +			bool busy_wait)
> +{
> +	struct scatterlist sg[2];
> +	struct virtio_balloon_resp_hdr *hdr = &vb->resp_hdr;
>  	unsigned int len;
>  
> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) *
vb->num_pfns);
> +	len = hdr->data_len = vb->resp_pos * sizeof(__le64);
> +	sg_init_table(sg, 2);
> +	sg_set_buf(&sg[0], hdr, sizeof(struct virtio_balloon_resp_hdr));
> +	sg_set_buf(&sg[1], vb->resp_data, len);
> +
> +	if (virtqueue_add_outbuf(vq, sg, 2, vb, GFP_KERNEL) == 0) {
!virtqueue_add_outbuf

Also how about making this header linear with pfns?
This way you can use 1 s/g and avoid mallocs.
> +		virtqueue_kick(vq);
> +		if (busy_wait)
> +			while (!virtqueue_get_buf(vq, &len)
> +				&& !virtqueue_is_broken(vq))
> +				cpu_relax();
> +		else
> +			wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> +		vb->resp_pos = 0;
> +		free_extended_page_bitmap(vb);
> +	}
> +}
>  
> -	/* We should always be able to add one buffer to an empty queue. */
> -	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> -	virtqueue_kick(vq);
> +static void do_set_resp_bitmap(struct virtio_balloon *vb, unsigned long
base,
> +			       int size)
> +{
> +	/* Use the extented chunk format if the size is too large */
> +	if (size > (1 << VIRTIO_BALLOON_CHUNK_SIZE_SHIFT)) {
> +		struct balloon_page_chunk_ext *chunk_ext > +			(struct
balloon_page_chunk_ext *)(vb->resp_data + vb->resp_pos);
> +		chunk_ext->base = cpu_to_le64(base);
> +		chunk_ext->size = cpu_to_le64(size);
> +		vb->resp_pos += sizeof(vb->resp_pos) / sizeof(*chunk_ext);
> +	} else {
> +		struct balloon_page_chunk *chunk > +			(struct balloon_page_chunk
*)(vb->resp_data + vb->resp_pos);
> +		chunk->base = cpu_to_le64(base);
> +		chunk->size = cpu_to_le16(size);
> +		vb->resp_pos += sizeof(vb->resp_pos) / sizeof(*chunk);
> +	}
> +}
>  
> -	/* When host has read buffer, this completes via balloon_ack */
> -	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> +static void chunking_pages_from_bmap(struct virtio_balloon *vb,
> +				     struct virtqueue *vq,
> +				     unsigned long start_pfn, unsigned long *bitmap,
> +				     unsigned long len, bool busy_wait)
> +{
> +	unsigned long pos = 0, end = len * BITS_PER_BYTE;
> +
> +	while (pos < end) {
> +		unsigned long one = find_next_bit(bitmap, end, pos);
> +
> +		if (one < end) {
> +			unsigned long chunk_size, zero;
> +
> +			zero = find_next_zero_bit(bitmap, end, one + 1);
> +			if (zero >= end)
> +				chunk_size = end - one;
> +			else
> +				chunk_size = zero - one;
> +			if (chunk_size) {
> +				if ((vb->resp_pos + 2) * sizeof(__le64) >
> +						vb->resp_buf_size)
why + 2? where does sizeof come from?
Any chance this can be cleaned up using structs sizeof etc?
or at least commented?
How am I supposed to figure out all this convoluted pointer math?
> +					send_resp_data(vb, vq, busy_wait);
> +				do_set_resp_bitmap(vb, start_pfn + one,	chunk_size);
> +			}
> +			pos = one + chunk_size;
> +		} else
> +			break;
> +	}
> +}
>  
> +static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> +{
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CHUNK_TRANSFER)) {
> +		int nr_pfn, nr_used_bmap, i;
> +		unsigned long start_pfn, bmap_len;
> +
> +		start_pfn = vb->start_pfn;
> +		nr_pfn = vb->end_pfn - start_pfn + 1;
> +		nr_pfn = roundup(nr_pfn, BITS_PER_LONG);
> +		nr_used_bmap = nr_pfn / PFNS_PER_PAGE_BMAP;
better name here including "page"?
And bmaps instead of bmap?
> +		if (nr_pfn % PFNS_PER_PAGE_BMAP)
> +			nr_used_bmap++;
Pls just use macros like ALIGN or roundup?
> +		bmap_len = nr_pfn / BITS_PER_BYTE;
I guess we need to round up.
> +
> +		for (i = 0; i < nr_used_bmap; i++) {
> +			unsigned int bmap_size = PAGE_BMAP_SIZE;
> +
> +			if (i + 1 == nr_used_bmap)
> +				bmap_size = bmap_len - PAGE_BMAP_SIZE * i;
So last one is treated speacially? why?
> +			chunking_pages_from_bmap(vb, vq, start_pfn + i * PFNS_PER_PAGE_BMAP,
> +				 vb->page_bitmap[i], bmap_size, false);
> +		}
> +		if (vb->resp_pos > 0)
> +			send_resp_data(vb, vq, false);
> +	} else {
> +		struct scatterlist sg;
> +		unsigned int len;
> +
> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) *
vb->num_pfns);
> +
> +		/* We should always be able to add one buffer to an
> +		 * empty queue
> +		 */
/*
 * Multi-line comments
 * like this.
 */
> +		virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> +		virtqueue_kick(vq);
> +		/* When host has read buffer, this completes via balloon_ack */
> +		wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> +	}
>  }
>  
>  static void set_page_pfns(struct virtio_balloon *vb,
> @@ -138,13 +326,58 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static void set_page_bitmap(struct virtio_balloon *vb,
> +			 struct list_head *pages, struct virtqueue *vq)
align on ( pls.
> +{
> +	unsigned long pfn, pfn_limit;
> +	struct page *page;
> +	bool found;
> +
> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
> +	pfn_limit = PFNS_PER_PAGE_BMAP * vb->nr_page_bmap;
> +
> +	if (vb->nr_page_bmap == 1)
> +		extend_page_bitmap(vb, vb->max_pfn - vb->min_pfn + 1);
how is 1 special?
> +	for (pfn = vb->min_pfn; pfn < vb->max_pfn; pfn += pfn_limit) {
> +		unsigned long end_pfn;
> +
> +		clear_page_bitmap(vb);
> +		vb->start_pfn = pfn;
> +		end_pfn = pfn;
> +		found = false;
> +		list_for_each_entry(page, pages, lru) {
> +			unsigned long bmap_idx, bmap_pos, balloon_pfn;
> +
> +			balloon_pfn = page_to_balloon_pfn(page);
> +			if (balloon_pfn < pfn || balloon_pfn >= pfn + pfn_limit)
> +				continue;
> +			bmap_idx = (balloon_pfn - pfn) / PFNS_PER_PAGE_BMAP;
> +			bmap_pos = (balloon_pfn - pfn) % PFNS_PER_PAGE_BMAP;
> +			set_bit(bmap_pos, vb->page_bitmap[bmap_idx]);
> +			if (balloon_pfn > end_pfn)
> +				end_pfn = balloon_pfn;
> +			found = true;
> +		}
> +		if (found) {
> +			vb->end_pfn = end_pfn;
> +			tell_host(vb, vq);
> +		}
> +	}
> +}
> +
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
> -	unsigned num_allocated_pages;
> +	unsigned int num_allocated_pages;
> +	bool chunking = virtio_has_feature(vb->vdev,
> +				 VIRTIO_BALLOON_F_CHUNK_TRANSFER);
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (chunking)
> +		init_bmap_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -159,7 +392,10 @@ static unsigned fill_balloon(struct virtio_balloon
*vb, size_t num)
>  			msleep(200);
>  			break;
>  		}
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (chunking)
> +			update_bmap_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> @@ -168,8 +404,13 @@ static unsigned fill_balloon(struct virtio_balloon
*vb, size_t num)
>  
>  	num_allocated_pages = vb->num_pfns;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (chunking)
> +			set_page_bitmap(vb, &vb_dev_info->pages,
> +					vb->inflate_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
> @@ -189,15 +430,20 @@ static void release_pages_balloon(struct
virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num)
>  {
> -	unsigned num_freed_pages;
> +	unsigned int num_freed_pages;
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
> +	bool chunking = virtio_has_feature(vb->vdev,
> +			 VIRTIO_BALLOON_F_CHUNK_TRANSFER);
Align on ( pls.
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (chunking)
> +		init_bmap_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	/* We can't release more pages than taken */
> @@ -207,7 +453,10 @@ static unsigned leak_balloon(struct virtio_balloon
*vb, size_t num)
>  		page = balloon_page_dequeue(vb_dev_info);
>  		if (!page)
>  			break;
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (chunking)
> +			update_bmap_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		list_add(&page->lru, &pages);
>  		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
> @@ -218,8 +467,12 @@ static unsigned leak_balloon(struct virtio_balloon
*vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->deflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (chunking)
> +			set_page_bitmap(vb, &pages, vb->deflate_vq);
> +		else
> +			tell_host(vb, vb->deflate_vq);
> +	}
>  	release_pages_balloon(vb, &pages);
>  	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
> @@ -431,6 +684,18 @@ static int init_vqs(struct virtio_balloon *vb)
>  }
>  
>  #ifdef CONFIG_BALLOON_COMPACTION
> +static void tell_host_one_page(struct virtio_balloon *vb,
> +	struct virtqueue *vq, struct page *page)
> +{
> +	__le64 *chunk;
> +
> +	chunk = vb->resp_data + vb->resp_pos;
> +	*chunk = cpu_to_le64((page_to_pfn(page) <<
> +				VIRTIO_BALLOON_CHUNK_SIZE_SHIFT) | 1);
> +	vb->resp_pos++;
> +	send_resp_data(vb, vq, false);
> +}
> +
>  /*
>   * virtballoon_migratepage - perform the balloon page migration on behalf
of
>   *			     a compation thread.     (called under page lock)
> @@ -455,6 +720,8 @@ static int virtballoon_migratepage(struct
balloon_dev_info *vb_dev_info,
>  	struct virtio_balloon *vb = container_of(vb_dev_info,
>  			struct virtio_balloon, vb_dev_info);
>  	unsigned long flags;
> +	bool chunking = virtio_has_feature(vb->vdev,
> +				 VIRTIO_BALLOON_F_CHUNK_TRANSFER);
>  
>  	/*
>  	 * In order to avoid lock contention while migrating pages concurrently
> @@ -475,15 +742,23 @@ static int virtballoon_migratepage(struct
balloon_dev_info *vb_dev_info,
>  	vb_dev_info->isolated_pages--;
>  	__count_vm_event(BALLOON_MIGRATE);
>  	spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
> -	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> -	set_page_pfns(vb, vb->pfns, newpage);
> -	tell_host(vb, vb->inflate_vq);
> +	if (chunking)
> +		tell_host_one_page(vb, vb->inflate_vq, newpage);
> +	else {
Ad {} around 1st block too pls.
> +		vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> +		set_page_pfns(vb, vb->pfns, newpage);
> +		tell_host(vb, vb->inflate_vq);
> +	}
>  
>  	/* balloon's page migration 2nd step -- deflate "page" */
>  	balloon_page_delete(page);
> -	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> -	set_page_pfns(vb, vb->pfns, page);
> -	tell_host(vb, vb->deflate_vq);
> +	if (chunking)
> +		tell_host_one_page(vb, vb->deflate_vq, page);

and here
> +	else {
> +		vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> +		set_page_pfns(vb, vb->pfns, page);
> +		tell_host(vb, vb->deflate_vq);
> +	}
>  
>  	mutex_unlock(&vb->balloon_lock);
>  
> @@ -533,6 +808,21 @@ static int virtballoon_probe(struct virtio_device
*vdev)
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> +
> +	vb->page_bitmap[0] = kmalloc(PAGE_BMAP_SIZE, GFP_KERNEL);
> +	if (!vb->page_bitmap[0]) {
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_CHUNK_TRANSFER);
> +	} else {
> +		vb->nr_page_bmap = 1;
> +		vb->resp_data = kmalloc(PAGE_BMAP_SIZE, GFP_KERNEL);
> +		if (!vb->resp_data) {
> +			__virtio_clear_bit(vdev,
> +					VIRTIO_BALLOON_F_CHUNK_TRANSFER);
align on ( pls
> +			kfree(vb->page_bitmap[0]);
> +		}
> +	}
> +	vb->resp_pos = 0;
> +	vb->resp_buf_size = PAGE_BMAP_SIZE;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
> @@ -578,6 +868,8 @@ static int virtballoon_probe(struct virtio_device
*vdev)
>  out_del_vqs:
>  	vdev->config->del_vqs(vdev);
>  out_free_vb:
> +	kfree(vb->resp_data);
> +	kfree_page_bitmap(vb);
>  	kfree(vb);
>  out:
>  	return err;
> @@ -611,6 +903,8 @@ static void virtballoon_remove(struct virtio_device
*vdev)
>  	remove_common(vb);
>  	if (vb->vb_dev_info.inode)
>  		iput(vb->vb_dev_info.inode);
> +	kfree_page_bitmap(vb);
> +	kfree(vb->resp_data);
>  	kfree(vb);
>  }
>  
> @@ -649,6 +943,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> +	VIRTIO_BALLOON_F_CHUNK_TRANSFER,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 2.7.4

Wei Wang

2017-Mar-10 10:02 UTC

head link

[virtio-dev] Re: [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

On 03/08/2017 12:01 PM, Michael S. Tsirkin wrote:> On Fri, Mar 03, 2017 at 01:40:28PM +0800, Wei Wang wrote:
>> From: Liang Li <liang.z.li at intel.com>
>>
>> The implementation of the current virtio-balloon is not very
>> efficient, because the pages are transferred to the host one by one.
>> Here is the breakdown of the time in percentage spent on each
>> step of the balloon inflating process (inflating 7GB of an 8GB
>> idle guest).
>>
>> 1) allocating pages (6.5%)
>> 2) sending PFNs to host (68.3%)
>> 3) address translation (6.1%)
>> 4) madvise (19%)
>>
>> It takes about 4126ms for the inflating process to complete.
>> The above profiling shows that the bottlenecks are stage 2)
>> and stage 4).
>>
>> This patch optimizes step 2) by transfering pages to the host in
>> chunks. A chunk consists of guest physically continuous pages, and
>> it is offered to the host via a base PFN (i.e. the start PFN of
>> those physically continuous pages) and the size (i.e. the total
>> number of the pages). A normal chunk is formated as below:
>> -----------------------------------------------
>> |  Base (52 bit)               | Size (12 bit)|
>> -----------------------------------------------
>> For large size chunks, an extended chunk format is used:
>> -----------------------------------------------
>> |                 Base (64 bit)               |
>> -----------------------------------------------
>> -----------------------------------------------
>> |                 Size (64 bit)               |
>> -----------------------------------------------
>>
>> By doing so, step 4) can also be optimized by doing address
>> translation and madvise() in chunks rather than page by page.
>>
>> This optimization requires the negotation of a new feature bit,
>> VIRTIO_BALLOON_F_CHUNK_TRANSFER.
>>
>> With this new feature, the above ballooning process takes ~590ms
>> resulting in an improvement of ~85%.
>>
>> TODO: optimize stage 1) by allocating/freeing a chunk of pages
>> instead of a single page each time.
>>
>> Signed-off-by: Liang Li <liang.z.li at intel.com>
>> Signed-off-by: Wei Wang <wei.w.wang at intel.com>
>> Suggested-by: Michael S. Tsirkin <mst at redhat.com>
>> Cc: Michael S. Tsirkin <mst at redhat.com>
>> Cc: Paolo Bonzini <pbonzini at redhat.com>
>> Cc: Cornelia Huck <cornelia.huck at de.ibm.com>
>> Cc: Amit Shah <amit.shah at redhat.com>
>> Cc: Dave Hansen <dave.hansen at intel.com>
>> Cc: Andrea Arcangeli <aarcange at redhat.com>
>> Cc: David Hildenbrand <david at redhat.com>
>> Cc: Liang Li <liliang324 at gmail.com>
>> Cc: Wei Wang <wei.w.wang at intel.com>
> Does this pass sparse? I see some endian-ness issues here.
"pass sparse"- what does that mean?
I didn't see any complaints from "make" on my machine.
>> ---
>>   drivers/virtio/virtio_balloon.c | 351
++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 323 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_balloon.c
b/drivers/virtio/virtio_balloon.c
>> index f59cb4f..4416370 100644
>> --- a/drivers/virtio/virtio_balloon.c
>> +++ b/drivers/virtio/virtio_balloon.c
>> @@ -42,6 +42,10 @@
>>   #define OOM_VBALLOON_DEFAULT_PAGES 256
>>   #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>>   
>> +#define PAGE_BMAP_SIZE	(8 * PAGE_SIZE)
>> +#define PFNS_PER_PAGE_BMAP	(PAGE_BMAP_SIZE * BITS_PER_BYTE)
>> +#define PAGE_BMAP_COUNT_MAX	32
>> +
>>   static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>>   module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>>   MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>> @@ -50,6 +54,16 @@ MODULE_PARM_DESC(oom_pages, "pages to free on
OOM");
>>   static struct vfsmount *balloon_mnt;
>>   #endif
>>   
>> +struct balloon_page_chunk {
>> +	__le64 base : 52;
>> +	__le64 size : 12;
>> +};
>> +
>> +struct balloon_page_chunk_ext {
>> +	__le64 base;
>> +	__le64 size;
>> +};
>> +
>>   struct virtio_balloon {
>>   	struct virtio_device *vdev;
>>   	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
>> @@ -67,6 +81,20 @@ struct virtio_balloon {
>>   
>>   	/* Number of balloon pages we've told the Host we're not
using. */
>>   	unsigned int num_pages;
>> +	/* Pointer to the response header. */
>> +	struct virtio_balloon_resp_hdr resp_hdr;
>> +	/* Pointer to the start address of response data. */
>> +	__le64 *resp_data;
>> +	/* Size of response data buffer. */
>> +	unsigned int resp_buf_size;
>> +	/* Pointer offset of the response data. */
>> +	unsigned int resp_pos;
>> +	/* Bitmap used to save the pfns info */
>> +	unsigned long *page_bitmap[PAGE_BMAP_COUNT_MAX];
>> +	/* Number of split page bitmaps */
>> +	unsigned int nr_page_bmap;
>> +	/* Used to record the processed pfn range */
>> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>>   	/*
>>   	 * The pages we've told the Host we're not using are
enqueued
>>   	 * at vb_dev_info->pages list.
>> @@ -110,20 +138,180 @@ static void balloon_ack(struct virtqueue *vq)
>>   	wake_up(&vb->acked);
>>   }
>>   
>> -static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>> +static inline void init_bmap_pfn_range(struct virtio_balloon *vb)
>>   {
>> -	struct scatterlist sg;
>> +	vb->min_pfn = ULONG_MAX;
>> +	vb->max_pfn = 0;
>> +}
>> +
>> +static inline void update_bmap_pfn_range(struct virtio_balloon *vb,
>> +				 struct page *page)
>> +{
>> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
>> +
>> +	vb->min_pfn = min(balloon_pfn, vb->min_pfn);
>> +	vb->max_pfn = max(balloon_pfn, vb->max_pfn);
>> +}
>> +
>> +static void extend_page_bitmap(struct virtio_balloon *vb,
>> +				unsigned long nr_pfn)
>
> what exactly does this do?
By default, only one page_bmap is allocated.
When there are too many pfns to fit into one page_bmap,
it tries to allocate more page_bmap using this extend_page_bmap() function.
>
>> +{
>> +	int i, bmap_count;
>> +	unsigned long bmap_len;
>> +
>> +	bmap_len = ALIGN(nr_pfn, BITS_PER_LONG) / BITS_PER_BYTE;
>> +	bmap_len = ALIGN(bmap_len, PAGE_BMAP_SIZE);
>> +	bmap_count = min((int)(bmap_len / PAGE_BMAP_SIZE),
>> +				 PAGE_BMAP_COUNT_MAX);
>> +
>> +	for (i = 1; i < bmap_count; i++) {
>> +		vb->page_bitmap[i] = kmalloc(PAGE_BMAP_SIZE, GFP_KERNEL);
>> +		if (vb->page_bitmap[i])
>> +			vb->nr_page_bmap++;
>> +		else
>> +			break;
> seems to fail silently ...
I think here it is fine whether the page_bmap extension succeeds or 
silently fails.
It doesn't affect the correctness of the implementation - the page_bmap 
is always
used through vb->nr_page_bmap. Maybe the code logic is still a little 
fuzzy. I plan
to re-write most of the implementation to make it clearer.
>> +	}
>> +}
>> +
>> +static void free_extended_page_bitmap(struct virtio_balloon *vb)
>> +{
>> +	int i, bmap_count = vb->nr_page_bmap;
>> +
>> +	for (i = 1; i < bmap_count; i++) {
>> +		kfree(vb->page_bitmap[i]);
>> +		vb->page_bitmap[i] = NULL;
>> +		vb->nr_page_bmap--;
>> +	}
>> +}
>> +
>> +static void kfree_page_bitmap(struct virtio_balloon *vb)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < vb->nr_page_bmap; i++)
>> +		kfree(vb->page_bitmap[i]);
>> +}
>> +
> A very confusing name. pls don't start with kfree.
Ok. I'll simply call it free_page_bmap()
>
>> +static void clear_page_bitmap(struct virtio_balloon *vb)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < vb->nr_page_bmap; i++)
>> +		memset(vb->page_bitmap[i], 0, PAGE_BMAP_SIZE);
>> +}
>> +
>> +static void send_resp_data(struct virtio_balloon *vb, struct virtqueue
*vq,
>> +			bool busy_wait)
>> +{
>> +	struct scatterlist sg[2];
>> +	struct virtio_balloon_resp_hdr *hdr = &vb->resp_hdr;
>>   	unsigned int len;
>>   
>> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) *
vb->num_pfns);
>> +	len = hdr->data_len = vb->resp_pos * sizeof(__le64);
>> +	sg_init_table(sg, 2);
>> +	sg_set_buf(&sg[0], hdr, sizeof(struct virtio_balloon_resp_hdr));
>> +	sg_set_buf(&sg[1], vb->resp_data, len);
>> +
>> +	if (virtqueue_add_outbuf(vq, sg, 2, vb, GFP_KERNEL) == 0) {
> !virtqueue_add_outbuf
>
> Also how about making this header linear with pfns?
> This way you can use 1 s/g and avoid mallocs.
Yes, I think it'll be better, thanks.

>> +		virtqueue_kick(vq);
>> +		if (busy_wait)
>> +			while (!virtqueue_get_buf(vq, &len)
>> +				&& !virtqueue_is_broken(vq))
>> +				cpu_relax();
>> +		else
>> +			wait_event(vb->acked, virtqueue_get_buf(vq, &len));
>> +		vb->resp_pos = 0;
>> +		free_extended_page_bitmap(vb);
>> +	}
>> +}
>>   
>> -	/* We should always be able to add one buffer to an empty queue. */
>> -	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
>> -	virtqueue_kick(vq);
>> +static void do_set_resp_bitmap(struct virtio_balloon *vb, unsigned
long base,
>> +			       int size)
>> +{
>> +	/* Use the extented chunk format if the size is too large */
>> +	if (size > (1 << VIRTIO_BALLOON_CHUNK_SIZE_SHIFT)) {
>> +		struct balloon_page_chunk_ext *chunk_ext >> +			(struct
balloon_page_chunk_ext *)(vb->resp_data + vb->resp_pos);
>> +		chunk_ext->base = cpu_to_le64(base);
>> +		chunk_ext->size = cpu_to_le64(size);
>> +		vb->resp_pos += sizeof(vb->resp_pos) / sizeof(*chunk_ext);
>> +	} else {
>> +		struct balloon_page_chunk *chunk >> +			(struct
balloon_page_chunk *)(vb->resp_data + vb->resp_pos);
>> +		chunk->base = cpu_to_le64(base);
>> +		chunk->size = cpu_to_le16(size);
>> +		vb->resp_pos += sizeof(vb->resp_pos) / sizeof(*chunk);
>> +	}
>> +}
>>   
>> -	/* When host has read buffer, this completes via balloon_ack */
>> -	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
>> +static void chunking_pages_from_bmap(struct virtio_balloon *vb,
>> +				     struct virtqueue *vq,
>> +				     unsigned long start_pfn, unsigned long *bitmap,
>> +				     unsigned long len, bool busy_wait)
>> +{
>> +	unsigned long pos = 0, end = len * BITS_PER_BYTE;
>> +
>> +	while (pos < end) {
>> +		unsigned long one = find_next_bit(bitmap, end, pos);
>> +
>> +		if (one < end) {
>> +			unsigned long chunk_size, zero;
>> +
>> +			zero = find_next_zero_bit(bitmap, end, one + 1);
>> +			if (zero >= end)
>> +				chunk_size = end - one;
>> +			else
>> +				chunk_size = zero - one;
>> +			if (chunk_size) {
>> +				if ((vb->resp_pos + 2) * sizeof(__le64) >
>> +						vb->resp_buf_size)
> why + 2? where does sizeof come from?
> Any chance this can be cleaned up using structs sizeof etc?
> or at least commented?
> How am I supposed to figure out all this convoluted pointer math?
Right. I'll take of it in the new version.

>> +					send_resp_data(vb, vq, busy_wait);
>> +				do_set_resp_bitmap(vb, start_pfn + one,	chunk_size);
>> +			}
>> +			pos = one + chunk_size;
>> +		} else
>> +			break;
>> +	}
>> +}
>>   
>> +static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>> +{
>> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CHUNK_TRANSFER))
{
>> +		int nr_pfn, nr_used_bmap, i;
>> +		unsigned long start_pfn, bmap_len;
>> +
>> +		start_pfn = vb->start_pfn;
>> +		nr_pfn = vb->end_pfn - start_pfn + 1;
>> +		nr_pfn = roundup(nr_pfn, BITS_PER_LONG);
>> +		nr_used_bmap = nr_pfn / PFNS_PER_PAGE_BMAP;
> better name here including "page"?
> And bmaps instead of bmap?Agree. I'll call it page_bmaps.

>> +		if (nr_pfn % PFNS_PER_PAGE_BMAP)
>> +			nr_used_bmap++;
> Pls just use macros like ALIGN or roundup?
>
>> +		bmap_len = nr_pfn / BITS_PER_BYTE;
> I guess we need to round up.
>
>> +
>> +		for (i = 0; i < nr_used_bmap; i++) {
>> +			unsigned int bmap_size = PAGE_BMAP_SIZE;
>> +
>> +			if (i + 1 == nr_used_bmap)
>> +				bmap_size = bmap_len - PAGE_BMAP_SIZE * i;
> So last one is treated speacially? why?
The last one takes the leftover only, so that chunking_pages_from_bmap()
only searches the necessary part of (rather than the entire) the page_bmap.
>> +			chunking_pages_from_bmap(vb, vq, start_pfn + i *
PFNS_PER_PAGE_BMAP,
>> +				 vb->page_bitmap[i], bmap_size, false);
>> +		}
>> +		if (vb->resp_pos > 0)
>> +			send_resp_data(vb, vq, false);
>> +	} else {
>> +		struct scatterlist sg;
>> +		unsigned int len;
>> +
>> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) *
vb->num_pfns);
>> +
>> +		/* We should always be able to add one buffer to an
>> +		 * empty queue
>> +		 */
> /*
>   * Multi-line comments
>   * like this.
>   */
OK.
>> +		virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
>> +		virtqueue_kick(vq);
>> +		/* When host has read buffer, this completes via balloon_ack */
>> +		wait_event(vb->acked, virtqueue_get_buf(vq, &len));
>> +	}
>>   }
>>   
>>   static void set_page_pfns(struct virtio_balloon *vb,
>> @@ -138,13 +326,58 @@ static void set_page_pfns(struct virtio_balloon
*vb,
>>   					  page_to_balloon_pfn(page) + i);
>>   }
>>   
>> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>> +static void set_page_bitmap(struct virtio_balloon *vb,
>> +			 struct list_head *pages, struct virtqueue *vq)
> align on ( pls.
>
>> +{
>> +	unsigned long pfn, pfn_limit;
>> +	struct page *page;
>> +	bool found;
>> +
>> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
>> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
>> +	pfn_limit = PFNS_PER_PAGE_BMAP * vb->nr_page_bmap;
>> +
>> +	if (vb->nr_page_bmap == 1)
>> +		extend_page_bitmap(vb, vb->max_pfn - vb->min_pfn + 1);
> how is 1 special?
Thanks. I think "if (vb->nr_page_bmap == 1)" should be removed.
Whenever the current amount of page_bmap is not enough to hold
the amount of pfns, it should try to entend page_bmap.

Best,
Wei

David Hildenbrand

2017-Mar-10 13:26 UTC

head link

[virtio-dev] Re: [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

Am 10.03.2017 um 11:02 schrieb Wei Wang:> On 03/08/2017 12:01 PM, Michael S. Tsirkin wrote:
>> On Fri, Mar 03, 2017 at 01:40:28PM +0800, Wei Wang wrote:
>>> From: Liang Li <liang.z.li at intel.com>
>>>
>>> The implementation of the current virtio-balloon is not very
>>> efficient, because the pages are transferred to the host one by
one.
>>> Here is the breakdown of the time in percentage spent on each
>>> step of the balloon inflating process (inflating 7GB of an 8GB
>>> idle guest).
>>>
>>> 1) allocating pages (6.5%)
>>> 2) sending PFNs to host (68.3%)
>>> 3) address translation (6.1%)
>>> 4) madvise (19%)
>>>
>>> It takes about 4126ms for the inflating process to complete.
>>> The above profiling shows that the bottlenecks are stage 2)
>>> and stage 4).
>>>
>>> This patch optimizes step 2) by transfering pages to the host in
>>> chunks. A chunk consists of guest physically continuous pages, and
>>> it is offered to the host via a base PFN (i.e. the start PFN of
>>> those physically continuous pages) and the size (i.e. the total
>>> number of the pages). A normal chunk is formated as below:
>>> -----------------------------------------------
>>> |  Base (52 bit)               | Size (12 bit)|
>>> -----------------------------------------------
>>> For large size chunks, an extended chunk format is used:
>>> -----------------------------------------------
>>> |                 Base (64 bit)               |
>>> -----------------------------------------------
>>> -----------------------------------------------
>>> |                 Size (64 bit)               |
>>> -----------------------------------------------
>>>
>>> By doing so, step 4) can also be optimized by doing address
>>> translation and madvise() in chunks rather than page by page.
>>>
>>> This optimization requires the negotation of a new feature bit,
>>> VIRTIO_BALLOON_F_CHUNK_TRANSFER.
>>>
>>> With this new feature, the above ballooning process takes ~590ms
>>> resulting in an improvement of ~85%.
>>>
>>> TODO: optimize stage 1) by allocating/freeing a chunk of pages
>>> instead of a single page each time.
>>>
>>> Signed-off-by: Liang Li <liang.z.li at intel.com>
>>> Signed-off-by: Wei Wang <wei.w.wang at intel.com>
>>> Suggested-by: Michael S. Tsirkin <mst at redhat.com>
>>> Cc: Michael S. Tsirkin <mst at redhat.com>
>>> Cc: Paolo Bonzini <pbonzini at redhat.com>
>>> Cc: Cornelia Huck <cornelia.huck at de.ibm.com>
>>> Cc: Amit Shah <amit.shah at redhat.com>
>>> Cc: Dave Hansen <dave.hansen at intel.com>
>>> Cc: Andrea Arcangeli <aarcange at redhat.com>
>>> Cc: David Hildenbrand <david at redhat.com>
>>> Cc: Liang Li <liliang324 at gmail.com>
>>> Cc: Wei Wang <wei.w.wang at intel.com>
>> Does this pass sparse? I see some endian-ness issues here.
> 
> "pass sparse"- what does that mean?
> I didn't see any complaints from "make" on my machine.
https://kernel.org/doc/html/latest/dev-tools/sparse.html

Static code analysis. You have to run it explicitly.

-- 
Thanks,

David

Michael S. Tsirkin

2017-Mar-10 15:37 UTC

head link

[virtio-dev] Re: [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

On Fri, Mar 10, 2017 at 06:02:31PM +0800, Wei Wang
wrote:> On 03/08/2017 12:01 PM, Michael S. Tsirkin wrote:
> > On Fri, Mar 03, 2017 at 01:40:28PM +0800, Wei Wang wrote:
> > > From: Liang Li <liang.z.li at intel.com>
> > > 
> > > The implementation of the current virtio-balloon is not very
> > > efficient, because the pages are transferred to the host one by
one.
> > > Here is the breakdown of the time in percentage spent on each
> > > step of the balloon inflating process (inflating 7GB of an 8GB
> > > idle guest).
> > > 
> > > 1) allocating pages (6.5%)
> > > 2) sending PFNs to host (68.3%)
> > > 3) address translation (6.1%)
> > > 4) madvise (19%)
> > > 
> > > It takes about 4126ms for the inflating process to complete.
> > > The above profiling shows that the bottlenecks are stage 2)
> > > and stage 4).
> > > 
> > > This patch optimizes step 2) by transfering pages to the host in
> > > chunks. A chunk consists of guest physically continuous pages,
and
> > > it is offered to the host via a base PFN (i.e. the start PFN of
> > > those physically continuous pages) and the size (i.e. the total
> > > number of the pages). A normal chunk is formated as below:
> > > -----------------------------------------------
> > > |  Base (52 bit)               | Size (12 bit)|
> > > -----------------------------------------------
> > > For large size chunks, an extended chunk format is used:
> > > -----------------------------------------------
> > > |                 Base (64 bit)               |
> > > -----------------------------------------------
> > > -----------------------------------------------
> > > |                 Size (64 bit)               |
> > > -----------------------------------------------
> > > 
> > > By doing so, step 4) can also be optimized by doing address
> > > translation and madvise() in chunks rather than page by page.
> > > 
> > > This optimization requires the negotation of a new feature bit,
> > > VIRTIO_BALLOON_F_CHUNK_TRANSFER.
> > > 
> > > With this new feature, the above ballooning process takes ~590ms
> > > resulting in an improvement of ~85%.
> > > 
> > > TODO: optimize stage 1) by allocating/freeing a chunk of pages
> > > instead of a single page each time.
> > > 
> > > Signed-off-by: Liang Li <liang.z.li at intel.com>
> > > Signed-off-by: Wei Wang <wei.w.wang at intel.com>
> > > Suggested-by: Michael S. Tsirkin <mst at redhat.com>
> > > Cc: Michael S. Tsirkin <mst at redhat.com>
> > > Cc: Paolo Bonzini <pbonzini at redhat.com>
> > > Cc: Cornelia Huck <cornelia.huck at de.ibm.com>
> > > Cc: Amit Shah <amit.shah at redhat.com>
> > > Cc: Dave Hansen <dave.hansen at intel.com>
> > > Cc: Andrea Arcangeli <aarcange at redhat.com>
> > > Cc: David Hildenbrand <david at redhat.com>
> > > Cc: Liang Li <liliang324 at gmail.com>
> > > Cc: Wei Wang <wei.w.wang at intel.com>
> > Does this pass sparse? I see some endian-ness issues here.
> 
> "pass sparse"- what does that mean?
> I didn't see any complaints from "make" on my machine.

Run with make C=1 (or C=2 to check all source).

Generally there's a ton of useful info you will find
if you run make help.

-- 
MST

Apparently Analagous Threads

Search for more possibly parallel threads

Linux Virtualization - Mar 2017 - [virtio-dev] Re: [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

[PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

[virtio-dev] Re: [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

[virtio-dev] Re: [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

[virtio-dev] Re: [PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER

Apparently Analagous Threads