Wei Wang
2018-Jul-20  08:33 UTC
[PATCH v36 0/5] Virtio-balloon: support free page reporting
This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:
Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.
This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.
* Tests
- Test Environment
    Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    Guest: 8G RAM, 4 vCPU
    Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
- Test Results
    - Idle Guest Live Migration Time (results are averaged over 10 runs):
        - Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction
	(setting page poisoning zero and enabling ksm don't affect the
         comparison result)
    - Guest with Linux Compilation Workload (make bzImage -j4):
        - Live Migration Time (average)
          Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction
        - Linux Compilation Time
          Optimization v.s. Legacy = 5min4s v.s. 5min12s
          --> no obvious difference
ChangeLog:
v35->v36:
    - remove the mm patch, as Linus has a suggestion to get free page
      addresses via allocation, instead of reading from the free page
      list.
    - virtio-balloon:
        - replace oom notifier with shrinker;
        - the guest to host communication interface remains the same as
          v32.
	- allocate free page blocks and send to host one by one, and free
          them after sending all the pages.
For ChangeLogs from v22 to v35, please reference
https://lwn.net/Articles/759413/
For ChangeLogs before v21, please reference
https://lwn.net/Articles/743660/
Wei Wang (5):
  virtio-balloon: remove BUG() in init_vqs
  virtio_balloon: replace oom notifier with shrinker
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  mm/page_poison: expose page_poisoning_enabled to kernel modules
  virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
 drivers/virtio/virtio_balloon.c     | 456 ++++++++++++++++++++++++++++++------
 include/uapi/linux/virtio_balloon.h |   7 +
 mm/page_poison.c                    |   6 +
 3 files changed, 394 insertions(+), 75 deletions(-)
-- 
2.7.4
It's a bit overkill to use BUG when failing to add an entry to the
stats_vq in init_vqs. So remove it and just return the error to the
caller to bail out nicely.
Signed-off-by: Wei Wang <wei.w.wang at intel.com>
Cc: Michael S. Tsirkin <mst at redhat.com>
---
 drivers/virtio/virtio_balloon.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 6b237e3..9356a1a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -455,9 +455,13 @@ static int init_vqs(struct virtio_balloon *vb)
 		num_stats = update_balloon_stats(vb);
 
 		sg_init_one(&sg, vb->stats, sizeof(vb->stats[0]) * num_stats);
-		if (virtqueue_add_outbuf(vb->stats_vq, &sg, 1, vb, GFP_KERNEL)
-		    < 0)
-			BUG();
+		err = virtqueue_add_outbuf(vb->stats_vq, &sg, 1, vb,
+					   GFP_KERNEL);
+		if (err) {
+			dev_warn(&vb->vdev->dev, "%s: add stat_vq failed\n",
+				 __func__);
+			return err;
+		}
 		virtqueue_kick(vb->stats_vq);
 	}
 	return 0;
-- 
2.7.4
Wei Wang
2018-Jul-20  08:33 UTC
[PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker
The OOM notifier is getting deprecated to use for the reasons mentioned
here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314
This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure.
In addition, the bug in the replaced virtballoon_oom_notify that only
VIRTIO_BALLOON_ARRAY_PFNS_MAX (i.e 256) balloon pages can be freed
though the user has specified more than that number is fixed in the
shrinker_scan function.
Signed-off-by: Wei Wang <wei.w.wang at intel.com>
Cc: Michael S. Tsirkin <mst at redhat.com>
Cc: Michal Hocko <mhocko at kernel.org>
Cc: Andrew Morton <akpm at linux-foundation.org>
Cc: Linus Torvalds <torvalds at linux-foundation.org>
---
 drivers/virtio/virtio_balloon.c | 113 +++++++++++++++++++++++-----------------
 1 file changed, 65 insertions(+), 48 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 9356a1a..c6fd406 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,7 +27,6 @@
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/balloon_compaction.h>
-#include <linux/oom.h>
 #include <linux/wait.h>
 #include <linux/mm.h>
 #include <linux/mount.h>
@@ -40,12 +39,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >>
VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
-#define OOM_VBALLOON_DEFAULT_PAGES 256
+#define DEFAULT_BALLOON_PAGES_TO_SHRINK 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
-static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
-module_param(oom_pages, int, S_IRUSR | S_IWUSR);
-MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
+static unsigned long balloon_pages_to_shrink = DEFAULT_BALLOON_PAGES_TO_SHRINK;
+module_param(balloon_pages_to_shrink, ulong, 0600);
+MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on memory
presure");
 
 #ifdef CONFIG_BALLOON_COMPACTION
 static struct vfsmount *balloon_mnt;
@@ -86,8 +85,8 @@ struct virtio_balloon {
 	/* Memory statistics */
 	struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
 
-	/* To register callback in oom notifier call chain */
-	struct notifier_block nb;
+	/* To register a shrinker to shrink memory upon memory pressure */
+	struct shrinker shrinker;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -365,38 +364,6 @@ static void update_balloon_size(struct virtio_balloon *vb)
 		      &actual);
 }
 
-/*
- * virtballoon_oom_notify - release pages when system is under severe
- *			    memory pressure (called from out_of_memory())
- * @self : notifier block struct
- * @dummy: not used
- * @parm : returned - number of freed pages
- *
- * The balancing of memory by use of the virtio balloon should not cause
- * the termination of processes while there are pages in the balloon.
- * If virtio balloon manages to release some memory, it will make the
- * system return and retry the allocation that forced the OOM killer
- * to run.
- */
-static int virtballoon_oom_notify(struct notifier_block *self,
-				  unsigned long dummy, void *parm)
-{
-	struct virtio_balloon *vb;
-	unsigned long *freed;
-	unsigned num_freed_pages;
-
-	vb = container_of(self, struct virtio_balloon, nb);
-	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
-		return NOTIFY_OK;
-
-	freed = parm;
-	num_freed_pages = leak_balloon(vb, oom_pages);
-	update_balloon_size(vb);
-	*freed += num_freed_pages;
-
-	return NOTIFY_OK;
-}
-
 static void update_balloon_stats_func(struct work_struct *work)
 {
 	struct virtio_balloon *vb;
@@ -548,6 +515,61 @@ static struct file_system_type balloon_fs = {
 
 #endif /* CONFIG_BALLOON_COMPACTION */
 
+static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
+						  struct shrink_control *sc)
+{
+	unsigned long pages_to_free = balloon_pages_to_shrink,
+		      pages_freed = 0;
+	struct virtio_balloon *vb = container_of(shrinker,
+					struct virtio_balloon, shrinker);
+
+	/*
+	 * One invocation of leak_balloon can deflate at most
+	 * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
+	 * multiple times to deflate pages till reaching
+	 * balloon_pages_to_shrink pages.
+	 */
+	while (vb->num_pages && pages_to_free) {
+		pages_to_free = balloon_pages_to_shrink - pages_freed;
+		pages_freed += leak_balloon(vb, pages_to_free);
+	}
+	update_balloon_size(vb);
+
+	return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE;
+}
+
+static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
+						   struct shrink_control *sc)
+{
+	struct virtio_balloon *vb = container_of(shrinker,
+					struct virtio_balloon, shrinker);
+
+	/*
+	 * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to handle the
+	 * case when shrinker needs to be invoked to relieve memory pressure.
+	 */
+	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+		return 0;
+
+	return min_t(unsigned long, vb->num_pages, balloon_pages_to_shrink) /
+	       VIRTIO_BALLOON_PAGES_PER_PAGE;
+}
+
+static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
+{
+	unregister_shrinker(&vb->shrinker);
+}
+
+static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
+{
+	vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
+	vb->shrinker.count_objects = virtio_balloon_shrinker_count;
+	vb->shrinker.batch = 0;
+	vb->shrinker.seeks = DEFAULT_SEEKS;
+
+	return register_shrinker(&vb->shrinker);
+}
+
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
@@ -580,17 +602,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	if (err)
 		goto out_free_vb;
 
-	vb->nb.notifier_call = virtballoon_oom_notify;
-	vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
-	err = register_oom_notifier(&vb->nb);
-	if (err < 0)
-		goto out_del_vqs;
-
 #ifdef CONFIG_BALLOON_COMPACTION
 	balloon_mnt = kern_mount(&balloon_fs);
 	if (IS_ERR(balloon_mnt)) {
 		err = PTR_ERR(balloon_mnt);
-		unregister_oom_notifier(&vb->nb);
 		goto out_del_vqs;
 	}
 
@@ -599,12 +614,14 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	if (IS_ERR(vb->vb_dev_info.inode)) {
 		err = PTR_ERR(vb->vb_dev_info.inode);
 		kern_unmount(balloon_mnt);
-		unregister_oom_notifier(&vb->nb);
 		vb->vb_dev_info.inode = NULL;
 		goto out_del_vqs;
 	}
 	vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
 #endif
+	err = virtio_balloon_register_shrinker(vb);
+	if (err)
+		goto out_del_vqs;
 
 	virtio_device_ready(vdev);
 
@@ -637,7 +654,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb = vdev->priv;
 
-	unregister_oom_notifier(&vb->nb);
+	virtio_balloon_unregister_shrinker(vb);
 
 	spin_lock_irq(&vb->stop_update_lock);
 	vb->stop_update = true;
-- 
2.7.4
Wei Wang
2018-Jul-20  08:33 UTC
[PATCH v36 3/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.
Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are
obtained one by one from the mm free list via the regular allocation
function. The allocated pages are given back to mm after they are put onto
the vq.
Host requests the guest to report free page hints by sending a new cmd id
to the guest via the free_page_report_cmd_id configuration register. When
the guest starts to report, it first sends a start cmd to host via the
free page vq, which acks to host the cmd id received. When the guest
finishes reporting free pages, a stop cmd is sent to host via the vq.
TODO:
- Add a batch page allocation API to amortize the allocation overhead.
Signed-off-by: Wei Wang <wei.w.wang at intel.com>
Signed-off-by: Liang Li <liang.z.li at intel.com>
Cc: Michael S. Tsirkin <mst at redhat.com>
Cc: Michal Hocko <mhocko at kernel.org>
Cc: Andrew Morton <akpm at linux-foundation.org>
Cc: Linus Torvalds <torvalds at linux-foundation.org>
---
 drivers/virtio/virtio_balloon.c     | 331 +++++++++++++++++++++++++++++++++---
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 307 insertions(+), 28 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index c6fd406..82cd497 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -42,6 +42,14 @@
 #define DEFAULT_BALLOON_PAGES_TO_SHRINK 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
+					     __GFP_NOMEMALLOC)
+/* The order of free page blocks to report to host */
+#define VIRTIO_BALLOON_FREE_PAGE_ORDER (MAX_ORDER - 1)
+/* The size of a free page block in bytes */
+#define VIRTIO_BALLOON_FREE_PAGE_SIZE \
+	(1 << (VIRTIO_BALLOON_FREE_PAGE_ORDER + PAGE_SHIFT))
+
 static unsigned long balloon_pages_to_shrink = DEFAULT_BALLOON_PAGES_TO_SHRINK;
 module_param(balloon_pages_to_shrink, ulong, 0600);
 MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on memory
presure");
@@ -50,9 +58,22 @@ MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free
on memory presure");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+	VIRTIO_BALLOON_VQ_INFLATE,
+	VIRTIO_BALLOON_VQ_DEFLATE,
+	VIRTIO_BALLOON_VQ_STATS,
+	VIRTIO_BALLOON_VQ_FREE_PAGE,
+	VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
 	struct virtio_device *vdev;
-	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+	/* Balloon's own wq for cpu-intensive work items */
+	struct workqueue_struct *balloon_wq;
+	/* The free page reporting work item submitted to the balloon wq */
+	struct work_struct report_free_page_work;
 
 	/* The balloon servicing is delegated to a freezable workqueue. */
 	struct work_struct update_balloon_stats_work;
@@ -62,6 +83,16 @@ struct virtio_balloon {
 	spinlock_t stop_update_lock;
 	bool stop_update;
 
+	/* The list of allocated free pages, waiting to be given back to mm */
+	struct list_head free_page_list;
+	spinlock_t free_page_list_lock;
+	/* The cmd id received from host */
+	u32 cmd_id_received;
+	/* The cmd id that is actively in use */
+	__virtio32 cmd_id_active;
+	/* Buffer to store the stop sign */
+	__virtio32 cmd_id_stop;
+
 	/* Waiting for host to ack the pages we released. */
 	wait_queue_head_t acked;
 
@@ -325,17 +356,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
 	virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-	struct virtio_balloon *vb = vdev->priv;
-	unsigned long flags;
-
-	spin_lock_irqsave(&vb->stop_update_lock, flags);
-	if (!vb->stop_update)
-		queue_work(system_freezable_wq, &vb->update_balloon_size_work);
-	spin_unlock_irqrestore(&vb->stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
 	s64 target;
@@ -352,6 +372,52 @@ static inline s64 towards_target(struct virtio_balloon *vb)
 	return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+	struct virtio_balloon *vb = vdev->priv;
+	unsigned long flags;
+	s64 diff = towards_target(vb);
+
+	if (diff) {
+		spin_lock_irqsave(&vb->stop_update_lock, flags);
+		if (!vb->stop_update)
+			queue_work(system_freezable_wq,
+				   &vb->update_balloon_size_work);
+		spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+	}
+
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+		virtio_cread(vdev, struct virtio_balloon_config,
+			     free_page_report_cmd_id, &vb->cmd_id_received);
+		if (vb->cmd_id_received != VIRTIO_BALLOON_CMD_ID_STOP &&
+		    vb->cmd_id_received !+				virtio32_to_cpu(vdev, vb->cmd_id_active))
{
+			spin_lock_irqsave(&vb->stop_update_lock, flags);
+			if (!vb->stop_update) {
+				queue_work(vb->balloon_wq,
+					   &vb->report_free_page_work);
+			}
+			spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+		}
+	}
+}
+
+static unsigned long return_free_pages_to_mm(struct virtio_balloon *vb)
+{
+	struct page *page;
+	unsigned long num = 0;
+
+	spin_lock_irq(&vb->free_page_list_lock);
+	while ((page = balloon_page_pop(&vb->free_page_list))) {
+		free_pages((unsigned long)page_address(page),
+			   VIRTIO_BALLOON_FREE_PAGE_ORDER);
+		num++;
+	}
+	spin_unlock_irq(&vb->free_page_list_lock);
+
+	return num;
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
 	u32 actual = vb->num_pages;
@@ -394,26 +460,44 @@ static void update_balloon_size_func(struct work_struct
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
-	static const char * const names[] = { "inflate",
"deflate", "stats" };
-	int err, nvqs;
+	struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+	vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];
+	const char *names[VIRTIO_BALLOON_VQ_MAX];
+	int err;
 
 	/*
-	 * We expect two virtqueues: inflate and deflate, and
-	 * optionally stat.
+	 * Inflateq and deflateq are used unconditionally. The names[]
+	 * will be NULL if the related feature is not enabled, which will
+	 * cause no allocation for the corresponding virtqueue in find_vqs.
 	 */
-	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
-	err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
+	callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack;
+	names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate";
+	callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack;
+	names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate";
+	names[VIRTIO_BALLOON_VQ_STATS] = NULL;
+	names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
+
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
+		names[VIRTIO_BALLOON_VQ_STATS] = "stats";
+		callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request;
+	}
+
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+		names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq";
+		callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
+	}
+
+	err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
+					 vqs, callbacks, names, NULL, NULL);
 	if (err)
 		return err;
 
-	vb->inflate_vq = vqs[0];
-	vb->deflate_vq = vqs[1];
+	vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
+	vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
 		struct scatterlist sg;
 		unsigned int num_stats;
-		vb->stats_vq = vqs[2];
+		vb->stats_vq = vqs[VIRTIO_BALLOON_VQ_STATS];
 
 		/*
 		 * Prime this virtqueue with one buffer so the hypervisor can
@@ -431,9 +515,145 @@ static int init_vqs(struct virtio_balloon *vb)
 		}
 		virtqueue_kick(vb->stats_vq);
 	}
+
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+		vb->free_page_vq = vqs[VIRTIO_BALLOON_VQ_FREE_PAGE];
+
+	return 0;
+}
+
+static int send_cmd_id_start(struct virtio_balloon *vb)
+{
+	struct scatterlist sg;
+	struct virtqueue *vq = vb->free_page_vq;
+	int err, unused;
+
+	/* Detach all the used buffers from the vq */
+	while (virtqueue_get_buf(vq, &unused))
+		;
+
+	vb->cmd_id_active = cpu_to_virtio32(vb->vdev, vb->cmd_id_received);
+	sg_init_one(&sg, &vb->cmd_id_active, sizeof(vb->cmd_id_active));
+	err = virtqueue_add_outbuf(vq, &sg, 1, &vb->cmd_id_active,
GFP_KERNEL);
+	if (!err)
+		virtqueue_kick(vq);
+	return err;
+}
+
+static int send_cmd_id_stop(struct virtio_balloon *vb)
+{
+	struct scatterlist sg;
+	struct virtqueue *vq = vb->free_page_vq;
+	int err, unused;
+
+	/* Detach all the used buffers from the vq */
+	while (virtqueue_get_buf(vq, &unused))
+		;
+
+	sg_init_one(&sg, &vb->cmd_id_stop, sizeof(vb->cmd_id_stop));
+	err = virtqueue_add_outbuf(vq, &sg, 1, &vb->cmd_id_stop,
GFP_KERNEL);
+	if (!err)
+		virtqueue_kick(vq);
+	return err;
+}
+
+static int get_free_page_and_send(struct virtio_balloon *vb)
+{
+	struct virtqueue *vq = vb->free_page_vq;
+	struct page *page;
+	struct scatterlist sg;
+	int err, unused;
+	void *p;
+
+	/* Detach all the used buffers from the vq */
+	while (virtqueue_get_buf(vq, &unused))
+		;
+
+	page = alloc_pages(VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG,
+			   VIRTIO_BALLOON_FREE_PAGE_ORDER);
+	/*
+	 * When the allocation returns NULL, it indicates that we have got all
+	 * the possible free pages, so return -EINTR to stop.
+	 */
+	if (!page)
+		return -EINTR;
+
+	p = page_address(page);
+	sg_init_one(&sg, p, VIRTIO_BALLOON_FREE_PAGE_SIZE);
+	/* There is always one entry reserved for the cmd id to use. */
+	if (vq->num_free > 1) {
+		err = virtqueue_add_inbuf(vq, &sg, 1, p, GFP_KERNEL);
+		if (unlikely(err)) {
+			free_pages((unsigned long)p,
+				   VIRTIO_BALLOON_FREE_PAGE_ORDER);
+			return err;
+		}
+		virtqueue_kick(vq);
+		spin_lock_irq(&vb->free_page_list_lock);
+		balloon_page_push(&vb->free_page_list, page);
+		spin_unlock_irq(&vb->free_page_list_lock);
+	} else {
+		/*
+		 * The vq has no available entry to add this page block, so
+		 * just free it.
+		 */
+		free_pages((unsigned long)p, VIRTIO_BALLOON_FREE_PAGE_ORDER);
+	}
+
 	return 0;
 }
 
+static int send_free_pages(struct virtio_balloon *vb)
+{
+	int err;
+	u32 cmd_id_active;
+
+	while (1) {
+		/*
+		 * If a stop id or a new cmd id was just received from host,
+		 * stop the reporting.
+		 */
+		cmd_id_active = virtio32_to_cpu(vb->vdev, vb->cmd_id_active);
+		if (cmd_id_active != vb->cmd_id_received)
+			break;
+
+		/*
+		 * The free page blocks are allocated and sent to host one by
+		 * one.
+		 */
+		err = get_free_page_and_send(vb);
+		if (err == -EINTR)
+			break;
+		else if (unlikely(err))
+			return err;
+	}
+
+	return 0;
+}
+
+static void report_free_page_func(struct work_struct *work)
+{
+	int err;
+	struct virtio_balloon *vb = container_of(work, struct virtio_balloon,
+						 report_free_page_work);
+
+	/* Start by sending the received cmd id to host with an outbuf. */
+	err = send_cmd_id_start(vb);
+	if (unlikely(err))
+		goto out_err;
+
+	err = send_free_pages(vb);
+	return_free_pages_to_mm(vb);
+	if (unlikely(err))
+		goto out_err;
+
+	/* End by sending a stop id to host with an outbuf. */
+	err = send_cmd_id_stop(vb);
+out_err:
+	if (err)
+		dev_err(&vb->vdev->dev, "%s: err = %d\n", __func__, err);
+}
+
 #ifdef CONFIG_BALLOON_COMPACTION
 /*
  * virtballoon_migratepage - perform the balloon page migration on behalf of
@@ -523,6 +743,16 @@ static unsigned long virtio_balloon_shrinker_scan(struct
shrinker *shrinker,
 	struct virtio_balloon *vb = container_of(shrinker,
 					struct virtio_balloon, shrinker);
 
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+		pages_freed = return_free_pages_to_mm(vb) <<
+			      VIRTIO_BALLOON_FREE_PAGE_ORDER;
+	}
+
+	if (pages_freed >= pages_to_free)
+		return pages_freed;
+
+	pages_to_free -= pages_freed;
+
 	/*
 	 * One invocation of leak_balloon can deflate at most
 	 * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
@@ -530,8 +760,8 @@ static unsigned long virtio_balloon_shrinker_scan(struct
shrinker *shrinker,
 	 * balloon_pages_to_shrink pages.
 	 */
 	while (vb->num_pages && pages_to_free) {
-		pages_to_free = balloon_pages_to_shrink - pages_freed;
 		pages_freed += leak_balloon(vb, pages_to_free);
+		pages_to_free -= pages_freed;
 	}
 	update_balloon_size(vb);
 
@@ -541,6 +771,7 @@ static unsigned long virtio_balloon_shrinker_scan(struct
shrinker *shrinker,
 static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
 						   struct shrink_control *sc)
 {
+	unsigned long count;
 	struct virtio_balloon *vb = container_of(shrinker,
 					struct virtio_balloon, shrinker);
 
@@ -551,8 +782,18 @@ static unsigned long virtio_balloon_shrinker_count(struct
shrinker *shrinker,
 	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		return 0;
 
-	return min_t(unsigned long, vb->num_pages, balloon_pages_to_shrink) /
-	       VIRTIO_BALLOON_PAGES_PER_PAGE;
+	count = min_t(unsigned long, vb->num_pages, balloon_pages_to_shrink) /
+		VIRTIO_BALLOON_PAGES_PER_PAGE;
+
+	/*
+	 * Just add one block of free pages for the count estimation. We will
+	 * release all of them in shrinker_scan regardless of the count
+	 * returned here.
+	 */
+	if (!list_empty(&vb->free_page_list))
+		count += 1 << VIRTIO_BALLOON_FREE_PAGE_ORDER;
+
+	return count;
 }
 
 static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
@@ -615,13 +856,38 @@ static int virtballoon_probe(struct virtio_device *vdev)
 		err = PTR_ERR(vb->vb_dev_info.inode);
 		kern_unmount(balloon_mnt);
 		vb->vb_dev_info.inode = NULL;
-		goto out_del_vqs;
+		goto out_del_balloon_wq;
 	}
 	vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
 #endif
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+		/*
+		 * There is always one entry reserved for cmd id, so the ring
+		 * size needs to be at least two to report free page hints.
+		 */
+		if (virtqueue_get_vring_size(vb->free_page_vq) < 2) {
+			err = -ENOSPC;
+			goto out_del_vqs;
+		}
+		vb->balloon_wq = alloc_workqueue("balloon-wq",
+					WQ_FREEZABLE | WQ_CPU_INTENSIVE, 0);
+		if (!vb->balloon_wq) {
+			err = -ENOMEM;
+			goto out_del_vqs;
+		}
+		INIT_WORK(&vb->report_free_page_work, report_free_page_func);
+		vb->cmd_id_received = VIRTIO_BALLOON_CMD_ID_STOP;
+		vb->cmd_id_active = cpu_to_virtio32(vb->vdev,
+						  VIRTIO_BALLOON_CMD_ID_STOP);
+		vb->cmd_id_stop = cpu_to_virtio32(vb->vdev,
+						  VIRTIO_BALLOON_CMD_ID_STOP);
+		spin_lock_init(&vb->free_page_list_lock);
+		INIT_LIST_HEAD(&vb->free_page_list);
+	}
+
 	err = virtio_balloon_register_shrinker(vb);
 	if (err)
-		goto out_del_vqs;
+		goto out_del_balloon_wq;
 
 	virtio_device_ready(vdev);
 
@@ -629,6 +895,9 @@ static int virtballoon_probe(struct virtio_device *vdev)
 		virtballoon_changed(vdev);
 	return 0;
 
+out_del_balloon_wq:
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+		destroy_workqueue(vb->balloon_wq);
 out_del_vqs:
 	vdev->config->del_vqs(vdev);
 out_free_vb:
@@ -662,6 +931,11 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	cancel_work_sync(&vb->update_balloon_size_work);
 	cancel_work_sync(&vb->update_balloon_stats_work);
 
+	if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+		cancel_work_sync(&vb->report_free_page_work);
+		destroy_workqueue(vb->balloon_wq);
+	}
+
 	remove_common(vb);
 #ifdef CONFIG_BALLOON_COMPACTION
 	if (vb->vb_dev_info.inode)
@@ -713,6 +987,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
+	VIRTIO_BALLOON_F_FREE_PAGE_HINT,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h
b/include/uapi/linux/virtio_balloon.h
index 13b8cb5..18ee430 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,15 +34,19 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+#define VIRTIO_BALLOON_CMD_ID_STOP	0
 struct virtio_balloon_config {
 	/* Number of pages host wants Guest to give up. */
 	__u32 num_pages;
 	/* Number of pages we've actually got in balloon. */
 	__u32 actual;
+	/* Free page report command id, readonly by guest */
+	__u32 free_page_report_cmd_id;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4
Wei Wang
2018-Jul-20  08:33 UTC
[PATCH v36 4/5] mm/page_poison: expose page_poisoning_enabled to kernel modules
In some usages, e.g. virtio-balloon, a kernel module needs to know if
page poisoning is in use. This patch exposes the page_poisoning_enabled
function to kernel modules.
Signed-off-by: Wei Wang <wei.w.wang at intel.com>
Cc: Andrew Morton <akpm at linux-foundation.org>
Cc: Michal Hocko <mhocko at kernel.org>
Cc: Michael S. Tsirkin <mst at redhat.com>
Acked-by: Andrew Morton <akpm at linux-foundation.org>
---
 mm/page_poison.c | 6 ++++++
 1 file changed, 6 insertions(+)
diff --git a/mm/page_poison.c b/mm/page_poison.c
index aa2b3d3..830f604 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -17,6 +17,11 @@ static int __init early_page_poison_param(char *buf)
 }
 early_param("page_poison", early_page_poison_param);
 
+/**
+ * page_poisoning_enabled - check if page poisoning is enabled
+ *
+ * Return true if page poisoning is enabled, or false if not.
+ */
 bool page_poisoning_enabled(void)
 {
 	/*
@@ -29,6 +34,7 @@ bool page_poisoning_enabled(void)
 		(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
 		debug_pagealloc_enabled()));
 }
+EXPORT_SYMBOL_GPL(page_poisoning_enabled);
 
 static void poison_page(struct page *page)
 {
-- 
2.7.4
Wei Wang
2018-Jul-20  08:33 UTC
[PATCH v36 5/5] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value that is in use.
Suggested-by: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Wei Wang <wei.w.wang at intel.com>
Cc: Michael S. Tsirkin <mst at redhat.com>
Cc: Michal Hocko <mhocko at suse.com>
Cc: Andrew Morton <akpm at linux-foundation.org>
---
 drivers/virtio/virtio_balloon.c     | 10 ++++++++++
 include/uapi/linux/virtio_balloon.h |  3 +++
 2 files changed, 13 insertions(+)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 82cd497..6340cc1 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -814,6 +814,7 @@ static int virtio_balloon_register_shrinker(struct
virtio_balloon *vb)
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
+	__u32 poison_val;
 	int err;
 
 	if (!vdev->config->get) {
@@ -883,6 +884,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
 						  VIRTIO_BALLOON_CMD_ID_STOP);
 		spin_lock_init(&vb->free_page_list_lock);
 		INIT_LIST_HEAD(&vb->free_page_list);
+		if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+			memset(&poison_val, PAGE_POISON, sizeof(poison_val));
+			virtio_cwrite(vb->vdev, struct virtio_balloon_config,
+				      poison_val, &poison_val);
+		}
 	}
 
 	err = virtio_balloon_register_shrinker(vb);
@@ -979,6 +985,9 @@ static int virtballoon_restore(struct virtio_device *vdev)
 
 static int virtballoon_validate(struct virtio_device *vdev)
 {
+	if (!page_poisoning_enabled())
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+
 	__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM);
 	return 0;
 }
@@ -988,6 +997,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
 	VIRTIO_BALLOON_F_FREE_PAGE_HINT,
+	VIRTIO_BALLOON_F_PAGE_POISON,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h
b/include/uapi/linux/virtio_balloon.h
index 18ee430..80a7b7e 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -47,6 +48,8 @@ struct virtio_balloon_config {
 	__u32 actual;
 	/* Free page report command id, readonly by guest */
 	__u32 free_page_report_cmd_id;
+	/* Stores PAGE_POISON if page poisoning is in use */
+	__u32 poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4
Michael S. Tsirkin
2018-Jul-20  12:51 UTC
[PATCH v36 0/5] Virtio-balloon: support free page reporting
On Fri, Jul 20, 2018 at 04:33:00PM +0800, Wei Wang wrote:> This patch series is separated from the previous "Virtio-balloon > Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, > implemented by this series enables the virtio-balloon driver to report > hints of guest free pages to the host. It can be used to accelerate live > migration of VMs. Here is an introduction of this usage: > > Live migration needs to transfer the VM's memory from the source machine > to the destination round by round. For the 1st round, all the VM's memory > is transferred. From the 2nd round, only the pieces of memory that were > written by the guest (after the 1st round) are transferred. One method > that is popularly used by the hypervisor to track which part of memory is > written is to write-protect all the guest memory. > > This feature enables the optimization by skipping the transfer of guest > free pages during VM live migration. It is not concerned that the memory > pages are used after they are given to the hypervisor as a hint of the > free pages, because they will be tracked by the hypervisor and transferred > in the subsequent round if they are used and written. > > * Tests > - Test Environment > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > Guest: 8G RAM, 4 vCPU > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 secondCan we split out patches 1 and 2? They seem appropriate for this release ...> - Test Results > - Idle Guest Live Migration Time (results are averaged over 10 runs): > - Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction > (setting page poisoning zero and enabling ksm don't affect the > comparison result) > - Guest with Linux Compilation Workload (make bzImage -j4): > - Live Migration Time (average) > Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction > - Linux Compilation Time > Optimization v.s. Legacy = 5min4s v.s. 5min12s > --> no obvious difference > > ChangeLog: > v35->v36: > - remove the mm patch, as Linus has a suggestion to get free page > addresses via allocation, instead of reading from the free page > list. > - virtio-balloon: > - replace oom notifier with shrinker; > - the guest to host communication interface remains the same as > v32. > - allocate free page blocks and send to host one by one, and free > them after sending all the pages. > > For ChangeLogs from v22 to v35, please reference > https://lwn.net/Articles/759413/ > > For ChangeLogs before v21, please reference > https://lwn.net/Articles/743660/ > > Wei Wang (5): > virtio-balloon: remove BUG() in init_vqs > virtio_balloon: replace oom notifier with shrinker > virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT > mm/page_poison: expose page_poisoning_enabled to kernel modules > virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON > > drivers/virtio/virtio_balloon.c | 456 ++++++++++++++++++++++++++++++------ > include/uapi/linux/virtio_balloon.h | 7 + > mm/page_poison.c | 6 + > 3 files changed, 394 insertions(+), 75 deletions(-) > > -- > 2.7.4
Wang, Wei W
2018-Jul-22  11:11 UTC
[PATCH v36 0/5] Virtio-balloon: support free page reporting
On Friday, July 20, 2018 8:52 PM, Michael S. Tsirkin wrote:> On Fri, Jul 20, 2018 at 04:33:00PM +0800, Wei Wang wrote: > > This patch series is separated from the previous "Virtio-balloon > > Enhancement" series. The new feature, > VIRTIO_BALLOON_F_FREE_PAGE_HINT, > > implemented by this series enables the virtio-balloon driver to report > > hints of guest free pages to the host. It can be used to accelerate > > live migration of VMs. Here is an introduction of this usage: > > > > Live migration needs to transfer the VM's memory from the source > > machine to the destination round by round. For the 1st round, all the > > VM's memory is transferred. From the 2nd round, only the pieces of > > memory that were written by the guest (after the 1st round) are > > transferred. One method that is popularly used by the hypervisor to > > track which part of memory is written is to write-protect all the guest > memory. > > > > This feature enables the optimization by skipping the transfer of > > guest free pages during VM live migration. It is not concerned that > > the memory pages are used after they are given to the hypervisor as a > > hint of the free pages, because they will be tracked by the hypervisor > > and transferred in the subsequent round if they are used and written. > > > > * Tests > > - Test Environment > > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > Guest: 8G RAM, 4 vCPU > > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 > > second > > Can we split out patches 1 and 2? They seem appropriate for this release ...Sounds good to me. I'm not sure if there would be comments on the first 2 patches. If no, can you just take them here? Or you need me to repost them separately? Best, Wei
Michael S. Tsirkin
2018-Jul-22  14:48 UTC
[PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker
On Fri, Jul 20, 2018 at 04:33:02PM +0800, Wei Wang wrote:> The OOM notifier is getting deprecated to use for the reasons mentioned > here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314 > > This patch replaces the virtio-balloon oom notifier with a shrinker > to release balloon pages on memory pressure. > > In addition, the bug in the replaced virtballoon_oom_notify that only > VIRTIO_BALLOON_ARRAY_PFNS_MAX (i.e 256) balloon pages can be freed > though the user has specified more than that number is fixed in the > shrinker_scan function. > > Signed-off-by: Wei Wang <wei.w.wang at intel.com> > Cc: Michael S. Tsirkin <mst at redhat.com> > Cc: Michal Hocko <mhocko at kernel.org> > Cc: Andrew Morton <akpm at linux-foundation.org> > Cc: Linus Torvalds <torvalds at linux-foundation.org> > --- > drivers/virtio/virtio_balloon.c | 113 +++++++++++++++++++++++----------------- > 1 file changed, 65 insertions(+), 48 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index 9356a1a..c6fd406 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -27,7 +27,6 @@ > #include <linux/slab.h> > #include <linux/module.h> > #include <linux/balloon_compaction.h> > -#include <linux/oom.h> > #include <linux/wait.h> > #include <linux/mm.h> > #include <linux/mount.h> > @@ -40,12 +39,12 @@ > */ > #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT) > #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 > -#define OOM_VBALLOON_DEFAULT_PAGES 256 > +#define DEFAULT_BALLOON_PAGES_TO_SHRINK 256 > #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 > > -static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; > -module_param(oom_pages, int, S_IRUSR | S_IWUSR); > -MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); > +static unsigned long balloon_pages_to_shrink = DEFAULT_BALLOON_PAGES_TO_SHRINK; > +module_param(balloon_pages_to_shrink, ulong, 0600); > +MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on memory presure"); > > #ifdef CONFIG_BALLOON_COMPACTION > static struct vfsmount *balloon_mnt; > @@ -86,8 +85,8 @@ struct virtio_balloon { > /* Memory statistics */ > struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; > > - /* To register callback in oom notifier call chain */ > - struct notifier_block nb; > + /* To register a shrinker to shrink memory upon memory pressure */ > + struct shrinker shrinker; > }; > > static struct virtio_device_id id_table[] = { > @@ -365,38 +364,6 @@ static void update_balloon_size(struct virtio_balloon *vb) > &actual); > } > > -/* > - * virtballoon_oom_notify - release pages when system is under severe > - * memory pressure (called from out_of_memory()) > - * @self : notifier block struct > - * @dummy: not used > - * @parm : returned - number of freed pages > - * > - * The balancing of memory by use of the virtio balloon should not cause > - * the termination of processes while there are pages in the balloon. > - * If virtio balloon manages to release some memory, it will make the > - * system return and retry the allocation that forced the OOM killer > - * to run. > - */ > -static int virtballoon_oom_notify(struct notifier_block *self, > - unsigned long dummy, void *parm) > -{ > - struct virtio_balloon *vb; > - unsigned long *freed; > - unsigned num_freed_pages; > - > - vb = container_of(self, struct virtio_balloon, nb); > - if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > - return NOTIFY_OK; > - > - freed = parm; > - num_freed_pages = leak_balloon(vb, oom_pages); > - update_balloon_size(vb); > - *freed += num_freed_pages; > - > - return NOTIFY_OK; > -} > - > static void update_balloon_stats_func(struct work_struct *work) > { > struct virtio_balloon *vb; > @@ -548,6 +515,61 @@ static struct file_system_type balloon_fs = { > > #endif /* CONFIG_BALLOON_COMPACTION */ > > +static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker, > + struct shrink_control *sc) > +{ > + unsigned long pages_to_free = balloon_pages_to_shrink, > + pages_freed = 0; > + struct virtio_balloon *vb = container_of(shrinker, > + struct virtio_balloon, shrinker); > + > + /* > + * One invocation of leak_balloon can deflate at most > + * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it > + * multiple times to deflate pages till reaching > + * balloon_pages_to_shrink pages. > + */ > + while (vb->num_pages && pages_to_free) { > + pages_to_free = balloon_pages_to_shrink - pages_freed; > + pages_freed += leak_balloon(vb, pages_to_free); > + } > + update_balloon_size(vb);Are you sure that this is never called if count returned 0?> + > + return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE; > +} > + > +static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker, > + struct shrink_control *sc) > +{ > + struct virtio_balloon *vb = container_of(shrinker, > + struct virtio_balloon, shrinker); > + > + /* > + * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to handle the > + * case when shrinker needs to be invoked to relieve memory pressure. > + */ > + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > + return 0;So why not skip notifier registration when deflate on oom is clear?> + > + return min_t(unsigned long, vb->num_pages, balloon_pages_to_shrink) / > + VIRTIO_BALLOON_PAGES_PER_PAGE; > +} > + > +static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb) > +{ > + unregister_shrinker(&vb->shrinker); > +} > + > +static int virtio_balloon_register_shrinker(struct virtio_balloon *vb) > +{ > + vb->shrinker.scan_objects = virtio_balloon_shrinker_scan; > + vb->shrinker.count_objects = virtio_balloon_shrinker_count; > + vb->shrinker.batch = 0; > + vb->shrinker.seeks = DEFAULT_SEEKS; > + > + return register_shrinker(&vb->shrinker); > +} > + > static int virtballoon_probe(struct virtio_device *vdev) > { > struct virtio_balloon *vb; > @@ -580,17 +602,10 @@ static int virtballoon_probe(struct virtio_device *vdev) > if (err) > goto out_free_vb; > > - vb->nb.notifier_call = virtballoon_oom_notify; > - vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY; > - err = register_oom_notifier(&vb->nb); > - if (err < 0) > - goto out_del_vqs; > - > #ifdef CONFIG_BALLOON_COMPACTION > balloon_mnt = kern_mount(&balloon_fs); > if (IS_ERR(balloon_mnt)) { > err = PTR_ERR(balloon_mnt); > - unregister_oom_notifier(&vb->nb); > goto out_del_vqs; > } > > @@ -599,12 +614,14 @@ static int virtballoon_probe(struct virtio_device *vdev) > if (IS_ERR(vb->vb_dev_info.inode)) { > err = PTR_ERR(vb->vb_dev_info.inode); > kern_unmount(balloon_mnt); > - unregister_oom_notifier(&vb->nb); > vb->vb_dev_info.inode = NULL; > goto out_del_vqs; > } > vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops; > #endif > + err = virtio_balloon_register_shrinker(vb); > + if (err) > + goto out_del_vqs; >So we can get scans before device is ready. Leak will fail then. Why not register later after device is ready?> virtio_device_ready(vdev); > > @@ -637,7 +654,7 @@ static void virtballoon_remove(struct virtio_device *vdev) > { > struct virtio_balloon *vb = vdev->priv; > > - unregister_oom_notifier(&vb->nb); > + virtio_balloon_unregister_shrinker(vb); > > spin_lock_irq(&vb->stop_update_lock); > vb->stop_update = true; > -- > 2.7.4
Michael S. Tsirkin
2018-Jul-23  14:07 UTC
[PATCH v36 0/5] Virtio-balloon: support free page reporting
On Fri, Jul 20, 2018 at 04:33:00PM +0800, Wei Wang wrote:> This patch series is separated from the previous "Virtio-balloon > Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, > implemented by this series enables the virtio-balloon driver to report > hints of guest free pages to the host. It can be used to accelerate live > migration of VMs. Here is an introduction of this usage: > > Live migration needs to transfer the VM's memory from the source machine > to the destination round by round. For the 1st round, all the VM's memory > is transferred. From the 2nd round, only the pieces of memory that were > written by the guest (after the 1st round) are transferred. One method > that is popularly used by the hypervisor to track which part of memory is > written is to write-protect all the guest memory. > > This feature enables the optimization by skipping the transfer of guest > free pages during VM live migration. It is not concerned that the memory > pages are used after they are given to the hypervisor as a hint of the > free pages, because they will be tracked by the hypervisor and transferred > in the subsequent round if they are used and written. > > * Tests > - Test Environment > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > Guest: 8G RAM, 4 vCPU > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second > > - Test Results > - Idle Guest Live Migration Time (results are averaged over 10 runs): > - Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction > (setting page poisoning zero and enabling ksm don't affect the > comparison result) > - Guest with Linux Compilation Workload (make bzImage -j4): > - Live Migration Time (average) > Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction > - Linux Compilation Time > Optimization v.s. Legacy = 5min4s v.s. 5min12s > --> no obvious differenceI'd like to see dgilbert's take on whether this kind of gain justifies adding a PV interfaces, and what kind of guest workload is appropriate. Cc'd.> ChangeLog: > v35->v36: > - remove the mm patch, as Linus has a suggestion to get free page > addresses via allocation, instead of reading from the free page > list. > - virtio-balloon: > - replace oom notifier with shrinker; > - the guest to host communication interface remains the same as > v32. > - allocate free page blocks and send to host one by one, and free > them after sending all the pages. > > For ChangeLogs from v22 to v35, please reference > https://lwn.net/Articles/759413/ > > For ChangeLogs before v21, please reference > https://lwn.net/Articles/743660/ > > Wei Wang (5): > virtio-balloon: remove BUG() in init_vqs > virtio_balloon: replace oom notifier with shrinker > virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT > mm/page_poison: expose page_poisoning_enabled to kernel modules > virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON > > drivers/virtio/virtio_balloon.c | 456 ++++++++++++++++++++++++++++++------ > include/uapi/linux/virtio_balloon.h | 7 + > mm/page_poison.c | 6 + > 3 files changed, 394 insertions(+), 75 deletions(-) > > -- > 2.7.4
Dr. David Alan Gilbert
2018-Jul-23  14:36 UTC
[PATCH v36 0/5] Virtio-balloon: support free page reporting
* Michael S. Tsirkin (mst at redhat.com) wrote:> On Fri, Jul 20, 2018 at 04:33:00PM +0800, Wei Wang wrote: > > This patch series is separated from the previous "Virtio-balloon > > Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, > > implemented by this series enables the virtio-balloon driver to report > > hints of guest free pages to the host. It can be used to accelerate live > > migration of VMs. Here is an introduction of this usage: > > > > Live migration needs to transfer the VM's memory from the source machine > > to the destination round by round. For the 1st round, all the VM's memory > > is transferred. From the 2nd round, only the pieces of memory that were > > written by the guest (after the 1st round) are transferred. One method > > that is popularly used by the hypervisor to track which part of memory is > > written is to write-protect all the guest memory. > > > > This feature enables the optimization by skipping the transfer of guest > > free pages during VM live migration. It is not concerned that the memory > > pages are used after they are given to the hypervisor as a hint of the > > free pages, because they will be tracked by the hypervisor and transferred > > in the subsequent round if they are used and written. > > > > * Tests > > - Test Environment > > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > Guest: 8G RAM, 4 vCPU > > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second > > > > - Test Results > > - Idle Guest Live Migration Time (results are averaged over 10 runs): > > - Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction > > (setting page poisoning zero and enabling ksm don't affect the > > comparison result) > > - Guest with Linux Compilation Workload (make bzImage -j4): > > - Live Migration Time (average) > > Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction > > - Linux Compilation Time > > Optimization v.s. Legacy = 5min4s v.s. 5min12s > > --> no obvious difference > > I'd like to see dgilbert's take on whether this kind of gain > justifies adding a PV interfaces, and what kind of guest workload > is appropriate. > > Cc'd.Well, 44% is great ... although the measurement is a bit weird. a) A 2 second downtime is very large; 300-500ms is more normal b) I'm not sure what the 'average' is - is that just between a bunch of repeated migrations? c) What load was running in the guest during the live migration? An interesting measurement to add would be to do the same test but with a VM with a lot more RAM but the same load; you'd hope the gain would be even better. It would be interesting, especially because the users who are interested are people creating VMs allocated with lots of extra memory (for the worst case) but most of the time migrating when it's fairly idle. Dave> > > > ChangeLog: > > v35->v36: > > - remove the mm patch, as Linus has a suggestion to get free page > > addresses via allocation, instead of reading from the free page > > list. > > - virtio-balloon: > > - replace oom notifier with shrinker; > > - the guest to host communication interface remains the same as > > v32. > > - allocate free page blocks and send to host one by one, and free > > them after sending all the pages. > > > > For ChangeLogs from v22 to v35, please reference > > https://lwn.net/Articles/759413/ > > > > For ChangeLogs before v21, please reference > > https://lwn.net/Articles/743660/ > > > > Wei Wang (5): > > virtio-balloon: remove BUG() in init_vqs > > virtio_balloon: replace oom notifier with shrinker > > virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT > > mm/page_poison: expose page_poisoning_enabled to kernel modules > > virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON > > > > drivers/virtio/virtio_balloon.c | 456 ++++++++++++++++++++++++++++++------ > > include/uapi/linux/virtio_balloon.h | 7 + > > mm/page_poison.c | 6 + > > 3 files changed, 394 insertions(+), 75 deletions(-) > > > > -- > > 2.7.4-- Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK
Reasonably Related Threads
- [PATCH v36 0/5] Virtio-balloon: support free page reporting
- [PATCH v36 0/5] Virtio-balloon: support free page reporting
- [PATCH v36 0/5] Virtio-balloon: support free page reporting
- [PATCH v36 0/5] Virtio-balloon: support free page reporting
- [PATCH v36 0/5] Virtio-balloon: support free page reporting