This patch series intends to summarize the recent contributions made by Michael S. Tsirkin, Tetsuo Handa, Michal Hocko etc. via reporting and discussing the related deadlock issues on the mailinglist. Please check each patch for details.>From a high-level point of view, this patch series achieves:1) eliminate the deadlock issue fundamentally caused by the inability to run leak_balloon and fill_balloon concurrently; 2) enable OOM to release more than 256 inflated pages; and 3) stop inflating when the guest is under severe memory pressure (i.e. OOM). Here is an example of the benefit brought by this patch series: The guest sets virtio_balloon.oom_pages=100000. When the host requests to inflate 7.9G of an 8G idle guest, the guest can still run normally since OOM can guarantee at least 100000 pages (400MB) for the guest. Without the above patches, the guest will kill all the killable processes and fall into kernel panic finally. Wei Wang (3): virtio-balloon: replace the coarse-grained balloon_lock virtio-balloon: deflate up to oom_pages on OOM virtio-balloon: stop inflating when OOM occurs drivers/virtio/virtio_balloon.c | 149 ++++++++++++++++++++++++---------------- 1 file changed, 91 insertions(+), 58 deletions(-) -- 2.7.4
Wei Wang
2017-Oct-20 11:54 UTC
[PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock
The balloon_lock was used to synchronize the access demand to elements of struct virtio_balloon and its queue operations (please see commit e22504296d). This prevents the concurrent run of the leak_balloon and fill_balloon functions, thereby resulting in a deadlock issue on OOM: fill_balloon: take balloon_lock and wait for OOM to get some memory; oom_notify: release some inflated memory via leak_balloon(); leak_balloon: wait for balloon_lock to be released by fill_balloon. This patch breaks the lock into two fine-grained inflate_lock and deflate_lock, and eliminates the unnecessary use of the shared data (i.e. vb->pnfs, vb->num_pfns). This enables leak_balloon and fill_balloon to run concurrently and solves the deadlock issue. Reported-by: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp> Signed-off-by: Wei Wang <wei.w.wang at intel.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp> Cc: Michal Hocko <mhocko at kernel.org> --- drivers/virtio/virtio_balloon.c | 102 +++++++++++++++++++++------------------- 1 file changed, 53 insertions(+), 49 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index f0b3a0b..1ecd15a 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -67,7 +67,7 @@ struct virtio_balloon { wait_queue_head_t acked; /* Number of balloon pages we've told the Host we're not using. */ - unsigned int num_pages; + atomic64_t num_pages; /* * The pages we've told the Host we're not using are enqueued * at vb_dev_info->pages list. @@ -76,12 +76,9 @@ struct virtio_balloon { */ struct balloon_dev_info vb_dev_info; - /* Synchronize access/update to this struct virtio_balloon elements */ - struct mutex balloon_lock; - - /* The array of pfns we tell the Host about. */ - unsigned int num_pfns; - __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; + /* Synchronize access to inflate_vq and deflate_vq respectively */ + struct mutex inflate_lock; + struct mutex deflate_lock; /* Memory statistics */ struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; @@ -111,12 +108,13 @@ static void balloon_ack(struct virtqueue *vq) wake_up(&vb->acked); } -static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) +static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq, + __virtio32 pfns[], unsigned int num_pfns) { struct scatterlist sg; unsigned int len; - sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns); + sg_init_one(&sg, pfns, sizeof(pfns[0]) * num_pfns); /* We should always be able to add one buffer to an empty queue. */ virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL); @@ -144,14 +142,14 @@ static void set_page_pfns(struct virtio_balloon *vb, static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) { struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; - unsigned num_allocated_pages; + unsigned int num_pfns; + __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb->pfns)); + num = min_t(size_t, num, VIRTIO_BALLOON_ARRAY_PFNS_MAX); - mutex_lock(&vb->balloon_lock); - for (vb->num_pfns = 0; vb->num_pfns < num; - vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { + for (num_pfns = 0; num_pfns < num; + num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { struct page *page = balloon_page_enqueue(vb_dev_info); if (!page) { @@ -162,20 +160,20 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) msleep(200); break; } - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); - vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; + set_page_pfns(vb, pfns + num_pfns, page); if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) adjust_managed_page_count(page, -1); } - num_allocated_pages = vb->num_pfns; + mutex_lock(&vb->inflate_lock); /* Did we get any? */ - if (vb->num_pfns != 0) - tell_host(vb, vb->inflate_vq); - mutex_unlock(&vb->balloon_lock); + if (num_pfns != 0) + tell_host(vb, vb->inflate_vq, pfns, num_pfns); + mutex_unlock(&vb->inflate_lock); + atomic64_add(num_pfns, &vb->num_pages); - return num_allocated_pages; + return num_pfns; } static void release_pages_balloon(struct virtio_balloon *vb, @@ -194,38 +192,39 @@ static void release_pages_balloon(struct virtio_balloon *vb, static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) { - unsigned num_freed_pages; struct page *page; struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; LIST_HEAD(pages); + unsigned int num_pfns; + __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb->pfns)); + num = min_t(size_t, num, VIRTIO_BALLOON_ARRAY_PFNS_MAX); - mutex_lock(&vb->balloon_lock); /* We can't release more pages than taken */ - num = min(num, (size_t)vb->num_pages); - for (vb->num_pfns = 0; vb->num_pfns < num; - vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { + num = min_t(size_t, num, atomic64_read(&vb->num_pages)); + for (num_pfns = 0; num_pfns < num; + num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { page = balloon_page_dequeue(vb_dev_info); if (!page) break; - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); + set_page_pfns(vb, pfns + num_pfns, page); list_add(&page->lru, &pages); - vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } - num_freed_pages = vb->num_pfns; /* * Note that if * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - if (vb->num_pfns != 0) - tell_host(vb, vb->deflate_vq); + mutex_lock(&vb->deflate_lock); + if (num_pfns != 0) + tell_host(vb, vb->deflate_vq, pfns, num_pfns); + mutex_unlock(&vb->deflate_lock); release_pages_balloon(vb, &pages); - mutex_unlock(&vb->balloon_lock); - return num_freed_pages; + atomic64_sub(num_pfns, &vb->num_pages); + + return num_pfns; } static inline void update_stat(struct virtio_balloon *vb, int idx, @@ -327,12 +326,12 @@ static inline s64 towards_target(struct virtio_balloon *vb) num_pages = le32_to_cpu((__force __le32)num_pages); target = num_pages; - return target - vb->num_pages; + return target - atomic64_read(&vb->num_pages); } static void update_balloon_size(struct virtio_balloon *vb) { - u32 actual = vb->num_pages; + u32 actual = atomic64_read(&vb->num_pages); /* Legacy balloon config space is LE, unlike all other devices. */ if (!virtio_has_feature(vb->vdev, VIRTIO_F_VERSION_1)) @@ -465,6 +464,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, struct virtio_balloon *vb = container_of(vb_dev_info, struct virtio_balloon, vb_dev_info); unsigned long flags; + __virtio32 pfns[VIRTIO_BALLOON_PAGES_PER_PAGE]; /* * In order to avoid lock contention while migrating pages concurrently @@ -474,8 +474,12 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, * recursion in the case it ends up triggering memory compaction * while it is attempting to inflate the ballon. */ - if (!mutex_trylock(&vb->balloon_lock)) + if (!mutex_trylock(&vb->inflate_lock)) + return -EAGAIN; + if (!mutex_trylock(&vb->deflate_lock)) { + mutex_unlock(&vb->inflate_lock); return -EAGAIN; + } get_page(newpage); /* balloon reference */ @@ -485,17 +489,16 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, vb_dev_info->isolated_pages--; __count_vm_event(BALLOON_MIGRATE); spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags); - vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, newpage); - tell_host(vb, vb->inflate_vq); + + set_page_pfns(vb, pfns, newpage); + tell_host(vb, vb->inflate_vq, pfns, VIRTIO_BALLOON_PAGES_PER_PAGE); + mutex_unlock(&vb->inflate_lock); /* balloon's page migration 2nd step -- deflate "page" */ balloon_page_delete(page); - vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; - set_page_pfns(vb, vb->pfns, page); - tell_host(vb, vb->deflate_vq); - - mutex_unlock(&vb->balloon_lock); + set_page_pfns(vb, pfns, page); + tell_host(vb, vb->deflate_vq, pfns, VIRTIO_BALLOON_PAGES_PER_PAGE); + mutex_unlock(&vb->deflate_lock); put_page(page); /* balloon reference */ @@ -542,8 +545,9 @@ static int virtballoon_probe(struct virtio_device *vdev) INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func); spin_lock_init(&vb->stop_update_lock); vb->stop_update = false; - vb->num_pages = 0; - mutex_init(&vb->balloon_lock); + atomic64_set(&vb->num_pages, 0); + mutex_init(&vb->inflate_lock); + mutex_init(&vb->deflate_lock); init_waitqueue_head(&vb->acked); vb->vdev = vdev; @@ -596,8 +600,8 @@ static int virtballoon_probe(struct virtio_device *vdev) static void remove_common(struct virtio_balloon *vb) { /* There might be pages left in the balloon: free them. */ - while (vb->num_pages) - leak_balloon(vb, vb->num_pages); + while (atomic64_read(&vb->num_pages)) + leak_balloon(vb, atomic64_read(&vb->num_pages)); update_balloon_size(vb); /* Now we reset the device so we can clean up the queues. */ -- 2.7.4
Wei Wang
2017-Oct-20 11:54 UTC
[PATCH v1 2/3] virtio-balloon: deflate up to oom_pages on OOM
The current implementation only deflates 256 pages even when a user specifies more than that via the oom_pages module param. This patch enables the deflating of up to oom_pages pages if there are enough inflated pages. Signed-off-by: Wei Wang <wei.w.wang at intel.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Michal Hocko <mhocko at kernel.org> Cc: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp> --- drivers/virtio/virtio_balloon.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 1ecd15a..ab55cf8 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -43,8 +43,8 @@ #define OOM_VBALLOON_DEFAULT_PAGES 256 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 -static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; -module_param(oom_pages, int, S_IRUSR | S_IWUSR); +static unsigned int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; +module_param(oom_pages, uint, 0600); MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); #ifdef CONFIG_BALLOON_COMPACTION @@ -359,16 +359,20 @@ static int virtballoon_oom_notify(struct notifier_block *self, { struct virtio_balloon *vb; unsigned long *freed; - unsigned num_freed_pages; + unsigned int npages = oom_pages; vb = container_of(self, struct virtio_balloon, nb); if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) return NOTIFY_OK; freed = parm; - num_freed_pages = leak_balloon(vb, oom_pages); + + /* Don't deflate more than the number of inflated pages */ + while (npages && atomic64_read(&vb->num_pages)) + npages -= leak_balloon(vb, npages); + update_balloon_size(vb); - *freed += num_freed_pages; + *freed += oom_pages - npages; return NOTIFY_OK; } -- 2.7.4
Wei Wang
2017-Oct-20 11:54 UTC
[PATCH v1 3/3] virtio-balloon: stop inflating when OOM occurs
This patch forces the cease of the inflating work when OOM occurs. The fundamental idea of memory ballooning is to take out some guest pages when the guest has low memory utilization, so it is sensible to inflate nothing when the guest is already under memory pressure. On the other hand, the policy is determined by the admin or the orchestration layer from the host. That is, the host is expected to re-start the memory inflating request at a proper time later when the guest has enough memory to inflate, for example, by checking the memory stats reported by the balloon. If another inflating requests is sent to guest when the guest is still under memory pressure, still no pages will be inflated. Signed-off-by: Wei Wang <wei.w.wang at intel.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp> Cc: Michal Hocko <mhocko at kernel.org> --- drivers/virtio/virtio_balloon.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index ab55cf8..cf29663 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -63,6 +63,15 @@ struct virtio_balloon { spinlock_t stop_update_lock; bool stop_update; + /* + * The balloon driver enters the oom mode if the oom notifier is + * invoked. Entering the oom mode will force the exit of current + * inflating work. When a later inflating request is received from + * the host, the success of memory allocation via balloon_page_enqueue + * will turn off the mode. + */ + bool oom_mode; + /* Waiting for host to ack the pages we released. */ wait_queue_head_t acked; @@ -142,22 +151,22 @@ static void set_page_pfns(struct virtio_balloon *vb, static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) { struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; + struct page *page; + size_t orig_num; unsigned int num_pfns; __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; + orig_num = num; /* We can only do one array worth at a time. */ num = min_t(size_t, num, VIRTIO_BALLOON_ARRAY_PFNS_MAX); for (num_pfns = 0; num_pfns < num; num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { - struct page *page = balloon_page_enqueue(vb_dev_info); - + page = balloon_page_enqueue(vb_dev_info); if (!page) { dev_info_ratelimited(&vb->vdev->dev, "Out of puff! Can't get %u pages\n", VIRTIO_BALLOON_PAGES_PER_PAGE); - /* Sleep for at least 1/5 of a second before retry. */ - msleep(200); break; } set_page_pfns(vb, pfns + num_pfns, page); @@ -166,6 +175,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) adjust_managed_page_count(page, -1); } + /* + * The oom_mode is set, but we've already been able to get some + * pages, so it is time to turn it off here. + */ + if (unlikely(READ_ONCE(vb->oom_mode) && page)) + WRITE_ONCE(vb->oom_mode, false); + mutex_lock(&vb->inflate_lock); /* Did we get any? */ if (num_pfns != 0) @@ -173,6 +189,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) mutex_unlock(&vb->inflate_lock); atomic64_add(num_pfns, &vb->num_pages); + /* + * If oom_mode is on, return the original @num passed by + * update_balloon_size_func to stop the inflating. + */ + if (READ_ONCE(vb->oom_mode)) + return orig_num; + return num_pfns; } @@ -365,6 +388,7 @@ static int virtballoon_oom_notify(struct notifier_block *self, if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) return NOTIFY_OK; + WRITE_ONCE(vb->oom_mode, true); freed = parm; /* Don't deflate more than the number of inflated pages */ @@ -549,6 +573,7 @@ static int virtballoon_probe(struct virtio_device *vdev) INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func); spin_lock_init(&vb->stop_update_lock); vb->stop_update = false; + vb->oom_mode = false; atomic64_set(&vb->num_pages, 0); mutex_init(&vb->inflate_lock); mutex_init(&vb->deflate_lock); -- 2.7.4
On Fri, Oct 20, 2017 at 07:54:23PM +0800, Wei Wang wrote:> This patch series intends to summarize the recent contributions made by > Michael S. Tsirkin, Tetsuo Handa, Michal Hocko etc. via reporting and > discussing the related deadlock issues on the mailinglist. Please check > each patch for details. > > >From a high-level point of view, this patch series achieves: > 1) eliminate the deadlock issue fundamentally caused by the inability > to run leak_balloon and fill_balloon concurrently;We need to think about this carefully. Is it an issue that leak can now bypass fill? It seems that we can now try to leak a page before fill was seen by host, but I did not look into it deeply. I really like my patch for this better at least for current kernel. I agree we need to work more on 2+3.> 2) enable OOM to release more than 256 inflated pages; andDoes just this help enough? How about my patch + 2? Tetsuo, what do you think?> 3) stop inflating when the guest is under severe memory pressure > (i.e. OOM).But when do we finally inflate? Question is how does host know it needs to resend an interrupt, and when should it do it?> Here is an example of the benefit brought by this patch series: > The guest sets virtio_balloon.oom_pages=100000. When the host requests > to inflate 7.9G of an 8G idle guest, the guest can still run normally > since OOM can guarantee at least 100000 pages (400MB) for the guest. > Without the above patches, the guest will kill all the killable > processes and fall into kernel panic finally. > > Wei Wang (3): > virtio-balloon: replace the coarse-grained balloon_lock > virtio-balloon: deflate up to oom_pages on OOM > virtio-balloon: stop inflating when OOM occurs > > drivers/virtio/virtio_balloon.c | 149 ++++++++++++++++++++++++---------------- > 1 file changed, 91 insertions(+), 58 deletions(-) > > -- > 2.7.4
Michael S. Tsirkin
2017-Oct-22 03:21 UTC
[PATCH v1 2/3] virtio-balloon: deflate up to oom_pages on OOM
On Fri, Oct 20, 2017 at 07:54:25PM +0800, Wei Wang wrote:> The current implementation only deflates 256 pages even when a user > specifies more than that via the oom_pages module param. This patch > enables the deflating of up to oom_pages pages if there are enough > inflated pages. > > Signed-off-by: Wei Wang <wei.w.wang at intel.com> > Cc: Michael S. Tsirkin <mst at redhat.com> > Cc: Michal Hocko <mhocko at kernel.org> > Cc: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp>This seems reasonable. Does this by itself help?> --- > drivers/virtio/virtio_balloon.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index 1ecd15a..ab55cf8 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -43,8 +43,8 @@ > #define OOM_VBALLOON_DEFAULT_PAGES 256 > #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 > > -static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; > -module_param(oom_pages, int, S_IRUSR | S_IWUSR); > +static unsigned int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; > +module_param(oom_pages, uint, 0600); > MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); > > #ifdef CONFIG_BALLOON_COMPACTION > @@ -359,16 +359,20 @@ static int virtballoon_oom_notify(struct notifier_block *self, > { > struct virtio_balloon *vb; > unsigned long *freed; > - unsigned num_freed_pages; > + unsigned int npages = oom_pages; > > vb = container_of(self, struct virtio_balloon, nb); > if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > return NOTIFY_OK; > > freed = parm; > - num_freed_pages = leak_balloon(vb, oom_pages); > + > + /* Don't deflate more than the number of inflated pages */ > + while (npages && atomic64_read(&vb->num_pages)) > + npages -= leak_balloon(vb, npages); > + > update_balloon_size(vb); > - *freed += num_freed_pages; > + *freed += oom_pages - npages; > > return NOTIFY_OK; > } > -- > 2.7.4
Tetsuo Handa
2017-Oct-22 05:20 UTC
[PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock
Wei Wang wrote:> The balloon_lock was used to synchronize the access demand to elements > of struct virtio_balloon and its queue operations (please see commit > e22504296d). This prevents the concurrent run of the leak_balloon and > fill_balloon functions, thereby resulting in a deadlock issue on OOM: > > fill_balloon: take balloon_lock and wait for OOM to get some memory; > oom_notify: release some inflated memory via leak_balloon(); > leak_balloon: wait for balloon_lock to be released by fill_balloon. > > This patch breaks the lock into two fine-grained inflate_lock and > deflate_lock, and eliminates the unnecessary use of the shared data > (i.e. vb->pnfs, vb->num_pfns). This enables leak_balloon and > fill_balloon to run concurrently and solves the deadlock issue. >> @@ -162,20 +160,20 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > msleep(200); > break; > } > - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > - vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; > + set_page_pfns(vb, pfns + num_pfns, page); > if (!virtio_has_feature(vb->vdev, > VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > adjust_managed_page_count(page, -1); > } > > - num_allocated_pages = vb->num_pfns; > + mutex_lock(&vb->inflate_lock); > /* Did we get any? */ > - if (vb->num_pfns != 0) > - tell_host(vb, vb->inflate_vq); > - mutex_unlock(&vb->balloon_lock); > + if (num_pfns != 0) > + tell_host(vb, vb->inflate_vq, pfns, num_pfns); > + mutex_unlock(&vb->inflate_lock); > + atomic64_add(num_pfns, &vb->num_pages);Isn't this addition too late? If leak_balloon() is called due to out_of_memory(), it will fail to find up to dated vb->num_pages value.> > - return num_allocated_pages; > + return num_pfns; > } > > static void release_pages_balloon(struct virtio_balloon *vb, > @@ -194,38 +192,39 @@ static void release_pages_balloon(struct virtio_balloon *vb, > > static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) > { > - unsigned num_freed_pages; > struct page *page; > struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; > LIST_HEAD(pages); > + unsigned int num_pfns; > + __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];This array consumes 1024 bytes of kernel stack, doesn't it? leak_balloon() might be called from out_of_memory() where kernel stack is already largely consumed before entering __alloc_pages_nodemask(). For reducing possibility of stack overflow, since out_of_memory() is serialized by oom_lock, I suggest using static (maybe kmalloc()ed as vb->oom_pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]) buffer when called from out_of_memory().> > /* We can only do one array worth at a time. */ > - num = min(num, ARRAY_SIZE(vb->pfns)); > + num = min_t(size_t, num, VIRTIO_BALLOON_ARRAY_PFNS_MAX); > > - mutex_lock(&vb->balloon_lock); > /* We can't release more pages than taken */ > - num = min(num, (size_t)vb->num_pages); > - for (vb->num_pfns = 0; vb->num_pfns < num; > - vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > + num = min_t(size_t, num, atomic64_read(&vb->num_pages)); > + for (num_pfns = 0; num_pfns < num; > + num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > page = balloon_page_dequeue(vb_dev_info);If balloon_page_dequeue() can be concurrently called by both host's request and guest's OOM event, is (!dequeued_page) test in balloon_page_dequeue() safe? Is such concurrency needed?> if (!page) > break; > - set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > + set_page_pfns(vb, pfns + num_pfns, page); > list_add(&page->lru, &pages); > - vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; > } > > - num_freed_pages = vb->num_pfns; > /* > * Note that if > * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); > * is true, we *have* to do it in this order > */ > - if (vb->num_pfns != 0) > - tell_host(vb, vb->deflate_vq); > + mutex_lock(&vb->deflate_lock); > + if (num_pfns != 0) > + tell_host(vb, vb->deflate_vq, pfns, num_pfns); > + mutex_unlock(&vb->deflate_lock); > release_pages_balloon(vb, &pages); > - mutex_unlock(&vb->balloon_lock); > - return num_freed_pages; > + atomic64_sub(num_pfns, &vb->num_pages);Isn't this subtraction too late?> + > + return num_pfns; > } > > static inline void update_stat(struct virtio_balloon *vb, int idx,> @@ -465,6 +464,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, > struct virtio_balloon *vb = container_of(vb_dev_info, > struct virtio_balloon, vb_dev_info); > unsigned long flags; > + __virtio32 pfns[VIRTIO_BALLOON_PAGES_PER_PAGE];If this is called from memory allocation path, maybe kmalloc()ed buffer is safer.
On 10/22/2017 11:19 AM, Michael S. Tsirkin wrote:> On Fri, Oct 20, 2017 at 07:54:23PM +0800, Wei Wang wrote: >> This patch series intends to summarize the recent contributions made by >> Michael S. Tsirkin, Tetsuo Handa, Michal Hocko etc. via reporting and >> discussing the related deadlock issues on the mailinglist. Please check >> each patch for details. >> >> >From a high-level point of view, this patch series achieves: >> 1) eliminate the deadlock issue fundamentally caused by the inability >> to run leak_balloon and fill_balloon concurrently; > We need to think about this carefully. Is it an issue that > leak can now bypass fill? It seems that we can now > try to leak a page before fill was seen by host, > but I did not look into it deeply. > > I really like my patch for this better at least for > current kernel. I agree we need to work more on 2+3.Yes, we can check more. But from the original intention: (copied from the commit e22504296d) balloon_lock (mutex) : synchronizes the access demand to elements of struct virtio_balloon and its queue operations; This implementation has covered what balloon_lock achieves. We have inflating and deflating decoupled and use a small lock for each vq respectively. I also tested inflating 20G, and before it's done, requested to deflating 20G, all work fine.> >> 2) enable OOM to release more than 256 inflated pages; and > Does just this help enough? How about my patch + 2? > Tetsuo, what do you think? > >> 3) stop inflating when the guest is under severe memory pressure >> (i.e. OOM). > But when do we finally inflate? Question is how does host know it needs > to resend an interrupt, and when should it do it?I think "when to inflate again" should be a policy defined by the orchestration layer software on the host. A reasonable inflating request should be sent to a guest on the condition that this guest has enough free memory to inflate (virtio-balloon memory stats has already supported to report that info). If the policy defines to inflate guest memory without considering whether the guest is even under memory pressure. The mechanism we provide here is to offer no pages to the host in that case. I think this should be reasonable. Best, Wei
Michael S. Tsirkin
2017-Oct-22 17:13 UTC
[PATCH v1 3/3] virtio-balloon: stop inflating when OOM occurs
On Fri, Oct 20, 2017 at 07:54:26PM +0800, Wei Wang wrote:> This patch forces the cease of the inflating work when OOM occurs. > The fundamental idea of memory ballooning is to take out some guest > pages when the guest has low memory utilization, so it is sensible to > inflate nothing when the guest is already under memory pressure. > > On the other hand, the policy is determined by the admin or the > orchestration layer from the host. That is, the host is expected to > re-start the memory inflating request at a proper time later when > the guest has enough memory to inflate, for example, by checking > the memory stats reported by the balloon.Is there any other way to do it? And if so can't we just have guest do it automatically? Maybe the issue is really that fill attempts to allocate memory aggressively instead of checking availability. Maybe with deflate on oom it should check availability?> If another inflating > requests is sent to guest when the guest is still under memory > pressure, still no pages will be inflated.Any such changes are hypervisor-visible and need a new feature bit.> Signed-off-by: Wei Wang <wei.w.wang at intel.com> > Cc: Michael S. Tsirkin <mst at redhat.com> > Cc: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp> > Cc: Michal Hocko <mhocko at kernel.org> > --- > drivers/virtio/virtio_balloon.c | 33 +++++++++++++++++++++++++++++---- > 1 file changed, 29 insertions(+), 4 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index ab55cf8..cf29663 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -63,6 +63,15 @@ struct virtio_balloon { > spinlock_t stop_update_lock; > bool stop_update; > > + /* > + * The balloon driver enters the oom mode if the oom notifier is > + * invoked. Entering the oom mode will force the exit of current > + * inflating work. When a later inflating request is received from > + * the host, the success of memory allocation via balloon_page_enqueue > + * will turn off the mode. > + */ > + bool oom_mode; > + > /* Waiting for host to ack the pages we released. */ > wait_queue_head_t acked; > > @@ -142,22 +151,22 @@ static void set_page_pfns(struct virtio_balloon *vb, > static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > { > struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; > + struct page *page; > + size_t orig_num; > unsigned int num_pfns; > __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]; > > + orig_num = num; > /* We can only do one array worth at a time. */ > num = min_t(size_t, num, VIRTIO_BALLOON_ARRAY_PFNS_MAX); > > for (num_pfns = 0; num_pfns < num; > num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > - struct page *page = balloon_page_enqueue(vb_dev_info); > - > + page = balloon_page_enqueue(vb_dev_info); > if (!page) { > dev_info_ratelimited(&vb->vdev->dev, > "Out of puff! Can't get %u pages\n", > VIRTIO_BALLOON_PAGES_PER_PAGE); > - /* Sleep for at least 1/5 of a second before retry. */ > - msleep(200); > break; > } > set_page_pfns(vb, pfns + num_pfns, page); > @@ -166,6 +175,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > adjust_managed_page_count(page, -1); > } > > + /* > + * The oom_mode is set, but we've already been able to get some > + * pages, so it is time to turn it off here. > + */ > + if (unlikely(READ_ONCE(vb->oom_mode) && page)) > + WRITE_ONCE(vb->oom_mode, false); > + > mutex_lock(&vb->inflate_lock); > /* Did we get any? */ > if (num_pfns != 0) > @@ -173,6 +189,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > mutex_unlock(&vb->inflate_lock); > atomic64_add(num_pfns, &vb->num_pages); > > + /* > + * If oom_mode is on, return the original @num passed by > + * update_balloon_size_func to stop the inflating. > + */ > + if (READ_ONCE(vb->oom_mode)) > + return orig_num; > + > return num_pfns; > } > > @@ -365,6 +388,7 @@ static int virtballoon_oom_notify(struct notifier_block *self, > if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > return NOTIFY_OK; > > + WRITE_ONCE(vb->oom_mode, true); > freed = parm; > > /* Don't deflate more than the number of inflated pages */ > @@ -549,6 +573,7 @@ static int virtballoon_probe(struct virtio_device *vdev) > INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func); > spin_lock_init(&vb->stop_update_lock); > vb->stop_update = false; > + vb->oom_mode = false; > atomic64_set(&vb->num_pages, 0); > mutex_init(&vb->inflate_lock); > mutex_init(&vb->deflate_lock); > -- > 2.7.4
On 10/22/2017 11:19 AM, Michael S. Tsirkin wrote:> On Fri, Oct 20, 2017 at 07:54:23PM +0800, Wei Wang wrote: >> This patch series intends to summarize the recent contributions made by >> Michael S. Tsirkin, Tetsuo Handa, Michal Hocko etc. via reporting and >> discussing the related deadlock issues on the mailinglist. Please check >> each patch for details. >> >> >From a high-level point of view, this patch series achieves: >> 1) eliminate the deadlock issue fundamentally caused by the inability >> to run leak_balloon and fill_balloon concurrently; > We need to think about this carefully. Is it an issue that > leak can now bypass fill? It seems that we can now > try to leak a page before fill was seen by host, > but I did not look into it deeply. > > I really like my patch for this better at least for > current kernel. I agree we need to work more on 2+3. >Since we have many customers interested in the "Virtio-balloon Enhancement" series, please review the v17 patches first (it has a dependency on your patch for that deadlock fix, so I included it there too), and we can get back to 2+3 here after that series is done. Thanks. Best, Wei
Possibly Parallel Threads
- [PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock
- [PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock
- [PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock
- [PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock
- [PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock