Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free memory at the last moment before some process get killed by OOM-killer. This does not provide a security breach as baloon itself is running inside guest OS and is working in the cooperation with the host. Thus some improvements from guest side should be considered as normal. To solve the problem, introduce a virtio_balloon callback which is expected to be called from the oom notifier call chain in out_of_memory() function. If virtio balloon could release some memory, it will make the system to return and retry the allocation that forced the out of memory killer to run. Patch 1 of this series adds support for implementation of virtio_balloon callback, so now leak_balloon() function returns number of freed pages. Patch 2 implements virtio_balloon callback itself. Changes from v2: - added feature bit to control OOM baloon behavior from host Changes from v1: - minor cosmetic tweaks suggested by rusty@ Signed-off-by: Raushaniya Maksudova <rmaksudova at parallels.com> Signed-off-by: Denis V. Lunev <den at openvz.org> CC: Rusty Russell <rusty at rustcorp.com.au> CC: Michael S. Tsirkin <mst at redhat.com>
Denis V. Lunev
2014-Oct-15 15:47 UTC
[PATCH 1/2] virtio_balloon: return the amount of freed memory from leak_balloon()
From: Raushaniya Maksudova <rmaksudova at parallels.com> This value would be useful in the next patch to provide the amount of the freed memory for OOM killer. Signed-off-by: Raushaniya Maksudova <rmaksudova at parallels.com> Signed-off-by: Denis V. Lunev <den at openvz.org> CC: Rusty Russell <rusty at rustcorp.com.au> CC: Michael S. Tsirkin <mst at redhat.com> --- drivers/virtio/virtio_balloon.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index f893148..66cac10 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -168,8 +168,9 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num) } } -static void leak_balloon(struct virtio_balloon *vb, size_t num) +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) { + unsigned num_freed_pages; struct page *page; struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; @@ -186,6 +187,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } + num_freed_pages = vb->num_pfns; /* * Note that if * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); @@ -195,6 +197,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) tell_host(vb, vb->deflate_vq); mutex_unlock(&vb->balloon_lock); release_pages_by_pfn(vb->pfns, vb->num_pfns); + return num_freed_pages; } static inline void update_stat(struct virtio_balloon *vb, int idx, -- 1.9.1
Denis V. Lunev
2014-Oct-15 15:47 UTC
[PATCH 2/2] virtio_balloon: free some memory from balloon on OOM
From: Raushaniya Maksudova <rmaksudova at parallels.com> Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free some memory at the last moment before some process will be get killed by OOM-killer. This does not provide a security breach as balloon itself is running inside guest OS and is working in the cooperation with the host. Thus some improvements from guest side should be considered as normal. To solve the problem, introduce a virtio_balloon callback which is expected to be called from the oom notifier call chain in out_of_memory() function. If virtio balloon could release some memory, it will make the system to return and retry the allocation that forced the out of memory killer to run. Allocate virtio feature bit for this: it is not set by default, the the guest will not deflate virtio balloon on OOM without explicit permission from host. Signed-off-by: Raushaniya Maksudova <rmaksudova at parallels.com> Signed-off-by: Denis V. Lunev <den at openvz.org> CC: Rusty Russell <rusty at rustcorp.com.au> CC: Michael S. Tsirkin <mst at redhat.com> --- drivers/virtio/virtio_balloon.c | 52 +++++++++++++++++++++++++++++++++++++ include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 53 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 66cac10..88d73a0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -28,6 +28,7 @@ #include <linux/slab.h> #include <linux/module.h> #include <linux/balloon_compaction.h> +#include <linux/oom.h> /* * Balloon device works in 4K page units. So each page is pointed to by @@ -36,6 +37,12 @@ */ #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT) #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 +#define OOM_VBALLOON_DEFAULT_PAGES 256 +#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 + +static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; +module_param(oom_pages, int, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); struct virtio_balloon { @@ -71,6 +78,9 @@ struct virtio_balloon /* Memory statistics */ int need_stats_update; struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; + + /* To register callback in oom notifier call chain */ + struct notifier_block nb; }; static struct virtio_device_id id_table[] = { @@ -290,6 +300,38 @@ static void update_balloon_size(struct virtio_balloon *vb) &actual); } +/* + * virtballoon_oom_notify - release pages when system is under severe + * memory pressure (called from out_of_memory()) + * @self : notifier block struct + * @dummy: not used + * @parm : returned - number of freed pages + * + * The balancing of memory by use of the virtio balloon should not cause + * the termination of processes while there are pages in the balloon. + * If virtio balloon manages to release some memory, it will make the + * system return and retry the allocation that forced the OOM killer + * to run. + */ +static int virtballoon_oom_notify(struct notifier_block *self, + unsigned long dummy, void *parm) +{ + struct virtio_balloon *vb; + unsigned long *freed; + unsigned num_freed_pages; + + vb = container_of(self, struct virtio_balloon, nb); + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) + return NOTIFY_OK; + + freed = parm; + num_freed_pages = leak_balloon(vb, oom_pages); + update_balloon_size(vb); + *freed += num_freed_pages; + + return NOTIFY_OK; +} + static int balloon(void *_vballoon) { struct virtio_balloon *vb = _vballoon; @@ -446,6 +488,12 @@ static int virtballoon_probe(struct virtio_device *vdev) if (err) goto out_free_vb; + vb->nb.notifier_call = virtballoon_oom_notify; + vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY; + err = register_oom_notifier(&vb->nb); + if (err < 0) + goto out_oom_notify; + vb->thread = kthread_run(balloon, vb, "vballoon"); if (IS_ERR(vb->thread)) { err = PTR_ERR(vb->thread); @@ -455,6 +503,8 @@ static int virtballoon_probe(struct virtio_device *vdev) return 0; out_del_vqs: + unregister_oom_notifier(&vb->nb); +out_oom_notify: vdev->config->del_vqs(vdev); out_free_vb: kfree(vb); @@ -479,6 +529,7 @@ static void virtballoon_remove(struct virtio_device *vdev) { struct virtio_balloon *vb = vdev->priv; + unregister_oom_notifier(&vb->nb); kthread_stop(vb->thread); remove_common(vb); kfree(vb); @@ -516,6 +567,7 @@ static int virtballoon_restore(struct virtio_device *vdev) static unsigned int features[] = { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, + VIRTIO_BALLOON_F_DEFLATE_ON_OOM, }; static struct virtio_driver virtio_balloon_driver = { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index 5e26f61..be40f70 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -31,6 +31,7 @@ /* The feature bitmap for virtio balloon */ #define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ +#define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 -- 1.9.1
Rusty Russell
2014-Oct-19 23:31 UTC
[PATCH 2/2] virtio_balloon: free some memory from balloon on OOM
"Denis V. Lunev" <den at openvz.org> writes:> From: Raushaniya Maksudova <rmaksudova at parallels.com> > > Excessive virtio_balloon inflation can cause invocation of OOM-killer, > when Linux is under severe memory pressure. Various mechanisms are > responsible for correct virtio_balloon memory management. Nevertheless > it is often the case that these control tools does not have enough time > to react on fast changing memory load. As a result OS runs out of memory > and invokes OOM-killer. The balancing of memory by use of the virtio > balloon should not cause the termination of processes while there are > pages in the balloon. Now there is no way for virtio balloon driver to > free some memory at the last moment before some process will be get > killed by OOM-killer. > > This does not provide a security breach as balloon itself is running > inside guest OS and is working in the cooperation with the host. Thus > some improvements from guest side should be considered as normal. > > To solve the problem, introduce a virtio_balloon callback which is > expected to be called from the oom notifier call chain in out_of_memory() > function. If virtio balloon could release some memory, it will make > the system to return and retry the allocation that forced the out of > memory killer to run. > > Allocate virtio feature bit for this: it is not set by default, > the the guest will not deflate virtio balloon on OOM without explicit > permission from host. > > Signed-off-by: Raushaniya Maksudova <rmaksudova at parallels.com> > Signed-off-by: Denis V. Lunev <den at openvz.org> > CC: Rusty Russell <rusty at rustcorp.com.au> > CC: Michael S. Tsirkin <mst at redhat.com> > --- > drivers/virtio/virtio_balloon.c | 52 +++++++++++++++++++++++++++++++++++++ > include/uapi/linux/virtio_balloon.h | 1 + > 2 files changed, 53 insertions(+) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index 66cac10..88d73a0 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -28,6 +28,7 @@ > #include <linux/slab.h> > #include <linux/module.h> > #include <linux/balloon_compaction.h> > +#include <linux/oom.h> > > /* > * Balloon device works in 4K page units. So each page is pointed to by > @@ -36,6 +37,12 @@ > */ > #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT) > #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 > +#define OOM_VBALLOON_DEFAULT_PAGES 256 > +#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 > + > +static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; > +module_param(oom_pages, int, S_IRUSR | S_IWUSR); > +MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); > > struct virtio_balloon > { > @@ -71,6 +78,9 @@ struct virtio_balloon > /* Memory statistics */ > int need_stats_update; > struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; > + > + /* To register callback in oom notifier call chain */ > + struct notifier_block nb; > }; > > static struct virtio_device_id id_table[] = { > @@ -290,6 +300,38 @@ static void update_balloon_size(struct virtio_balloon *vb) > &actual); > } > > +/* > + * virtballoon_oom_notify - release pages when system is under severe > + * memory pressure (called from out_of_memory()) > + * @self : notifier block struct > + * @dummy: not used > + * @parm : returned - number of freed pages > + * > + * The balancing of memory by use of the virtio balloon should not cause > + * the termination of processes while there are pages in the balloon. > + * If virtio balloon manages to release some memory, it will make the > + * system return and retry the allocation that forced the OOM killer > + * to run. > + */ > +static int virtballoon_oom_notify(struct notifier_block *self, > + unsigned long dummy, void *parm) > +{ > + struct virtio_balloon *vb; > + unsigned long *freed; > + unsigned num_freed_pages; > + > + vb = container_of(self, struct virtio_balloon, nb); > + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > + return NOTIFY_OK; > + > + freed = parm; > + num_freed_pages = leak_balloon(vb, oom_pages); > + update_balloon_size(vb); > + *freed += num_freed_pages; > + > + return NOTIFY_OK; > +} > + > static int balloon(void *_vballoon) > { > struct virtio_balloon *vb = _vballoon;OK, I applied these. I thought it a bit weird that you're checking the feature in the callback, rather than only registering the callback if the feature exists. But it's clear enough. Thanks, Rusty.
Michael S. Tsirkin
2014-Oct-20 06:55 UTC
[PATCH v3 0/2] shrink virtio baloon on OOM in guest
On Wed, Oct 15, 2014 at 07:47:42PM +0400, Denis V. Lunev wrote:> Excessive virtio_balloon inflation can cause invocation of OOM-killer, when > Linux is under severe memory pressure. Various mechanisms are responsible for > correct virtio_balloon memory management. Nevertheless it is often the case > that these control tools does not have enough time to react on fast changing > memory load. As a result OS runs out of memory and invokes OOM-killer. > The balancing of memory by use of the virtio balloon should not cause the > termination of processes while there are pages in the balloon. Now there is > no way for virtio balloon driver to free memory at the last moment before > some process get killed by OOM-killer. > > This does not provide a security breach as baloon itself is running > inside guest OS and is working in the cooperation with the host. Thus > some improvements from guest side should be considered as normal. > > To solve the problem, introduce a virtio_balloon callback which is expected > to be called from the oom notifier call chain in out_of_memory() function. > If virtio balloon could release some memory, it will make the system to > return and retry the allocation that forced the out of memory killer to run. > > Patch 1 of this series adds support for implementation of virtio_balloon > callback, so now leak_balloon() function returns number of freed pages. > Patch 2 implements virtio_balloon callback itself. > > Changes from v2: > - added feature bit to control OOM baloon behavior from host > Changes from v1: > - minor cosmetic tweaks suggested by rusty@ > > Signed-off-by: Raushaniya Maksudova <rmaksudova at parallels.com> > Signed-off-by: Denis V. Lunev <den at openvz.org> > CC: Rusty Russell <rusty at rustcorp.com.au> > CC: Michael S. Tsirkin <mst at redhat.com>With the feature bit, I think it's fine. Acked-by: Michael S. Tsirkin <mst at redhat.com>