Michael S. Tsirkin
2017-Oct-13 13:23 UTC
[PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote:> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > serialize against fill_balloon(). But in fill_balloon(), > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY > is specified, this allocation attempt might indirectly depend on somebody > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it > will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if > leak_balloon() is called from out_of_memory(). > > Thread1 Thread2 > fill_balloon() > takes a balloon_lock > balloon_page_enqueue() > alloc_page(GFP_HIGHUSER_MOVABLE) > direct reclaim (__GFP_FS context) takes a fs lock > waits for that fs lock alloc_page(GFP_NOFS) > __alloc_pages_may_oom() > takes the oom_lock > out_of_memory() > blocking_notifier_call_chain() > leak_balloon() > tries to take that balloon_lock and deadlocks > > Signed-off-by: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp>This doesn't deflate on oom if lock is contended, and we acked DEFLATE_ON_OOM so host actually expects us to. The proper fix isn't that hard - just avoid allocations under lock. Patch posted, pls take a look.> --- > drivers/virtio/virtio_balloon.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index f0b3a0b..03e6078 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb, > } > } > > -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) > +static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait) > { > unsigned num_freed_pages; > struct page *page; > @@ -202,7 +202,13 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) > /* We can only do one array worth at a time. */ > num = min(num, ARRAY_SIZE(vb->pfns)); > > - mutex_lock(&vb->balloon_lock); > + if (wait) > + mutex_lock(&vb->balloon_lock); > + else if (!mutex_trylock(&vb->balloon_lock)) { > + pr_info("virtio_balloon: Unable to release %lu pages due to lock contention.\n", > + (unsigned long) min(num, (size_t)vb->num_pages)); > + return 0; > + } > /* We can't release more pages than taken */ > num = min(num, (size_t)vb->num_pages); > for (vb->num_pfns = 0; vb->num_pfns < num; > @@ -367,7 +373,7 @@ static int virtballoon_oom_notify(struct notifier_block *self, > return NOTIFY_OK; > > freed = parm; > - num_freed_pages = leak_balloon(vb, oom_pages); > + num_freed_pages = leak_balloon(vb, oom_pages, false); > update_balloon_size(vb); > *freed += num_freed_pages; > > @@ -395,7 +401,7 @@ static void update_balloon_size_func(struct work_struct *work) > if (diff > 0) > diff -= fill_balloon(vb, diff); > else if (diff < 0) > - diff += leak_balloon(vb, -diff); > + diff += leak_balloon(vb, -diff, true); > update_balloon_size(vb); > > if (diff) > @@ -597,7 +603,7 @@ static void remove_common(struct virtio_balloon *vb) > { > /* There might be pages left in the balloon: free them. */ > while (vb->num_pages) > - leak_balloon(vb, vb->num_pages); > + leak_balloon(vb, vb->num_pages, true); > update_balloon_size(vb); > > /* Now we reset the device so we can clean up the queues. */ > -- > 1.8.3.1
Tetsuo Handa
2017-Oct-13 16:41 UTC
[PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
Michael S. Tsirkin wrote:> On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote: > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > serialize against fill_balloon(). But in fill_balloon(), > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY > > is specified, this allocation attempt might indirectly depend on somebody > > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect > > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via > > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via > > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock > > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it > > will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if > > leak_balloon() is called from out_of_memory(). > > > > Thread1 Thread2 > > fill_balloon() > > takes a balloon_lock > > balloon_page_enqueue() > > alloc_page(GFP_HIGHUSER_MOVABLE) > > direct reclaim (__GFP_FS context) takes a fs lock > > waits for that fs lock alloc_page(GFP_NOFS) > > __alloc_pages_may_oom() > > takes the oom_lock > > out_of_memory() > > blocking_notifier_call_chain() > > leak_balloon() > > tries to take that balloon_lock and deadlocks > > > > Signed-off-by: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp> > > This doesn't deflate on oom if lock is contended, and we acked > DEFLATE_ON_OOM so host actually expects us to.I still don't understand what is wrong with not deflating on OOM. According to https://lists.oasis-open.org/archives/virtio-dev/201504/msg00084.html , If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been negotiated, the driver MAY use pages from the balloon when \field{num_pages} is less than or equal to the actual number of pages in the balloon if this is required for system stability (e.g. if memory is required by applications running within the guest). it says "MAY" rather than "MUST". I think it is legal to be a no-op. Maybe I don't understand the difference between "deflate the balloon" and "use pages from the balloon" ? According to https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027807.html , it seems to me that the expected behavior after deflating while inflating was not defined when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was proposed. When the host increased "struct virtio_balloon_config"->num_pages and kicked the guest, the guest's update_balloon_size_func() starts calling fill_balloon() until "struct virtio_balloon"->num_pages reaches "struct virtio_balloon_config"->num_pages, doesn't it? struct virtio_balloon_config { /* Number of pages host wants Guest to give up. */ __u32 num_pages; /* Number of pages we've actually got in balloon. */ __u32 actual; }; If leak_balloon() is called via out_of_memory(), leak_balloon() will decrease "struct virtio_balloon"->num_pages. But, is "struct virtio_balloon_config"->num_pages updated when leak_balloon() is called via out_of_memory() ? If yes, update_balloon_size_func() would stop calling fill_balloon() when leak_balloon() was called via out_of_memory(). If no, update_balloon_size_func() would continue calling fill_balloon() when leak_balloon() was called via out_of_memory() via fill_balloon() via update_balloon_size_func(). That is, when fill_balloon() tries to increase "struct virtio_balloon"->num_pages, leak_balloon() which decreases "struct virtio_balloon"->num_pages is called due to indirect __GFP_DIRECT_RECLAIM dependency via out_of_memory(). As a result, fill_balloon() will continue trying to increase "struct virtio_balloon"->num_pages and leak_balloon() will continue decreasing "struct virtio_balloon"->num_pages when leak_balloon() is called via fill_balloon() via update_balloon_size_func() due to host increased "struct virtio_balloon_config"->num_pages and kicked the guest. We deflate the balloon in order to inflate the balloon. That is OOM lockup, isn't it? How is such situation better than invoking the OOM killer in order to inflate the balloon?> > The proper fix isn't that hard - just avoid allocations under lock. > > Patch posted, pls take a look.Your patch allocates pages in order to inflate the balloon, but your patch will allow leak_balloon() to deflate the balloon. How deflating the balloon (i.e. calling leak_balloon()) makes sense when allocating pages for inflating the balloon (i.e. calling fill_balloon()) ?
Michael S. Tsirkin
2017-Oct-15 00:22 UTC
[PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
On Sat, Oct 14, 2017 at 01:41:14AM +0900, Tetsuo Handa wrote:> Michael S. Tsirkin wrote: > > On Tue, Oct 10, 2017 at 07:47:37PM +0900, Tetsuo Handa wrote: > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > > serialize against fill_balloon(). But in fill_balloon(), > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] > > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY > > > is specified, this allocation attempt might indirectly depend on somebody > > > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect > > > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via > > > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via > > > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock > > > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it > > > will cause OOM lockup. Thus, do not wait for vb->balloon_lock mutex if > > > leak_balloon() is called from out_of_memory(). > > > > > > Thread1 Thread2 > > > fill_balloon() > > > takes a balloon_lock > > > balloon_page_enqueue() > > > alloc_page(GFP_HIGHUSER_MOVABLE) > > > direct reclaim (__GFP_FS context) takes a fs lock > > > waits for that fs lock alloc_page(GFP_NOFS) > > > __alloc_pages_may_oom() > > > takes the oom_lock > > > out_of_memory() > > > blocking_notifier_call_chain() > > > leak_balloon() > > > tries to take that balloon_lock and deadlocks > > > > > > Signed-off-by: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp> > > > > This doesn't deflate on oom if lock is contended, and we acked > > DEFLATE_ON_OOM so host actually expects us to. > > I still don't understand what is wrong with not deflating on OOM.Well the point of this flag is that when it's acked, host knows that it's safe to inflate the balloon to a large portion of guest memory and this won't cause an OOM situation. Without this flag, if you inflate after OOM, then it's fine as allocations fail, but if you inflate and then enter OOM, balloon won't give up memory. The flag is unfortunately hard for management tools to use which led to it still not being supported by hosts.> According to https://lists.oasis-open.org/archives/virtio-dev/201504/msg00084.html , > > If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been negotiated, the > driver MAY use pages from the balloon when \field{num_pages} > is less than or equal to the actual number of pages in the > balloon if this is required for system stability > (e.g. if memory is required by applications running within > the guest). > > it says "MAY" rather than "MUST". I think it is legal to be a no-op. > Maybe I don't understand the difference between "deflate the balloon" and > "use pages from the balloon" ?Maybe we should strengthen this to SHOULD.> According to https://lists.linuxfoundation.org/pipermail/virtualization/2014-October/027807.html , > it seems to me that the expected behavior after deflating while inflating > was not defined when VIRTIO_BALLOON_F_DEFLATE_ON_OOM was proposed.I think the assumption is that it fill back up eventually when guest does have some free memory.> When the host increased "struct virtio_balloon_config"->num_pages and > kicked the guest, the guest's update_balloon_size_func() starts calling > fill_balloon() until "struct virtio_balloon"->num_pages reaches > "struct virtio_balloon_config"->num_pages, doesn't it? > > struct virtio_balloon_config { > /* Number of pages host wants Guest to give up. */ > __u32 num_pages; > /* Number of pages we've actually got in balloon. */ > __u32 actual; > }; > > If leak_balloon() is called via out_of_memory(), leak_balloon() > will decrease "struct virtio_balloon"->num_pages. > But, is "struct virtio_balloon_config"->num_pages updated when > leak_balloon() is called via out_of_memory() ? > If yes, update_balloon_size_func() would stop calling fill_balloon() > when leak_balloon() was called via out_of_memory(). > If no, update_balloon_size_func() would continue calling fill_balloon() > when leak_balloon() was called via out_of_memory() via fill_balloon() > via update_balloon_size_func(). That is, when fill_balloon() tries to > increase "struct virtio_balloon"->num_pages, leak_balloon() which > decreases "struct virtio_balloon"->num_pages is called due to indirect > __GFP_DIRECT_RECLAIM dependency via out_of_memory(). > As a result, fill_balloon() will continue trying to increase > "struct virtio_balloon"->num_pages and leak_balloon() will continue > decreasing "struct virtio_balloon"->num_pages when leak_balloon() > is called via fill_balloon() via update_balloon_size_func() due to > host increased "struct virtio_balloon_config"->num_pages and kicked > the guest. We deflate the balloon in order to inflate the balloon. > That is OOM lockup, isn't it? How is such situation better than > invoking the OOM killer in order to inflate the balloon?I don't think it's a lockup see below.> > > > The proper fix isn't that hard - just avoid allocations under lock. > > > > Patch posted, pls take a look. > > Your patch allocates pages in order to inflate the balloon, but > your patch will allow leak_balloon() to deflate the balloon. > How deflating the balloon (i.e. calling leak_balloon()) makes sense > when allocating pages for inflating the balloon (i.e. calling > fill_balloon()) ?The idea is that fill_balloon is allocating memory with __GFP_NORETRY so it will avoid disruptive actions like the OOM killer. Under pressure it will normally fail and retry in half a second or so. Calling leak_balloon in that situation could benefit the system as a whole. I might be misunderstanding the meaning of the relevant GFP flags, pls correct me if I'm wrong. -- MST
Possibly Parallel Threads
- [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
- [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
- [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
- [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
- [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()