Tetsuo Handa
2017-Sep-11 10:27 UTC
mm, virtio: possible OOM lockup at virtballoon_oom_notify()
Hello. I noticed that virtio_balloon is using register_oom_notifier() and leak_balloon() from virtballoon_oom_notify() might depend on __GFP_DIRECT_RECLAIM memory allocation. In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to serialize against fill_balloon(). But in fill_balloon(), alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). And leak_balloon() is called by virtballoon_oom_notify() via blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex at leak_balloon(). Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf() request from leak_balloon() from virtballoon_oom_notify() from blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup because oom_lock mutex is already held before calling out_of_memory(). OOM notifier callback should not (directly or indirectly) depend on __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?
Michael S. Tsirkin
2017-Sep-29 04:00 UTC
mm, virtio: possible OOM lockup at virtballoon_oom_notify()
On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:> Hello. > > I noticed that virtio_balloon is using register_oom_notifier() and > leak_balloon() from virtballoon_oom_notify() might depend on > __GFP_DIRECT_RECLAIM memory allocation. > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > serialize against fill_balloon(). But in fill_balloon(), > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). > And leak_balloon() is called by virtballoon_oom_notify() via > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex > at leak_balloon().That would be tricky to fix. I guess we'll need to drop the lock while allocating memory - not an easy fix.> Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf() > request from leak_balloon() from virtballoon_oom_notify() from > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup > because oom_lock mutex is already held before calling out_of_memory().I guess we should just do GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?> > OOM notifier callback should not (directly or indirectly) depend on > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?
Tetsuo Handa
2017-Sep-29 04:44 UTC
mm, virtio: possible OOM lockup at virtballoon_oom_notify()
Michael S. Tsirkin wrote:> On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote: > > Hello. > > > > I noticed that virtio_balloon is using register_oom_notifier() and > > leak_balloon() from virtballoon_oom_notify() might depend on > > __GFP_DIRECT_RECLAIM memory allocation. > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > serialize against fill_balloon(). But in fill_balloon(), > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). > > And leak_balloon() is called by virtballoon_oom_notify() via > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex > > at leak_balloon(). > > That would be tricky to fix. I guess we'll need to drop the lock > while allocating memory - not an easy fix. > > > Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via > > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf() > > request from leak_balloon() from virtballoon_oom_notify() from > > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup > > because oom_lock mutex is already held before calling out_of_memory(). > > I guess we should just do > > GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?Yes, but GFP_KERNEL & ~__GFP_DIRECT_RECLAIM will effectively be GFP_NOWAIT, for __GFP_IO and __GFP_FS won't make sense without __GFP_DIRECT_RECLAIM. It might significantly increases possibility of memory allocation failure.> > > > > > OOM notifier callback should not (directly or indirectly) depend on > > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency? >Another idea would be to use a kernel thread (or workqueue) so that virtballoon_oom_notify() can wait with timeout. We could offload entire blocking_notifier_call_chain(&oom_notify_list, 0, &freed) call to a kernel thread (or workqueue) with timeout if MM folks agree.
Michael S. Tsirkin
2017-Oct-02 03:59 UTC
[RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:> Tetsuo Handa wrote: > > Michael S. Tsirkin wrote: > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote: > > > > Hello. > > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and > > > > leak_balloon() from virtballoon_oom_notify() might depend on > > > > __GFP_DIRECT_RECLAIM memory allocation. > > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > > > serialize against fill_balloon(). But in fill_balloon(), > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). > > > > And leak_balloon() is called by virtballoon_oom_notify() via > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex > > > > at leak_balloon(). > > > > > > That would be tricky to fix. I guess we'll need to drop the lock > > > while allocating memory - not an easy fix. > > > > > > > Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via > > > > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf() > > > > request from leak_balloon() from virtballoon_oom_notify() from > > > > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup > > > > because oom_lock mutex is already held before calling out_of_memory(). > > > > > > I guess we should just do > > > > > > GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then? > > > > Yes, but GFP_KERNEL & ~__GFP_DIRECT_RECLAIM will effectively be GFP_NOWAIT, for > > __GFP_IO and __GFP_FS won't make sense without __GFP_DIRECT_RECLAIM. It might > > significantly increases possibility of memory allocation failure. > > > > > > > > > > > > > > > > OOM notifier callback should not (directly or indirectly) depend on > > > > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency? > > > > > > > Another idea would be to use a kernel thread (or workqueue) so that > > virtballoon_oom_notify() can wait with timeout. > > > > We could offload entire blocking_notifier_call_chain(&oom_notify_list, 0, &freed) > > call to a kernel thread (or workqueue) with timeout if MM folks agree. > > > > Below is a patch which offloads blocking_notifier_call_chain() call. What do you think? > ---------------------------------------- > [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread. > > Since oom_notify_list is traversed via blocking_notifier_call_chain(), > it is legal to sleep inside OOM notifier callback function. > > However, since oom_notify_list is traversed with oom_lock held, > __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt cannot > fail when traversing oom_notify_list entries. Therefore, OOM notifier > callback function should not (directly or indirectly) depend on > __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt. > > Currently there are 5 register_oom_notifier() users in the mainline kernel. > > arch/powerpc/platforms/pseries/cmm.c > arch/s390/mm/cmm.c > drivers/gpu/drm/i915/i915_gem_shrinker.c > drivers/virtio/virtio_balloon.c > kernel/rcu/tree_plugin.h > > Among these users, at least virtio_balloon.c has possibility of OOM lockup > because it is using mutex which can depend on GFP_KERNEL memory allocations. > (Both cmm.c seem to be safe as they use spinlocks. I'm not sure about > tree_plugin.h and i915_gem_shrinker.c . Please check.) > > But converting such allocations to use GFP_NOWAIT is not only prone to > allocation failures under memory pressure but also difficult to audit > whether all locations are converted to use GFP_NOWAIT. > > Therefore, this patch offloads blocking_notifier_call_chain() call to a > dedicated kernel thread and wait for completion with timeout of 5 seconds > so that we can completely forget about possibility of OOM lockup due to > OOM notifier callback function. > > (5 seconds is chosen from my guess that blocking_notifier_call_chain() > should not take long, for we are using mutex_trylock(&oom_lock) at > __alloc_pages_may_oom() based on an assumption that out_of_memory() should > reclaim memory shortly.) > > The kernel thread is created upon first register_oom_notifier() call. > Thus, those environments which do not use register_oom_notifier() will > not waste resource for the dedicated kernel thread. > > Signed-off-by: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp> > --- > mm/oom_kill.c | 40 ++++++++++++++++++++++++++++++++++++---- > 1 file changed, 36 insertions(+), 4 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index dee0f75..d9744f7 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -981,9 +981,37 @@ static void check_panic_on_oom(struct oom_control *oc, > } > > static BLOCKING_NOTIFIER_HEAD(oom_notify_list); > +static bool oom_notifier_requested; > +static unsigned long oom_notifier_freed; > +static struct task_struct *oom_notifier_th; > +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_request_wait); > +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_response_wait); > + > +static int oom_notifier(void *unused) > +{ > + while (true) { > + wait_event_freezable(oom_notifier_request_wait, > + oom_notifier_requested); > + blocking_notifier_call_chain(&oom_notify_list, 0, > + &oom_notifier_freed); > + oom_notifier_requested = false; > + wake_up(&oom_notifier_response_wait); > + } > + return 0; > +} > > int register_oom_notifier(struct notifier_block *nb) > { > + if (!oom_notifier_th) { > + struct task_struct *th = kthread_run(oom_notifier, NULL, > + "oom_notifier"); > + > + if (IS_ERR(th)) { > + pr_err("Unable to start OOM notifier thread.\n"); > + return (int) PTR_ERR(th); > + } > + oom_notifier_th = th; > + } > return blocking_notifier_chain_register(&oom_notify_list, nb); > } > EXPORT_SYMBOL_GPL(register_oom_notifier); > @@ -1005,17 +1033,21 @@ int unregister_oom_notifier(struct notifier_block *nb) > */ > bool out_of_memory(struct oom_control *oc) > { > - unsigned long freed = 0; > enum oom_constraint constraint = CONSTRAINT_NONE; > > if (oom_killer_disabled) > return false; > > - if (!is_memcg_oom(oc)) { > - blocking_notifier_call_chain(&oom_notify_list, 0, &freed); > - if (freed > 0) > + if (!is_memcg_oom(oc) && oom_notifier_th) { > + oom_notifier_requested = true; > + wake_up(&oom_notifier_request_wait); > + wait_event_timeout(oom_notifier_response_wait, > + !oom_notifier_requested, 5 * HZ);I guess this means what was earlier a deadlock will free up after 5 seconds, by a 5 sec downtime is still a lot, isn't it?> + if (oom_notifier_freed) { > + oom_notifier_freed = 0; > /* Got some memory back in the last second. */ > return true; > + } > } > > /* > -- > 1.8.3.1
Michal Hocko
2017-Oct-02 09:06 UTC
[RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
[Hmm, I do not see the original patch which this has been a reply to] On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:> On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote: > > Tetsuo Handa wrote: > > > Michael S. Tsirkin wrote: > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote: > > > > > Hello. > > > > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and > > > > > leak_balloon() from virtballoon_oom_notify() might depend on > > > > > __GFP_DIRECT_RECLAIM memory allocation. > > > > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > > > > serialize against fill_balloon(). But in fill_balloon(), > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). > > > > > And leak_balloon() is called by virtballoon_oom_notify() via > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex > > > > > at leak_balloon().This is really nasty! And I would argue that this is an abuse of the oom notifier interface from the virtio code. OOM notifiers are an ugly hack on its own but all its users have to be really careful to not depend on any allocation request because that is a straight deadlock situation. I do not think that making oom notifier API more complex is the way to go. Can we simply change the lock to try_lock? If the lock is held we would simply fall back to the normal OOM handling. As a follow up it would be great if virtio could use some other form of aging e.g. shrinker. -- Michal Hocko SUSE Labs
Michal Hocko
2017-Oct-02 11:50 UTC
[RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:> Michal Hocko wrote: > > [Hmm, I do not see the original patch which this has been a reply to] > > urbl.hostedemail.com and b.barracudacentral.org blocked my IP address, > and the rest are "Recipient address rejected: Greylisted" or > "Deferred: 451-4.3.0 Multiple destination domains per transaction is unsupported.", > and after all dropped at the servers. Sad... > > > > > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote: > > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote: > > > > Tetsuo Handa wrote: > > > > > Michael S. Tsirkin wrote: > > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote: > > > > > > > Hello. > > > > > > > > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and > > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on > > > > > > > __GFP_DIRECT_RECLAIM memory allocation. > > > > > > > > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > > > > > > serialize against fill_balloon(). But in fill_balloon(), > > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies > > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might > > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory > > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach > > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). > > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via > > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already > > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, > > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex > > > > > > > at leak_balloon(). > > > > This is really nasty! And I would argue that this is an abuse of the oom > > notifier interface from the virtio code. OOM notifiers are an ugly hack > > on its own but all its users have to be really careful to not depend on > > any allocation request because that is a straight deadlock situation. > > Please describe such warning at > "int register_oom_notifier(struct notifier_block *nb)" definition.Yes, we can and should do that. Although I would prefer to simply document this API as deprecated. Care to send a patch? I am quite busy with other stuff.> > I do not think that making oom notifier API more complex is the way to > > go. Can we simply change the lock to try_lock? > > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used > which will fail to make progress due to oom_lock already held. Therefore, > virtballoon_oom_notify() needs to guarantee that all allocation attempts use > GFP_NOWAIT when called from virtballoon_oom_notify().Ohh, I missed your point and thought the dependency is indirect and some other call path is allocating while holding the lock. But you seem to be right and leak_balloon tell_host virtqueue_add_outbuf virtqueue_add can do GFP_KERNEL allocation and this is clearly wrong. Nobody should try to allocate while we are in the OOM path. Michael, is there any way to drop this? -- Michal Hocko SUSE Labs
Michael S. Tsirkin
2017-Oct-02 14:11 UTC
[RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:> On Mon 02-10-17 20:33:52, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > [Hmm, I do not see the original patch which this has been a reply to] > > > > urbl.hostedemail.com and b.barracudacentral.org blocked my IP address, > > and the rest are "Recipient address rejected: Greylisted" or > > "Deferred: 451-4.3.0 Multiple destination domains per transaction is unsupported.", > > and after all dropped at the servers. Sad... > > > > > > > > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote: > > > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote: > > > > > Tetsuo Handa wrote: > > > > > > Michael S. Tsirkin wrote: > > > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote: > > > > > > > > Hello. > > > > > > > > > > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and > > > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on > > > > > > > > __GFP_DIRECT_RECLAIM memory allocation. > > > > > > > > > > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > > > > > > > serialize against fill_balloon(). But in fill_balloon(), > > > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies > > > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might > > > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory > > > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach > > > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). > > > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via > > > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already > > > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, > > > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex > > > > > > > > at leak_balloon(). > > > > > > This is really nasty! And I would argue that this is an abuse of the oom > > > notifier interface from the virtio code. OOM notifiers are an ugly hack > > > on its own but all its users have to be really careful to not depend on > > > any allocation request because that is a straight deadlock situation. > > > > Please describe such warning at > > "int register_oom_notifier(struct notifier_block *nb)" definition. > > Yes, we can and should do that. Although I would prefer to simply > document this API as deprecated. Care to send a patch? I am quite busy > with other stuff. > > > > I do not think that making oom notifier API more complex is the way to > > > go. Can we simply change the lock to try_lock? > > > > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the > > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used > > which will fail to make progress due to oom_lock already held. Therefore, > > virtballoon_oom_notify() needs to guarantee that all allocation attempts use > > GFP_NOWAIT when called from virtballoon_oom_notify(). > > Ohh, I missed your point and thought the dependency is indirectI do think this is the case. See below.> and some > other call path is allocating while holding the lock. But you seem to be > right and > leak_balloon > tell_host > virtqueue_add_outbuf > virtqueue_add > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should > try to allocate while we are in the OOM path. Michael, is there any way > to drop this?Yes - in practice it won't ever allocate - that path is never taken with add_outbuf - it is for add_sgs only. IMHO the issue is balloon inflation which needs to allocate memory. It does it under a mutex, and oom handler tries to take the same mutex.> -- > Michal Hocko > SUSE Labs
Apparently Analagous Threads
- mm, virtio: possible OOM lockup at virtballoon_oom_notify()
- [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
- [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
- [PATCH] virtio_balloon: fix deadlock on OOM
- [PATCH] virtio_balloon: fix deadlock on OOM