Eric Ren
2016-Sep-12 02:04 UTC
[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
Hi Joseph, On 09/12/2016 09:37 AM, Joseph Qi wrote:> Hi Eric, > > On 2016/9/10 17:55, Eric Ren wrote: >> The testcase "mmaptruncate" of ocfs2-test deadlocked occasionally. >> >> In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it; >> there are 2 process repeatedly performing the following operations >> respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, >> 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) >> and then ftruncate(fd, CLUSTER_SIZE) again and again. >> >> This is the backtrace when the deadlock happens: >> [<ffffffff817054f0>] __wait_on_bit_lock+0x50/0xa0 >> [<ffffffff81199bd7>] __lock_page+0xb7/0xc0 >> [<ffffffff810c4de0>] ? autoremove_wake_function+0x40/0x40 >> [<ffffffffa0440f4f>] ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2] >> [<ffffffffa0462a50>] ? ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2] >> [<ffffffffa0467b47>] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2] >> [<ffffffff811cf286>] do_page_mkwrite+0x66/0xc0 >> [<ffffffff811d3635>] handle_mm_fault+0x685/0x1350 >> [<ffffffff81039dc0>] ? __fpu__restore_sig+0x70/0x530 >> [<ffffffff810694c8>] __do_page_fault+0x1d8/0x4d0 >> [<ffffffff81069827>] trace_do_page_fault+0x37/0xf0 >> [<ffffffff81061e69>] do_async_page_fault+0x19/0x70 >> [<ffffffff8170ac98>] async_page_fault+0x28/0x30 >> >> In ocfs2_write_begin_nolock(), we first grab the pages and then >> allocate disk space for this write; ocfs2_try_to_free_truncate_log() >> will be called if ENOSPC is turned; if we're lucky to get enough clusters, >> which is usually the case, we start over again. But in ocfs2_free_write_ctxt() >> the target page isn't unlocked, so we will deadlock when trying to grab >> the target page again. > IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and > w_target_locked is set to true, and then will be unlocked by > ocfs2_unlock_pages in ocfs2_free_write_ctxt. > So I'm not getting the case "page isn't unlock". Could you please explain > it in more detail?Thanks for review;-) Follow up the calling chain: ocfs2_free_write_ctxt() ->ocfs2_unlock_pages() in ocfs2_unlock_pages (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we can see the code just put_page(target_page), but not unlock it. Yeah, I will think this a bit more like: why not unlock the target_page there? Is there other potential problems if the "ret" is not "-ENOSPC" but other possible error code? Thanks, Eric> > Thanks, > Joseph > >> Fix this issue by unlocking the target page after we fail to allocate >> enough space at the first time. >> >> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. >> >> Signed-off-by: Eric Ren <zren at suse.com> >> --- >> fs/ocfs2/aops.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >> index 98d3654..78d1d67 100644 >> --- a/fs/ocfs2/aops.c >> +++ b/fs/ocfs2/aops.c >> @@ -1860,6 +1860,13 @@ out: >> */ >> try_free = 0; >> >> + /* >> + * Unlock mmap_page because the page has been locked when we >> + * are here. >> + */ >> + if (mmap_page) >> + unlock_page(mmap_page); >> + >> ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need); >> if (ret1 == 1) >> goto try_again; >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20160912/73d4d6f7/attachment.html
Eric Ren
2016-Sep-12 03:06 UTC
[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
Hi,>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and >> w_target_locked is set to true, and then will be unlocked by >> ocfs2_unlock_pages in ocfs2_free_write_ctxt. >> So I'm not getting the case "page isn't unlock". Could you please explain >> it in more detail? > Thanks for review;-) Follow up the calling chain: > > ocfs2_free_write_ctxt() > ->ocfs2_unlock_pages() > > in ocfs2_unlock_pages > (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we > can see the code just put_page(target_page), but not unlock it. > > Yeah, I will think this a bit more like: > why not unlock the target_page there? Is there other potential problems if the "ret" is > not "-ENOSPC" but > other possible error code?1. ocfs2_unlock_pages() will be called in ocfs2_write_end_nolock(), in this case, we definitely want to return a locked mmaped page to VM code (do_page_mkwrite) when VM_FAULT_LOCKED is set. 2. But there's indeed a potential existing deadlock situation: ocfs2_grab_pages_for_write() ==> return -ENOMEM and with the mmaped page locked ocfs2_free_write_ctxt() ==> leave the mmapped page locked ocfs2_write_begin_nolock() ==> return -ENOMEM __ocfs2_page_mkwrite() ==> return VM_FAULT_OMM __do_page_mkwrite() ==> deadlock here (https://github.com/torvalds/linux/blob/master/mm/memory.c#L2054) This is another corner case, right? Anyway, I think this patch is good for the -ENOSPC case. And another patch should be proposed for -ENOMEM case? Thanks, Eric> > Thanks, > Eric > >> >> Thanks, >> Joseph >> >>> Fix this issue by unlocking the target page after we fail to allocate >>> enough space at the first time. >>> >>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. >>> >>> Signed-off-by: Eric Ren <zren at suse.com> >>> --- >>> fs/ocfs2/aops.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >>> index 98d3654..78d1d67 100644 >>> --- a/fs/ocfs2/aops.c >>> +++ b/fs/ocfs2/aops.c >>> @@ -1860,6 +1860,13 @@ out: >>> */ >>> try_free = 0; >>> + /* >>> + * Unlock mmap_page because the page has been locked when we >>> + * are here. >>> + */ >>> + if (mmap_page) >>> + unlock_page(mmap_page); >>> + >>> ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need); >>> if (ret1 == 1) >>> goto try_again; >>> >> >> > >
Eric Ren
2016-Sep-14 08:04 UTC
[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
Hi Joseph,>>> In ocfs2_write_begin_nolock(), we first grab the pages and then >>> allocate disk space for this write; ocfs2_try_to_free_truncate_log() >>> will be called if ENOSPC is turned; if we're lucky to get enough clusters, >>> which is usually the case, we start over again. But in ocfs2_free_write_ctxt() >>> the target page isn't unlocked, so we will deadlock when trying to grab >>> the target page again. >> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and >> w_target_locked is set to true, and then will be unlocked by >> ocfs2_unlock_pages in ocfs2_free_write_ctxt. >> So I'm not getting the case "page isn't unlock". Could you please explain >> it in more detail? > Thanks for review;-) Follow up the calling chain: > > ocfs2_free_write_ctxt() > ->ocfs2_unlock_pages() > > in ocfs2_unlock_pages > (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we > can see the code just put_page(target_page), but not unlock it.Did this answer your question? Thanks, Eric> > Yeah, I will think this a bit more like: > why not unlock the target_page there? Is there other potential problems if the "ret" is > not "-ENOSPC" but > other possible error code? > > Thanks, > Eric > >> >> Thanks, >> Joseph >> >>> Fix this issue by unlocking the target page after we fail to allocate >>> enough space at the first time. >>> >>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. >>> >>> Signed-off-by: Eric Ren <zren at suse.com> >>> --- >>> fs/ocfs2/aops.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >>> index 98d3654..78d1d67 100644 >>> --- a/fs/ocfs2/aops.c >>> +++ b/fs/ocfs2/aops.c >>> @@ -1860,6 +1860,13 @@ out: >>> */ >>> try_free = 0; >>> + /* >>> + * Unlock mmap_page because the page has been locked when we >>> + * are here. >>> + */ >>> + if (mmap_page) >>> + unlock_page(mmap_page); >>> + >>> ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need); >>> if (ret1 == 1) >>> goto try_again; >>> >> >> > > > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20160914/f04177a5/attachment.html