thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin

If this information is useful, please help other people find it:
Share via:

Eric Ren

2016-Sep-12 02:04 UTC

[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

Hi Joseph,

On 09/12/2016 09:37 AM, Joseph Qi wrote:> Hi Eric,
>
> On 2016/9/10 17:55, Eric Ren wrote:
>> The testcase "mmaptruncate" of ocfs2-test deadlocked
occasionally.
>>
>> In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
>> there are 2 process repeatedly performing the following operations
>> respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1,
>> 'a', 1), while the another is playing ftruncate(fd,
2*CLUSTER_SIZE)
>> and then ftruncate(fd, CLUSTER_SIZE) again and again.
>>
>> This is the backtrace when the deadlock happens:
>> [<ffffffff817054f0>] __wait_on_bit_lock+0x50/0xa0
>> [<ffffffff81199bd7>] __lock_page+0xb7/0xc0
>> [<ffffffff810c4de0>] ? autoremove_wake_function+0x40/0x40
>> [<ffffffffa0440f4f>] ocfs2_write_begin_nolock+0x163f/0x1790
[ocfs2]
>> [<ffffffffa0462a50>] ? ocfs2_allocate_extend_trans+0x180/0x180
[ocfs2]
>> [<ffffffffa0467b47>] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
>> [<ffffffff811cf286>] do_page_mkwrite+0x66/0xc0
>> [<ffffffff811d3635>] handle_mm_fault+0x685/0x1350
>> [<ffffffff81039dc0>] ? __fpu__restore_sig+0x70/0x530
>> [<ffffffff810694c8>] __do_page_fault+0x1d8/0x4d0
>> [<ffffffff81069827>] trace_do_page_fault+0x37/0xf0
>> [<ffffffff81061e69>] do_async_page_fault+0x19/0x70
>> [<ffffffff8170ac98>] async_page_fault+0x28/0x30
>>
>> In ocfs2_write_begin_nolock(), we first grab the pages and then
>> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
>> will be called if ENOSPC is turned; if we're lucky to get enough
clusters,
>> which is usually the case, we start over again. But in
ocfs2_free_write_ctxt()
>> the target page isn't unlocked, so we will deadlock when trying to
grab
>> the target page again.
> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
> w_target_locked is set to true, and then will be unlocked by
> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
> So I'm not getting the case "page isn't unlock". Could
you please explain
> it in more detail?Thanks for review;-) Follow up the calling chain:

ocfs2_free_write_ctxt()
  ->ocfs2_unlock_pages()

in ocfs2_unlock_pages
(https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
can see the code just put_page(target_page), but not unlock it.

Yeah, I will think this a bit more like:
why not unlock the target_page there? Is there other potential problems if the
"ret" is not
"-ENOSPC" but
other possible error code?

Thanks,
Eric
>
> Thanks,
> Joseph
>
>> Fix this issue by unlocking the target page after we fail to allocate
>> enough space at the first time.
>>
>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for
root cause.
>>
>> Signed-off-by: Eric Ren <zren at suse.com>
>> ---
>>   fs/ocfs2/aops.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index 98d3654..78d1d67 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -1860,6 +1860,13 @@ out:
>>   		 */
>>   		try_free = 0;
>>   
>> +		/*
>> +		 * Unlock mmap_page because the page has been locked when we
>> +		 * are here.
>> +		 */
>> +		if (mmap_page)
>> +			unlock_page(mmap_page);
>> +
>>   		ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>   		if (ret1 == 1)
>>   			goto try_again;
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20160912/73d4d6f7/attachment.html

Eric Ren

2016-Sep-12 03:06 UTC

head link

[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

Hi,>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>> w_target_locked is set to true, and then will be unlocked by
>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>> So I'm not getting the case "page isn't unlock".
Could you please explain
>> it in more detail?
> Thanks for review;-) Follow up the calling chain:
>
> ocfs2_free_write_ctxt()
>  ->ocfs2_unlock_pages()
>
> in ocfs2_unlock_pages 
> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
> can see the code just put_page(target_page), but not unlock it.
>
> Yeah, I will think this a bit more like:
> why not unlock the target_page there? Is there other potential problems if
the "ret" is
> not "-ENOSPC" but
> other possible error code?1. ocfs2_unlock_pages() will be called in ocfs2_write_end_nolock(), in this
case, we
definitely want to return a locked mmaped page
to VM code (do_page_mkwrite) when VM_FAULT_LOCKED is set.

2. But there's indeed a potential existing deadlock situation:
                                ocfs2_grab_pages_for_write()  ==> return
-ENOMEM and with
the mmaped page locked
                                ocfs2_free_write_ctxt() ==> leave the mmapped
page locked
                          ocfs2_write_begin_nolock() ==> return -ENOMEM
                    __ocfs2_page_mkwrite() ==> return VM_FAULT_OMM
              __do_page_mkwrite() ==> deadlock here 
(https://github.com/torvalds/linux/blob/master/mm/memory.c#L2054)
This is another corner case, right?

Anyway, I think this patch is good for the -ENOSPC case. And another patch
should be
proposed for -ENOMEM case?

Thanks,
Eric
>
> Thanks,
> Eric
>
>>
>> Thanks,
>> Joseph
>>
>>> Fix this issue by unlocking the target page after we fail to
allocate
>>> enough space at the first time.
>>>
>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for
root cause.
>>>
>>> Signed-off-by: Eric Ren <zren at suse.com>
>>> ---
>>>   fs/ocfs2/aops.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 98d3654..78d1d67 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -1860,6 +1860,13 @@ out:
>>>            */
>>>           try_free = 0;
>>>   +        /*
>>> +         * Unlock mmap_page because the page has been locked when
we
>>> +         * are here.
>>> +         */
>>> +        if (mmap_page)
>>> +            unlock_page(mmap_page);
>>> +
>>>           ret1 = ocfs2_try_to_free_truncate_log(osb,
clusters_need);
>>>           if (ret1 == 1)
>>>               goto try_again;
>>>
>>
>>
>
>

Eric Ren

2016-Sep-14 08:04 UTC

head link

[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

Hi Joseph,>>> In ocfs2_write_begin_nolock(), we first grab the pages and then
>>> allocate disk space for this write;
ocfs2_try_to_free_truncate_log()
>>> will be called if ENOSPC is turned; if we're lucky to get
enough clusters,
>>> which is usually the case, we start over again. But in
ocfs2_free_write_ctxt()
>>> the target page isn't unlocked, so we will deadlock when trying
to grab
>>> the target page again.
>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>> w_target_locked is set to true, and then will be unlocked by
>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>> So I'm not getting the case "page isn't unlock".
Could you please explain
>> it in more detail?
> Thanks for review;-) Follow up the calling chain:
>
> ocfs2_free_write_ctxt()
>  ->ocfs2_unlock_pages()
>
> in ocfs2_unlock_pages 
> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
> can see the code just put_page(target_page), but not unlock it.Did this answer your question?

Thanks,
Eric>
> Yeah, I will think this a bit more like:
> why not unlock the target_page there? Is there other potential problems if
the "ret" is
> not "-ENOSPC" but
> other possible error code?
>
> Thanks,
> Eric
>
>>
>> Thanks,
>> Joseph
>>
>>> Fix this issue by unlocking the target page after we fail to
allocate
>>> enough space at the first time.
>>>
>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for
root cause.
>>>
>>> Signed-off-by: Eric Ren <zren at suse.com>
>>> ---
>>>   fs/ocfs2/aops.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 98d3654..78d1d67 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -1860,6 +1860,13 @@ out:
>>>            */
>>>           try_free = 0;
>>>   +        /*
>>> +         * Unlock mmap_page because the page has been locked when
we
>>> +         * are here.
>>> +         */
>>> +        if (mmap_page)
>>> +            unlock_page(mmap_page);
>>> +
>>>           ret1 = ocfs2_try_to_free_truncate_log(osb,
clusters_need);
>>>           if (ret1 == 1)
>>>               goto try_again;
>>>
>>
>>
>
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20160914/f04177a5/attachment.html

Ocfs2 devel - Sep 2016 - [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()