thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag

If this information is useful, please help other people find it:
Share via:

Changwei Ge

2018-Aug-30 02:05 UTC

[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent

Hi Larry,

Sorry for this late reply.


On 2018/8/28 12:46, Larry Chen wrote:> Hi Changwei,
>
> I think there might be some logic error.
>
> Imagine that we remove __ocfs2_flush_truncate_log() in
> ocfs2_defrag_extent().
>
> In ocfs2_reserve_clusters might skip freeing truncate log if there is 
> enough clusters in local. Then __ocfs2_move_extent might return error 
> because ocfs2_truncate_log_append might fail.I think changing ocfs2 infrastructure methods like your way is not a 
good idea.
So how about this way?
->lock truncate log inode
->__ocfs2_flush_truncate_log()
->ocfs2_lock_allocators_move_extents()
->unlock truncate log inode

Thanks
Changwei
>
> Thanks,
> Larry
>
>
> On 08/28/2018 09:59 AM, Changwei Ge wrote:
>> Hi Larry,
>>
>> I'd like to propose another way to fix this issue, which might be
>> easier. :-)
>> Actually, ocfs2_reserve_clusters() inside will flush truncate log if
>> there is no enough clusters left.
>> So how about just remove __ocfs2_flush_truncate_log() in
>> ocfs2_defrag_extent()?
>>
>> Thanks,
>> Changwei
>>
>> On 2018/8/27 16:01, Larry Chen wrote:
>>> ocfs2_defrag_extent may fall into dead lock.
>>>
>>> ocfs2_ioctl_move_extents
>>> ??? ocfs2_ioctl_move_extents
>>> ????? ocfs2_move_extents
>>> ??????? ocfs2_defrag_extent
>>> ????????? ocfs2_lock_allocators_move_extents
>>>
>>> ??????????? ocfs2_reserve_clusters
>>> ????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>
>>> ????? __ocfs2_flush_truncate_log
>>> ????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>
>>> As back trace shows above, ocfs2_reserve_clusters will call
inode_lock
>>> against the global bitmap if local allocator has not sufficient 
>>> cluters.
>>> Once global bitmap could meet the demand, ocfs2_reserve_cluster
will
>>> return success with global bitmap locked.
>>>
>>> After ocfs2_reserve_cluster, if truncate log is full,
>>> __ocfs2_flush_truncate_log will definitely fall into dead lock
>>> because it needs to inode_lock global bitmap, which has
>>> already been locked.
>>>
>>> To fix this bug, we could remove from 
>>> ocfs2_lock_allocators_move_extents
>>> the code which intends to lock global allocator, and put the
removed
>>> code after __ocfs2_flush_truncate_log
>>>
>>> The ocfs2_lock_allocators_move_extents has been refered by 2
places,
>>> one
>>> is here, the other does not need the data allocator context, which 
>>> means
>>> this patch does not affect the caller so far.
>>>
>>>
>>>
>>> Signed-off-by: Larry Chen <lchen at suse.com>
>>> ---
>>> ?? fs/ocfs2/move_extents.c | 13 ++++++-------
>>> ?? 1 file changed, 6 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
>>> index 7eb3b0a6347e..064dedf40d74 100644
>>> --- a/fs/ocfs2/move_extents.c
>>> +++ b/fs/ocfs2/move_extents.c
>>> @@ -192,13 +192,6 @@ static int 
>>> ocfs2_lock_allocators_move_extents(struct inode *inode,
>>> ?????????? goto out;
>>> ?????? }
>>> ?? -??? if (data_ac) {
>>> -??????? ret = ocfs2_reserve_clusters(osb, clusters_to_move,
data_ac);
>>> -??????? if (ret) {
>>> -??????????? mlog_errno(ret);
>>> -??????????? goto out;
>>> -??????? }
>>> -??? }
>>> ?? ?????? *credits += ocfs2_calc_extend_credits(osb->sb, 
>>> et->et_root_el);
>>> ?? @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct 
>>> ocfs2_move_extents_context *context,
>>> ?????????? }
>>> ?????? }
>>> ?? +??? ret = ocfs2_reserve_clusters(osb, *len,
&context->data_ac);
>>> +??? if (ret) {
>>> +??????? mlog_errno(ret);
>>> +??????? goto out_unlock_mutex;
>>> +??? }
>>> +
>>> ?????? handle = ocfs2_start_trans(osb, credits);
>>> ?????? if (IS_ERR(handle)) {
>>> ?????????? ret = PTR_ERR(handle);
>>
>

Larry Chen

2018-Aug-30 06:26 UTC

head link

[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent

Hi Changwei,

Maybe we need more investigation.

The following is your fix:
lock truncate log inode
     __ocfs2_flush_truncate_log()
	ocfs2_lock_allocators_move_extents()
		unlock truncate log inode

The lock action will happen like following:
lock(truncate_inode)
	lock(sub allocat)
	lock(local_alloc) or lock(global_bitmap)
	
I'm not sure if there is another code path that tries to get the same 
locks but in different order, which may causes dead locks.

Indeed ocfs2 involves too many locks, I would like to reduce the 
deadlock risk at max extent.

Another way is to add an new argument for __ocfs2_flush_truncate which 
indicates whether global bitmap is needed to be locked.


Thanks,
Larry





On 08/30/2018 10:05 AM, Changwei Ge wrote:> Hi Larry,
> 
> Sorry for this late reply.
> 
> 
> On 2018/8/28 12:46, Larry Chen wrote:
>> Hi Changwei,
>>
>> I think there might be some logic error.
>>
>> Imagine that we remove __ocfs2_flush_truncate_log() in
>> ocfs2_defrag_extent().
>>
>> In ocfs2_reserve_clusters might skip freeing truncate log if there is
>> enough clusters in local. Then __ocfs2_move_extent might return error
>> because ocfs2_truncate_log_append might fail.
> I think changing ocfs2 infrastructure methods like your way is not a
> good idea.
> So how about this way?
> ->lock truncate log inode
> ->__ocfs2_flush_truncate_log()
> ->ocfs2_lock_allocators_move_extents()
> ->unlock truncate log inode
> 
> Thanks
> Changwei
> 
>>
>> Thanks,
>> Larry
>>
>>
>> On 08/28/2018 09:59 AM, Changwei Ge wrote:
>>> Hi Larry,
>>>
>>> I'd like to propose another way to fix this issue, which might
be
>>> easier. :-)
>>> Actually, ocfs2_reserve_clusters() inside will flush truncate log
if
>>> there is no enough clusters left.
>>> So how about just remove __ocfs2_flush_truncate_log() in
>>> ocfs2_defrag_extent()?
>>>
>>> Thanks,
>>> Changwei
>>>
>>> On 2018/8/27 16:01, Larry Chen wrote:
>>>> ocfs2_defrag_extent may fall into dead lock.
>>>>
>>>> ocfs2_ioctl_move_extents
>>>>  ??? ocfs2_ioctl_move_extents
>>>>  ????? ocfs2_move_extents
>>>>  ??????? ocfs2_defrag_extent
>>>>  ????????? ocfs2_lock_allocators_move_extents
>>>>
>>>>  ??????????? ocfs2_reserve_clusters
>>>>  ????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>>
>>>>  ????? __ocfs2_flush_truncate_log
>>>>  ????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>>
>>>> As back trace shows above, ocfs2_reserve_clusters will call
inode_lock
>>>> against the global bitmap if local allocator has not sufficient
>>>> cluters.
>>>> Once global bitmap could meet the demand, ocfs2_reserve_cluster
will
>>>> return success with global bitmap locked.
>>>>
>>>> After ocfs2_reserve_cluster, if truncate log is full,
>>>> __ocfs2_flush_truncate_log will definitely fall into dead lock
>>>> because it needs to inode_lock global bitmap, which has
>>>> already been locked.
>>>>
>>>> To fix this bug, we could remove from
>>>> ocfs2_lock_allocators_move_extents
>>>> the code which intends to lock global allocator, and put the
removed
>>>> code after __ocfs2_flush_truncate_log
>>>>
>>>> The ocfs2_lock_allocators_move_extents has been refered by 2
places,
>>>> one
>>>> is here, the other does not need the data allocator context,
which
>>>> means
>>>> this patch does not affect the caller so far.
>>>>
>>>>
>>>>
>>>> Signed-off-by: Larry Chen <lchen at suse.com>
>>>> ---
>>>>  ?? fs/ocfs2/move_extents.c | 13 ++++++-------
>>>>  ?? 1 file changed, 6 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
>>>> index 7eb3b0a6347e..064dedf40d74 100644
>>>> --- a/fs/ocfs2/move_extents.c
>>>> +++ b/fs/ocfs2/move_extents.c
>>>> @@ -192,13 +192,6 @@ static int
>>>> ocfs2_lock_allocators_move_extents(struct inode *inode,
>>>>  ?????????? goto out;
>>>>  ?????? }
>>>>  ?? -??? if (data_ac) {
>>>> -??????? ret = ocfs2_reserve_clusters(osb, clusters_to_move,
data_ac);
>>>> -??????? if (ret) {
>>>> -??????????? mlog_errno(ret);
>>>> -??????????? goto out;
>>>> -??????? }
>>>> -??? }
>>>>  ?? ?????? *credits += ocfs2_calc_extend_credits(osb->sb,
>>>> et->et_root_el);
>>>>  ?? @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct
>>>> ocfs2_move_extents_context *context,
>>>>  ?????????? }
>>>>  ?????? }
>>>>  ?? +??? ret = ocfs2_reserve_clusters(osb, *len,
&context->data_ac);
>>>> +??? if (ret) {
>>>> +??????? mlog_errno(ret);
>>>> +??????? goto out_unlock_mutex;
>>>> +??? }
>>>> +
>>>>  ?????? handle = ocfs2_start_trans(osb, credits);
>>>>  ?????? if (IS_ERR(handle)) {
>>>>  ?????????? ret = PTR_ERR(handle);
>>>
>>
>

Ocfs2 devel - Aug 2018 - [PATCH] fix dead lock caused by ocfs2_defrag_extent

[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent

[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent