Larry Chen
2018-Aug-30 06:26 UTC
[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent
Hi Changwei,
Maybe we need more investigation.
The following is your fix:
lock truncate log inode
__ocfs2_flush_truncate_log()
ocfs2_lock_allocators_move_extents()
unlock truncate log inode
The lock action will happen like following:
lock(truncate_inode)
lock(sub allocat)
lock(local_alloc) or lock(global_bitmap)
I'm not sure if there is another code path that tries to get the same
locks but in different order, which may causes dead locks.
Indeed ocfs2 involves too many locks, I would like to reduce the
deadlock risk at max extent.
Another way is to add an new argument for __ocfs2_flush_truncate which
indicates whether global bitmap is needed to be locked.
Thanks,
Larry
On 08/30/2018 10:05 AM, Changwei Ge wrote:> Hi Larry,
>
> Sorry for this late reply.
>
>
> On 2018/8/28 12:46, Larry Chen wrote:
>> Hi Changwei,
>>
>> I think there might be some logic error.
>>
>> Imagine that we remove __ocfs2_flush_truncate_log() in
>> ocfs2_defrag_extent().
>>
>> In ocfs2_reserve_clusters might skip freeing truncate log if there is
>> enough clusters in local. Then __ocfs2_move_extent might return error
>> because ocfs2_truncate_log_append might fail.
> I think changing ocfs2 infrastructure methods like your way is not a
> good idea.
> So how about this way?
> ->lock truncate log inode
> ->__ocfs2_flush_truncate_log()
> ->ocfs2_lock_allocators_move_extents()
> ->unlock truncate log inode
>
> Thanks
> Changwei
>
>>
>> Thanks,
>> Larry
>>
>>
>> On 08/28/2018 09:59 AM, Changwei Ge wrote:
>>> Hi Larry,
>>>
>>> I'd like to propose another way to fix this issue, which might
be
>>> easier. :-)
>>> Actually, ocfs2_reserve_clusters() inside will flush truncate log
if
>>> there is no enough clusters left.
>>> So how about just remove __ocfs2_flush_truncate_log() in
>>> ocfs2_defrag_extent()?
>>>
>>> Thanks,
>>> Changwei
>>>
>>> On 2018/8/27 16:01, Larry Chen wrote:
>>>> ocfs2_defrag_extent may fall into dead lock.
>>>>
>>>> ocfs2_ioctl_move_extents
>>>> ??? ocfs2_ioctl_move_extents
>>>> ????? ocfs2_move_extents
>>>> ??????? ocfs2_defrag_extent
>>>> ????????? ocfs2_lock_allocators_move_extents
>>>>
>>>> ??????????? ocfs2_reserve_clusters
>>>> ????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>>
>>>> ????? __ocfs2_flush_truncate_log
>>>> ????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE
>>>>
>>>> As back trace shows above, ocfs2_reserve_clusters will call
inode_lock
>>>> against the global bitmap if local allocator has not sufficient
>>>> cluters.
>>>> Once global bitmap could meet the demand, ocfs2_reserve_cluster
will
>>>> return success with global bitmap locked.
>>>>
>>>> After ocfs2_reserve_cluster, if truncate log is full,
>>>> __ocfs2_flush_truncate_log will definitely fall into dead lock
>>>> because it needs to inode_lock global bitmap, which has
>>>> already been locked.
>>>>
>>>> To fix this bug, we could remove from
>>>> ocfs2_lock_allocators_move_extents
>>>> the code which intends to lock global allocator, and put the
removed
>>>> code after __ocfs2_flush_truncate_log
>>>>
>>>> The ocfs2_lock_allocators_move_extents has been refered by 2
places,
>>>> one
>>>> is here, the other does not need the data allocator context,
which
>>>> means
>>>> this patch does not affect the caller so far.
>>>>
>>>>
>>>>
>>>> Signed-off-by: Larry Chen <lchen at suse.com>
>>>> ---
>>>> ?? fs/ocfs2/move_extents.c | 13 ++++++-------
>>>> ?? 1 file changed, 6 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
>>>> index 7eb3b0a6347e..064dedf40d74 100644
>>>> --- a/fs/ocfs2/move_extents.c
>>>> +++ b/fs/ocfs2/move_extents.c
>>>> @@ -192,13 +192,6 @@ static int
>>>> ocfs2_lock_allocators_move_extents(struct inode *inode,
>>>> ?????????? goto out;
>>>> ?????? }
>>>> ?? -??? if (data_ac) {
>>>> -??????? ret = ocfs2_reserve_clusters(osb, clusters_to_move,
data_ac);
>>>> -??????? if (ret) {
>>>> -??????????? mlog_errno(ret);
>>>> -??????????? goto out;
>>>> -??????? }
>>>> -??? }
>>>> ?? ?????? *credits += ocfs2_calc_extend_credits(osb->sb,
>>>> et->et_root_el);
>>>> ?? @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct
>>>> ocfs2_move_extents_context *context,
>>>> ?????????? }
>>>> ?????? }
>>>> ?? +??? ret = ocfs2_reserve_clusters(osb, *len,
&context->data_ac);
>>>> +??? if (ret) {
>>>> +??????? mlog_errno(ret);
>>>> +??????? goto out_unlock_mutex;
>>>> +??? }
>>>> +
>>>> ?????? handle = ocfs2_start_trans(osb, credits);
>>>> ?????? if (IS_ERR(handle)) {
>>>> ?????????? ret = PTR_ERR(handle);
>>>
>>
>
Changwei Ge
2018-Aug-30 07:52 UTC
[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent
Hi Larry, On 2018/8/30 14:26, Larry Chen wrote:> Hi Changwei, > > Maybe we need more investigation. > > The following is your fix: > lock truncate log inode > ??? __ocfs2_flush_truncate_log() > ????ocfs2_lock_allocators_move_extents() > ??????? unlock truncate log inode > > The lock action will happen like following: > lock(truncate_inode) > ????lock(sub allocat) > ????lock(local_alloc) or lock(global_bitmap)I don't think we have to worry much about mixed order? of cluster lock and inode mutex since cluster lock granted node will directly succeed instead of waiting for itself.> > I'm not sure if there is another code path that tries to get the same > locks but in different order, which may causes dead locks. > > Indeed ocfs2 involves too many locks, I would like to reduce the > deadlock risk at max extent. > > Another way is to add an new argument for __ocfs2_flush_truncate which > indicates whether global bitmap is needed to be locked.Sounds a feasible way :) Thanks, Changwei> > > Thanks, > Larry > > > > > > On 08/30/2018 10:05 AM, Changwei Ge wrote: >> Hi Larry, >> >> Sorry for this late reply. >> >> >> On 2018/8/28 12:46, Larry Chen wrote: >>> Hi Changwei, >>> >>> I think there might be some logic error. >>> >>> Imagine that we remove __ocfs2_flush_truncate_log() in >>> ocfs2_defrag_extent(). >>> >>> In ocfs2_reserve_clusters might skip freeing truncate log if there is >>> enough clusters in local. Then __ocfs2_move_extent might return error >>> because ocfs2_truncate_log_append might fail. >> I think changing ocfs2 infrastructure methods like your way is not a >> good idea. >> So how about this way? >> ->lock truncate log inode >> ->__ocfs2_flush_truncate_log() >> ->ocfs2_lock_allocators_move_extents() >> ->unlock truncate log inode >> >> Thanks >> Changwei >> >>> >>> Thanks, >>> Larry >>> >>> >>> On 08/28/2018 09:59 AM, Changwei Ge wrote: >>>> Hi Larry, >>>> >>>> I'd like to propose another way to fix this issue, which might be >>>> easier. :-) >>>> Actually, ocfs2_reserve_clusters() inside will flush truncate log if >>>> there is no enough clusters left. >>>> So how about just remove __ocfs2_flush_truncate_log() in >>>> ocfs2_defrag_extent()? >>>> >>>> Thanks, >>>> Changwei >>>> >>>> On 2018/8/27 16:01, Larry Chen wrote: >>>>> ocfs2_defrag_extent may fall into dead lock. >>>>> >>>>> ocfs2_ioctl_move_extents >>>>> ???? ocfs2_ioctl_move_extents >>>>> ?????? ocfs2_move_extents >>>>> ???????? ocfs2_defrag_extent >>>>> ?????????? ocfs2_lock_allocators_move_extents >>>>> >>>>> ???????????? ocfs2_reserve_clusters >>>>> ?????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE >>>>> >>>>> ?????? __ocfs2_flush_truncate_log >>>>> ?????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE >>>>> >>>>> As back trace shows above, ocfs2_reserve_clusters will call >>>>> inode_lock >>>>> against the global bitmap if local allocator has not sufficient >>>>> cluters. >>>>> Once global bitmap could meet the demand, ocfs2_reserve_cluster will >>>>> return success with global bitmap locked. >>>>> >>>>> After ocfs2_reserve_cluster, if truncate log is full, >>>>> __ocfs2_flush_truncate_log will definitely fall into dead lock >>>>> because it needs to inode_lock global bitmap, which has >>>>> already been locked. >>>>> >>>>> To fix this bug, we could remove from >>>>> ocfs2_lock_allocators_move_extents >>>>> the code which intends to lock global allocator, and put the removed >>>>> code after __ocfs2_flush_truncate_log >>>>> >>>>> The ocfs2_lock_allocators_move_extents has been refered by 2 places, >>>>> one >>>>> is here, the other does not need the data allocator context, which >>>>> means >>>>> this patch does not affect the caller so far. >>>>> >>>>> >>>>> >>>>> Signed-off-by: Larry Chen <lchen at suse.com> >>>>> --- >>>>> ??? fs/ocfs2/move_extents.c | 13 ++++++------- >>>>> ??? 1 file changed, 6 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c >>>>> index 7eb3b0a6347e..064dedf40d74 100644 >>>>> --- a/fs/ocfs2/move_extents.c >>>>> +++ b/fs/ocfs2/move_extents.c >>>>> @@ -192,13 +192,6 @@ static int >>>>> ocfs2_lock_allocators_move_extents(struct inode *inode, >>>>> ??????????? goto out; >>>>> ??????? } >>>>> ??? -??? if (data_ac) { >>>>> -??????? ret = ocfs2_reserve_clusters(osb, clusters_to_move, >>>>> data_ac); >>>>> -??????? if (ret) { >>>>> -??????????? mlog_errno(ret); >>>>> -??????????? goto out; >>>>> -??????? } >>>>> -??? } >>>>> ??? ?????? *credits += ocfs2_calc_extend_credits(osb->sb, >>>>> et->et_root_el); >>>>> ??? @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct >>>>> ocfs2_move_extents_context *context, >>>>> ??????????? } >>>>> ??????? } >>>>> ??? +??? ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac); >>>>> +??? if (ret) { >>>>> +??????? mlog_errno(ret); >>>>> +??????? goto out_unlock_mutex; >>>>> +??? } >>>>> + >>>>> ??????? handle = ocfs2_start_trans(osb, credits); >>>>> ??????? if (IS_ERR(handle)) { >>>>> ??????????? ret = PTR_ERR(handle); >>>> >>> >> >