Changwei Ge
2018-Aug-30 07:52 UTC
[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent
Hi Larry, On 2018/8/30 14:26, Larry Chen wrote:> Hi Changwei, > > Maybe we need more investigation. > > The following is your fix: > lock truncate log inode > ??? __ocfs2_flush_truncate_log() > ????ocfs2_lock_allocators_move_extents() > ??????? unlock truncate log inode > > The lock action will happen like following: > lock(truncate_inode) > ????lock(sub allocat) > ????lock(local_alloc) or lock(global_bitmap)I don't think we have to worry much about mixed order? of cluster lock and inode mutex since cluster lock granted node will directly succeed instead of waiting for itself.> > I'm not sure if there is another code path that tries to get the same > locks but in different order, which may causes dead locks. > > Indeed ocfs2 involves too many locks, I would like to reduce the > deadlock risk at max extent. > > Another way is to add an new argument for __ocfs2_flush_truncate which > indicates whether global bitmap is needed to be locked.Sounds a feasible way :) Thanks, Changwei> > > Thanks, > Larry > > > > > > On 08/30/2018 10:05 AM, Changwei Ge wrote: >> Hi Larry, >> >> Sorry for this late reply. >> >> >> On 2018/8/28 12:46, Larry Chen wrote: >>> Hi Changwei, >>> >>> I think there might be some logic error. >>> >>> Imagine that we remove __ocfs2_flush_truncate_log() in >>> ocfs2_defrag_extent(). >>> >>> In ocfs2_reserve_clusters might skip freeing truncate log if there is >>> enough clusters in local. Then __ocfs2_move_extent might return error >>> because ocfs2_truncate_log_append might fail. >> I think changing ocfs2 infrastructure methods like your way is not a >> good idea. >> So how about this way? >> ->lock truncate log inode >> ->__ocfs2_flush_truncate_log() >> ->ocfs2_lock_allocators_move_extents() >> ->unlock truncate log inode >> >> Thanks >> Changwei >> >>> >>> Thanks, >>> Larry >>> >>> >>> On 08/28/2018 09:59 AM, Changwei Ge wrote: >>>> Hi Larry, >>>> >>>> I'd like to propose another way to fix this issue, which might be >>>> easier. :-) >>>> Actually, ocfs2_reserve_clusters() inside will flush truncate log if >>>> there is no enough clusters left. >>>> So how about just remove __ocfs2_flush_truncate_log() in >>>> ocfs2_defrag_extent()? >>>> >>>> Thanks, >>>> Changwei >>>> >>>> On 2018/8/27 16:01, Larry Chen wrote: >>>>> ocfs2_defrag_extent may fall into dead lock. >>>>> >>>>> ocfs2_ioctl_move_extents >>>>> ???? ocfs2_ioctl_move_extents >>>>> ?????? ocfs2_move_extents >>>>> ???????? ocfs2_defrag_extent >>>>> ?????????? ocfs2_lock_allocators_move_extents >>>>> >>>>> ???????????? ocfs2_reserve_clusters >>>>> ?????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE >>>>> >>>>> ?????? __ocfs2_flush_truncate_log >>>>> ?????????????? inode_lock GLOBAL_BITMAP_SYSTEM_INODE >>>>> >>>>> As back trace shows above, ocfs2_reserve_clusters will call >>>>> inode_lock >>>>> against the global bitmap if local allocator has not sufficient >>>>> cluters. >>>>> Once global bitmap could meet the demand, ocfs2_reserve_cluster will >>>>> return success with global bitmap locked. >>>>> >>>>> After ocfs2_reserve_cluster, if truncate log is full, >>>>> __ocfs2_flush_truncate_log will definitely fall into dead lock >>>>> because it needs to inode_lock global bitmap, which has >>>>> already been locked. >>>>> >>>>> To fix this bug, we could remove from >>>>> ocfs2_lock_allocators_move_extents >>>>> the code which intends to lock global allocator, and put the removed >>>>> code after __ocfs2_flush_truncate_log >>>>> >>>>> The ocfs2_lock_allocators_move_extents has been refered by 2 places, >>>>> one >>>>> is here, the other does not need the data allocator context, which >>>>> means >>>>> this patch does not affect the caller so far. >>>>> >>>>> >>>>> >>>>> Signed-off-by: Larry Chen <lchen at suse.com> >>>>> --- >>>>> ??? fs/ocfs2/move_extents.c | 13 ++++++------- >>>>> ??? 1 file changed, 6 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c >>>>> index 7eb3b0a6347e..064dedf40d74 100644 >>>>> --- a/fs/ocfs2/move_extents.c >>>>> +++ b/fs/ocfs2/move_extents.c >>>>> @@ -192,13 +192,6 @@ static int >>>>> ocfs2_lock_allocators_move_extents(struct inode *inode, >>>>> ??????????? goto out; >>>>> ??????? } >>>>> ??? -??? if (data_ac) { >>>>> -??????? ret = ocfs2_reserve_clusters(osb, clusters_to_move, >>>>> data_ac); >>>>> -??????? if (ret) { >>>>> -??????????? mlog_errno(ret); >>>>> -??????????? goto out; >>>>> -??????? } >>>>> -??? } >>>>> ??? ?????? *credits += ocfs2_calc_extend_credits(osb->sb, >>>>> et->et_root_el); >>>>> ??? @@ -283,6 +276,12 @@ static int ocfs2_defrag_extent(struct >>>>> ocfs2_move_extents_context *context, >>>>> ??????????? } >>>>> ??????? } >>>>> ??? +??? ret = ocfs2_reserve_clusters(osb, *len, &context->data_ac); >>>>> +??? if (ret) { >>>>> +??????? mlog_errno(ret); >>>>> +??????? goto out_unlock_mutex; >>>>> +??? } >>>>> + >>>>> ??????? handle = ocfs2_start_trans(osb, credits); >>>>> ??????? if (IS_ERR(handle)) { >>>>> ??????????? ret = PTR_ERR(handle); >>>> >>> >> >
Larry Chen
2018-Aug-30 08:24 UTC
[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent
Hi Changwei, On 08/30/2018 03:52 PM, Changwei Ge wrote:> Hi Larry, > > > On 2018/8/30 14:26, Larry Chen wrote: >> Hi Changwei, >> >> Maybe we need more investigation. >> >> The following is your fix: >> lock truncate log inode >> ??? __ocfs2_flush_truncate_log() >> ????ocfs2_lock_allocators_move_extents() >> ??????? unlock truncate log inode >> >> The lock action will happen like following: >> lock(truncate_inode) >> ????lock(sub allocat) >> ????lock(local_alloc) or lock(global_bitmap) > > I don't think we have to worry much about mixed order? of cluster lock > and inode mutex since cluster lock granted node will directly succeed > instead of waiting for itself. > >> >> I'm not sure if there is another code path that tries to get the same >> locks but in different order, which may causes dead locks.Yeah, I use lock to mean both inode_lock and ocfs2_inode_lock. As too many types of lock and inode locks are involved, I can not guarantee that there is no logic error.>> >> Indeed ocfs2 involves too many locks, I would like to reduce the >> deadlock risk at max extent. >> >> Another way is to add an new argument for __ocfs2_flush_truncate which >> indicates whether global bitmap is needed to be locked. > > Sounds a feasible way :)Haha, I also prefer this way :) I'll send another patch and run test cases on my environment. Thanks, Larry
Larry Chen
2018-Sep-02 09:09 UTC
[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent
Hi Changwei,>>> >>> Another way is to add an new argument for __ocfs2_flush_truncate which >>> indicates whether global bitmap is needed to be locked. >> >> Sounds a feasible way :) >I tried this way, but it still causes dead lock. Here is the back trace. The root cause is that ocfs2 sometimes relies on work queue to flush truncate log. The worker who process the work queue is ocfs2_truncate_log_worker, which directly call ocfs2_flush_truncate_log. And its lock order is truncate_log and then global bitmap, which conflicts with ocfs2_defrag_extent. #0 [ffffb6c000a03c50] __schedule at ffffffff966c47bf #1 [ffffb6c000a03ce0] schedule at ffffffff966c4e08 #2 [ffffb6c000a03ce8] rwsem_down_write_failed at ffffffff966c82b3 #3 [ffffb6c000a03d80] schedule at ffffffff966c4e08 #4 [ffffb6c000a03d88] call_rwsem_down_write_failed at ffffffff963a6e03 #5 [ffffb6c000a03dc8] down_write at ffffffff966c7410 try global bit map #6 [ffffb6c000a03dd0] __ocfs2_flush_truncate_log at ffffffffc0749145 [ocfs2] got truncate log #7 [ffffb6c000a03e68] ocfs2_flush_truncate_log at ffffffffc0749afd [ocfs2] #8 [ffffb6c000a03e80] ocfs2_truncate_log_worker at ffffffffc0749b29 [ocfs2] #9 [ffffb6c000a03e98] process_one_work at ffffffff9609e64a #10 [ffffb6c000a03ed8] worker_thread at ffffffff9609e88b #11 [ffffb6c000a03f10] kthread at ffffffff960a470a #12 [ffffb6c000a03f50] ret_from_fork at ffffffff96800235 [<ffffffff963a6e03>] call_rwsem_down_write_failed+0x13/0x20 try truncate log got global bm [<ffffffffc07906f3>] ocfs2_defrag_extent+0x1d3/0x680 [ocfs2] [<ffffffffc07919f9>] ocfs2_move_extents+0x329/0x740 [ocfs2] [<ffffffffc0791f43>] ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] [<ffffffffc0773a83>] ocfs2_ioctl+0x253/0x640 [ocfs2] [<ffffffff96256480>] do_vfs_ioctl+0x90/0x5f0 [<ffffffff96256a54>] SyS_ioctl+0x74/0x80 [<ffffffff96003ae4>] do_syscall_64+0x74/0x140 [<ffffffff9680009a>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [<ffffffffffffffff>] 0xffffffffffffffff