Changwei Ge
2018-Jan-19 01:48 UTC
[Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread
Hi Jan, On 2018/1/18 0:03, Jan Kara wrote:> On Wed 17-01-18 16:21:35, Jan Kara wrote: >> Hello, >> >> On Fri 12-01-18 16:25:56, Eric Ren wrote: >>> On 01/12/2018 11:43 AM, Shichangkuo wrote: >>>> Hi all, >>>> ??Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows: >>>> journal recovery work: >>>> [<ffffffff8a8c0694>] call_rwsem_down_read_failed+0x14/0x30 >>>> [<ffffffffc0d5d652>] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2] >>>> [<ffffffffc0d21221>] ocfs2_complete_recovery+0xc1/0x440 [ocfs2] >>>> [<ffffffff8a09a1f0>] process_one_work+0x130/0x350 >>>> [<ffffffff8a09a946>] worker_thread+0x46/0x3b0 >>>> [<ffffffff8a0a0e51>] kthread+0x101/0x140 >>>> [<ffffffff8aa002ff>] ret_from_fork+0x1f/0x30 >>>> [<ffffffffffffffff>] 0xffffffffffffffff >>>> >>>> /bin/umount: >>>> [<ffffffff8a099b24>] flush_workqueue+0x104/0x3e0 >>>> [<ffffffffc0cf18db>] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2] >>>> [<ffffffffc0d4fd6c>] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2] >>>> [<ffffffffc0d500e1>] ocfs2_put_super+0x31/0xa0 [ocfs2] >>>> [<ffffffff8a2445bd>] generic_shutdown_super+0x6d/0x120 >>>> [<ffffffff8a24469d>] kill_block_super+0x2d/0x60 >>>> [<ffffffff8a244e71>] deactivate_locked_super+0x51/0x90 >>>> [<ffffffff8a263a1b>] cleanup_mnt+0x3b/0x70 >>>> [<ffffffff8a09e9c6>] task_work_run+0x86/0xa0 >>>> [<ffffffff8a003d70>] exit_to_usermode_loop+0x6d/0xa9 >>>> [<ffffffff8a003a2d>] do_syscall_64+0x11d/0x130 >>>> [<ffffffff8aa00113>] entry_SYSCALL64_slow_path+0x25/0x25 >>>> [<ffffffffffffffff>] 0xffffffffffffffff >>>> ?? >>>> Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was already locked by umount thread, then get a deadlock. >>> >>> Good catch, thanks for reporting.? Is it reproducible? Can you please share >>> the steps for reproducing this issue? >>>> This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and 5f530de63cfc6ca8571cbdf58af63fb166cc6517. >>>> I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already removed. >>>> Shall we add a new mutex? >>> >>> @Jan, I don't look into the code yet, could you help me understand why we >>> need to get sb->s_umount in ocfs2_finish_quota_recovery? >>> Is it because that the quota recovery process will start at umounting? or >>> some where else? >> >> I was refreshing my memory wrt how ocfs2 quota recovery works. The problem >> is the following: We load information about all quota information that >> needs recovering (this is possibly for other nodes) in >> ocfs2_begin_quota_recovery() that gets called during mount. Real quota >> recovery happens from the recovery thread in ocfs2_finish_quota_recovery(). >> We need to protect code running there from dquot_disable() calls as that >> will free structures we use for updating quota information etc. Currently >> we use sb->s_umount for that protection. >> >> The problem above apparently happens when someone calls umount before the >> recovery thread can finish quota recovery. I will think more about how to >> fix the locking so that this lock inversion does not happen... > > So could we move ocfs2_recovery_exit() call in ocfs2_dismount_volume() up > before ocfs2_disable_quotas()? It seems possible to me, I'm just not sure > if there are not some hidden dependencies on recovery being shut down only > after truncate log / local alloc. If we can do that, we could remove > s_umount protection from ocfs2_finish_quota_recovery() and thus resolve the > race. > > HonzaThanks for looking into this. I am not quite familiar with quota part.:) Or can we move ocfs2_disable_quotas() in ocfs2_dismount_volume down after ocfs2_recovery_exit() with ::invoking down_read(&sb->s_umount) eliminated? Another way I can figure out is: I think we might get inspired from qsync_work_fn(). In that function if current work is under running context of umount with ::s_umount held, it just delays current work to next time. So can we also _try lock_ in ocfs2_finish_quota_recovery() and recover quota by other ocfs2 cluster member nodes or local node's next time of mount? Thanks, Changwei>
piaojun
2018-Jan-19 03:16 UTC
[Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread
Hi Jan, Eric and Changwei, Could we use another mutex lock to protect quota recovery? Sharing the lock with VFS-layer probably seems a little weird. On 2018/1/19 9:48, Changwei Ge wrote:> Hi Jan, > > On 2018/1/18 0:03, Jan Kara wrote: >> On Wed 17-01-18 16:21:35, Jan Kara wrote: >>> Hello, >>> >>> On Fri 12-01-18 16:25:56, Eric Ren wrote: >>>> On 01/12/2018 11:43 AM, Shichangkuo wrote: >>>>> Hi all, >>>>> ??Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows: >>>>> journal recovery work: >>>>> [<ffffffff8a8c0694>] call_rwsem_down_read_failed+0x14/0x30 >>>>> [<ffffffffc0d5d652>] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2] >>>>> [<ffffffffc0d21221>] ocfs2_complete_recovery+0xc1/0x440 [ocfs2] >>>>> [<ffffffff8a09a1f0>] process_one_work+0x130/0x350 >>>>> [<ffffffff8a09a946>] worker_thread+0x46/0x3b0 >>>>> [<ffffffff8a0a0e51>] kthread+0x101/0x140 >>>>> [<ffffffff8aa002ff>] ret_from_fork+0x1f/0x30 >>>>> [<ffffffffffffffff>] 0xffffffffffffffff >>>>> >>>>> /bin/umount: >>>>> [<ffffffff8a099b24>] flush_workqueue+0x104/0x3e0 >>>>> [<ffffffffc0cf18db>] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2] >>>>> [<ffffffffc0d4fd6c>] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2] >>>>> [<ffffffffc0d500e1>] ocfs2_put_super+0x31/0xa0 [ocfs2] >>>>> [<ffffffff8a2445bd>] generic_shutdown_super+0x6d/0x120 >>>>> [<ffffffff8a24469d>] kill_block_super+0x2d/0x60 >>>>> [<ffffffff8a244e71>] deactivate_locked_super+0x51/0x90 >>>>> [<ffffffff8a263a1b>] cleanup_mnt+0x3b/0x70 >>>>> [<ffffffff8a09e9c6>] task_work_run+0x86/0xa0 >>>>> [<ffffffff8a003d70>] exit_to_usermode_loop+0x6d/0xa9 >>>>> [<ffffffff8a003a2d>] do_syscall_64+0x11d/0x130 >>>>> [<ffffffff8aa00113>] entry_SYSCALL64_slow_path+0x25/0x25 >>>>> [<ffffffffffffffff>] 0xffffffffffffffff >>>>> ?? >>>>> Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was already locked by umount thread, then get a deadlock. >>>> >>>> Good catch, thanks for reporting. Is it reproducible? Can you please share >>>> the steps for reproducing this issue? >>>>> This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and 5f530de63cfc6ca8571cbdf58af63fb166cc6517. >>>>> I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already removed. >>>>> Shall we add a new mutex? >>>> >>>> @Jan, I don't look into the code yet, could you help me understand why we >>>> need to get sb->s_umount in ocfs2_finish_quota_recovery? >>>> Is it because that the quota recovery process will start at umounting? or >>>> some where else? >>> >>> I was refreshing my memory wrt how ocfs2 quota recovery works. The problem >>> is the following: We load information about all quota information that >>> needs recovering (this is possibly for other nodes) in >>> ocfs2_begin_quota_recovery() that gets called during mount. Real quota >>> recovery happens from the recovery thread in ocfs2_finish_quota_recovery(). >>> We need to protect code running there from dquot_disable() calls as that >>> will free structures we use for updating quota information etc. Currently >>> we use sb->s_umount for that protection. >>> >>> The problem above apparently happens when someone calls umount before the >>> recovery thread can finish quota recovery. I will think more about how to >>> fix the locking so that this lock inversion does not happen... >> >> So could we move ocfs2_recovery_exit() call in ocfs2_dismount_volume() up >> before ocfs2_disable_quotas()? It seems possible to me, I'm just not sure >> if there are not some hidden dependencies on recovery being shut down only >> after truncate log / local alloc. If we can do that, we could remove >> s_umount protection from ocfs2_finish_quota_recovery() and thus resolve the >> race. >> >> Honza > > Thanks for looking into this. > I am not quite familiar with quota part.:) > > Or can we move ocfs2_disable_quotas() in ocfs2_dismount_volume down > after ocfs2_recovery_exit() with ::invoking down_read(&sb->s_umount) > eliminated? > > Another way I can figure out is: > I think we might get inspired from qsync_work_fn(). > In that function if current work is under running context of umount with > ::s_umount held, it just delays current work to next time. > > So can we also _try lock_ in ocfs2_finish_quota_recovery() and recover > quota by other ocfs2 cluster member nodes or local node's next time of > mount? >I guess we need analyse the impact of _try lock_. Such as no other node will help recovering quota when I'm the only node in cluster. thanks, Jun> Thanks, > Changwei > > >> > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >