Shichangkuo
2018-Jan-12 03:43 UTC
[Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread
Hi all, ??Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows: journal recovery work: [<ffffffff8a8c0694>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffffc0d5d652>] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2] [<ffffffffc0d21221>] ocfs2_complete_recovery+0xc1/0x440 [ocfs2] [<ffffffff8a09a1f0>] process_one_work+0x130/0x350 [<ffffffff8a09a946>] worker_thread+0x46/0x3b0 [<ffffffff8a0a0e51>] kthread+0x101/0x140 [<ffffffff8aa002ff>] ret_from_fork+0x1f/0x30 [<ffffffffffffffff>] 0xffffffffffffffff /bin/umount: [<ffffffff8a099b24>] flush_workqueue+0x104/0x3e0 [<ffffffffc0cf18db>] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2] [<ffffffffc0d4fd6c>] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2] [<ffffffffc0d500e1>] ocfs2_put_super+0x31/0xa0 [ocfs2] [<ffffffff8a2445bd>] generic_shutdown_super+0x6d/0x120 [<ffffffff8a24469d>] kill_block_super+0x2d/0x60 [<ffffffff8a244e71>] deactivate_locked_super+0x51/0x90 [<ffffffff8a263a1b>] cleanup_mnt+0x3b/0x70 [<ffffffff8a09e9c6>] task_work_run+0x86/0xa0 [<ffffffff8a003d70>] exit_to_usermode_loop+0x6d/0xa9 [<ffffffff8a003a2d>] do_syscall_64+0x11d/0x130 [<ffffffff8aa00113>] entry_SYSCALL64_slow_path+0x25/0x25 [<ffffffffffffffff>] 0xffffffffffffffff ?? Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was already locked by umount thread, then get a deadlock. This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and 5f530de63cfc6ca8571cbdf58af63fb166cc6517. I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already removed. Shall we add a new mutex? Thanks Changkuo ------------------------------------------------------------------------------------------------------------------------------------- ????????????????????????????????????? ???????????????????????????????????????? ???????????????????????????????????????? ??? This e-mail and its attachments contain confidential information from New H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Joseph Qi
2018-Jan-12 05:50 UTC
[Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread
Hi Changkuo, You said s_umount was acquired by umount and ocfs2rec was blocked when acquiring it. But you didn't describe why umount was blocked. Thanks, Joseph On 18/1/12 11:43, Shichangkuo wrote:> Hi all, > ??Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows: > journal recovery work: > [<ffffffff8a8c0694>] call_rwsem_down_read_failed+0x14/0x30 > [<ffffffffc0d5d652>] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2] > [<ffffffffc0d21221>] ocfs2_complete_recovery+0xc1/0x440 [ocfs2] > [<ffffffff8a09a1f0>] process_one_work+0x130/0x350 > [<ffffffff8a09a946>] worker_thread+0x46/0x3b0 > [<ffffffff8a0a0e51>] kthread+0x101/0x140 > [<ffffffff8aa002ff>] ret_from_fork+0x1f/0x30 > [<ffffffffffffffff>] 0xffffffffffffffff > > /bin/umount: > [<ffffffff8a099b24>] flush_workqueue+0x104/0x3e0 > [<ffffffffc0cf18db>] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2] > [<ffffffffc0d4fd6c>] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2] > [<ffffffffc0d500e1>] ocfs2_put_super+0x31/0xa0 [ocfs2] > [<ffffffff8a2445bd>] generic_shutdown_super+0x6d/0x120 > [<ffffffff8a24469d>] kill_block_super+0x2d/0x60 > [<ffffffff8a244e71>] deactivate_locked_super+0x51/0x90 > [<ffffffff8a263a1b>] cleanup_mnt+0x3b/0x70 > [<ffffffff8a09e9c6>] task_work_run+0x86/0xa0 > [<ffffffff8a003d70>] exit_to_usermode_loop+0x6d/0xa9 > [<ffffffff8a003a2d>] do_syscall_64+0x11d/0x130 > [<ffffffff8aa00113>] entry_SYSCALL64_slow_path+0x25/0x25 > [<ffffffffffffffff>] 0xffffffffffffffff > ?? > Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was already locked by umount thread, then get a deadlock. > This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and 5f530de63cfc6ca8571cbdf58af63fb166cc6517. > I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already removed. > Shall we add a new mutex? > > Thanks > Changkuo > ------------------------------------------------------------------------------------------------------------------------------------- > ????????????????????????????????????? > ???????????????????????????????????????? > ???????????????????????????????????????? > ??? > This e-mail and its attachments contain confidential information from New H3C, which is > intended only for the person or entity whose address is listed above. Any use of the > information contained herein in any way (including, but not limited to, total or partial > disclosure, reproduction, or dissemination) by persons other than the intended > recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender > by phone or email immediately and delete it! > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
Eric Ren
2018-Jan-12 08:25 UTC
[Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread
Hi, On 01/12/2018 11:43 AM, Shichangkuo wrote:> Hi all, > ??Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows: > journal recovery work: > [<ffffffff8a8c0694>] call_rwsem_down_read_failed+0x14/0x30 > [<ffffffffc0d5d652>] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2] > [<ffffffffc0d21221>] ocfs2_complete_recovery+0xc1/0x440 [ocfs2] > [<ffffffff8a09a1f0>] process_one_work+0x130/0x350 > [<ffffffff8a09a946>] worker_thread+0x46/0x3b0 > [<ffffffff8a0a0e51>] kthread+0x101/0x140 > [<ffffffff8aa002ff>] ret_from_fork+0x1f/0x30 > [<ffffffffffffffff>] 0xffffffffffffffff > > /bin/umount: > [<ffffffff8a099b24>] flush_workqueue+0x104/0x3e0 > [<ffffffffc0cf18db>] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2] > [<ffffffffc0d4fd6c>] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2] > [<ffffffffc0d500e1>] ocfs2_put_super+0x31/0xa0 [ocfs2] > [<ffffffff8a2445bd>] generic_shutdown_super+0x6d/0x120 > [<ffffffff8a24469d>] kill_block_super+0x2d/0x60 > [<ffffffff8a244e71>] deactivate_locked_super+0x51/0x90 > [<ffffffff8a263a1b>] cleanup_mnt+0x3b/0x70 > [<ffffffff8a09e9c6>] task_work_run+0x86/0xa0 > [<ffffffff8a003d70>] exit_to_usermode_loop+0x6d/0xa9 > [<ffffffff8a003a2d>] do_syscall_64+0x11d/0x130 > [<ffffffff8aa00113>] entry_SYSCALL64_slow_path+0x25/0x25 > [<ffffffffffffffff>] 0xffffffffffffffff > ?? > Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was already locked by umount thread, then get a deadlock.Good catch, thanks for reporting.? Is it reproducible? Can you please share the steps for reproducing this issue?> This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and 5f530de63cfc6ca8571cbdf58af63fb166cc6517. > I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already removed. > Shall we add a new mutex?@Jan, I don't look into the code yet, could you help me understand why we need to get sb->s_umount in ocfs2_finish_quota_recovery? Is it because that the quota recovery process will start at umounting? or some where else? Thanks, Eric