Hi, In a OCFS2 cluster of XenServer 7.1.1 hosts, we met umount hung on two different hosts. The kernel is based on Linux 4.4.27. The cluster has 9 hosts and 8 OCFS2 filesystems. Though umount is hanging, the mountpoint entry already disappeared in /proc/mounts. Despite this issue, the OCFS2 filesystems are working well. the first umount stack: (by cat /proc/PID/stack) [<ffffffff810d05cd>] msleep+0x2d/0x40 [<ffffffffa0620409>] dlmunlock+0x2c9/0x490 [ocfs2_dlm] [<ffffffffa04461a5>] o2cb_dlm_unlock+0x35/0x50 [ocfs2_stack_o2cb] [<ffffffffa0574120>] ocfs2_dlm_unlock+0x20/0x30 [ocfs2_stackglue] [<ffffffffa06531e0>] ocfs2_drop_lock.isra.20+0x250/0x370 [ocfs2] [<ffffffffa0654a36>] ocfs2_drop_inode_locks+0xa6/0x180 [ocfs2] [<ffffffffa0661d13>] ocfs2_clear_inode+0x343/0x6d0 [ocfs2] [<ffffffffa0663616>] ocfs2_evict_inode+0x526/0x5d0 [ocfs2] [<ffffffff811cd816>] evict+0xb6/0x170 [<ffffffff811ce495>] iput+0x1c5/0x1f0 [<ffffffffa068cbd0>] ocfs2_release_system_inodes+0x90/0xd0 [ocfs2] [<ffffffffa068dfad>] ocfs2_dismount_volume+0x17d/0x390 [ocfs2] [<ffffffffa068e210>] ocfs2_put_super+0x50/0x80 [ocfs2] [<ffffffff811b6e6f>] generic_shutdown_super+0x6f/0x100 [<ffffffff811b6f87>] kill_block_super+0x27/0x70 [<ffffffff811b68bb>] deactivate_locked_super+0x3b/0x70 [<ffffffff811b6949>] deactivate_super+0x59/0x60 [<ffffffff811d18a8>] cleanup_mnt+0x58/0x80 [<ffffffff811d1922>] __cleanup_mnt+0x12/0x20 [<ffffffff8108c2ad>] task_work_run+0x7d/0xa0 [<ffffffff8106d2b9>] exit_to_usermode_loop+0x73/0x98 [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 [<ffffffff815a0acc>] int_ret_from_sys_call+0x25/0x8f [<ffffffffffffffff>] 0xffffffffffffffff the second umount stack: [<ffffffff81087398>] flush_workqueue+0x1c8/0x520 [<ffffffffa06700c9>] ocfs2_shutdown_local_alloc+0x39/0x410 [ocfs2] [<ffffffffa0692edd>] ocfs2_dismount_volume+0xad/0x390 [ocfs2] [<ffffffffa0693210>] ocfs2_put_super+0x50/0x80 [ocfs2] [<ffffffff811b6e6f>] generic_shutdown_super+0x6f/0x100 [<ffffffff811b6f87>] kill_block_super+0x27/0x70 [<ffffffff811b68bb>] deactivate_locked_super+0x3b/0x70 [<ffffffff811b6949>] deactivate_super+0x59/0x60 [<ffffffff811d18a8>] cleanup_mnt+0x58/0x80 [<ffffffff811d1922>] __cleanup_mnt+0x12/0x20 [<ffffffff8108c2ad>] task_work_run+0x7d/0xa0 [<ffffffff8106d2b9>] exit_to_usermode_loop+0x73/0x98 [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 [<ffffffff815a0acc>] int_ret_from_sys_call+0x25/0x8f [<ffffffffffffffff>] 0xffffffffffffffff -robin
Hello Robin, Since OCFS2 in SUSE HA extension uses pcmk stack, rather than o2cb stack, I can not give you more detailed comments. But from the back-trace of the first umount process, it looks there is a msleep loop in dlmunlock function until certain condition is met. Hello Alex, could your guys help to look at this case? I feel the first hang process reveal the current o2cb based DLM is exceptional? Thanks a lot. Gang>>> On 6/4/2019 at 5:46 pm, in message<CAG8B0Ozk=J4WZ_L9dKry9AMp3JahQ002gNE0UsWqcsc_RB6Hdw at mail.gmail.com>, Robin Lee <robinlee.sysu at gmail.com> wrote:> Hi, > > In a OCFS2 cluster of XenServer 7.1.1 hosts, we met umount hung on two > different hosts. > The kernel is based on Linux 4.4.27. > The cluster has 9 hosts and 8 OCFS2 filesystems. > Though umount is hanging, the mountpoint entry already disappeared in > /proc/mounts. > Despite this issue, the OCFS2 filesystems are working well. > > the first umount stack: (by cat /proc/PID/stack) > [<ffffffff810d05cd>] msleep+0x2d/0x40 > [<ffffffffa0620409>] dlmunlock+0x2c9/0x490 [ocfs2_dlm] > [<ffffffffa04461a5>] o2cb_dlm_unlock+0x35/0x50 [ocfs2_stack_o2cb] > [<ffffffffa0574120>] ocfs2_dlm_unlock+0x20/0x30 [ocfs2_stackglue] > [<ffffffffa06531e0>] ocfs2_drop_lock.isra.20+0x250/0x370 [ocfs2] > [<ffffffffa0654a36>] ocfs2_drop_inode_locks+0xa6/0x180 [ocfs2] > [<ffffffffa0661d13>] ocfs2_clear_inode+0x343/0x6d0 [ocfs2] > [<ffffffffa0663616>] ocfs2_evict_inode+0x526/0x5d0 [ocfs2] > [<ffffffff811cd816>] evict+0xb6/0x170 > [<ffffffff811ce495>] iput+0x1c5/0x1f0 > [<ffffffffa068cbd0>] ocfs2_release_system_inodes+0x90/0xd0 [ocfs2] > [<ffffffffa068dfad>] ocfs2_dismount_volume+0x17d/0x390 [ocfs2] > [<ffffffffa068e210>] ocfs2_put_super+0x50/0x80 [ocfs2] > [<ffffffff811b6e6f>] generic_shutdown_super+0x6f/0x100 > [<ffffffff811b6f87>] kill_block_super+0x27/0x70 > [<ffffffff811b68bb>] deactivate_locked_super+0x3b/0x70 > [<ffffffff811b6949>] deactivate_super+0x59/0x60 > [<ffffffff811d18a8>] cleanup_mnt+0x58/0x80 > [<ffffffff811d1922>] __cleanup_mnt+0x12/0x20 > [<ffffffff8108c2ad>] task_work_run+0x7d/0xa0 > [<ffffffff8106d2b9>] exit_to_usermode_loop+0x73/0x98 > [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 > [<ffffffff815a0acc>] int_ret_from_sys_call+0x25/0x8f > [<ffffffffffffffff>] 0xffffffffffffffff > > the second umount stack: > [<ffffffff81087398>] flush_workqueue+0x1c8/0x520 > [<ffffffffa06700c9>] ocfs2_shutdown_local_alloc+0x39/0x410 [ocfs2] > [<ffffffffa0692edd>] ocfs2_dismount_volume+0xad/0x390 [ocfs2] > [<ffffffffa0693210>] ocfs2_put_super+0x50/0x80 [ocfs2] > [<ffffffff811b6e6f>] generic_shutdown_super+0x6f/0x100 > [<ffffffff811b6f87>] kill_block_super+0x27/0x70 > [<ffffffff811b68bb>] deactivate_locked_super+0x3b/0x70 > [<ffffffff811b6949>] deactivate_super+0x59/0x60 > [<ffffffff811d18a8>] cleanup_mnt+0x58/0x80 > [<ffffffff811d1922>] __cleanup_mnt+0x12/0x20 > [<ffffffff8108c2ad>] task_work_run+0x7d/0xa0 > [<ffffffff8106d2b9>] exit_to_usermode_loop+0x73/0x98 > [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 > [<ffffffff815a0acc>] int_ret_from_sys_call+0x25/0x8f > [<ffffffffffffffff>] 0xffffffffffffffff > > -robin > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users
I am going deeper into the current situation. I am still finding a way to get the umount done without rebooting any hosts. I found the hosts with hung 'umount' kept sending DLM_MASTER_REQUEST_MSG to the same other host(named NODE-10). NODE-10 kept sendintg back DLM_MASTER_RESP_ERROR, kept logging 'returning DLM_MASTER_RESP_ERROR since res is being recovered/migrated'. The other hosts then slept for 50ms and resent the message, kept logging 'node %u hit an error, resending'. And I used 'debugfs.ocfs2 -R dlm_locks /dev/...' to find the bad lockres. I found a single lockres that marked MIGRATING in NODE-10, and IN_PROGRESS in the other hosts. So, I am considering if the MIGRATING flag is cleared in NODE-10, then the other hosts can get out of the loop and finish the 'umount'. So the question is whether there is a way to clear the MIGRATING flag of a lockres. Is it safe to directly reset the flag by SystemTap? Or any existing tool to do that? On Tue, Jun 11, 2019 at 3:51 PM Gang He <ghe at suse.com> wrote:> > Hello Robin, > > Since OCFS2 in SUSE HA extension uses pcmk stack, rather than o2cb stack, I can not give you more detailed comments. > But from the back-trace of the first umount process, it looks there is a msleep loop in dlmunlock function until certain condition is met. > Hello Alex, > could your guys help to look at this case? I feel the first hang process reveal the current o2cb based DLM is exceptional? > > > Thanks a lot. > Gang > > >>> On 6/4/2019 at 5:46 pm, in message > <CAG8B0Ozk=J4WZ_L9dKry9AMp3JahQ002gNE0UsWqcsc_RB6Hdw at mail.gmail.com>, Robin Lee > <robinlee.sysu at gmail.com> wrote: > > Hi, > > > > In a OCFS2 cluster of XenServer 7.1.1 hosts, we met umount hung on two > > different hosts. > > The kernel is based on Linux 4.4.27. > > The cluster has 9 hosts and 8 OCFS2 filesystems. > > Though umount is hanging, the mountpoint entry already disappeared in > > /proc/mounts. > > Despite this issue, the OCFS2 filesystems are working well. > > > > the first umount stack: (by cat /proc/PID/stack) > > [<ffffffff810d05cd>] msleep+0x2d/0x40 > > [<ffffffffa0620409>] dlmunlock+0x2c9/0x490 [ocfs2_dlm] > > [<ffffffffa04461a5>] o2cb_dlm_unlock+0x35/0x50 [ocfs2_stack_o2cb] > > [<ffffffffa0574120>] ocfs2_dlm_unlock+0x20/0x30 [ocfs2_stackglue] > > [<ffffffffa06531e0>] ocfs2_drop_lock.isra.20+0x250/0x370 [ocfs2] > > [<ffffffffa0654a36>] ocfs2_drop_inode_locks+0xa6/0x180 [ocfs2] > > [<ffffffffa0661d13>] ocfs2_clear_inode+0x343/0x6d0 [ocfs2] > > [<ffffffffa0663616>] ocfs2_evict_inode+0x526/0x5d0 [ocfs2] > > [<ffffffff811cd816>] evict+0xb6/0x170 > > [<ffffffff811ce495>] iput+0x1c5/0x1f0 > > [<ffffffffa068cbd0>] ocfs2_release_system_inodes+0x90/0xd0 [ocfs2] > > [<ffffffffa068dfad>] ocfs2_dismount_volume+0x17d/0x390 [ocfs2] > > [<ffffffffa068e210>] ocfs2_put_super+0x50/0x80 [ocfs2] > > [<ffffffff811b6e6f>] generic_shutdown_super+0x6f/0x100 > > [<ffffffff811b6f87>] kill_block_super+0x27/0x70 > > [<ffffffff811b68bb>] deactivate_locked_super+0x3b/0x70 > > [<ffffffff811b6949>] deactivate_super+0x59/0x60 > > [<ffffffff811d18a8>] cleanup_mnt+0x58/0x80 > > [<ffffffff811d1922>] __cleanup_mnt+0x12/0x20 > > [<ffffffff8108c2ad>] task_work_run+0x7d/0xa0 > > [<ffffffff8106d2b9>] exit_to_usermode_loop+0x73/0x98 > > [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 > > [<ffffffff815a0acc>] int_ret_from_sys_call+0x25/0x8f > > [<ffffffffffffffff>] 0xffffffffffffffff > > > > the second umount stack: > > [<ffffffff81087398>] flush_workqueue+0x1c8/0x520 > > [<ffffffffa06700c9>] ocfs2_shutdown_local_alloc+0x39/0x410 [ocfs2] > > [<ffffffffa0692edd>] ocfs2_dismount_volume+0xad/0x390 [ocfs2] > > [<ffffffffa0693210>] ocfs2_put_super+0x50/0x80 [ocfs2] > > [<ffffffff811b6e6f>] generic_shutdown_super+0x6f/0x100 > > [<ffffffff811b6f87>] kill_block_super+0x27/0x70 > > [<ffffffff811b68bb>] deactivate_locked_super+0x3b/0x70 > > [<ffffffff811b6949>] deactivate_super+0x59/0x60 > > [<ffffffff811d18a8>] cleanup_mnt+0x58/0x80 > > [<ffffffff811d1922>] __cleanup_mnt+0x12/0x20 > > [<ffffffff8108c2ad>] task_work_run+0x7d/0xa0 > > [<ffffffff8106d2b9>] exit_to_usermode_loop+0x73/0x98 > > [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50 > > [<ffffffff815a0acc>] int_ret_from_sys_call+0x25/0x8f > > [<ffffffffffffffff>] 0xffffffffffffffff > > > > -robin > > > > _______________________________________________ > > Ocfs2-users mailing list > > Ocfs2-users at oss.oracle.com > > https://oss.oracle.com/mailman/listinfo/ocfs2-users >