Junxiao Bi
2016-Oct-12 06:47 UTC
[Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing
On 10/12/2016 10:36 AM, Eric Ren wrote:> Hi, > > When backporting those patches, I find that they are already in our > product kernel, maybe > via "stable kernel" policy, although our product kernel is 4.4 while the > patches were merged > into 4.6. > > Seems it's another deadlock that happens when doing `chmod -R 777 > /mnt/ocfs2` > among mutilple nodes at the same time.Yes, but i just finish running ocfs2 full test on linux next-20161006 and didn't find any issue. Thanks, Junxiao.> > Thanks, > Eric > On 10/12/2016 09:23 AM, Eric Ren wrote: >> Hi Junxiao, >> >>> Hi Eric, >>> >>> On 10/11/2016 10:42 AM, Eric Ren wrote: >>>> Hi Junxiao, >>>> >>>> As the subject, the testing hung there on a kernel without your >>>> patches: >>>> >>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang" >>>> and >>>> "ocfs2: fix posix_acl_create deadlock" >>>> >>>> The stack trace is: >>>> ``` >>>> ocfs2cts1:~ # pstree -pl 24133 >>>> discontig_runne(24133)???activate_discon(21156)???mpirun(15146)???fillup_contig_b(15149)???sudo(15231)???chmod(15232) >>>> >>>> >>>> ocfs2cts1:~ # pgrep -a chmod >>>> 15232 /bin/chmod -R 777 /mnt/ocfs2 >>>> >>>> ocfs2cts1:~ # cat /proc/15232/stack >>>> [<ffffffffa05377ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2] >>>> [<ffffffffa053856d>] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2] >>>> [<ffffffffa0538dbb>] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2] >>>> [<ffffffffa0531e61>] ocfs2_readdir+0x41/0x1b0 [ocfs2] >>>> [<ffffffff8120d03c>] iterate_dir+0x9c/0x110 >>>> [<ffffffff8120d453>] SyS_getdents+0x83/0xf0 >>>> [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d >>>> [<ffffffffffffffff>] 0xffffffffffffffff >>>> ``` >>>> >>>> Do you think this issue can be fixed by your patches? >>> Looks not. Those two patches are to fix recursive locking deadlock. But >>> from above call trace, there is no recursive lock. >> Sorry, the call trace on another node was missing. Here it is: >> >> ocfs2cts2:~ # pstree -lp >> sshd(4292)???sshd(4745)???sshd(4753)???bash(4754)???orted(4781)???fillup_contig_b(4782)???sudo(4864)???chmod(4865) >> >> >> ocfs2cts2:~ # cat /proc/4865/stack >> [<ffffffffa053e7ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2] >> [<ffffffffa053f56d>] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2] >> [<ffffffffa059c860>] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2] >> [<ffffffff812044e6>] generic_permission+0x166/0x1c0 >> [<ffffffffa0542aca>] ocfs2_permission+0xaa/0xd0 [ocfs2] >> [<ffffffff81204596>] __inode_permission+0x56/0xb0 >> [<ffffffff812068fa>] link_path_walk+0x29a/0x560 >> [<ffffffff81206cbf>] path_lookupat+0x7f/0x110 >> [<ffffffff8120929c>] filename_lookup+0x9c/0x150 >> [<ffffffff811f96c3>] SyS_fchmodat+0x33/0x90 >> [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d >> [<ffffffffffffffff>] 0xffffffffffffffff >> >> Thanks, >> Eric >> >> >>> Thanks, >>> Junxiao. >>>> I will try your patches later, but I am little worried the possibility >>>> of reproduction may not be 100%. >>>> So ask you to confirm;-) >>>> >>>> Eric >> >> _______________________________________________ >> Ocfs2-devel mailing list >> Ocfs2-devel at oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel > >
Eric Ren
2016-Oct-12 09:34 UTC
[Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing
Hi Junxiao, On 10/12/2016 02:47 PM, Junxiao Bi wrote:> On 10/12/2016 10:36 AM, Eric Ren wrote: >> Hi, >> >> When backporting those patches, I find that they are already in our >> product kernel, maybe >> via "stable kernel" policy, although our product kernel is 4.4 while the >> patches were merged >> into 4.6. >> >> Seems it's another deadlock that happens when doing `chmod -R 777 >> /mnt/ocfs2` >> among mutilple nodes at the same time. > Yes, but i just finish running ocfs2 full test on linux next-20161006 > and didn't find any issue.Thanks a lot, really! 1. What's the size of your ocfs2 disk? My disk is 200G. 2. Did you run discontig block group test with multiple nodes? with this option: " -m ocfs2cts1,ocfs2cts2" 3. Then, I am using fs/dlm. That's a different point. Thanks, Eric> > Thanks, > Junxiao. > >> Thanks, >> Eric >> On 10/12/2016 09:23 AM, Eric Ren wrote: >>> Hi Junxiao, >>> >>>> Hi Eric, >>>> >>>> On 10/11/2016 10:42 AM, Eric Ren wrote: >>>>> Hi Junxiao, >>>>> >>>>> As the subject, the testing hung there on a kernel without your >>>>> patches: >>>>> >>>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang" >>>>> and >>>>> "ocfs2: fix posix_acl_create deadlock" >>>>> >>>>> The stack trace is: >>>>> ``` >>>>> ocfs2cts1:~ # pstree -pl 24133 >>>>> discontig_runne(24133)???activate_discon(21156)???mpirun(15146)???fillup_contig_b(15149)???sudo(15231)???chmod(15232) >>>>> >>>>> >>>>> ocfs2cts1:~ # pgrep -a chmod >>>>> 15232 /bin/chmod -R 777 /mnt/ocfs2 >>>>> >>>>> ocfs2cts1:~ # cat /proc/15232/stack >>>>> [<ffffffffa05377ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2] >>>>> [<ffffffffa053856d>] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2] >>>>> [<ffffffffa0538dbb>] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2] >>>>> [<ffffffffa0531e61>] ocfs2_readdir+0x41/0x1b0 [ocfs2] >>>>> [<ffffffff8120d03c>] iterate_dir+0x9c/0x110 >>>>> [<ffffffff8120d453>] SyS_getdents+0x83/0xf0 >>>>> [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d >>>>> [<ffffffffffffffff>] 0xffffffffffffffff >>>>> ``` >>>>> >>>>> Do you think this issue can be fixed by your patches? >>>> Looks not. Those two patches are to fix recursive locking deadlock. But >>>> from above call trace, there is no recursive lock. >>> Sorry, the call trace on another node was missing. Here it is: >>> >>> ocfs2cts2:~ # pstree -lp >>> sshd(4292)???sshd(4745)???sshd(4753)???bash(4754)???orted(4781)???fillup_contig_b(4782)???sudo(4864)???chmod(4865) >>> >>> >>> ocfs2cts2:~ # cat /proc/4865/stack >>> [<ffffffffa053e7ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2] >>> [<ffffffffa053f56d>] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2] >>> [<ffffffffa059c860>] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2] >>> [<ffffffff812044e6>] generic_permission+0x166/0x1c0 >>> [<ffffffffa0542aca>] ocfs2_permission+0xaa/0xd0 [ocfs2] >>> [<ffffffff81204596>] __inode_permission+0x56/0xb0 >>> [<ffffffff812068fa>] link_path_walk+0x29a/0x560 >>> [<ffffffff81206cbf>] path_lookupat+0x7f/0x110 >>> [<ffffffff8120929c>] filename_lookup+0x9c/0x150 >>> [<ffffffff811f96c3>] SyS_fchmodat+0x33/0x90 >>> [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d >>> [<ffffffffffffffff>] 0xffffffffffffffff >>> >>> Thanks, >>> Eric >>> >>> >>>> Thanks, >>>> Junxiao. >>>>> I will try your patches later, but I am little worried the possibility >>>>> of reproduction may not be 100%. >>>>> So ask you to confirm;-) >>>>> >>>>> Eric >>> _______________________________________________ >>> Ocfs2-devel mailing list >>> Ocfs2-devel at oss.oracle.com >>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> >