RO Holders: 1 EX Holders: 0
So node 18 wants to upgrade to EX. For that to happen,
node 17 has to downgrade from PR. But it cannot because
there is 1 RO (readonly) holder. If you are using NFS and
see a nfsd in a D state, then that would be it. I've just
released 1.4.7 in which this issue has been addressed.
Sunil
Brad Plant wrote:> Hi Sunil,
>
> I managed to collect the fs_locks and dlm_locks output on both nodes this
time. www1 is node 17 while www2 is node 18. I had to reboot www1 to fix the
problem but of course www1 couldn't unmount the file system so the other
nodes saw it as a crash.
>
> Both nodes are running 2.6.18-164.15.1.el5.centos.plusxen with the matching
ocfs2 1.4.4-1 rpm downloaded from
http://oss.oracle.com/projects/ocfs2/files/RedHat/RHEL5/x86_64/.
>
> Do you make anything of this?
>
> I read that there is going to be a new ocfs2 release soon. I'm sure
there's lots of bug fixes, but are there any in there that you think might
solve this problem?
>
> Cheers,
>
> Brad
>
>
> www2 ~ # ./scanlocks2
> /dev/xvdd3 M0000000000000000095a0300000000
>
> www2 ~ # debugfs.ocfs2 -R "fs_locks
M0000000000000000095a0300000000" /dev/xvdd3 |cat
> Lockres: M0000000000000000095a0300000000 Mode: Protected Read
> Flags: Initialized Attached Busy
> RO Holders: 0 EX Holders: 0
> Pending Action: Convert Pending Unlock Action: None
> Requested Mode: Exclusive Blocking Mode: No Lock
> PR > Gets: 6802 Fails: 0 Waits (usec) Total: 0 Max: 0
> EX > Gets: 16340 Fails: 0 Waits (usec) Total: 12000 Max: 8000
> Disk Refreshes: 0
>
> www2 ~ # debugfs.ocfs2 -R "dlm_locks
M0000000000000000095a0300000000" /dev/xvdd3 |cat
> Lockres: M0000000000000000095a0300000000 Owner: 18 State: 0x0
> Last Used: 0 ASTs Reserved: 0 Inflight: 0 Migration Pending: No
> Refs: 4 Locks: 2 On Lists: None
> Reference Map: 17
> Lock-Queue Node Level Conv Cookie Refs AST BAST
Pending-Action
> Granted 17 PR -1 17:62487955 2 No No None
> Converting 18 PR EX 18:6599867 2 No No None
>
>
> www1 ~ # ./scanlocks2
>
> www1 ~ # debugfs.ocfs2 -R "fs_locks
M0000000000000000095a0300000000" /dev/xvdd3 |cat
> Lockres: M0000000000000000095a0300000000 Mode: Protected Read
> Flags: Initialized Attached Blocked Queued
> RO Holders: 1 EX Holders: 0
> Pending Action: None Pending Unlock Action: None
> Requested Mode: Protected Read Blocking Mode: Exclusive
> PR > Gets: 110 Fails: 3 Waits (usec) Total: 32000 Max: 12000
> EX > Gets: 0 Fails: 0 Waits (usec) Total: 0 Max: 0
> Disk Refreshes: 0
>
> www1 ~ # debugfs.ocfs2 -R "dlm_locks
M0000000000000000095a0300000000" /dev/xvdd3 |cat
> Lockres: M0000000000000000095a0300000000 Owner: 18 State: 0x0
> Last Used: 0 ASTs Reserved: 0 Inflight: 0 Migration Pending: No
> Refs: 3 Locks: 1 On Lists: None
> Reference Map:
> Lock-Queue Node Level Conv Cookie Refs AST BAST
Pending-Action
> Granted 17 PR -1 17:62487955 2 No No None
>
>
>
>
>
>
>
> On Fri, 19 Mar 2010 08:48:39 -0700
> Sunil Mushran <sunil.mushran at oracle.com> wrote:
>
>
>> In findpath <lockname>, the lockname needs to be in angular
brackets.
>>
>> Did you manage to trap the oops stack trace of the crash?
>>
>> So the dlm on the master says that node 250 has a PR but the fslocks
>> on 250 says that it has requested a PR but not gotten a reply back as
yet.
>> Next time also dump the dlm_lock on 250. (The message flow is fs on 250
>> talsk to dlm on 250 which talkd to dlm on master which may have to talk
>> to other nodes but eventually replies to dlm on 250 which then pings
the
>> fs on that node. Roundtrip happens in a couple hundred of usecs in
gige.)
>>
>> Running a mix of localflock and not is not advisable. Not the end of
the
>> world
>> though. It depends on how flocks are being used.
>>
>> Is this a mix of virtual and physical boxes?
>>
>> Brad Plant wrote:
>>
>>> Hi Sunil,
>>>
>>> I seem to have struck this issue, also I'm not using nfs.
I've got other processes stuck in the D stat. It's a mail server and the
processes are postfix and courier-imap. As per your instructions, I've run
scanlocks2, and debugfs.ocfs2:
>>>
>>> mail1 ~ # ./scanlocks2
>>> /dev/xvdc1 M0000000000000000808bc800000000
>>>
>>> mail1 ~ # debugfs.ocfs2 -R "fs_locks -l
M0000000000000000808bc800000000" /dev/xvdc1 |cat
>>> Lockres: M0000000000000000808bc800000000 Mode: Protected Read
>>> Flags: Initialized Attached Busy
>>> RO Holders: 0 EX Holders: 0
>>> Pending Action: Convert Pending Unlock Action: None
>>> Requested Mode: Exclusive Blocking Mode: No Lock
>>> Raw LVB: 05 00 00 00 00 00 00 01 00 00 01 99 00 00 01 99
>>> 12 1f c9 67 29 71 32 86 12 e8 e2 f6 d1 07 8c 15
>>> 12 e8 e2 f6 d1 07 8c 15 00 00 00 00 00 00 10 00
>>> 41 c0 00 05 00 00 00 00 4b b6 12 7d 00 00 00 00
>>> PR > Gets: 471598 Fails: 0 Waits (usec) Total: 64002 Max:
8000
>>> EX > Gets: 8041 Fails: 0 Waits (usec) Total: 28001 Max:
4000
>>> Disk Refreshes: 0
>>>
>>> mail1 ~ # debugfs.ocfs2 -R "dlm_locks -l
M0000000000000000808bc800000000" /dev/xvdc1 |cat
>>> Lockres: M0000000000000000808bc800000000 Owner: 1 State: 0x0
>>> Last Used: 0 ASTs Reserved: 0 Inflight: 0 Migration
Pending: No
>>> Refs: 4 Locks: 2 On Lists: None
>>> Reference Map: 250
>>> Raw LVB: 05 00 00 00 00 00 00 01 00 00 01 99 00 00 01 99
>>> 12 1f c9 67 29 71 32 86 12 e8 e2 f6 d1 07 8c 15
>>> 12 e8 e2 f6 d1 07 8c 15 00 00 00 00 00 00 10 00
>>> 41 c0 00 05 00 00 00 00 4b b6 12 7d 00 00 00 00
>>> Lock-Queue Node Level Conv Cookie Refs AST BAST
Pending-Action
>>> Granted 250 PR -1 250:10866405 2 No No
None
>>> Converting 1 PR EX 1:95 2 No No
None
>>>
>>> mail1 *is* node number 1, so this is the master node.
>>>
>>> I managed to run scanlocks2 on node 250 (backup1) and also managed
to get the following:
>>>
>>> backup1 ~ # debugfs.ocfs2 -R "fs_locks -l
M00000000000000007e89e400000000" /dev/xvdc1 |cat
>>> Lockres: M00000000000000007e89e400000000 Mode: Invalid
>>> Flags: Initialized Busy
>>> RO Holders: 0 EX Holders: 0
>>> Pending Action: Attach Pending Unlock Action: None
>>> Requested Mode: Protected Read Blocking Mode: Invalid
>>> Raw LVB: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> PR > Gets: 0 Fails: 0 Waits (usec) Total: 0 Max: 0
>>> EX > Gets: 0 Fails: 0 Waits (usec) Total: 0 Max: 0
>>> Disk Refreshes: 0
>>>
>>> A further run of scanlocks2 however resulted in backup1 (node 250)
crashing.
>>>
>>> The FS is mounted by 3 nodes: mail1, mail2 and backup1. mail1 and
mail2 are running the latest centos 5 xen kernel with NO localflocks. backup1 is
running a 2.6.28.10 vanilla mainline kernel (pv-ops) WITH localflocks.
>>>
>>> I had to switch backup1 to a mainline kernel with localflocks
because performing backups on backup1 using rsync seemed to take a long time
(3-4 times longer) when using the centos 5 xen kernel with no localflocks. I was
running all nodes on recent-ish mainline kernels, but have only recently
converted most of them to centos 5 because of repeated ocfs2 stability issues
with mainline kernels and ocfs2.
>>>
>>> When backup1 crashed, the lock held by mail1 seemed to be released
and everything went back to normal.
>>>
>>> I tried to do a debugfs.ocfs2 -R "findpath
M00000000000000007e89e400000000" /dev/xvdc1 |cat but it said "usage:
locate <inode#>" despite the man page stating otherwise. -R
"locate ..." said the same.
>>>
>>> I hope you're able to get some useful info from the above. If
not, can you please provide the next steps that you would want me to run *in
case* it happens again.
>>>
>>> Cheers,
>>>
>>> Brad
>>>
>>>
>>> On Thu, 18 Mar 2010 11:25:28 -0700
>>> Sunil Mushran <sunil.mushran at oracle.com> wrote:
>>>
>>>
>>>
>>>> I am assuming you are mounting the nfs mounts with the
nordirplus
>>>> mount option. If not, that is known to deadlock a nfsd thread
leading
>>>> to what you are seeing.
>>>>
>>>> There are two possible reasons for this error. One is a dlm
issue.
>>>> Other is a local deadlock like above.
>>>>
>>>> To see if the dlm is the cause for the hang, run scanlocks2.
>>>> http://oss.oracle.com/~smushran/.dlm/scripts/scanlocks2
>>>>
>>>> This will dump the busy lock resources. Run it a few times. If
>>>> a lock resource comes up regularly, then it indicates a dlm
problem.
>>>>
>>>> Then dump the fs and dlm lock state on that node.
>>>> debugfs.ocfs2 -R "fs_locks LOCKNAME" /dev/sdX
>>>> debugfs.ocfs2 -R "dlm_locks LOCKNAME" /dev/sdX
>>>>
>>>> The dlm lock will tell you the master node. Repeat the two
dumps
>>>> on the master node. The dlm lock on the master node will point
>>>> to the current holder. Repeat the same on that node. Email all
that
>>>> to me asap.
>>>>
>>>> michael.a.jaquays at verizon.com wrote:
>>>>
>>>>
>>>>> All,
>>>>>
>>>>> I've seen a few posts about this issue in the past, but
not a resolution. I have a 3 node cluster sharing ocfs2 volumes to app nodes
via nfs. On occasion, one of our db nodes will have nfs go into an
uninterruptable sleep state. The nfs daemon is completely useless at this
point. The db node has to be rebooted to resolve. It seems that nfs is waiting
on ocfs2_wait_for_mask. Any suggestions on a resolution would be appreciated.
>>>>>
>>>>> root 18387 0.0 0.0 0 0 ? S<
Mar15 0:00 [nfsd4]
>>>>> root 18389 0.0 0.0 0 0 ? D Mar15
0:10 [nfsd]
>>>>> root 18390 0.0 0.0 0 0 ? D Mar15
0:10 [nfsd]
>>>>> root 18391 0.0 0.0 0 0 ? D Mar15
0:10 [nfsd]
>>>>> root 18392 0.0 0.0 0 0 ? D Mar15
0:13 [nfsd]
>>>>> root 18393 0.0 0.0 0 0 ? D Mar15
0:08 [nfsd]
>>>>> root 18394 0.0 0.0 0 0 ? D Mar15
0:09 [nfsd]
>>>>> root 18395 0.0 0.0 0 0 ? D Mar15
0:12 [nfsd]
>>>>> root 18396 0.0 0.0 0 0 ? D Mar15
0:13 [nfsd]
>>>>>
>>>>> 18387 nfsd4 worker_thread
>>>>> 18389 nfsd ocfs2_wait_for_mask
>>>>> 18390 nfsd ocfs2_wait_for_mask
>>>>> 18391 nfsd ocfs2_wait_for_mask
>>>>> 18392 nfsd ocfs2_wait_for_mask
>>>>> 18393 nfsd ocfs2_wait_for_mask
>>>>> 18394 nfsd ocfs2_wait_for_mask
>>>>> 18395 nfsd ocfs2_wait_for_mask
>>>>> 18396 nfsd ocfs2_wait_for_mask
>>>>>
>>>>>
>>>>> -Mike Jaquays
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>>
>
>