So in 1.4, we have a much improved debugging infrastructure for
such issues. Check out the write on dlm debugging in the 1.4
user's guide in the chapter titled notes.
In short, you have correctly identified the lock resource. But we
need to go a step further and get the info from the dlm and see
as to which node is holding onto the lock and why.
Read the writeup and of you have any qs, ping me.
Sunil
Andrew Phillips wrote:> Hello,
>
> We just experienced a hang that looks superficially very similar to
> http://www.mail-archive.com/ocfs2-users at oss.oracle.com/msg02359.html
>
> There are 3 nodes in the cluster ocfs2-1.4.1 rhel 5.2. Versions,
uname's
> in the attached text file which also includes fs_locks dumps and various
> other diagnostics.
>
> The lock up happened when we were restarting a java application that
> was writing to the /journal directory, being read by another java app
> on a second node. Restarting the machine that the
> jvm was running on did not help - indicating a locking issue.
>
> ls of the directory hangs the process on the machine that was writing.
> An ls on the machine that was reading initially worked. An rm command
> on the reader then caused that to lock up as well.
>
> Here's an extract showing what they're waiting on.
>
> 2222 D bash ocfs2_wait_for_mask
> 2282 Zl java <defunct> exit
> 2567 Zl java <defunct> exit
> 2736 D ls ocfs2_wait_for_mask
> 2770 D ls ocfs2_wait_for_mask
>
> Andy
>
>
>
>
> ________________________________________________________________________
> In order to protect our email recipients, Betfair Group use SkyScan from
> MessageLabs to scan all Incoming and Outgoing mail for viruses.
>
> ________________________________________________________________________
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users