Charlie Sharkey wrote:>
> version info
>
> ---------------
>
> n1 kernel: OCFS2 Node Manager 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> n1 kernel: OCFS2 DLM 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> n1 kernel: OCFS2 DLMFS 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> ocfs2-tools-1.4.0-0.5
>
> ocfs2console-1.4.0-0.5
>
> Linux n1 2.6.16.60-0.34-smp #1 SMP Fri Jan 16 14:59:01 UTC 2009 x86_64
> x86_64 x86_64 GNU/Linux
>
>
===========================================================================>
> One of the nodes of a six node cluster got a hung process. The ?ps
> ?elf? command shows it as:
>
> 5 D vtape 8542 1 6 77 0 - 77376 ocfs2_ Jan12 ? 01:34:31
> /opt/bti/mas/bin/vt -d -p /var/run/vt.pid
>
> The system isn?t hung, I can ssh into the system and ls each ocfs2
> directory. I have run the debugfs.ocfs2
>
> command: debug.ocfs2 ?R ?stats? and it shows no errors. I ran the
> ?scanlocks2? script and it didn?t show
>
> any hung locks. It did create some files (/tmp/_fsl_dm-22 ?
> /tmp/_fsl_dm-26). The contents of those files
>
> are: ?Debug string proto 2 found, but 1 is the highest I understand.?
>
You have an old debugfs.ocfs2. See if sles has a newer ocfs2-tools.
With it, rerun scanlocks2. That will tell us if dlm is involved or not.
Meanwhile what does this say.
ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN