Eric Jones
2006-May-04 18:01 UTC
[Ocfs2-users] Experience with NFS exporting ocfs2 filesystems
Folks, I wanted to see whether anyone here had any experience with sharing ocfs2 over NFS. We have 41 nodes accessing an ocfs2 filesystem, one of them exporting it via NFS to 5 or 10 clients. We've been experiencing a problem where NFS on the host locks up and all the (NFS) clients start blocking on I/O to the filesystem (the nfsd processes were in "D" state). If I log into the NFS server (or any other cluster nodes) the filesystem is mostly OK, except for the directory containing the semaphore directories. An ls of that directory hangs and can't be interrupted with ^C. The only way we've found to recover NFS is to reboot the host. We haven't firmly determined the problem is ocfs2 but there are three things pointing that way: 1) I've never seen this kind of problem serving ext3fs before (weak, I know) 2) We seem to be able to cause the problem by running a program on our cluster that uses directories on ocfs2 as semaphores, continually mkdir'ing, rmdir'ing, and checking existence. A process was also continuously polling the directories over NFS. If we poll locally (ocfs2-attached system) the problem does not recur. 3) The underlying ocfs2 filesystem also experiences problems. Configuration: 40 linux hosts in cluster of i686 architecture running 2.6.13-1.1532_FC4smp 1 linux host in cluster of architecture x86_64 running 2.6.13-1.1532_FC4smp (compiled for x86_64) x86_64 host is NFS server all hosts running OCFS2 1.2.1 Wed Apr 26 19:47:10 EDT 2006 (build bd2f25ba0af9677db3572e3ccd92f739) I haven't successfully created a complete sysrq-t of the system because it seems to panic the kernel. If others have had success with this sort of configuration, please let me know so I can start looking elsewhere. Thanks, Eric -- Eric Jones ejones at jimmy dot harvard dot edu System Adminstrator Department of Biostatistics & Computational Biology 617-632-2447 Dana-Farber Cancer Institute