Hello all. First, I'd like to thank the developers for a great filesystem, we use it in production every day serving web pages for a site doing up to 400mbit. We're recently running into troubles, however, (switched to the latest ocfs2-tools, latest kernel available at time of writing, 2.6.22). We use gentoo, so that makes it a little easier to use. Our problem is this: Mostly anytime a file is modified live on the ocfs2 volume, the 2 other nodes (also running apache), begin to get high load averages, and the site goes down, until all but node1 are rebooted. It seems there is some sort of locking contention? Also, if uploads are being done (via vsftpd) to the volume, the same behavior occurs, and no nodes are able to take out new locks. What could be causing this problem? It's becoming an issue, as it takes the whole website down. Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20071010/60edccd9/attachment.html
On Wed, Oct 10, 2007 at 09:42:05AM -0700, Michael M. wrote:> We?re recently running into troubles, however, (switched to the latest > ocfs2-tools, latest kernel available at time of writing, 2.6.22). We use > gentoo, so that makes it a little easier to use.Have you tried any of the backported fixes for 2.6.22? You can find them at: http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/ If at all possible, I'd upgrade to the latest 2.6.22 stable kernel (2.6.22.10 at the moment), and apply the patches at: http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.22.6/ That'd get at least those known issues out of the way. Btw, which kernel / tools were you using previous to the upgrade?> Mostly anytime a file is modified live on the ocfs2 volume, the 2 other nodes > (also running apache), begin to get high load averages, and the site goes down, > until all but node1 are rebooted. It seems there is some sort of locking > contention? Also, if uploads are being done (via vsftpd) to the volume, the > same behavior occurs, and no nodes are able to take out new locks. What could > be causing this problem? It?s becoming an issue, as it takes the whole website > down.I'm not 100% clear on what you mean by "live"... Are the nodes all doing writes to the same file? That'd certainly incur a high locking overhead. It shouldn't hang unrelated processes on the nodes though. Could you file a bugzilla with the following information please: - What processes are eating the cpu (I guess some info from "top" on all the nodes would do) - Attach your kernel config - File system options (the output from "echo stats -h | debugfs.ocfs2 /dev/XXX") - Describe your storage and network - Exact ocfs2-tools version Btw, feel free to put my e-mail address as CC in the bugzilla. Thanks, --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh@oracle.com
I'm seeing a lot of these in my logs now: (20894,3):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #6540104595874881545 : signature = uY^U<F0>oY<D1> (20894,3):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #6540104595874881545 : signature = uY^U<F0>oY<D1> (9524,3):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #6540104595874881545: signature = uY^U<F0>oY<D1> (9524,3):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #6540104595874881545: signature = uY^U<F0>oY<D1> (9512,2):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #6733855010560701275: signature = ( (9512,3):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #6733855010560701275: signature = ( (18128,1):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #6566283579056201760: signature (18128,2):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #6566283579056201760: signature (29922,3):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #2314885530818453536: signature = ray (29922,1):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #2314885530818453536: signature = ray (29425,0):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #2314885530818453536: signature = ray (29425,3):ocfs2_read_locked_inode:459 ERROR: Invalid dinode #2314885530818453536: signature = ray -- Michael S. Moody Sr. Systems Engineer Global Systems Consulting Direct: (650) 265-4154 Web: http://www.GlobalSystemsConsulting.com Engineering Support: support@gsc.cc Billing Support: billing@gsc.cc Customer Support Portal: http://my.gsc.cc NOTICE - This message contains privileged and confidential information intended only for the use of the addressee named above. If you are not the intended recipient of this message, you are hereby notified that you must not disseminate, copy or take any action in reliance on it. If you have received this message in error, please immediately notify Global Systems Consulting, its subsidiaries or associates. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the view of Global Systems Consulting, its subsidiaries and associates.