Hello again, We have hit some performance problem today in one of our clusters. The performance suddenly drop from the normal performance (about 30Mbytes/s), read/write, to a few Kbytes/s (about 200Kbytes/s), read only, for a while, and as sudden as it started, it backs to the normal read/write performance, cycling randomly. When the "read only" occurs on one node, the other shows only the heartbeat activity (about 2 io's per 2 seconds) until the first back to normal and vice-versa. The servers are running e-mail application (IMAP/POP/SMTP -- Maildir format) with more than 20.000 user, so they are constantly creating, removing and moving files. Dumping the processes in D state when the server is in "constant few kbytes read only state", they look like: node#0: 10739 D imapd ocfs2_lookup_lock_orphan_dir 11658 D imapd ocfs2_reserve_suballoc_bits 12326 D imapd ocfs2_lookup_lock_orphan_dir 12330 D pop3d lock_rename 12351 D imapd ocfs2_lookup_lock_orphan_dir 12357 D imapd ocfs2_lookup_lock_orphan_dir 12359 D imapd unlinkat 12381 D imapd ocfs2_lookup_lock_orphan_dir 12498 D deliverquota ocfs2_wait_for_mask 12710 D pop3d ocfs2_reserve_suballoc_bits 12712 D imapd unlinkat 12726 D imapd ocfs2_reserve_suballoc_bits 12730 D imapd unlinkat 12736 D imapd ocfs2_reserve_suballoc_bits 12738 D imapd unlinkat 12749 D pop3d lock_rename 12891 D pop3d ocfs2_reserve_suballoc_bits 12971 D pop3d mutex_fastpath_lock_retval 12985 D pop3d lock_rename 13006 D deliverquota ocfs2_reserve_suballoc_bits 13061 D pop3d lock_rename 13117 D pop3d lock_rename [-- suppressed --] 100+ processes in D state node#1: 24428 D deliverquota ocfs2_wait_for_mask Some stacktraces from the processes: Call Trace: [<ffffffff81437e31>] __mutex_lock_common+0x12f/0x1a1 [<ffffffff81437ef2>] __mutex_lock_slowpath+0x19/0x1b [<ffffffff81437f5b>] mutex_lock+0x23/0x3a [<ffffffffa065ba1f>] ocfs2_lookup_lock_orphan_dir+0xb8/0x18a [ocfs2] [<ffffffffa065c7d5>] ocfs2_prepare_orphan_dir+0x3f/0x229 [ocfs2] [<ffffffffa0660bab>] ocfs2_unlink+0x523/0xa81 [ocfs2] [<ffffffff810425b3>] ? need_resched+0x23/0x2d [<ffffffff810425cb>] ? should_resched+0xe/0x2f [<ffffffff810425cb>] ? should_resched+0xe/0x2f [<ffffffff8116324d>] ? dquot_initialize+0x126/0x13d [<ffffffff810425b3>] ? need_resched+0x23/0x2d [<ffffffff81122c0c>] vfs_unlink+0x82/0xd1 [<ffffffff81124bcc>] do_unlinkat+0xc6/0x178 [<ffffffff8112186b>] ? path_put+0x22/0x27 [<ffffffff810a7d03>] ? audit_syscall_entry+0x103/0x12f [<ffffffff81124c94>] sys_unlink+0x16/0x18 [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b Call Trace: [<ffffffff81437e31>] __mutex_lock_common+0x12f/0x1a1 [<ffffffffa0633682>] ? ocfs2_match+0x2c/0x3a [ocfs2] [<ffffffff81437ef2>] __mutex_lock_slowpath+0x19/0x1b [<ffffffff81437f5b>] mutex_lock+0x23/0x3a [<ffffffffa0676a82>] ocfs2_reserve_suballoc_bits+0x11a/0x499 [ocfs2] [<ffffffffa0678b4c>] ocfs2_reserve_new_inode+0x134/0x37a [ocfs2] [<ffffffffa065d409>] ocfs2_mknod+0x2d4/0xf26 [ocfs2] [<ffffffffa063d02c>] ? ocfs2_should_refresh_lock_res+0x8f/0x1ad [ocfs2] [<ffffffffa0653cf6>] ? ocfs2_wait_for_recovery+0x1a/0x8f [ocfs2] [<ffffffff81437f4e>] ? mutex_lock+0x16/0x3a [<ffffffffa065e0fd>] ocfs2_create+0xa2/0x10a [ocfs2] [<ffffffff8112268f>] vfs_create+0x7e/0x9d [<ffffffff81125794>] do_filp_open+0x302/0x92d [<ffffffff810425cb>] ? should_resched+0xe/0x2f [<ffffffff81437731>] ? _cond_resched+0xe/0x22 [<ffffffff81238109>] ? might_fault+0xe/0x10 [<ffffffff812381f3>] ? __strncpy_from_user+0x20/0x4a [<ffffffff81114bc8>] do_sys_open+0x62/0x109 [<ffffffff81114ca2>] sys_open+0x20/0x22 [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b Checking the bz, these two bugs seems to have similar behavior: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1281 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1300 on the mailing list archive, this thread also shows similar behavior: http://www.mail-archive.com/ocfs2-users at oss.oracle.com/msg02509.html The cluster is formed by two Dell PE 1950 with 8G ram, attached via 2Gbit FC to a Dell EMC AX/100 storage. The network between them is running at 1Gbit. Using CenOS 5.5, OCFS2 1.6.4 and ULEK 2.6.32-100.0.19.el5. Tests so far: * We have changed mount option data from ordered to writeback -- no success; * We have added mount option localalloc=16 -- no success; * We have turned off group and user quota support -- no success; * Rebooted the servers (to test with everything fresh) -- no success; * Mounted the filesystem only in one node -- success; The problem does not show when you mount the filesystem on only one node, so we are currently working around by exporting the filesystem via NFS, leading me to conclude that the lock is inside the cluster stack (dlm or something). We have checked logs, debugs, traces trying to pinpoint the problem, but with no success. Any clue on how to debug further or if it is the same problem as the ones on the cited bug reports? The node#0 has heavier I/O load than node#1, could it trigger something? The filesystem is about 94% full (751G of 803G). Thanks! Regards, -- .:''''':. .:' ` S?rgio Surkamp | Gerente de Rede :: ........ sergio at gruposinternet.com.br `:. .:' `:, ,.:' *Grupos Internet S.A.* `: :' R. Lauro Linhares, 2123 Torre B - Sala 201 : : Trindade - Florian?polis - SC :.' :: +55 48 3234-4109 : ' http://www.gruposinternet.com.br