Karl-Heinz Blenk
2014-May-06 16:02 UTC
[Ocfs-users] poor write performance or locking issues with ocfs2
Hello all, I've got heavy troubles with my ocfs2 environment. Cluster filesystem worked fine for about 3-6 weeks after initial setup, but since 1 week performance issues occurs. I've already searched long time in google and on this mailing list but I wasn't able to found any solution. I've found a lot of posts with "same" problems but without the magic answer :-) First, the environment: - HP 3par SAN, 2 TB LUN (no SAN storage related performance problems - already checked) - qlogic HBA (4 path), round robin with multipath - kernel 3.2.0-4-amd64 - 5 cluster nodes - ocfs2 version 1.6.4 - 480 million inodes in use, iUse% = 92 - OCFS was made with: "mkfs.ocfs2 -b 4k -C 4k -N 8 -L myocfs -T mail --fs-feature-level=max-features --fs-features=indexed-dirs /dev/mapper/myocfs" - OCFS is mounted with: "_netdev,noatime,data=writeback,nouser_xattr"; I also tried "_netdev,noatime,data=writeback,nouser_xattr,commit=60,localalloc=16" which I've found on this great list, but this haven't solved the issues. And also a try without data= und commit=... - Apache 2 Webserver with PHP on 2 nodes, NGINX and FTP on the other nodes (nginx will only read data, FTP and PHP will write also). I guess the read-rate is about 80%. - The filesystem was online extended 2 times after initial setup. - sysctl.conf parameters are set (for the webserver): -- net.ipv4.ip_nonlocal_bind=1 net.ipv4.tcp_fin_timeout=10 net.ipv4.ip_local_port_range=1024 65535 vm.swappiness=10 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 -- Now, the problem: The cluster runs well, but some times a day the systemload grows up from ~0-1 to 40, 500, 2000! CPU is fine, no problems. RAM is free, no problems. "ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep D" shows me some apache processes with a state of "D", but with no "WIDE-WCHAN-COLUMN" filled. Here's an example output: -- 3176 D< o2hb-6F81EC9057 - 3392 D jbd2/dm-1-41 - 3393 D ocfs2cmt - 17221 D apache2 - 18424 D kworker/8:3 - 18453 D apache2 - ... --- Some output of /proc/pid/stack: -- [<ffffffff81051d5f>] process_timeout+0x0/0x5 [<ffffffff810528be>] msleep_interruptible+0x1a/0x37 [<ffffffffa0311903>] o2hb_thread+0x17f/0x2df [ocfs2_nodemanager] [<ffffffffa0311784>] o2hb_thread+0x0/0x2df [ocfs2_nodemanager] [<ffffffff8105f681>] kthread+0x76/0x7e [<ffffffff81356ef4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105f60b>] kthread+0x0/0x7e [<ffffffff81356ef0>] kernel_thread_helper+0x0/0x10 [<ffffffffffffffff>] 0xffffffffffffffff [<ffffffffa015f007>] jbd2_journal_commit_transaction+0x1a6/0x10bf [jbd2] [<ffffffff8100d02f>] load_TLS+0x7/0xa [<ffffffff8100d69e>] __switch_to+0x133/0x258 [<ffffffff81039ac2>] finish_task_switch+0x88/0xb9 [<ffffffff81071011>] arch_local_irq_save+0x11/0x17 [<ffffffff8105fcd3>] autoremove_wake_function+0x0/0x2a [<ffffffffa0163156>] kjournald2+0xc0/0x20a [jbd2] [<ffffffff8105fcd3>] autoremove_wake_function+0x0/0x2a [<ffffffffa0163096>] kjournald2+0x0/0x20a [jbd2] [<ffffffff8105f681>] kthread+0x76/0x7e [<ffffffff81356ef4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105f60b>] kthread+0x0/0x7e [<ffffffff81356ef0>] kernel_thread_helper+0x0/0x10 [<ffffffffffffffff>] 0xffffffffffffffff root at server:~# cat /proc/3393/stack [<ffffffff8100d02f>] load_TLS+0x7/0xa [<ffffffff811b42e3>] call_rwsem_down_write_failed+0x13/0x20 [<ffffffffa0546841>] ocfs2_commit_thread+0xf1/0x3a5 [ocfs2] [<ffffffff8105fcd3>] autoremove_wake_function+0x0/0x2a [<ffffffffa0546750>] ocfs2_commit_thread+0x0/0x3a5 [ocfs2] [<ffffffff8105f681>] kthread+0x76/0x7e [<ffffffff81356ef4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105f60b>] kthread+0x0/0x7e [<ffffffff81356ef0>] kernel_thread_helper+0x0/0x10 [<ffffffffffffffff>] 0xffffffffffffffff -- When the load grow the process will remain. When I run "iotop -o" I see one apache process with an IO ~99%. DISK Write is 0 K/s, DISK Read is ~100 - 200 K/s. When checking lsof with the process ID, I get following (example): -- apache2 64039 www-data 6w FIFO 0,8 0t0 60701189 pipe apache2 64039 www-data 7u REG 254,3 0 1308167 /tmp/ZCUDX8xreu (deleted) apache2 64039 www-data 8u 0000 0,9 0 3545 anon_inode apache2 64039 www-data 9u IPv6 62815839 0t0 TCP *** (CLOSE_WAIT) apache2 64039 www-data 10u IPv4 62820353 0t0 TCP *** (CLOSE_WAIT) apache2 64039 www-data 11u IPv4 62820355 0t0 TCP *** (ESTABLISHED) apache2 64039 www-data 12u IPv4 62810900 0t0 TCP *** (ESTABLISHED) apache2 64039 www-data 13r REG 254,3 1679422 1308180 /tmp/phprWvjyj apache2 64039 www-data 14w REG 254,1 315392 517499656 /var/www/myocfs/images/original/0a6adf0421891131f30120d4235fbb08.jpg -- The last line differ. Always type "w", but not always the same path. When this problem occure, all other server grows in load (many visitors...), but all webserver processes are in "D" state. No IO seen with iotop on the other nodes. In the shell, I can run "ls /var/www/myocfs/images/original/0a6adf0421891131f30120d4235fbb08.jpg" fine on the "active" server. The command on the other nodes will hang / wait. Ls of another file in another path will work somtimes... After some minutes (3 - 15) all nodes can access the file, load is getting back, IO is normal again. But this isn't reproducable. debugfs.ocfs2 with fs_lock won't work regarding an mismatch between kernel and ocft-tools (I think). debugfs.ocfs2 -R "fs_locks" /dev/mapper/myocfs --> Debug string proto 3 found, but 2 is the highest I understand. - Why does this problem occure only since ~1 week? Why run the ocfs fine weeks before? (I've also rebootet the server ...) - What's the issue? Locking? How can I solve this? Or is the IO to slow? - What causes the issue? Write? Delete? Why will some writes work well? Can anybody help me with this issue? I've no further ideas :-( Kind regards, Karl-Heinz