thr3ads.net - Ocfs users - [Ocfs-users] poor write performance or locking issues with ocfs2 [May 2014]

If this information is useful, please help other people find it:
Share via:
Karl-Heinz Blenk
2014-May-06 16:02 UTC
[Ocfs-users] poor write performance or locking issues with ocfs2

Hello all,

I've got heavy troubles with my ocfs2 environment. Cluster filesystem worked
fine for about 3-6 weeks after initial setup, but since 1 week performance
issues occurs. I've already searched long time in google and on this mailing
list but I wasn't able to found any solution. I've found a lot of posts
with "same" problems but without the magic answer  :-)

First, the environment:
- HP 3par SAN, 2 TB LUN (no SAN storage related performance problems - already
checked)
- qlogic HBA (4 path), round robin with multipath
- kernel 3.2.0-4-amd64
- 5 cluster nodes
- ocfs2 version 1.6.4
- 480 million inodes in use, iUse% = 92
- OCFS was made with: "mkfs.ocfs2 -b 4k -C 4k -N 8 -L myocfs -T mail
--fs-feature-level=max-features --fs-features=indexed-dirs
/dev/mapper/myocfs"
- OCFS is mounted with: "_netdev,noatime,data=writeback,nouser_xattr";
I also tried
"_netdev,noatime,data=writeback,nouser_xattr,commit=60,localalloc=16"
which I've found on this great list, but this haven't solved the issues.
And also a try without data= und commit=...
- Apache 2 Webserver with PHP on 2 nodes, NGINX and FTP on the other nodes
(nginx will only read data, FTP and PHP will write also). I guess the read-rate
is about 80%.
- The filesystem was online extended 2 times after initial setup.
- sysctl.conf parameters are set (for the webserver):
--
net.ipv4.ip_nonlocal_bind=1
net.ipv4.tcp_fin_timeout=10
net.ipv4.ip_local_port_range=1024 65535
vm.swappiness=10
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
--

Now, the problem:
The cluster runs well, but some times a day the systemload grows up from ~0-1 to
40, 500, 2000! CPU is fine, no problems. RAM is free, no problems.
"ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep D" shows me
some apache processes with a state of "D", but with no
"WIDE-WCHAN-COLUMN" filled. Here's an example output:
--
3176 D<   o2hb-6F81EC9057 -
 3392 D    jbd2/dm-1-41    -
 3393 D    ocfs2cmt        -
17221 D    apache2         -
18424 D    kworker/8:3     -
18453 D    apache2         -
...
---

Some output of /proc/pid/stack:
--
[<ffffffff81051d5f>] process_timeout+0x0/0x5
[<ffffffff810528be>] msleep_interruptible+0x1a/0x37
[<ffffffffa0311903>] o2hb_thread+0x17f/0x2df [ocfs2_nodemanager]
[<ffffffffa0311784>] o2hb_thread+0x0/0x2df [ocfs2_nodemanager]
[<ffffffff8105f681>] kthread+0x76/0x7e
[<ffffffff81356ef4>] kernel_thread_helper+0x4/0x10
[<ffffffff8105f60b>] kthread+0x0/0x7e
[<ffffffff81356ef0>] kernel_thread_helper+0x0/0x10
[<ffffffffffffffff>] 0xffffffffffffffff


[<ffffffffa015f007>] jbd2_journal_commit_transaction+0x1a6/0x10bf [jbd2]
[<ffffffff8100d02f>] load_TLS+0x7/0xa
[<ffffffff8100d69e>] __switch_to+0x133/0x258
[<ffffffff81039ac2>] finish_task_switch+0x88/0xb9
[<ffffffff81071011>] arch_local_irq_save+0x11/0x17
[<ffffffff8105fcd3>] autoremove_wake_function+0x0/0x2a
[<ffffffffa0163156>] kjournald2+0xc0/0x20a [jbd2]
[<ffffffff8105fcd3>] autoremove_wake_function+0x0/0x2a
[<ffffffffa0163096>] kjournald2+0x0/0x20a [jbd2]
[<ffffffff8105f681>] kthread+0x76/0x7e
[<ffffffff81356ef4>] kernel_thread_helper+0x4/0x10
[<ffffffff8105f60b>] kthread+0x0/0x7e
[<ffffffff81356ef0>] kernel_thread_helper+0x0/0x10
[<ffffffffffffffff>] 0xffffffffffffffff



root at server:~# cat /proc/3393/stack
[<ffffffff8100d02f>] load_TLS+0x7/0xa
[<ffffffff811b42e3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffffa0546841>] ocfs2_commit_thread+0xf1/0x3a5 [ocfs2]
[<ffffffff8105fcd3>] autoremove_wake_function+0x0/0x2a
[<ffffffffa0546750>] ocfs2_commit_thread+0x0/0x3a5 [ocfs2]
[<ffffffff8105f681>] kthread+0x76/0x7e
[<ffffffff81356ef4>] kernel_thread_helper+0x4/0x10
[<ffffffff8105f60b>] kthread+0x0/0x7e
[<ffffffff81356ef0>] kernel_thread_helper+0x0/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
--

When the load grow the process will remain. When I run "iotop -o" I
see one apache process with an IO ~99%. DISK Write is 0 K/s, DISK Read is ~100 -
200 K/s. When checking lsof with the process ID, I get following (example):
--
apache2   64039         www-data    6w     FIFO                0,8      0t0  
60701189 pipe
apache2   64039         www-data    7u      REG              254,3        0   
1308167 /tmp/ZCUDX8xreu (deleted)
apache2   64039         www-data    8u     0000                0,9        0     
3545 anon_inode
apache2   64039         www-data    9u     IPv6           62815839      0t0     
TCP *** (CLOSE_WAIT)
apache2   64039         www-data   10u     IPv4           62820353      0t0     
TCP *** (CLOSE_WAIT)
apache2   64039         www-data   11u     IPv4           62820355      0t0     
TCP *** (ESTABLISHED)
apache2   64039         www-data   12u     IPv4           62810900      0t0     
TCP *** (ESTABLISHED)
apache2   64039         www-data   13r      REG              254,3  1679422   
1308180 /tmp/phprWvjyj
apache2   64039         www-data   14w      REG              254,1   315392 
517499656 /var/www/myocfs/images/original/0a6adf0421891131f30120d4235fbb08.jpg
--
The last line differ. Always type "w", but not always the same path.
When this problem occure, all other server grows in load (many visitors...), but
all webserver processes are in "D" state. No IO seen with iotop on the
other nodes. In the shell, I can run "ls
/var/www/myocfs/images/original/0a6adf0421891131f30120d4235fbb08.jpg" fine
on the "active" server. The command on the other nodes will hang /
wait. Ls of another file in another path will work somtimes... After some
minutes (3 - 15) all nodes can access the file, load is getting back, IO is
normal again. But this isn't reproducable.


debugfs.ocfs2 with fs_lock won't work regarding an mismatch between kernel
and ocft-tools (I think).
debugfs.ocfs2 -R "fs_locks" /dev/mapper/myocfs
--> Debug string proto 3 found, but 2 is the highest I understand.

- Why does this problem occure only since ~1 week? Why run the ocfs fine weeks
before? (I've also rebootet the server ...)
- What's the issue? Locking? How can I solve this? Or is the IO to slow?
- What causes the issue? Write? Delete? Why will some writes work well?


Can anybody help me with this issue? I've no further ideas :-(

Kind regards,
Karl-Heinz
Apparently Analagous Threads

Search for more possibly parallel threads
Ocfs users - May 2014 - poor write performance or locking issues with ocfs2

[Ocfs-users] poor write performance or locking issues with ocfs2

Apparently Analagous Threads

Wisdom of the Ancients