Dear all, Not realy sure whether this is bug or not, but we found that sometimes OCFS2 on our system do journaling a lot. (Please see screen shot below) As you can see, the IO was jumped from 111 w/s to 11,960 w/s , IO utilize jumped from 1.5% to 97% , %iowait jumped from 0.25% to 10.94%. It will not come into question, if this happen in short time period (eg. 1-2 sec.) , but sometimes we found this persist for half an hour or even more. Is this normal for OCFS2 or not ? How could I tune OCFS2 to avoid the above situation. We also suspect the relation of this symptom with another problem similar to http://oss.oracle.com/pipermail/ocfs2-users/2009-January/003250.html , because we found OCFS2 file system pause for a long period of time too ( 5-20 Sec. on open for write). Those scattered happen all day with more frequence during highload time. (Screen shot below) Our configuration is 3 web servers connect to HP SAN Storage using OCFS2 1.4.7. Amongst those 3 servers , there are 2 for reading data and 1 for writing data (traffic seperate at proxy layer) . The symptom above happen on data writing server. Thank you. Wanchat Padungrat PS. We keep our service by reinstall the whole data every month, without doing whole data copy (consumed 12 hours), the symptom getting worse an worse eg. journaling persist longer , file system pause longer (sometimes more than a minute !!) ... which affect the whole system , such as, too many waiting apache process, lavg going skyrocket(sometimes 400+) and cannot recover itself etc. . Screen shot ----------------- ----------------------------------------------------------------------------- Here is iostat+top during journaling process running. ----------------------------------------------------------------------------- avg-cpu: %user %nice %system %iowait %steal %idle 1.88 0.00 5.38 10.94 0.00 81.81 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 5156.00 5.00 6804.50 19.00 47892.25 14.07 9.76 1.43 0.14 97.55 sdc1 0.00 5156.00 5.00 6804.50 19.00 47892.25 14.07 9.76 1.43 0.14 97.60 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 5.00 11960.00 19.00 47838.25 8.00 13.65 1.14 0.08 97.25 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 5.00 11960.00 19.00 47838.25 8.00 13.66 1.14 0.08 97.30 ------------------------------------------------------------------------------------------- Tasks: 215 total, 2 running, 213 sleeping, 0 stopped, 0 zombie Cpu(s): 2.5%us, 3.6%sy, 0.0%ni, 81.9%id, 10.9%wa, 0.0%hi, 1.1%si, 0.0%st Mem: 12298636k total, 11783892k used, 514744k free, 1679156k buffers Swap: 2048276k total, 396k used, 2047880k free, 6149700k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3416 root 10 -5 0 0 0 S 5.7 0.0 473:10.53 o2net 10034 httpd 16 0 192m 18m 2476 D 3.7 0.2 0:43.44 httpd 16119 httpd 16 0 190m 16m 2200 S 3.7 0.1 0:25.65 httpd 6660 root 10 -5 0 0 0 D 3.3 0.0 39:15.14 ocfs2cmt <--- Journaling 20840 httpd 16 0 191m 16m 2168 D 3.3 0.1 0:14.99 httpd 6659 root 10 -5 0 0 0 S 3.0 0.0 34:32.57 kjournald <--- Journaling 14282 httpd 16 0 190m 16m 2240 D 2.7 0.1 0:29.31 httpd 6656 root 11 -5 0 0 0 S 1.7 0.0 79:21.05 dlm_thread 17159 httpd 16 0 190m 16m 2240 D 1.7 0.1 0:24.06 httpd 14718 httpd 16 0 192m 18m 2408 D 1.3 0.2 0:28.30 httpd 6655 root 10 -5 0 0 0 S 1.0 0.0 12:02.41 ocfs2dc 10041 httpd 16 0 192m 18m 2252 R 1.0 0.2 0:41.68 httpd 14227 httpd 16 0 191m 17m 2248 D 1.0 0.1 0:29.96 httpd 14246 httpd 16 0 194m 20m 2236 D 1.0 0.2 0:30.14 httpd 19852 httpd 16 0 191m 16m 2160 D 1.0 0.1 0:16.72 httpd 25976 httpd 16 0 191m 16m 2148 D 1.0 0.1 0:05.45 httpd 6658 root 11 -5 0 0 0 S 0.7 0.0 86:50.46 dlm_wq 7387 httpd 16 0 201m 27m 2284 D 0.7 0.2 0:49.53 httpd 9843 httpd 16 0 191m 17m 2288 D 0.7 0.1 0:40.68 httpd ------------------------------------------------------------------------------------------- Here is iostat+top during journaling process sleep (normal run). ------------------------------------------------------------------------------------------- avg-cpu: %user %nice %system %iowait %steal %idle 3.00 0.00 2.31 0.25 0.00 94.44 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 26.00 0.00 10.50 0.00 146.00 27.81 0.13 12.05 1.38 1.45 sda1 0.00 26.00 0.00 10.50 0.00 146.00 27.81 0.13 12.05 1.38 1.45 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 80.50 3.00 31.00 11.00 444.25 26.78 0.05 1.41 0.46 1.55 sdc1 0.00 80.50 3.00 31.00 11.00 444.25 26.78 0.05 1.41 0.46 1.55 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 3.00 111.50 11.00 444.25 7.95 0.13 1.10 0.14 1.55 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 3.00 111.50 11.00 444.25 7.95 0.13 1.11 0.14 1.55 ------------------------------------------------------------------------------------------- top - 15:44:49 up 12 days, 2:21, 1 user, load average: 0.72, 9.27, 15.66 Tasks: 215 total, 1 running, 214 sleeping, 0 stopped, 0 zombie Cpu(s): 2.2%us, 1.0%sy, 0.0%ni, 96.0%id, 0.2%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 12298636k total, 11832348k used, 466288k free, 1688052k buffers Swap: 2048276k total, 396k used, 2047880k free, 6215116k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26088 httpd 16 0 189m 15m 2152 S 4.0 0.1 0:11.31 httpd 3416 root 11 -5 0 0 0 S 3.3 0.0 474:07.19 o2net 29193 httpd 15 0 191m 17m 2152 S 3.0 0.1 0:05.16 httpd 9842 httpd 15 0 191m 17m 2252 S 2.7 0.1 0:47.80 httpd 14329 httpd 15 0 191m 16m 2240 S 2.7 0.1 0:35.21 httpd 13245 httpd 15 0 190m 16m 2272 S 1.7 0.1 0:37.53 httpd 14719 httpd 16 0 190m 16m 2240 S 1.3 0.1 0:31.97 httpd 6658 root 11 -5 0 0 0 D 1.0 0.0 86:59.26 dlm_wq 10039 httpd 16 0 190m 16m 2328 S 0.7 0.1 0:44.13 httpd 10041 httpd 16 0 191m 17m 2252 S 0.7 0.1 0:46.10 httpd 16120 httpd 15 0 190m 16m 2240 S 0.7 0.1 0:36.17 httpd 6656 root 10 -5 0 0 0 D 0.3 0.0 79:35.89 dlm_thread 9844 httpd 15 0 188m 14m 2260 S 0.3 0.1 0:47.96 httpd 9845 httpd 15 0 191m 17m 2292 S 0.3 0.1 0:46.30 httpd 14227 httpd 15 0 192m 18m 2248 S 0.3 0.2 0:35.18 httpd 14232 httpd 15 0 190m 16m 2244 S 0.3 0.1 0:33.24 httpd 14247 httpd 15 0 190m 16m 2164 S 0.3 0.1 0:35.52 httpd 14281 httpd 15 0 189m 15m 2272 S 0.3 0.1 0:36.51 httpd 14327 httpd 15 0 192m 18m 2196 S 0.3 0.2 0:35.16 httpd 14331 httpd 15 0 190m 16m 2240 S 0.3 0.1 0:33.99 httpd ------------------------------------------------------------------------------------------- Lag on open file for write shown by strace simple cp command, the symptom appear at bottom of the list ------------------------------------------------------------------------------------------ [root at cafe4 wanchat-san1]# strace -T cp source_file destiantion_file execve("/bin/cp", ["cp", "source_file", "destiantion_file"], [/* 22 vars */]) = 0 <0.000210> brk(0) = 0x2347000 <0.000006> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x2b6d685af000 <0.000006> uname({sys="Linux", node="cafe4.pantip.com", ...}) = 0 <0.000006> access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000008> open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000009> fstat(3, {st_mode=S_IFREG|0644, st_size=63180, ...}) = 0 <0.000006> mmap(NULL, 63180, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6d685b0000 <0.000010> close(3) = 0 <0.000007> open("/lib64/libacl.so.1", O_RDONLY) = 3 <0.000013> read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\31`\0379\0\0\0"..., 832) 832 <0.000008> fstat(3, {st_mode=S_IFREG|0755, st_size=28008, ...}) = 0 <0.000020> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x2b6d685c0000 <0.000008> mmap(0x391f600000, 2120992, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391f600000 <0.000010> mprotect(0x391f606000, 2093056, PROT_NONE) = 0 <0.000012> mmap(0x391f805000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x391f805000 <0.000007> close(3) = 0 <0.000005> open("/lib64/libselinux.so.1", O_RDONLY) = 3 <0.000007> read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`E\240\0369\0\0\0"..., 832) 832 <0.000006> fstat(3, {st_mode=S_IFREG|0755, st_size=95464, ...}) = 0 <0.000005> mmap(0x391ea00000, 2192784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391ea00000 <0.000007> mprotect(0x391ea15000, 2097152, PROT_NONE) = 0 <0.000007> mmap(0x391ec15000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x391ec15000 <0.000011> mmap(0x391ec17000, 1424, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391ec17000 <0.000010> close(3) = 0 <0.000007> open("/lib64/libattr.so.1", O_RDONLY) = 3 <0.000012> read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\17 \0369\0\0\0"..., 832) = 832 <0.000007> fstat(3, {st_mode=S_IFREG|0755, st_size=17888, ...}) = 0 <0.000005> mmap(0x391e200000, 2110728, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391e200000 <0.000007> mprotect(0x391e204000, 2093056, PROT_NONE) = 0 <0.000011> mmap(0x391e403000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x391e403000 <0.000012> close(3) = 0 <0.000007> open("/lib64/libc.so.6", O_RDONLY) = 3 <0.000012> read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332!\0359\0\0\0"..., 832) = 832 <0.000009> fstat(3, {st_mode=S_IFREG|0755, st_size=1717800, ...}) = 0 <0.000005> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x2b6d685c1000 <0.000009> mmap(0x391d200000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391d200000 <0.000009> mprotect(0x391d34d000, 2097152, PROT_NONE) = 0 <0.000009> mmap(0x391d54d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x391d54d000 <0.000011> mmap(0x391d552000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391d552000 <0.000014> close(3) = 0 <0.000005> open("/lib64/libdl.so.2", O_RDONLY) = 3 <0.000012> read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16`\0359\0\0\0"..., 832) = 832 <0.000008> fstat(3, {st_mode=S_IFREG|0755, st_size=23360, ...}) = 0 <0.000007> mmap(0x391d600000, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391d600000 <0.000009> mprotect(0x391d602000, 2097152, PROT_NONE) = 0 <0.000007> mmap(0x391d802000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x391d802000 <0.000007> close(3) = 0 <0.000005> open("/lib64/libsepol.so.1", O_RDONLY) = 3 <0.000009> read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0=`\0369\0\0\0"..., 832) = 832 <0.000006> fstat(3, {st_mode=S_IFREG|0755, st_size=247496, ...}) = 0 <0.000005> mmap(0x391e600000, 2383136, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391e600000 <0.000006> mprotect(0x391e63b000, 2097152, PROT_NONE) = 0 <0.000007> mmap(0x391e83b000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3b000) = 0x391e83b000 <0.000008> mmap(0x391e83c000, 40224, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391e83c000 <0.000007> close(3) = 0 <0.000005> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x2b6d685c2000 <0.000005> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x2b6d685c3000 <0.000005> arch_prctl(ARCH_SET_FS, 0x2b6d685c2f90) = 0 <0.000005> mprotect(0x391d54d000, 16384, PROT_READ) = 0 <0.000007> mprotect(0x391d802000, 4096, PROT_READ) = 0 <0.000006> mprotect(0x391d01b000, 4096, PROT_READ) = 0 <0.000005> munmap(0x2b6d685b0000, 63180) = 0 <0.000010> access("/etc/selinux/", F_OK) = 0 <0.000007> brk(0) = 0x2347000 <0.000005> brk(0x2368000) = 0x2368000 <0.000005> open("/etc/selinux/config", O_RDONLY) = 3 <0.000013> fstat(3, {st_mode=S_IFREG|0644, st_size=511, ...}) = 0 <0.000005> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x2b6d685b0000 <0.000005> read(3, "# This file controls the state o"..., 4096) = 511 <0.000009> read(3, "", 4096) = 0 <0.000006> close(3) = 0 <0.000006> munmap(0x2b6d685b0000, 4096) = 0 <0.000008> open("/proc/mounts", O_RDONLY) = 3 <0.000033> fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000005> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 0x2b6d685b0000 <0.000005> read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 667 <0.000041> read(3, "", 4096) = 0 <0.000005> close(3) = 0 <0.000006> munmap(0x2b6d685b0000, 4096) = 0 <0.000007> open("/proc/filesystems", O_RDONLY) = 3 <0.000011> read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tb"..., 4095) = 354 <0.000014> close(3) = 0 <0.000006> open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 <0.000010> fstat(3, {st_mode=S_IFREG|0644, st_size=56471264, ...}) = 0 <0.000005> mmap(NULL, 56471264, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6d685c4000 <0.000006> close(3) = 0 <0.000004> geteuid() = 0 <0.000006> stat("destiantion_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000013> stat("source_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000009> stat("destiantion_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000008> open("source_file", O_RDONLY) = 3 <0.000022> fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000005> open("destiantion_file", O_WRONLY|O_TRUNC) = 4 <8.159196> <------ 8 Sec ! During normal condition 0.000034 fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000008> fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000007> read(3, "Linux 2.6.18-194.3.1.el5 (cafe4."..., 4096) = 1307 <0.000012> write(4, "Linux 2.6.18-194.3.1.el5 (cafe4."..., 1307) = 1307 <0.003315> read(3, "", 4096) = 0 <0.000010> close(4) = 0 <0.000006> close(3) = 0 <0.000007> close(1) = 0 <0.000005> exit_group(0) = ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100728/39c0afcd/attachment-0001.html
Have you tried mounting with data=writeback ? On Jul 27, 2010, at 9:31 PM, wanchat padungrat <wanchat.p at pantip.com> wrote:> Dear all, > > Not realy sure whether this is bug or not, but we found that sometimes OCFS2 on our system do journaling a lot. > > (Please see screen shot below) > > As you can see, the IO was jumped from 111 w/s to 11,960 w/s , IO utilize jumped from 1.5% to 97% , %iowait jumped from 0.25% to 10.94%. > > It will not come into question, if this happen in short time period (eg. 1-2 sec.) , but sometimes we found this persist for half an hour or even more. > > Is this normal for OCFS2 or not ? How could I tune OCFS2 to avoid the above situation. > > We also suspect the relation of this symptom with another problem similar to http://oss.oracle.com/pipermail/ocfs2-users/2009-January/003250.html , because we found OCFS2 file system pause for a long period of time too ( 5-20 Sec. on open for write). Those scattered happen all day with more frequence during highload time. (Screen shot below) > > Our configuration is 3 web servers connect to HP SAN Storage using OCFS2 1.4.7. Amongst those 3 servers , there are 2 for reading data and 1 for writing data (traffic seperate at proxy layer) . The symptom above happen on data writing server. > > Thank you. > > Wanchat Padungrat > > PS. We keep our service by reinstall the whole data every month, without doing whole data copy (consumed 12 hours), the symptom getting worse an worse eg. journaling persist longer , file system pause longer (sometimes more than a minute !!) ... which affect the whole system , such as, too many waiting apache process, lavg going skyrocket(sometimes 400+) and cannot recover itself etc. . > > ? > > Screen shot > > ----------------- > > ----------------------------------------------------------------------------- > > Here is iostat+top during journaling process running. > > ----------------------------------------------------------------------------- > > avg-cpu: %user %nice %system %iowait %steal %idle > > 1.88 0.00 5.38 10.94 0.00 81.81 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > > sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdc 0.00 5156.00 5.00 6804.50 19.00 47892.25 14.07 9.76 1.43 0.14 97.55 > > sdc1 0.00 5156.00 5.00 6804.50 19.00 47892.25 14.07 9.76 1.43 0.14 97.60 > > sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sde1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-1 0.00 0.00 5.00 11960.00 19.00 47838.25 8.00 13.65 1.14 0.08 97.25 > > dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-3 0.00 0.00 5.00 11960.00 19.00 47838.25 8.00 13.66 1.14 0.08 97.30 > > ------------------------------------------------------------------------------------------- > > Tasks: 215 total, 2 running, 213 sleeping, 0 stopped, 0 zombie > > Cpu(s): 2.5%us, 3.6%sy, 0.0%ni, 81.9%id, 10.9%wa, 0.0%hi, 1.1%si, 0.0%st > > Mem: 12298636k total, 11783892k used, 514744k free, 1679156k buffers > > Swap: 2048276k total, 396k used, 2047880k free, 6149700k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 3416 root 10 -5 0 0 0 S 5.7 0.0 473:10.53 o2net > > 10034 httpd 16 0 192m 18m 2476 D 3.7 0.2 0:43.44 httpd > > 16119 httpd 16 0 190m 16m 2200 S 3.7 0.1 0:25.65 httpd > > 6660 root 10 -5 0 0 0 D 3.3 0.0 39:15.14 ocfs2cmt <--- Journaling > > 20840 httpd 16 0 191m 16m 2168 D 3.3 0.1 0:14.99 httpd > > 6659 root 10 -5 0 0 0 S 3.0 0.0 34:32.57 kjournald <--- Journaling > > 14282 httpd 16 0 190m 16m 2240 D 2.7 0.1 0:29.31 httpd > > 6656 root 11 -5 0 0 0 S 1.7 0.0 79:21.05 dlm_thread > > 17159 httpd 16 0 190m 16m 2240 D 1.7 0.1 0:24.06 httpd > > 14718 httpd 16 0 192m 18m 2408 D 1.3 0.2 0:28.30 httpd > > 6655 root 10 -5 0 0 0 S 1.0 0.0 12:02.41 ocfs2dc > > 10041 httpd 16 0 192m 18m 2252 R 1.0 0.2 0:41.68 httpd > > 14227 httpd 16 0 191m 17m 2248 D 1.0 0.1 0:29.96 httpd > > 14246 httpd 16 0 194m 20m 2236 D 1.0 0.2 0:30.14 httpd > > 19852 httpd 16 0 191m 16m 2160 D 1.0 0.1 0:16.72 httpd > > 25976 httpd 16 0 191m 16m 2148 D 1.0 0.1 0:05.45 httpd > > 6658 root 11 -5 0 0 0 S 0.7 0.0 86:50.46 dlm_wq > > 7387 httpd 16 0 201m 27m 2284 D 0.7 0.2 0:49.53 httpd > > 9843 httpd 16 0 191m 17m 2288 D 0.7 0.1 0:40.68 httpd > > ------------------------------------------------------------------------------------------- > > Here is iostat+top during journaling process sleep (normal run). > > ------------------------------------------------------------------------------------------- > > avg-cpu: %user %nice %system %iowait %steal %idle > > 3.00 0.00 2.31 0.25 0.00 94.44 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > > sda 0.00 26.00 0.00 10.50 0.00 146.00 27.81 0.13 12.05 1.38 1.45 > > sda1 0.00 26.00 0.00 10.50 0.00 146.00 27.81 0.13 12.05 1.38 1.45 > > sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdc 0.00 80.50 3.00 31.00 11.00 444.25 26.78 0.05 1.41 0.46 1.55 > > sdc1 0.00 80.50 3.00 31.00 11.00 444.25 26.78 0.05 1.41 0.46 1.55 > > sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sde1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-1 0.00 0.00 3.00 111.50 11.00 444.25 7.95 0.13 1.10 0.14 1.55 > > dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-3 0.00 0.00 3.00 111.50 11.00 444.25 7.95 0.13 1.11 0.14 1.55 > > ------------------------------------------------------------------------------------------- > > top - 15:44:49 up 12 days, 2:21, 1 user, load average: 0.72, 9.27, 15.66 > > Tasks: 215 total, 1 running, 214 sleeping, 0 stopped, 0 zombie > > Cpu(s): 2.2%us, 1.0%sy, 0.0%ni, 96.0%id, 0.2%wa, 0.0%hi, 0.5%si, 0.0%st > > Mem: 12298636k total, 11832348k used, 466288k free, 1688052k buffers > > Swap: 2048276k total, 396k used, 2047880k free, 6215116k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 26088 httpd 16 0 189m 15m 2152 S 4.0 0.1 0:11.31 httpd > > 3416 root 11 -5 0 0 0 S 3.3 0.0 474:07.19 o2net > > 29193 httpd 15 0 191m 17m 2152 S 3.0 0.1 0:05.16 httpd > > 9842 httpd 15 0 191m 17m 2252 S 2.7 0.1 0:47.80 httpd > > 14329 httpd 15 0 191m 16m 2240 S 2.7 0.1 0:35.21 httpd > > 13245 httpd 15 0 190m 16m 2272 S 1.7 0.1 0:37.53 httpd > > 14719 httpd 16 0 190m 16m 2240 S 1.3 0.1 0:31.97 httpd > > 6658 root 11 -5 0 0 0 D 1.0 0.0 86:59.26 dlm_wq > > 10039 httpd 16 0 190m 16m 2328 S 0.7 0.1 0:44.13 httpd > > 10041 httpd 16 0 191m 17m 2252 S 0.7 0.1 0:46.10 httpd > > 16120 httpd 15 0 190m 16m 2240 S 0.7 0.1 0:36.17 httpd > > 6656 root 10 -5 0 0 0 D 0.3 0.0 79:35.89 dlm_thread > > 9844 httpd 15 0 188m 14m 2260 S 0.3 0.1 0:47.96 httpd > > 9845 httpd 15 0 191m 17m 2292 S 0.3 0.1 0:46.30 httpd > > 14227 httpd 15 0 192m 18m 2248 S 0.3 0.2 0:35.18 httpd > > 14232 httpd 15 0 190m 16m 2244 S 0.3 0.1 0:33.24 httpd > > 14247 httpd 15 0 190m 16m 2164 S 0.3 0.1 0:35.52 httpd > > 14281 httpd 15 0 189m 15m 2272 S 0.3 0.1 0:36.51 httpd > > 14327 httpd 15 0 192m 18m 2196 S 0.3 0.2 0:35.16 httpd > > 14331 httpd 15 0 190m 16m 2240 S 0.3 0.1 0:33.99 httpd > > ? > > ------------------------------------------------------------------------------------------- > > Lag on open file for write shown by strace simple cp command, > > the symptom appear at bottom of the list > > ------------------------------------------------------------------------------------------ > > [root at cafe4 wanchat-san1]# strace -T cp source_file destiantion_file > > execve("/bin/cp", ["cp", "source_file", "destiantion_file"], [/* 22 vars */]) = 0 <0.000210> > > brk(0) = 0x2347000 <0.000006> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b6d685af000 <0.000006> > > uname({sys="Linux", node="cafe4.pantip.com", ...}) = 0 <0.000006> > > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000008> > > open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000009> > > fstat(3, {st_mode=S_IFREG|0644, st_size=63180, ...}) = 0 <0.000006> > > mmap(NULL, 63180, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6d685b0000 <0.000010> > > close(3) = 0 <0.000007> > > open("/lib64/libacl.so.1", O_RDONLY) = 3 <0.000013> > > read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\31`\0379\0\0\0"..., 832) = 832 <0.000008> > > fstat(3, {st_mode=S_IFREG|0755, st_size=28008, ...}) = 0 <0.000020> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b6d685c0000 <0.000008> > > mmap(0x391f600000, 2120992, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391f600000 <0.000010> > > mprotect(0x391f606000, 2093056, PROT_NONE) = 0 <0.000012> > > mmap(0x391f805000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x391f805000 <0.000007> > > close(3) = 0 <0.000005> > > open("/lib64/libselinux.so.1", O_RDONLY) = 3 <0.000007> > > read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`E\240\0369\0\0\0"..., 832) = 832 <0.000006> > > fstat(3, {st_mode=S_IFREG|0755, st_size=95464, ...}) = 0 <0.000005> > > mmap(0x391ea00000, 2192784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391ea00000 <0.000007> > > mprotect(0x391ea15000, 2097152, PROT_NONE) = 0 <0.000007> > > mmap(0x391ec15000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x391ec15000 <0.000011> > > mmap(0x391ec17000, 1424, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391ec17000 <0.000010> > > close(3) = 0 <0.000007> > > open("/lib64/libattr.so.1", O_RDONLY) = 3 <0.000012> > > read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\17 \0369\0\0\0"..., 832) = 832 <0.000007> > > fstat(3, {st_mode=S_IFREG|0755, st_size=17888, ...}) = 0 <0.000005> > > mmap(0x391e200000, 2110728, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391e200000 <0.000007> > > mprotect(0x391e204000, 2093056, PROT_NONE) = 0 <0.000011> > > mmap(0x391e403000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x391e403000 <0.000012> > > close(3) = 0 <0.000007> > > open("/lib64/libc.so.6", O_RDONLY) = 3 <0.000012> > > read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332!\0359\0\0\0"..., 832) = 832 <0.000009> > > fstat(3, {st_mode=S_IFREG|0755, st_size=1717800, ...}) = 0 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b6d685c1000 <0.000009> > > mmap(0x391d200000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391d200000 <0.000009> > > mprotect(0x391d34d000, 2097152, PROT_NONE) = 0 <0.000009> > > mmap(0x391d54d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x391d54d000 <0.000011> > > mmap(0x391d552000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391d552000 <0.000014> > > close(3) = 0 <0.000005> > > open("/lib64/libdl.so.2", O_RDONLY) = 3 <0.000012> > > read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16`\0359\0\0\0"..., 832) = 832 <0.000008> > > fstat(3, {st_mode=S_IFREG|0755, st_size=23360, ...}) = 0 <0.000007> > > mmap(0x391d600000, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391d600000 <0.000009> > > mprotect(0x391d602000, 2097152, PROT_NONE) = 0 <0.000007> > > mmap(0x391d802000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x391d802000 <0.000007> > > close(3) = 0 <0.000005> > > open("/lib64/libsepol.so.1", O_RDONLY) = 3 <0.000009> > > read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0=`\0369\0\0\0"..., 832) = 832 <0.000006> > > fstat(3, {st_mode=S_IFREG|0755, st_size=247496, ...}) = 0 <0.000005> > > mmap(0x391e600000, 2383136, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391e600000 <0.000006> > > mprotect(0x391e63b000, 2097152, PROT_NONE) = 0 <0.000007> > > mmap(0x391e83b000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3b000) = 0x391e83b000 <0.000008> > > mmap(0x391e83c000, 40224, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391e83c000 <0.000007> > > close(3) = 0 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b6d685c2000 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b6d685c3000 <0.000005> > > arch_prctl(ARCH_SET_FS, 0x2b6d685c2f90) = 0 <0.000005> > > mprotect(0x391d54d000, 16384, PROT_READ) = 0 <0.000007> > > mprotect(0x391d802000, 4096, PROT_READ) = 0 <0.000006> > > mprotect(0x391d01b000, 4096, PROT_READ) = 0 <0.000005> > > munmap(0x2b6d685b0000, 63180) = 0 <0.000010> > > access("/etc/selinux/", F_OK) = 0 <0.000007> > > brk(0) = 0x2347000 <0.000005> > > brk(0x2368000) = 0x2368000 <0.000005> > > open("/etc/selinux/config", O_RDONLY) = 3 <0.000013> > > fstat(3, {st_mode=S_IFREG|0644, st_size=511, ...}) = 0 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b6d685b0000 <0.000005> > > read(3, "# This file controls the state o"..., 4096) = 511 <0.000009> > > read(3, "", 4096) = 0 <0.000006> > > close(3) = 0 <0.000006> > > munmap(0x2b6d685b0000, 4096) = 0 <0.000008> > > open("/proc/mounts", O_RDONLY) = 3 <0.000033> > > fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b6d685b0000 <0.000005> > > read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 667 <0.000041> > > read(3, "", 4096) = 0 <0.000005> > > close(3) = 0 <0.000006> > > munmap(0x2b6d685b0000, 4096) = 0 <0.000007> > > open("/proc/filesystems", O_RDONLY) = 3 <0.000011> > > read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tb"..., 4095) = 354 <0.000014> > > close(3) = 0 <0.000006> > > open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 <0.000010> > > fstat(3, {st_mode=S_IFREG|0644, st_size=56471264, ...}) = 0 <0.000005> > > mmap(NULL, 56471264, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6d685c4000 <0.000006> > > close(3) = 0 <0.000004> > > geteuid() = 0 <0.000006> > > stat("destiantion_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000013> > > stat("source_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000009> > > stat("destiantion_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000008> > > open("source_file", O_RDONLY) = 3 <0.000022> > > fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000005> > > open("destiantion_file", O_WRONLY|O_TRUNC) = 4 <8.159196> <------ 8 Sec ! During normal condition 0.000034 > > fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000008> > > fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000007> > > read(3, "Linux 2.6.18-194.3.1.el5 (cafe4."..., 4096) = 1307 <0.000012> > > write(4, "Linux 2.6.18-194.3.1.el5 (cafe4."..., 1307) = 1307 <0.003315> > > read(3, "", 4096) = 0 <0.000010> > > close(4) = 0 <0.000006> > > close(3) = 0 <0.000007> > > close(1) = 0 <0.000005> > > exit_group(0) = ? > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100727/d80a6345/attachment-0001.html
Dear Sunil, Thank you for your promptly reply, Yes, we do mount as data=writeback . following is our current mount option : type ocfs2 (rw,_netdev,noatime,commit=15,data=writeback,heartbeat=local) Wanchat P. 2010/7/28 Sunil Mushran <sunil.mushran at oracle.com>> Have you tried mounting with data=writeback ? > > > On Jul 27, 2010, at 9:31 PM, wanchat padungrat <wanchat.p at pantip.com> > wrote: > > Dear all, > > Not realy sure whether this is bug or not, but we found that sometimes > OCFS2 on our system do journaling a lot. > > (Please see screen shot below) > > As you can see, the IO was jumped from 111 w/s to 11,960 w/s , IO utilize > jumped from 1.5% to 97% , %iowait jumped from 0.25% to 10.94%. > > It will not come into question, if this happen in short time period (eg. > 1-2 sec.) , but sometimes we found this persist for half an hour or even > more. > > Is this normal for OCFS2 or not ? How could I tune OCFS2 to avoid the above > situation. > > We also suspect the relation of this symptom with another problem similar > to http://oss.oracle.com/pipermail/ocfs2-users/2009-January/003250.html , > because we found OCFS2 file system pause for a long period of time too ( > 5-20 Sec. on open for write). Those scattered happen all day with more > frequence during highload time. (Screen shot below) > > Our configuration is 3 web servers connect to HP SAN Storage using OCFS2 > 1.4.7. Amongst those 3 servers , there are 2 for reading data and 1 for > writing data (traffic seperate at proxy layer) . The symptom above happen on > data writing server. > > Thank you. > > Wanchat Padungrat > > PS. We keep our service by reinstall the whole data every month, without > doing whole data copy (consumed 12 hours), the symptom getting worse an > worse eg. journaling persist longer , file system pause longer (sometimes > more than a minute !!) ... which affect the whole system , such as, too many > waiting apache process, lavg going skyrocket(sometimes 400+) and cannot > recover itself etc. . > > Screen shot > > ----------------- > > > ----------------------------------------------------------------------------- > > Here is iostat+top during journaling process running. > > > ----------------------------------------------------------------------------- > > avg-cpu: %user %nice %system %iowait %steal %idle > > 1.88 0.00 5.38 10.94 0.00 81.81 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm > %util > > sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdc 0.00 5156.00 5.00 6804.50 19.00 47892.25 14.07 9.76 1.43 0.14 97.55 > > sdc1 0.00 5156.00 5.00 6804.50 19.00 47892.25 14.07 9.76 1.43 0.14 97.60 > > sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sde1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-1 0.00 0.00 5.00 11960.00 19.00 47838.25 8.00 13.65 1.14 0.08 97.25 > > dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-3 0.00 0.00 5.00 11960.00 19.00 47838.25 8.00 13.66 1.14 0.08 97.30 > > > ------------------------------------------------------------------------------------------- > > Tasks: 215 total, 2 running, 213 sleeping, 0 stopped, 0 zombie > > Cpu(s): 2.5%us, 3.6%sy, 0.0%ni, 81.9%id, 10.9%wa, 0.0%hi, 1.1%si, 0.0%st > > Mem: 12298636k total, 11783892k used, 514744k free, 1679156k buffers > > Swap: 2048276k total, 396k used, 2047880k free, 6149700k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 3416 root 10 -5 0 0 0 S 5.7 0.0 473:10.53 o2net > > 10034 httpd 16 0 192m 18m 2476 D 3.7 0.2 0:43.44 httpd > > 16119 httpd 16 0 190m 16m 2200 S 3.7 0.1 0:25.65 httpd > > 6660 root 10 -5 0 0 0 D 3.3 0.0 39:15.14 ocfs2cmt <--- Journaling > > 20840 httpd 16 0 191m 16m 2168 D 3.3 0.1 0:14.99 httpd > > 6659 root 10 -5 0 0 0 S 3.0 0.0 34:32.57 kjournald <--- Journaling > > 14282 httpd 16 0 190m 16m 2240 D 2.7 0.1 0:29.31 httpd > > 6656 root 11 -5 0 0 0 S 1.7 0.0 79:21.05 dlm_thread > > 17159 httpd 16 0 190m 16m 2240 D 1.7 0.1 0:24.06 httpd > > 14718 httpd 16 0 192m 18m 2408 D 1.3 0.2 0:28.30 httpd > > 6655 root 10 -5 0 0 0 S 1.0 0.0 12:02.41 ocfs2dc > > 10041 httpd 16 0 192m 18m 2252 R 1.0 0.2 0:41.68 httpd > > 14227 httpd 16 0 191m 17m 2248 D 1.0 0.1 0:29.96 httpd > > 14246 httpd 16 0 194m 20m 2236 D 1.0 0.2 0:30.14 httpd > > 19852 httpd 16 0 191m 16m 2160 D 1.0 0.1 0:16.72 httpd > > 25976 httpd 16 0 191m 16m 2148 D 1.0 0.1 0:05.45 httpd > > 6658 root 11 -5 0 0 0 S 0.7 0.0 86:50.46 dlm_wq > > 7387 httpd 16 0 201m 27m 2284 D 0.7 0.2 0:49.53 httpd > > 9843 httpd 16 0 191m 17m 2288 D 0.7 0.1 0:40.68 httpd > > > ------------------------------------------------------------------------------------------- > > Here is iostat+top during journaling process sleep (normal run). > > > ------------------------------------------------------------------------------------------- > > avg-cpu: %user %nice %system %iowait %steal %idle > > 3.00 0.00 2.31 0.25 0.00 94.44 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm > %util > > sda 0.00 26.00 0.00 10.50 0.00 146.00 27.81 0.13 12.05 1.38 1.45 > > sda1 0.00 26.00 0.00 10.50 0.00 146.00 27.81 0.13 12.05 1.38 1.45 > > sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdc 0.00 80.50 3.00 31.00 11.00 444.25 26.78 0.05 1.41 0.46 1.55 > > sdc1 0.00 80.50 3.00 31.00 11.00 444.25 26.78 0.05 1.41 0.46 1.55 > > sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sdd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > sde1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-1 0.00 0.00 3.00 111.50 11.00 444.25 7.95 0.13 1.10 0.14 1.55 > > dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > dm-3 0.00 0.00 3.00 111.50 11.00 444.25 7.95 0.13 1.11 0.14 1.55 > > > ------------------------------------------------------------------------------------------- > > top - 15:44:49 up 12 days, 2:21, 1 user, load average: 0.72, 9.27, 15.66 > > Tasks: 215 total, 1 running, 214 sleeping, 0 stopped, 0 zombie > > Cpu(s): 2.2%us, 1.0%sy, 0.0%ni, 96.0%id, 0.2%wa, 0.0%hi, 0.5%si, 0.0%st > > Mem: 12298636k total, 11832348k used, 466288k free, 1688052k buffers > > Swap: 2048276k total, 396k used, 2047880k free, 6215116k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 26088 httpd 16 0 189m 15m 2152 S 4.0 0.1 0:11.31 httpd > > 3416 root 11 -5 0 0 0 S 3.3 0.0 474:07.19 o2net > > 29193 httpd 15 0 191m 17m 2152 S 3.0 0.1 0:05.16 httpd > > 9842 httpd 15 0 191m 17m 2252 S 2.7 0.1 0:47.80 httpd > > 14329 httpd 15 0 191m 16m 2240 S 2.7 0.1 0:35.21 httpd > > 13245 httpd 15 0 190m 16m 2272 S 1.7 0.1 0:37.53 httpd > > 14719 httpd 16 0 190m 16m 2240 S 1.3 0.1 0:31.97 httpd > > 6658 root 11 -5 0 0 0 D 1.0 0.0 86:59.26 dlm_wq > > 10039 httpd 16 0 190m 16m 2328 S 0.7 0.1 0:44.13 httpd > > 10041 httpd 16 0 191m 17m 2252 S 0.7 0.1 0:46.10 httpd > > 16120 httpd 15 0 190m 16m 2240 S 0.7 0.1 0:36.17 httpd > > 6656 root 10 -5 0 0 0 D 0.3 0.0 79:35.89 dlm_thread > > 9844 httpd 15 0 188m 14m 2260 S 0.3 0.1 0:47.96 httpd > > 9845 httpd 15 0 191m 17m 2292 S 0.3 0.1 0:46.30 httpd > > 14227 httpd 15 0 192m 18m 2248 S 0.3 0.2 0:35.18 httpd > > 14232 httpd 15 0 190m 16m 2244 S 0.3 0.1 0:33.24 httpd > > 14247 httpd 15 0 190m 16m 2164 S 0.3 0.1 0:35.52 httpd > > 14281 httpd 15 0 189m 15m 2272 S 0.3 0.1 0:36.51 httpd > > 14327 httpd 15 0 192m 18m 2196 S 0.3 0.2 0:35.16 httpd > > 14331 httpd 15 0 190m 16m 2240 S 0.3 0.1 0:33.99 httpd > > > ------------------------------------------------------------------------------------------- > > Lag on open file for write shown by strace simple cp command, > > the symptom appear at bottom of the list > > > ------------------------------------------------------------------------------------------ > > [root at cafe4 wanchat-san1]# strace -T cp source_file destiantion_file > > execve("/bin/cp", ["cp", "source_file", "destiantion_file"], [/* 22 vars > */]) = 0 <0.000210> > > brk(0) = 0x2347000 <0.000006> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > 0x2b6d685af000 <0.000006> > > uname({sys="Linux", node="cafe4.pantip.com", ...}) = 0 <0.000006> > > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) > <0.000008> > > open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000009> > > fstat(3, {st_mode=S_IFREG|0644, st_size=63180, ...}) = 0 <0.000006> > > mmap(NULL, 63180, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6d685b0000 <0.000010> > > close(3) = 0 <0.000007> > > open("/lib64/libacl.so.1", O_RDONLY) = 3 <0.000013> > > read(3, > "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\31`\0379\0\0\0"..., 832) > 832 <0.000008> > > fstat(3, {st_mode=S_IFREG|0755, st_size=28008, ...}) = 0 <0.000020> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > 0x2b6d685c0000 <0.000008> > > mmap(0x391f600000, 2120992, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, > 3, 0) = 0x391f600000 <0.000010> > > mprotect(0x391f606000, 2093056, PROT_NONE) = 0 <0.000012> > > mmap(0x391f805000, 4096, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x391f805000 <0.000007> > > close(3) = 0 <0.000005> > > open("/lib64/libselinux.so.1", O_RDONLY) = 3 <0.000007> > > read(3, > "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`E\240\0369\0\0\0"..., 832) > 832 <0.000006> > > fstat(3, {st_mode=S_IFREG|0755, st_size=95464, ...}) = 0 <0.000005> > > mmap(0x391ea00000, 2192784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, > 3, 0) = 0x391ea00000 <0.000007> > > mprotect(0x391ea15000, 2097152, PROT_NONE) = 0 <0.000007> > > mmap(0x391ec15000, 8192, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x391ec15000 <0.000011> > > mmap(0x391ec17000, 1424, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391ec17000 <0.000010> > > close(3) = 0 <0.000007> > > open("/lib64/libattr.so.1", O_RDONLY) = 3 <0.000012> > > read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\17 > \0369\0\0\0"..., 832) = 832 <0.000007> > > fstat(3, {st_mode=S_IFREG|0755, st_size=17888, ...}) = 0 <0.000005> > > mmap(0x391e200000, 2110728, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, > 3, 0) = 0x391e200000 <0.000007> > > mprotect(0x391e204000, 2093056, PROT_NONE) = 0 <0.000011> > > mmap(0x391e403000, 4096, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x391e403000 <0.000012> > > close(3) = 0 <0.000007> > > open("/lib64/libc.so.6", O_RDONLY) = 3 <0.000012> > > read(3, > "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332!\0359\0\0\0"..., > 832) = 832 <0.000009> > > fstat(3, {st_mode=S_IFREG|0755, st_size=1717800, ...}) = 0 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > 0x2b6d685c1000 <0.000009> > > mmap(0x391d200000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, > 3, 0) = 0x391d200000 <0.000009> > > mprotect(0x391d34d000, 2097152, PROT_NONE) = 0 <0.000009> > > mmap(0x391d54d000, 20480, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x391d54d000 <0.000011> > > mmap(0x391d552000, 16728, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391d552000 <0.000014> > > close(3) = 0 <0.000005> > > open("/lib64/libdl.so.2", O_RDONLY) = 3 <0.000012> > > read(3, > "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16`\0359\0\0\0"..., 832) > = 832 <0.000008> > > fstat(3, {st_mode=S_IFREG|0755, st_size=23360, ...}) = 0 <0.000007> > > mmap(0x391d600000, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, > 3, 0) = 0x391d600000 <0.000009> > > mprotect(0x391d602000, 2097152, PROT_NONE) = 0 <0.000007> > > mmap(0x391d802000, 8192, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x391d802000 <0.000007> > > close(3) = 0 <0.000005> > > open("/lib64/libsepol.so.1", O_RDONLY) = 3 <0.000009> > > read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0=`\0369\0\0\0"..., > 832) = 832 <0.000006> > > fstat(3, {st_mode=S_IFREG|0755, st_size=247496, ...}) = 0 <0.000005> > > mmap(0x391e600000, 2383136, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, > 3, 0) = 0x391e600000 <0.000006> > > mprotect(0x391e63b000, 2097152, PROT_NONE) = 0 <0.000007> > > mmap(0x391e83b000, 4096, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3b000) = 0x391e83b000 <0.000008> > > mmap(0x391e83c000, 40224, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391e83c000 <0.000007> > > close(3) = 0 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > 0x2b6d685c2000 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > 0x2b6d685c3000 <0.000005> > > arch_prctl(ARCH_SET_FS, 0x2b6d685c2f90) = 0 <0.000005> > > mprotect(0x391d54d000, 16384, PROT_READ) = 0 <0.000007> > > mprotect(0x391d802000, 4096, PROT_READ) = 0 <0.000006> > > mprotect(0x391d01b000, 4096, PROT_READ) = 0 <0.000005> > > munmap(0x2b6d685b0000, 63180) = 0 <0.000010> > > access("/etc/selinux/", F_OK) = 0 <0.000007> > > brk(0) = 0x2347000 <0.000005> > > brk(0x2368000) = 0x2368000 <0.000005> > > open("/etc/selinux/config", O_RDONLY) = 3 <0.000013> > > fstat(3, {st_mode=S_IFREG|0644, st_size=511, ...}) = 0 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > 0x2b6d685b0000 <0.000005> > > read(3, "# This file controls the state o"..., 4096) = 511 <0.000009> > > read(3, "", 4096) = 0 <0.000006> > > close(3) = 0 <0.000006> > > munmap(0x2b6d685b0000, 4096) = 0 <0.000008> > > open("/proc/mounts", O_RDONLY) = 3 <0.000033> > > fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000005> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > 0x2b6d685b0000 <0.000005> > > read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 667 <0.000041> > > read(3, "", 4096) = 0 <0.000005> > > close(3) = 0 <0.000006> > > munmap(0x2b6d685b0000, 4096) = 0 <0.000007> > > open("/proc/filesystems", O_RDONLY) = 3 <0.000011> > > read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tb"..., 4095) = 354 <0.000014> > > close(3) = 0 <0.000006> > > open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 <0.000010> > > fstat(3, {st_mode=S_IFREG|0644, st_size=56471264, ...}) = 0 <0.000005> > > mmap(NULL, 56471264, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6d685c4000 > <0.000006> > > close(3) = 0 <0.000004> > > geteuid() = 0 <0.000006> > > stat("destiantion_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 > <0.000013> > > stat("source_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 > <0.000009> > > stat("destiantion_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 > <0.000008> > > open("source_file", O_RDONLY) = 3 <0.000022> > > fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000005> > > open("destiantion_file", O_WRONLY|O_TRUNC) = 4 <8.159196> <------ 8 Sec ! > During normal condition 0.000034 > > fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000008> > > fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000007> > > read(3, "Linux 2.6.18-194.3.1.el5 (cafe4."..., 4096) = 1307 <0.000012> > > write(4, "Linux 2.6.18-194.3.1.el5 (cafe4."..., 1307) = 1307 <0.003315> > > read(3, "", 4096) = 0 <0.000010> > > close(4) = 0 <0.000006> > > close(3) = 0 <0.000007> > > close(1) = 0 <0.000005> > > exit_group(0) = ? > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100728/7a224c97/attachment-0001.html
Hi, (I am in the same team with Mr. Wanchat) Just want to note that we already format OCFS2 with -T mail option. As note below, data=writeback,noatime, and commit interval has been increase already. The weird thing about this problem is that, OCFS2 will work for like about 30 days, after that the I/O delay time will increase greatly, to the point that the system is unusable at all. This problem is quite serious for us. It disrupt the services every month. Right now the only cure we found is by formating another SAN and copy everything there (we did it with rsync). Right now the old SAN which contains the file system with trouble is still left intact. We can play around with it since it is not in production anymore. We can do some debugging on this. Just let us know what we should do. Somsak somsaks at gmail.com On Wed, Jul 28, 2010 at 12:33 PM, wanchat padungrat <wanchat.p at pantip.com>wrote:> Dear Sunil, > > Thank you for your promptly reply, > > Yes, we do mount as data=writeback . > > following is our current mount option : type ocfs2 > (rw,_netdev,noatime,commit=15,data=writeback,heartbeat=local) > > Wanchat P. > > > 2010/7/28 Sunil Mushran <sunil.mushran at oracle.com> > > Have you tried mounting with data=writeback ? >> >> >> On Jul 27, 2010, at 9:31 PM, wanchat padungrat <wanchat.p at pantip.com> >> wrote: >> >> Dear all, >> >> Not realy sure whether this is bug or not, but we found that sometimes >> OCFS2 on our system do journaling a lot. >> >> (Please see screen shot below) >> >> As you can see, the IO was jumped from 111 w/s to 11,960 w/s , IO utilize >> jumped from 1.5% to 97% , %iowait jumped from 0.25% to 10.94%. >> >> It will not come into question, if this happen in short time period (eg. >> 1-2 sec.) , but sometimes we found this persist for half an hour or even >> more. >> >> Is this normal for OCFS2 or not ? How could I tune OCFS2 to avoid the >> above situation. >> >> We also suspect the relation of this symptom with another problem similar >> to <http://oss.oracle.com/pipermail/ocfs2-users/2009-January/003250.html> >> http://oss.oracle.com/pipermail/ocfs2-users/2009-January/003250.html , >> because we found OCFS2 file system pause for a long period of time too ( >> 5-20 Sec. on open for write). Those scattered happen all day with more >> frequence during highload time. (Screen shot below) >> >> Our configuration is 3 web servers connect to HP SAN Storage using OCFS2 >> 1.4.7. Amongst those 3 servers , there are 2 for reading data and 1 for >> writing data (traffic seperate at proxy layer) . The symptom above happen on >> data writing server. >> >> Thank you. >> >> Wanchat Padungrat >> >> PS. We keep our service by reinstall the whole data every month, without >> doing whole data copy (consumed 12 hours), the symptom getting worse an >> worse eg. journaling persist longer , file system pause longer (sometimes >> more than a minute !!) ... which affect the whole system , such as, too many >> waiting apache process, lavg going skyrocket(sometimes 400+) and cannot >> recover itself etc. . >> >> Screen shot >> >> ----------------- >> >> >> ----------------------------------------------------------------------------- >> >> Here is iostat+top during journaling process running. >> >> >> ----------------------------------------------------------------------------- >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> >> 1.88 0.00 5.38 10.94 0.00 81.81 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm >> %util >> >> sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sdc 0.00 5156.00 5.00 6804.50 19.00 47892.25 14.07 9.76 1.43 0.14 97.55 >> >> sdc1 0.00 5156.00 5.00 6804.50 19.00 47892.25 14.07 9.76 1.43 0.14 97.60 >> >> sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sdd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sde1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> dm-1 0.00 0.00 5.00 11960.00 19.00 47838.25 8.00 13.65 1.14 0.08 97.25 >> >> dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> dm-3 0.00 0.00 5.00 11960.00 19.00 47838.25 8.00 13.66 1.14 0.08 97.30 >> >> >> ------------------------------------------------------------------------------------------- >> >> Tasks: 215 total, 2 running, 213 sleeping, 0 stopped, 0 zombie >> >> Cpu(s): 2.5%us, 3.6%sy, 0.0%ni, 81.9%id, 10.9%wa, 0.0%hi, 1.1%si, 0.0%st >> >> Mem: 12298636k total, 11783892k used, 514744k free, 1679156k buffers >> >> Swap: 2048276k total, 396k used, 2047880k free, 6149700k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> >> 3416 root 10 -5 0 0 0 S 5.7 0.0 473:10.53 o2net >> >> 10034 httpd 16 0 192m 18m 2476 D 3.7 0.2 0:43.44 httpd >> >> 16119 httpd 16 0 190m 16m 2200 S 3.7 0.1 0:25.65 httpd >> >> 6660 root 10 -5 0 0 0 D 3.3 0.0 39:15.14 ocfs2cmt <--- Journaling >> >> 20840 httpd 16 0 191m 16m 2168 D 3.3 0.1 0:14.99 httpd >> >> 6659 root 10 -5 0 0 0 S 3.0 0.0 34:32.57 kjournald <--- Journaling >> >> 14282 httpd 16 0 190m 16m 2240 D 2.7 0.1 0:29.31 httpd >> >> 6656 root 11 -5 0 0 0 S 1.7 0.0 79:21.05 dlm_thread >> >> 17159 httpd 16 0 190m 16m 2240 D 1.7 0.1 0:24.06 httpd >> >> 14718 httpd 16 0 192m 18m 2408 D 1.3 0.2 0:28.30 httpd >> >> 6655 root 10 -5 0 0 0 S 1.0 0.0 12:02.41 ocfs2dc >> >> 10041 httpd 16 0 192m 18m 2252 R 1.0 0.2 0:41.68 httpd >> >> 14227 httpd 16 0 191m 17m 2248 D 1.0 0.1 0:29.96 httpd >> >> 14246 httpd 16 0 194m 20m 2236 D 1.0 0.2 0:30.14 httpd >> >> 19852 httpd 16 0 191m 16m 2160 D 1.0 0.1 0:16.72 httpd >> >> 25976 httpd 16 0 191m 16m 2148 D 1.0 0.1 0:05.45 httpd >> >> 6658 root 11 -5 0 0 0 S 0.7 0.0 86:50.46 dlm_wq >> >> 7387 httpd 16 0 201m 27m 2284 D 0.7 0.2 0:49.53 httpd >> >> 9843 httpd 16 0 191m 17m 2288 D 0.7 0.1 0:40.68 httpd >> >> >> ------------------------------------------------------------------------------------------- >> >> Here is iostat+top during journaling process sleep (normal run). >> >> >> ------------------------------------------------------------------------------------------- >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> >> 3.00 0.00 2.31 0.25 0.00 94.44 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm >> %util >> >> sda 0.00 26.00 0.00 10.50 0.00 146.00 27.81 0.13 12.05 1.38 1.45 >> >> sda1 0.00 26.00 0.00 10.50 0.00 146.00 27.81 0.13 12.05 1.38 1.45 >> >> sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sdc 0.00 80.50 3.00 31.00 11.00 444.25 26.78 0.05 1.41 0.46 1.55 >> >> sdc1 0.00 80.50 3.00 31.00 11.00 444.25 26.78 0.05 1.41 0.46 1.55 >> >> sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sdd1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> sde1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> dm-1 0.00 0.00 3.00 111.50 11.00 444.25 7.95 0.13 1.10 0.14 1.55 >> >> dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> >> dm-3 0.00 0.00 3.00 111.50 11.00 444.25 7.95 0.13 1.11 0.14 1.55 >> >> >> ------------------------------------------------------------------------------------------- >> >> top - 15:44:49 up 12 days, 2:21, 1 user, load average: 0.72, 9.27, 15.66 >> >> Tasks: 215 total, 1 running, 214 sleeping, 0 stopped, 0 zombie >> >> Cpu(s): 2.2%us, 1.0%sy, 0.0%ni, 96.0%id, 0.2%wa, 0.0%hi, 0.5%si, 0.0%st >> >> Mem: 12298636k total, 11832348k used, 466288k free, 1688052k buffers >> >> Swap: 2048276k total, 396k used, 2047880k free, 6215116k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> >> 26088 httpd 16 0 189m 15m 2152 S 4.0 0.1 0:11.31 httpd >> >> 3416 root 11 -5 0 0 0 S 3.3 0.0 474:07.19 o2net >> >> 29193 httpd 15 0 191m 17m 2152 S 3.0 0.1 0:05.16 httpd >> >> 9842 httpd 15 0 191m 17m 2252 S 2.7 0.1 0:47.80 httpd >> >> 14329 httpd 15 0 191m 16m 2240 S 2.7 0.1 0:35.21 httpd >> >> 13245 httpd 15 0 190m 16m 2272 S 1.7 0.1 0:37.53 httpd >> >> 14719 httpd 16 0 190m 16m 2240 S 1.3 0.1 0:31.97 httpd >> >> 6658 root 11 -5 0 0 0 D 1.0 0.0 86:59.26 dlm_wq >> >> 10039 httpd 16 0 190m 16m 2328 S 0.7 0.1 0:44.13 httpd >> >> 10041 httpd 16 0 191m 17m 2252 S 0.7 0.1 0:46.10 httpd >> >> 16120 httpd 15 0 190m 16m 2240 S 0.7 0.1 0:36.17 httpd >> >> 6656 root 10 -5 0 0 0 D 0.3 0.0 79:35.89 dlm_thread >> >> 9844 httpd 15 0 188m 14m 2260 S 0.3 0.1 0:47.96 httpd >> >> 9845 httpd 15 0 191m 17m 2292 S 0.3 0.1 0:46.30 httpd >> >> 14227 httpd 15 0 192m 18m 2248 S 0.3 0.2 0:35.18 httpd >> >> 14232 httpd 15 0 190m 16m 2244 S 0.3 0.1 0:33.24 httpd >> >> 14247 httpd 15 0 190m 16m 2164 S 0.3 0.1 0:35.52 httpd >> >> 14281 httpd 15 0 189m 15m 2272 S 0.3 0.1 0:36.51 httpd >> >> 14327 httpd 15 0 192m 18m 2196 S 0.3 0.2 0:35.16 httpd >> >> 14331 httpd 15 0 190m 16m 2240 S 0.3 0.1 0:33.99 httpd >> >> >> ------------------------------------------------------------------------------------------- >> >> Lag on open file for write shown by strace simple cp command, >> >> the symptom appear at bottom of the list >> >> >> ------------------------------------------------------------------------------------------ >> >> [root at cafe4 wanchat-san1]# strace -T cp source_file destiantion_file >> >> execve("/bin/cp", ["cp", "source_file", "destiantion_file"], [/* 22 vars >> */]) = 0 <0.000210> >> >> brk(0) = 0x2347000 <0.000006> >> >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> 0x2b6d685af000 <0.000006> >> >> uname({sys="Linux", node=" <http://cafe4.pantip.com/>cafe4.pantip.com", >> ...}) = 0 <0.000006> >> >> access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) >> <0.000008> >> >> open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000009> >> >> fstat(3, {st_mode=S_IFREG|0644, st_size=63180, ...}) = 0 <0.000006> >> >> mmap(NULL, 63180, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6d685b0000 >> <0.000010> >> >> close(3) = 0 <0.000007> >> >> open("/lib64/libacl.so.1", O_RDONLY) = 3 <0.000013> >> >> read(3, >> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\31`\0379\0\0\0"..., 832) >> 832 <0.000008> >> >> fstat(3, {st_mode=S_IFREG|0755, st_size=28008, ...}) = 0 <0.000020> >> >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> 0x2b6d685c0000 <0.000008> >> >> mmap(0x391f600000, 2120992, PROT_READ|PROT_EXEC, >> MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391f600000 <0.000010> >> >> mprotect(0x391f606000, 2093056, PROT_NONE) = 0 <0.000012> >> >> mmap(0x391f805000, 4096, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x391f805000 <0.000007> >> >> close(3) = 0 <0.000005> >> >> open("/lib64/libselinux.so.1", O_RDONLY) = 3 <0.000007> >> >> read(3, >> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`E\240\0369\0\0\0"..., 832) >> 832 <0.000006> >> >> fstat(3, {st_mode=S_IFREG|0755, st_size=95464, ...}) = 0 <0.000005> >> >> mmap(0x391ea00000, 2192784, PROT_READ|PROT_EXEC, >> MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391ea00000 <0.000007> >> >> mprotect(0x391ea15000, 2097152, PROT_NONE) = 0 <0.000007> >> >> mmap(0x391ec15000, 8192, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x391ec15000 <0.000011> >> >> mmap(0x391ec17000, 1424, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391ec17000 <0.000010> >> >> close(3) = 0 <0.000007> >> >> open("/lib64/libattr.so.1", O_RDONLY) = 3 <0.000012> >> >> read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\17 >> \0369\0\0\0"..., 832) = 832 <0.000007> >> >> fstat(3, {st_mode=S_IFREG|0755, st_size=17888, ...}) = 0 <0.000005> >> >> mmap(0x391e200000, 2110728, PROT_READ|PROT_EXEC, >> MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391e200000 <0.000007> >> >> mprotect(0x391e204000, 2093056, PROT_NONE) = 0 <0.000011> >> >> mmap(0x391e403000, 4096, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x391e403000 <0.000012> >> >> close(3) = 0 <0.000007> >> >> open("/lib64/libc.so.6", O_RDONLY) = 3 <0.000012> >> >> read(3, >> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332!\0359\0\0\0"..., >> 832) = 832 <0.000009> >> >> fstat(3, {st_mode=S_IFREG|0755, st_size=1717800, ...}) = 0 <0.000005> >> >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> 0x2b6d685c1000 <0.000009> >> >> mmap(0x391d200000, 3498328, PROT_READ|PROT_EXEC, >> MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391d200000 <0.000009> >> >> mprotect(0x391d34d000, 2097152, PROT_NONE) = 0 <0.000009> >> >> mmap(0x391d54d000, 20480, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x391d54d000 <0.000011> >> >> mmap(0x391d552000, 16728, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391d552000 <0.000014> >> >> close(3) = 0 <0.000005> >> >> open("/lib64/libdl.so.2", O_RDONLY) = 3 <0.000012> >> >> read(3, >> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16`\0359\0\0\0"..., 832) >> = 832 <0.000008> >> >> fstat(3, {st_mode=S_IFREG|0755, st_size=23360, ...}) = 0 <0.000007> >> >> mmap(0x391d600000, 2109696, PROT_READ|PROT_EXEC, >> MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391d600000 <0.000009> >> >> mprotect(0x391d602000, 2097152, PROT_NONE) = 0 <0.000007> >> >> mmap(0x391d802000, 8192, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x391d802000 <0.000007> >> >> close(3) = 0 <0.000005> >> >> open("/lib64/libsepol.so.1", O_RDONLY) = 3 <0.000009> >> >> read(3, >> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0=`\0369\0\0\0"..., 832) >> 832 <0.000006> >> >> fstat(3, {st_mode=S_IFREG|0755, st_size=247496, ...}) = 0 <0.000005> >> >> mmap(0x391e600000, 2383136, PROT_READ|PROT_EXEC, >> MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x391e600000 <0.000006> >> >> mprotect(0x391e63b000, 2097152, PROT_NONE) = 0 <0.000007> >> >> mmap(0x391e83b000, 4096, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3b000) = 0x391e83b000 <0.000008> >> >> mmap(0x391e83c000, 40224, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x391e83c000 <0.000007> >> >> close(3) = 0 <0.000005> >> >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> 0x2b6d685c2000 <0.000005> >> >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> 0x2b6d685c3000 <0.000005> >> >> arch_prctl(ARCH_SET_FS, 0x2b6d685c2f90) = 0 <0.000005> >> >> mprotect(0x391d54d000, 16384, PROT_READ) = 0 <0.000007> >> >> mprotect(0x391d802000, 4096, PROT_READ) = 0 <0.000006> >> >> mprotect(0x391d01b000, 4096, PROT_READ) = 0 <0.000005> >> >> munmap(0x2b6d685b0000, 63180) = 0 <0.000010> >> >> access("/etc/selinux/", F_OK) = 0 <0.000007> >> >> brk(0) = 0x2347000 <0.000005> >> >> brk(0x2368000) = 0x2368000 <0.000005> >> >> open("/etc/selinux/config", O_RDONLY) = 3 <0.000013> >> >> fstat(3, {st_mode=S_IFREG|0644, st_size=511, ...}) = 0 <0.000005> >> >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> 0x2b6d685b0000 <0.000005> >> >> read(3, "# This file controls the state o"..., 4096) = 511 <0.000009> >> >> read(3, "", 4096) = 0 <0.000006> >> >> close(3) = 0 <0.000006> >> >> munmap(0x2b6d685b0000, 4096) = 0 <0.000008> >> >> open("/proc/mounts", O_RDONLY) = 3 <0.000033> >> >> fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000005> >> >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> 0x2b6d685b0000 <0.000005> >> >> read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 667 <0.000041> >> >> read(3, "", 4096) = 0 <0.000005> >> >> close(3) = 0 <0.000006> >> >> munmap(0x2b6d685b0000, 4096) = 0 <0.000007> >> >> open("/proc/filesystems", O_RDONLY) = 3 <0.000011> >> >> read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tb"..., 4095) = 354 <0.000014> >> >> close(3) = 0 <0.000006> >> >> open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 <0.000010> >> >> fstat(3, {st_mode=S_IFREG|0644, st_size=56471264, ...}) = 0 <0.000005> >> >> mmap(NULL, 56471264, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b6d685c4000 >> <0.000006> >> >> close(3) = 0 <0.000004> >> >> geteuid() = 0 <0.000006> >> >> stat("destiantion_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 >> <0.000013> >> >> stat("source_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 >> <0.000009> >> >> stat("destiantion_file", {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 >> <0.000008> >> >> open("source_file", O_RDONLY) = 3 <0.000022> >> >> fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000005> >> >> open("destiantion_file", O_WRONLY|O_TRUNC) = 4 <8.159196> <------ 8 Sec ! >> During normal condition 0.000034 >> >> fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000008> >> >> fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0 <0.000007> >> >> read(3, "Linux 2.6.18-194.3.1.el5 (cafe4."..., 4096) = 1307 <0.000012> >> >> write(4, "Linux 2.6.18-194.3.1.el5 (cafe4."..., 1307) = 1307 <0.003315> >> >> read(3, "", 4096) = 0 <0.000010> >> >> close(4) = 0 <0.000006> >> >> close(3) = 0 <0.000007> >> >> close(1) = 0 <0.000005> >> >> exit_group(0) = ? >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100729/8c1e3ab9/attachment-0001.html
Hi All, I have stumbled across (via Google) a post on this mailing list in relation to performance issues with OCFS2. A little overview of our setup: 3 x Dell Poweredge R200 servers, w/8GB RAM, Dual Gig NIC's running VSphere 4.1 (ESXi w/Enterprise License) 1 x Dell Poweredge MD3000i ISCSI SAN w/15x 1tb SATA drives in RAID6 Each ESXi server runs 2 Gentoo Virtual machines running kernel 2.6.34 with ocfs2-tools - workload consists of lighttpd, Apache & Squid, with caching from the SAN to the local vm disks & RAM. Our problem lies within performance of the OCFS2 volume (which is ~10TB) over the disks. The iowait is constantly high (30-40% per server), and even though there are plenty of inodes and physical disk free, we cannot explain the problem. dnetwww2 ~ # df -h Filesystem Size Used Avail Use% Mounted on rootfs 18G 6.5G 11G 39% / /dev/sda3 18G 6.5G 11G 39% / rc-svcdir 1.0M 72K 952K 8% /lib64/rc/init.d udev 10M 184K 9.9M 2% /dev shm 2.0G 0 2.0G 0% /dev/shm /dev/sdb1 247G 29G 206G 13% /cache /dev/ram0 190M 13M 177M 7% /home/core /dev/ram1 190M 60M 130M 32% /home/moddb /dev/ram2 190M 20M 170M 11% /home/desura /dev/mapper/360024e8000758ab1000007624c1525dc1 9.8T 2.1T 7.7T 22% /home/shared dnetwww2 ~ # cat /etc/fstab --snip-- /dev/mapper/360024e8000758ab1000007624c1525dc1 /home/shared ocfs2 commit=15,heartbeat=local,data=writeback,noatime,user_xattr 1 2 dnetwww2 ~ # multipath -ll 360024e8000758ab1000007624c1525dcdm-0 , [size=9.7T][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=6][active] \_ #:#:#:# sdc 8:32 [active][ready] \_ #:#:#:# sde 8:64 [active][ready] \_ round-robin 0 [prio=0][enabled] \_ #:#:#:# sdd 8:48 [active][ghost] \_ #:#:#:# sdf 8:80 [active][ghost] Now if I mount an ext4 formatted lun/partition from the MD3000i (mapped via iscsi&multipath-tools) I can read/write to it at 125MB/s with no issues. The ocfs2 mounted volume struggles to sustain 25-30MB/s read/write. :-( We have spent countless hours working (troubleshooting/debugging) this now without result. We've even replaced both controllers, switches, network cards and so on in an attempt to rule out a specific hardware cause, but it seems to be ocfs2 related. I've noted there are a number of new ocfs2 patches in Linux 2.6.35 & the yet to be released 2.6.36 - would like to know if any of these resolve this issue before we are forced to ditch ocfs2 and go back to NFS. Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100822/2040dbdb/attachment.html