Cyril FERAUDET
2007-Aug-30 01:37 UTC
[Ocfs2-users] Big load and many error with ocfs2 1.3.3
Hi all, I've one 3.5T Xserve with one LUN shared between 7 debian nodes : # dmesg |grep -i ocfs OCFS2 Node Manager 1.3.3 OCFS2 DLM 1.3.3 OCFS2 DLMFS 1.3.3 OCFS2 User DLM kernel interface loaded OCFS2 1.3.3 ocfs2_dlm: Node 11 joins domain 7791C0F1366B4FAD99E46078BE5092A1 ocfs2_dlm: Nodes in domain ("7791C0F1366B4FAD99E46078BE5092A1"): 0 1 7 8 9 10 11 # uname -a Linux img01 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 i686 GNU/ Linux All nodes mount the shared partition without problem, but after some minute of utilization I get these error : ...cut... (4191,0):ocfs2_populate_inode:236 ERROR: Invalid dinode: i_ino=2832705, i_blkno=2832705, signature = INODE01, flags = 0x0 (4191,0):ocfs2_read_locked_inode:393 ERROR: populate failed! i_blkno=2832705, i_ino=2832705 (4191,0):ocfs2_iget:131 ERROR: status = -116 (4191,0):ocfs2_iget:141 ERROR: status = -116 (4191,0):ocfs2_get_dentry:63 ERROR: status = -116 (4192,1):ocfs2_populate_inode:236 ERROR: Invalid dinode: i_ino=2832705, i_blkno=2832705, signature = INODE01, flags = 0x0 (4192,1):ocfs2_read_locked_inode:393 ERROR: populate failed! i_blkno=2832705, i_ino=2832705 (4192,1):ocfs2_iget:131 ERROR: status = -116 (4192,1):ocfs2_iget:141 ERROR: status = -116 (4192,1):ocfs2_get_dentry:63 ERROR: status = -116 (4191,0):ocfs2_get_dentry:69 ERROR: status = -116 (4186,1):ocfs2_populate_inode:236 ERROR: Invalid dinode: i_ino=8808616, i_blkno=8808616, signature = INODE01, flags = 0x0 (4186,1):ocfs2_read_locked_inode:393 ERROR: populate failed! i_blkno=8808616, i_ino=8808616 (4186,1):ocfs2_iget:131 ERROR: status = -116 (4186,1):ocfs2_iget:141 ERROR: status = -116 (4186,1):ocfs2_get_dentry:63 ERROR: status = -116 (4189,1):ocfs2_get_dentry:69 ERROR: status = -116 ...cut... Then all node have a huge load more than 100 with more than 50% of iowait. I've tested the kernel 2.6.22 (don't remember the ocfs version), there are no ocfs2 error, but all opened to write file are truncated to zero size ! Have you an idea, do you know a stable version of ocfs2 build into a debian kernel ? Thanks, Cyril
Sunil Mushran
2007-Aug-30 10:30 UTC
[Ocfs2-users] Big load and many error with ocfs2 1.3.3
Are you exporting the ocfs2 volume via NFS? If so, then we did fix this issue. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=592282cf2eaa33409c6511ddd3f3ecaa57daeaaa It should be fixed in 2.6.22. Use 2.6.22 + patches as listed in the following link: http://oss.oracle.com/pipermail/ocfs2-users/2007-August/001935.html Cyril FERAUDET wrote:> A response to my self ... > > Thanks Sunil Mushran, I haven't see your response in the thousand of > mail at my return of holidays ... > > This patch seems to be in the 2.6.20. > > Can this error cause my big load average ? > > No other patch was included till the 2.6.22. Some one have the same > problem than me with the 2.6.22 (file truncated) ? > > Thanks, > > Cyril > > Le 30 ao?t 07 ? 10:36, Cyril FERAUDET a ?crit : > >> Hi all, >> >> I've one 3.5T Xserve with one LUN shared between 7 debian nodes : >> >> # dmesg |grep -i ocfs >> >> OCFS2 Node Manager 1.3.3 >> OCFS2 DLM 1.3.3 >> OCFS2 DLMFS 1.3.3 >> OCFS2 User DLM kernel interface loaded >> OCFS2 1.3.3 >> ocfs2_dlm: Node 11 joins domain 7791C0F1366B4FAD99E46078BE5092A1 >> ocfs2_dlm: Nodes in domain ("7791C0F1366B4FAD99E46078BE5092A1"): 0 1 >> 7 8 9 10 11 >> >> # uname -a >> Linux img01 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 i686 >> GNU/Linux >> >> All nodes mount the shared partition without problem, but after some >> minute of utilization I get these error : >> >> ...cut... >> (4191,0):ocfs2_populate_inode:236 ERROR: Invalid dinode: >> i_ino=2832705, i_blkno=2832705, signature = INODE01, flags = 0x0 >> (4191,0):ocfs2_read_locked_inode:393 ERROR: populate failed! >> i_blkno=2832705, i_ino=2832705 >> (4191,0):ocfs2_iget:131 ERROR: status = -116 >> (4191,0):ocfs2_iget:141 ERROR: status = -116 >> (4191,0):ocfs2_get_dentry:63 ERROR: status = -116 >> (4192,1):ocfs2_populate_inode:236 ERROR: Invalid dinode: >> i_ino=2832705, i_blkno=2832705, signature = INODE01, flags = 0x0 >> (4192,1):ocfs2_read_locked_inode:393 ERROR: populate failed! >> i_blkno=2832705, i_ino=2832705 >> (4192,1):ocfs2_iget:131 ERROR: status = -116 >> (4192,1):ocfs2_iget:141 ERROR: status = -116 >> (4192,1):ocfs2_get_dentry:63 ERROR: status = -116 >> (4191,0):ocfs2_get_dentry:69 ERROR: status = -116 >> (4186,1):ocfs2_populate_inode:236 ERROR: Invalid dinode: >> i_ino=8808616, i_blkno=8808616, signature = INODE01, flags = 0x0 >> (4186,1):ocfs2_read_locked_inode:393 ERROR: populate failed! >> i_blkno=8808616, i_ino=8808616 >> (4186,1):ocfs2_iget:131 ERROR: status = -116 >> (4186,1):ocfs2_iget:141 ERROR: status = -116 >> (4186,1):ocfs2_get_dentry:63 ERROR: status = -116 >> (4189,1):ocfs2_get_dentry:69 ERROR: status = -116 >> ...cut... >> >> Then all node have a huge load more than 100 with more than 50% of >> iowait. >> >> I've tested the kernel 2.6.22 (don't remember the ocfs version), >> there are no ocfs2 error, but all opened to write file are truncated >> to zero size ! >> >> Have you an idea, do you know a stable version of ocfs2 build into a >> debian kernel ? >> >> Thanks, >> >> Cyril >> >> >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >
Cyril FERAUDET
2007-Sep-14 05:54 UTC
[Ocfs2-users] Big load and many error with ocfs2 1.3.3
Hello Sunil, Just back of holidays, I tested the 2.6.22 with all of your patches. No problem instead of without patches, put aside that I suspect a locking problem. Indeed, I've a script who replace a file with an flock (LOCK_EX) on the shared volume and an other script who acces to the shared volume by NFS to copy this file to a local volume. This file is copied each minutes and regulary I've an empty or truncated file. Do you now a restriction about locking or something like that ? Thanks for all. Cyril Le 30 ao?t 07 ? 22:29, Sunil Mushran a ?crit :> Sorry I don't track the debian kernel. But typically they take > the source for kernel.org and use it as is. But don't take my > word for it. > > Cyril FERAUDET wrote: >> There is actualy only one node who export the ocfs2 volume via NFS >> under the 2.6.18-4 on etch repository. >> I've got no problem until I leave it alone. When I add an other >> node I got the error below. >> >> So if I understand the 2.6.22 provided by Debian haven't all >> patches listed below and >> would be the cause of the truncated file of my ( a bad day to fix >> it ;-) ) ? >> >> Can I apply the missing patches to the debian package or I need to >> use a fresh sources ? >> >> Thanks in advance. >> >> Cyril >> >> Le 30 ao?t 07 ? 19:30, Sunil Mushran a ?crit : >> >>> Are you exporting the ocfs2 volume via NFS? If so, then we did >>> fix this issue. >>> http://git.kernel.org/?p=linux/kernel/git/torvalds/ >>> linux-2.6.git;a=commit;h=592282cf2eaa33409c6511ddd3f3ecaa57daeaaa >>> It should be fixed in 2.6.22. >>> >>> Use 2.6.22 + patches as listed in the following link: >>> http://oss.oracle.com/pipermail/ocfs2-users/2007-August/001935.html >>> >>> Cyril FERAUDET wrote: >>>> A response to my self ... >>>> >>>> Thanks Sunil Mushran, I haven't see your response in the >>>> thousand of mail at my return of holidays ... >>>> >>>> This patch seems to be in the 2.6.20. >>>> >>>> Can this error cause my big load average ? >>>> >>>> No other patch was included till the 2.6.22. Some one have the >>>> same problem than me with the 2.6.22 (file truncated) ? >>>> >>>> Thanks, >>>> >>>> Cyril >>>> >>>> Le 30 ao?t 07 ? 10:36, Cyril FERAUDET a ?crit : >>>> >>>>> Hi all, >>>>> >>>>> I've one 3.5T Xserve with one LUN shared between 7 debian nodes : >>>>> >>>>> # dmesg |grep -i ocfs >>>>> >>>>> OCFS2 Node Manager 1.3.3 >>>>> OCFS2 DLM 1.3.3 >>>>> OCFS2 DLMFS 1.3.3 >>>>> OCFS2 User DLM kernel interface loaded >>>>> OCFS2 1.3.3 >>>>> ocfs2_dlm: Node 11 joins domain 7791C0F1366B4FAD99E46078BE5092A1 >>>>> ocfs2_dlm: Nodes in domain >>>>> ("7791C0F1366B4FAD99E46078BE5092A1"): 0 1 7 8 9 10 11 >>>>> >>>>> # uname -a >>>>> Linux img01 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 >>>>> i686 GNU/Linux >>>>> >>>>> All nodes mount the shared partition without problem, but after >>>>> some minute of utilization I get these error : >>>>> >>>>> ...cut... >>>>> (4191,0):ocfs2_populate_inode:236 ERROR: Invalid dinode: >>>>> i_ino=2832705, i_blkno=2832705, signature = INODE01, flags = 0x0 >>>>> (4191,0):ocfs2_read_locked_inode:393 ERROR: populate failed! >>>>> i_blkno=2832705, i_ino=2832705 >>>>> (4191,0):ocfs2_iget:131 ERROR: status = -116 >>>>> (4191,0):ocfs2_iget:141 ERROR: status = -116 >>>>> (4191,0):ocfs2_get_dentry:63 ERROR: status = -116 >>>>> (4192,1):ocfs2_populate_inode:236 ERROR: Invalid dinode: >>>>> i_ino=2832705, i_blkno=2832705, signature = INODE01, flags = 0x0 >>>>> (4192,1):ocfs2_read_locked_inode:393 ERROR: populate failed! >>>>> i_blkno=2832705, i_ino=2832705 >>>>> (4192,1):ocfs2_iget:131 ERROR: status = -116 >>>>> (4192,1):ocfs2_iget:141 ERROR: status = -116 >>>>> (4192,1):ocfs2_get_dentry:63 ERROR: status = -116 >>>>> (4191,0):ocfs2_get_dentry:69 ERROR: status = -116 >>>>> (4186,1):ocfs2_populate_inode:236 ERROR: Invalid dinode: >>>>> i_ino=8808616, i_blkno=8808616, signature = INODE01, flags = 0x0 >>>>> (4186,1):ocfs2_read_locked_inode:393 ERROR: populate failed! >>>>> i_blkno=8808616, i_ino=8808616 >>>>> (4186,1):ocfs2_iget:131 ERROR: status = -116 >>>>> (4186,1):ocfs2_iget:141 ERROR: status = -116 >>>>> (4186,1):ocfs2_get_dentry:63 ERROR: status = -116 >>>>> (4189,1):ocfs2_get_dentry:69 ERROR: status = -116 >>>>> ...cut... >>>>> >>>>> Then all node have a huge load more than 100 with more than 50% >>>>> of iowait. >>>>> >>>>> I've tested the kernel 2.6.22 (don't remember the ocfs >>>>> version), there are no ocfs2 error, but all opened to write >>>>> file are truncated to zero size ! >>>>> >>>>> Have you an idea, do you know a stable version of ocfs2 build >>>>> into a debian kernel ? >>>>> >>>>> Thanks, >>>>> >>>>> Cyril >> >
Cyril FERAUDET
2007-Sep-18 03:59 UTC
[Ocfs2-users] Big load and many error with ocfs2 1.3.3
Hello Sunil, I've lot of problem with the patches on 2.6.22. I'm unable to mount the ocfs2 volume shared by NFS on more than 2 NFS client when there are more than 1 ocfs2 node. I get an "NFS stale handle" ... Without NFS with all node as ocfs2 node, it seems working fine. But I've more iowait without NFS than with 1 ocfs2 node and the other by NFS ... About lock, do you know if NFS need to have a filesystem with locking feature to purpose locking to NFS client ? Cyril Le 17 sept. 07 ? 19:15, Sunil Mushran a ?crit :> I'm sorry I could not understand your email. Are you saying that while > most of the problems were solved with the patches, one wasn't? > > As far as cluster aware flock/lockf goes, it is still not > implemented in > ocfs2. > > http://git.kernel.org/?p=linux/kernel/git/torvalds/ > linux-2.6.git;a=blob;f=Documentation/filesystems/ > ocfs2.txt;h=ed55238023a9843f7781ac6cfe94d174ba772072;hb=HEAD > > Cyril FERAUDET wrote: >> Hello Sunil, >> >> Just back of holidays, I tested the 2.6.22 with all of your patches. >> >> No problem instead of without patches, put aside that I suspect a >> locking problem. >> >> Indeed, I've a script who replace a file with an flock (LOCK_EX) >> on the shared volume and >> an other script who acces to the shared volume by NFS to copy this >> file to a local volume. >> >> This file is copied each minutes and regulary I've an empty or >> truncated file. >> >> Do you now a restriction about locking or something like that ? >> >> Thanks for all. >> >> Cyril >> >> >> Le 30 ao?t 07 ? 22:29, Sunil Mushran a ?crit : >> >>> Sorry I don't track the debian kernel. But typically they take >>> the source for kernel.org and use it as is. But don't take my >>> word for it. >>> >>> Cyril FERAUDET wrote: >>>> There is actualy only one node who export the ocfs2 volume via >>>> NFS under the 2.6.18-4 on etch repository. >>>> I've got no problem until I leave it alone. When I add an other >>>> node I got the error below. >>>> >>>> So if I understand the 2.6.22 provided by Debian haven't all >>>> patches listed below and >>>> would be the cause of the truncated file of my ( a bad day to >>>> fix it ;-) ) ? >>>> >>>> Can I apply the missing patches to the debian package or I need >>>> to use a fresh sources ? >>>> >>>> Thanks in advance. >>>> >>>> Cyril >>>> >>>> Le 30 ao?t 07 ? 19:30, Sunil Mushran a ?crit : >>>> >>>>> Are you exporting the ocfs2 volume via NFS? If so, then we did >>>>> fix this issue. >>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/ >>>>> linux-2.6.git;a=commit;h=592282cf2eaa33409c6511ddd3f3ecaa57daeaaa >>>>> It should be fixed in 2.6.22. >>>>> >>>>> Use 2.6.22 + patches as listed in the following link: >>>>> http://oss.oracle.com/pipermail/ocfs2-users/2007-August/ >>>>> 001935.html >>>>> >>>>> Cyril FERAUDET wrote: >>>>>> A response to my self ... >>>>>> >>>>>> Thanks Sunil Mushran, I haven't see your response in the >>>>>> thousand of mail at my return of holidays ... >>>>>> >>>>>> This patch seems to be in the 2.6.20. >>>>>> >>>>>> Can this error cause my big load average ? >>>>>> >>>>>> No other patch was included till the 2.6.22. Some one have the >>>>>> same problem than me with the 2.6.22 (file truncated) ? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Cyril >>>>>> >>>>>> Le 30 ao?t 07 ? 10:36, Cyril FERAUDET a ?crit : >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I've one 3.5T Xserve with one LUN shared between 7 debian >>>>>>> nodes : >>>>>>> >>>>>>> # dmesg |grep -i ocfs >>>>>>> >>>>>>> OCFS2 Node Manager 1.3.3 >>>>>>> OCFS2 DLM 1.3.3 >>>>>>> OCFS2 DLMFS 1.3.3 >>>>>>> OCFS2 User DLM kernel interface loaded >>>>>>> OCFS2 1.3.3 >>>>>>> ocfs2_dlm: Node 11 joins domain 7791C0F1366B4FAD99E46078BE5092A1 >>>>>>> ocfs2_dlm: Nodes in domain >>>>>>> ("7791C0F1366B4FAD99E46078BE5092A1"): 0 1 7 8 9 10 11 >>>>>>> >>>>>>> # uname -a >>>>>>> Linux img01 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 >>>>>>> i686 GNU/Linux >>>>>>> >>>>>>> All nodes mount the shared partition without problem, but >>>>>>> after some minute of utilization I get these error : >>>>>>> >>>>>>> ...cut... >>>>>>> (4191,0):ocfs2_populate_inode:236 ERROR: Invalid dinode: >>>>>>> i_ino=2832705, i_blkno=2832705, signature = INODE01, flags = 0x0 >>>>>>> (4191,0):ocfs2_read_locked_inode:393 ERROR: populate failed! >>>>>>> i_blkno=2832705, i_ino=2832705 >>>>>>> (4191,0):ocfs2_iget:131 ERROR: status = -116 >>>>>>> (4191,0):ocfs2_iget:141 ERROR: status = -116 >>>>>>> (4191,0):ocfs2_get_dentry:63 ERROR: status = -116 >>>>>>> (4192,1):ocfs2_populate_inode:236 ERROR: Invalid dinode: >>>>>>> i_ino=2832705, i_blkno=2832705, signature = INODE01, flags = 0x0 >>>>>>> (4192,1):ocfs2_read_locked_inode:393 ERROR: populate failed! >>>>>>> i_blkno=2832705, i_ino=2832705 >>>>>>> (4192,1):ocfs2_iget:131 ERROR: status = -116 >>>>>>> (4192,1):ocfs2_iget:141 ERROR: status = -116 >>>>>>> (4192,1):ocfs2_get_dentry:63 ERROR: status = -116 >>>>>>> (4191,0):ocfs2_get_dentry:69 ERROR: status = -116 >>>>>>> (4186,1):ocfs2_populate_inode:236 ERROR: Invalid dinode: >>>>>>> i_ino=8808616, i_blkno=8808616, signature = INODE01, flags = 0x0 >>>>>>> (4186,1):ocfs2_read_locked_inode:393 ERROR: populate failed! >>>>>>> i_blkno=8808616, i_ino=8808616 >>>>>>> (4186,1):ocfs2_iget:131 ERROR: status = -116 >>>>>>> (4186,1):ocfs2_iget:141 ERROR: status = -116 >>>>>>> (4186,1):ocfs2_get_dentry:63 ERROR: status = -116 >>>>>>> (4189,1):ocfs2_get_dentry:69 ERROR: status = -116 >>>>>>> ...cut... >>>>>>> >>>>>>> Then all node have a huge load more than 100 with more than >>>>>>> 50% of iowait. >>>>>>> >>>>>>> I've tested the kernel 2.6.22 (don't remember the ocfs >>>>>>> version), there are no ocfs2 error, but all opened to write >>>>>>> file are truncated to zero size ! >>>>>>> >>>>>>> Have you an idea, do you know a stable version of ocfs2 build >>>>>>> into a debian kernel ? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Cyril >>>> >>> >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >