Hi guys! I can?t find an answer in google, so my last hope is this mailing list. Story. I have two servers with same arrays. Servers connected by DRBD. I used ocfs2 as file system, also I used NFS4 to access to the ocfs2 drive. I do not have any idea, but the allocated descriptors in /proc/sys/fs/file-nr increasing every time while drive accessed. So after some time allocated descriptions over max value and all processes make error ?To much opened files? (something like this). I do not see any error messages in log files... Any idea? I haven?t sleep two days Thank you all in advance. Configs: cat /etc/drbd.conf # You can find an example in /usr/share/doc/drbd.../drbd.conf.example include "drbd.d/global_common.conf"; include "drbd.d/*.res"; resource nfs { protocol C; handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh root"; pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; } startup { become-primary-on both; degr-wfc-timeout 120; } disk { on-io-error detach; } net { cram-hmac-alg sha1; shared-secret "password"; allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 500M; verify-alg sha1; al-extents 257; } on st01 { device /dev/drbd0; disk /dev/sdb; address 192.168.3.151:7788; meta-disk internal; } on st02 { device /dev/drbd0; disk /dev/sdb; address 192.168.3.152:7788; meta-disk internal; } } --- cat /etc/ocfs2/cluster.conf #/etc/ocfs2/cluster.conf node: ip_port = 7777 ip_address = 192.168.1.151 number = 1 name = st01 cluster = ocfs2 node: ip_port = 7777 ip_address = 192.168.1.152 number = 2 name = st02 cluster = ocfs2 cluster: node_count = 2 name = ocfs2 --- cat /etc/exports # /etc/exports: the access control list for filesystems which may be exported # to NFS clients. See exports(5). /snfs 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check,fsid=0) /snfs/projects 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check) /snfs/configs 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check) /snfs/variables 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check) /snfs/backups 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check) --- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110602/2ee78406/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1088 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110602/2ee78406/attachment.png
Maybe some problem with NFSv4, as OCFS2 officially support only NFSv2 and NFSv3. Try to export the volume with NFSv3 only. Also note that DRDB isn't officially supported too. Regards, S?rgio Em Thu, 2 Jun 2011 09:37:37 +0400 "Vasyl S. Kostroma" <admin at v-sf.info> escreveu:> Hi guys! > I can?t find an answer in google, so my last hope is this mailing > list. > > Story. > I have two servers with same arrays. Servers connected by DRBD. > I used ocfs2 as file system, also I used NFS4 to access to the > ocfs2 drive. I do not have any idea, but the allocated descriptors in > /proc/sys/fs/file-nr increasing every time while drive accessed. > So after some time allocated descriptions over max value and all > processes make error ?To much opened files? (something like > this). I do not see any error messages in log files... Any idea? > I haven?t sleep two days > > Thank you all in advance. > > Configs: > cat /etc/drbd.conf > # You can find an example in /usr/share/doc/drbd.../drbd.conf.example > > include "drbd.d/global_common.conf"; > include "drbd.d/*.res"; > > > resource nfs { > > protocol C; > > handlers { > split-brain "/usr/lib/drbd/notify-split-brain.sh > root"; pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; > pri-lost-after-sb "echo o > /proc/sysrq-trigger ; > halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; > } > > startup { > become-primary-on both; > degr-wfc-timeout 120; > } > > disk { > on-io-error detach; > } > > net { > cram-hmac-alg sha1; > shared-secret "password"; > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > rr-conflict disconnect; > } > > syncer { > rate 500M; > verify-alg sha1; > al-extents 257; > } > > on st01 { > device /dev/drbd0; > disk /dev/sdb; > address 192.168.3.151:7788; > meta-disk internal; > } > > on st02 { > device /dev/drbd0; > disk /dev/sdb; > address 192.168.3.152:7788; > meta-disk internal; > } > } > > --- > cat /etc/ocfs2/cluster.conf > #/etc/ocfs2/cluster.conf > node: > ip_port = 7777 > ip_address = 192.168.1.151 > number = 1 > name = st01 > cluster = ocfs2 > > node: > ip_port = 7777 > ip_address = 192.168.1.152 > number = 2 > name = st02 > cluster = ocfs2 > > cluster: > node_count = 2 > name = ocfs2 > --- > cat /etc/exports > # /etc/exports: the access control list for filesystems which may be > exported # to NFS clients. See exports(5). > /snfs > 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check,fsid=0) /snfs/projects > 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check) /snfs/configs > 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check) /snfs/variables > 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check) /snfs/backups > 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check) ----- .:''''':. .:' ` S?rgio Surkamp | Gerente de Rede :: ........ sergio at gruposinternet.com.br `:. .:' `:, ,.:' *Grupos Internet S.A.* `: :' R. Lauro Linhares, 2123 Torre B - Sala 201 : : Trindade - Florian?polis - SC :.' :: +55 48 3234-4109 : ' http://www.gruposinternet.com.br
That's the number of files open on the system. So this looks like an app problem. Some app has many files open. On 06/01/2011 10:37 PM, Vasyl S. Kostroma wrote:> Hi guys! > I can?t find an answer in google, so my last hope is this mailing list. > Story. > I have two servers with same arrays. Servers connected by DRBD. > I used ocfs2 as file system, also I used NFS4 to access to the > ocfs2 drive. I do not have any idea, but the allocated descriptors in > /proc/sys/fs/file-nr increasing every time while drive accessed. > So after some time allocated descriptions over max value and all > processes make error ?To much opened files? (something like > this). I do not see any error messages in log files... Any idea? > I haven?t sleep two days ????????? ?????? > Thank you all in advance. > Configs: > /cat /etc/drbd.conf/ > /# You can find an example in /usr/share/doc/drbd.../drbd.conf.example/ > // > /include "drbd.d/global_common.conf";/ > /include "drbd.d/*.res";/ > // > // > /resource nfs {/ > // > / protocol C;/ > // > / handlers {/ > / split-brain "/usr/lib/drbd/notify-split-brain.sh root";/ > / pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";/ > / pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";/ > / local-io-error "echo o > /proc/sysrq-trigger ; halt -f";/ > / }/ > // > / startup {/ > / become-primary-on both;/ > / degr-wfc-timeout 120;/ > / }/ > // > / disk {/ > / on-io-error detach;/ > / }/ > // > / net {/ > / cram-hmac-alg sha1;/ > / shared-secret "password";/ > // > / allow-two-primaries;/ > / after-sb-0pri discard-zero-changes;/ > / after-sb-1pri discard-secondary;/ > / after-sb-2pri disconnect;/ > / rr-conflict disconnect;/ > / }/ > // > / syncer {/ > / rate 500M;/ > / verify-alg sha1;/ > / al-extents 257;/ > / }/ > // > / on st01 {/ > / device /dev/drbd0;/ > / disk /dev/sdb;/ > / address 192.168.3.151:7788;/ > / meta-disk internal;/ > / }/ > // > / on st02 {/ > / device /dev/drbd0;/ > / disk /dev/sdb;/ > / address 192.168.3.152:7788;/ > / meta-disk internal;/ > / }/ > /}/ > --- > /cat /etc/ocfs2/cluster.conf/ > /#/etc/ocfs2/cluster.conf/ > /node:/ > / ip_port = 7777/ > / ip_address = 192.168.1.151/ > / number = 1/ > / name = st01/ > / cluster = ocfs2/ > // > /node:/ > / ip_port = 7777/ > / ip_address = 192.168.1.152/ > / number = 2/ > / name = st02/ > / cluster = ocfs2/ > // > /cluster:/ > / node_count = 2/ > / name = ocfs2/ > /---/ > /cat /etc/exports/ > /# /etc/exports: the access control list for filesystems which may be exported/ > /# to NFS clients. See exports(5)./ > //snfs 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check,fsid=0)/ > //snfs/projects 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check)/ > //snfs/configs 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check)/ > //snfs/variables 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check)/ > //snfs/backups 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check)/ > --- > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110602/d4a74063/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1088 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110602/d4a74063/attachment-0001.png
Thomas.Zimolong at bmi.bund.de
2011-Jun-06 09:58 UTC
[Ocfs2-users] Problems with descriptions.
Hi! A bit late, but was out of office a few days...> I do not have any idea, but the allocated descriptors in > /proc/sys/fs/file-nr increasing every time while drive accessed. > So after some time allocated descriptions over max value and all > processes make error ?To much opened files? (something like > this). I do not see any error messages in log files... Any idea?I think you have to distinguish two things: file handles and open files (file descriptors). The difference between file handles and open files is, that a process can have more open files than file handles. That?s because things like current working directories, memory mapped library files and executable text files (i.e. the program itself) are counted as open files but don't use file handles. On the other hand file handles rarely get unallocated, they rather get unused and are reused by some new process. As far as I understood, the difference is that a file handle is part of an additional layer of the C standard I/O library routines. See more on file handles and file descriptors in <<http://en.wikipedia.org/wiki/File_descriptor>>. Concerning the file handle problem: /proc/sys/fs/file-nr shows a statistic of file handles in three values: First value is the number of allocated file handles, second the number of free allocated(!) file handles (with Kernel 2.6 always 0) and the third number is the system wide maximum of allocatable file handles. The maximum is configured in /etc/sysctl.conf by "fs.file-max" and can be controlled via sysctl (man 8 sysctl) or via /proc/sys/fs/file-max. Concerning "too many open files" you might have a problem with the maximum number of allowed open files per user. Look at "lsof -u <user> | wc -l" to see how many open files all processes of <user> have. Look at "ulimit -n", which shows the number of allowed open files for the user. Only root can increase the limits (ok, the user can do this between hard and soft), so for a user different from root you have to configure this in /etc/security/limits.conf. See the bash manpage for the builtin "ulimit". Hope this sheds some light on your problem... Greetz Thomas