thr3ads.net - Gluster users - [Gluster-users] Load goes up every 3-5 days [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Tobias Wilken

2010-Jun-14 15:59 UTC

[Gluster-users] Load goes up every 3-5 days

Hey all,

I have a curious problem with a simple glusterfs Installation. After about
3-5 days the load of the glusterfs process goes up to 100% CPU usage and
takes about 1 hour until it comes back to normal operation. Actions on the
glusterfs mount point are mainly creating, reading and writing into bzr
repositories, but I can't assoziate any action to the problem. It occurs on
operations which I done multiple times before without problems.

The configuration files are created by:
glusterfs-volgen --name repstore1 --raid 1 hostname1:/data/share
hostname2:/data/share
Only the iocache cache-size is reduced to 256MB and a user/password
authentication is added and of course the hostnames are adapted. I'm using
two glusterfs daemons/server as replication and the glusterfs is mounted on
these nodes, too. Additional two nodes have mounted the glusterfs. As
version glusterfs 3.0.4 is used, compiled without any special flags.

The only specialty the nodes are amazon ec2 instances, but I don't think it
should make a difference.

The last week I tried very hard to reproduce the problem by settings the
nodes under cpu load, memory usage, stressing the filesystem with writing
and reading and destroying the network connection and many permutations of
that :-), but I can't reproduce it. Today a simple bzr export operation
"crashes" it again.

Any idea how I can reproduce such a problem for further debugging?? Any
other ideas? Maybe some pity? :-)

Best regards
Tobias Wilken

P.S. The logs from the today "crash", on Host1 the load of the
glusterfs
process goes up.
Host1:
/var/log/glusterfs/data-share.log
[2010-06-14 08:52:07] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse:
14750285: GETATTR 140270343725136 (fuse_loc_fill() failed)
[2010-06-14 08:52:07] W [fuse-bridge.c:1529:fuse_rename_cbk] glusterfs-fuse:
14750288:
/applications/moodletest/repository/.bzr/branch/lock/jodfk6p0iu.tmp ->
/applications/moodletest/repository/.bzr/branch/lock/held => -1 (Directory
not empty)
[2010-06-14 08:52:07] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse:
14750295: GETATTR 140270343725136 (fuse_loc_fill() failed)
[2010-06-14 08:52:07] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse:
14750298: GETATTR 140270343725136 (fuse_loc_fill() failed)
[2010-06-14 08:52:07] W [fuse-bridge.c:1529:fuse_rename_cbk] glusterfs-fuse:
14750300:
/applications/moodletest/repository/.bzr/branch/lock/jhogp2wpi2.tmp ->
/applications/moodletest/repository/.bzr/branch/lock/held => -1 (Directory
not empty)
[2010-06-14 08:52:07] W [fuse-bridge.c:1529:fuse_rename_cbk] glusterfs-fuse:
14750303:
/applications/moodletest/repository/.bzr/branch/lock/18ytmffmsv.tmp ->
/applications/moodletest/repository/.bzr/branch/lock/held => -1 (Directory
not empty)
[2010-06-14 08:52:07] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse:
14750308: GETATTR 140270343725136 (fuse_loc_fill() failed)
[2010-06-14 08:52:07] W [fuse-bridge.c:1529:fuse_rename_cbk] glusterfs-fuse:
14750311:
/applications/moodletest/repository/.bzr/branch/lock/lzrpezkqno.tmp ->
/applications/moodletest/repository/.bzr/branch/lock/held => -1 (Directory
not empty)
[2010-06-14 08:52:47] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse:
14756735: FSTAT() ERR => -1 (File descriptor in bad state)


/var/log/glusterfs/glusterfsd.log
[2010-06-14 08:52:15] N [server-protocol.c:6788:notify] server-tcp:
10.227.26.95:1017 disconnected
[2010-06-14 08:52:15] N [server-protocol.c:6788:notify] server-tcp:
10.227.26.95:1016 disconnected
[2010-06-14 08:52:15] N [server-helpers.c:842:server_connection_destroy]
server-tcp: destroyed connection of
ip-10-227-26-95-6105-2010/06/10-01:28:54:815100-hostname2-1

Host2:
/var/log/glusterfs/data-share.log
[2010-06-14 08:52:15] W [fuse-bridge.c:1848:fuse_readv_cbk] glusterfs-fuse:
16468676: READ => -1 (File descriptor in bad state)
pending frames:

patchset: v3.0.4
signal received: 6
time of crash: 2010-06-14 08:52:15
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.0.4
/lib/libc.so.6(+0x33af0)[0x7ff48e9a3af0]
/lib/libc.so.6(gsignal+0x35)[0x7ff48e9a3a75]
/lib/libc.so.6(abort+0x180)[0x7ff48e9a75c0]
/lib/libc.so.6(+0x6d4fb)[0x7ff48e9dd4fb]
/lib/libc.so.6(+0x775b6)[0x7ff48e9e75b6]
/lib/libc.so.6(cfree+0x73)[0x7ff48e9ede53]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/quick-read.so(qr_readv+0x252)[0x7ff48d4e3842]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/stat-prefetch.so(sp_readv+0x142)[0x7ff48d2d2082]
/usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so(+0x5fa7)[0x7ff48d0b7fa7]
/usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so(+0x4b54)[0x7ff48d0b6b54]
/lib/libpthread.so.0(+0x69ca)[0x7ff48ecf89ca]
/lib/libc.so.6(clone+0x6d)[0x7ff48ea566cd]
---------

/var/log/glusterfs/glusterfsd.log
[2010-06-14 08:52:15] N [server-protocol.c:6788:notify] server-tcp:
10.227.26.95:1019 disconnected
[2010-06-14 08:52:15] N [server-protocol.c:6788:notify] server-tcp:
10.227.26.95:1018 disconnected
[2010-06-14 08:52:15] N [server-helpers.c:842:server_connection_destroy]
server-tcp: destroyed connection of
ip-10-227-26-95-6105-2010/06/10-01:28:54:815100-hostname1-1

Lakshmipathi

2010-Jun-15 11:16 UTC

head link

[Gluster-users] Load goes up every 3-5 days

Hi Tobias Wilken,
Can you please sent us,the complete server/client log file,volume files, along
with backtrace of core file?
(for compilation flags and running backtrace-please check
http://www.gluster.com/community/documentation/index.php/GlusterFS_Troubleshooting)

You can keep track of this issue here -
http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1004

-- 
----
Cheers,
Lakshmipathi.G
FOSS Programmer.


----- Original Message -----
From: "Tobias Wilken" <tw at cloudcontrol.de>
To: gluster-users at gluster.org
Sent: Monday, June 14, 2010 9:29:33 PM
Subject: [Gluster-users] Load goes up every 3-5 days

Hey all,

I have a curious problem with a simple glusterfs Installation. After about
3-5 days the load of the glusterfs process goes up to 100% CPU usage and
takes about 1 hour until it comes back to normal operation. Actions on the
glusterfs mount point are mainly creating, reading and writing into bzr
repositories, but I can't assoziate any action to the problem. It occurs on
operations which I done multiple times before without problems.

The configuration files are created by:
glusterfs-volgen --name repstore1 --raid 1 hostname1:/data/share
hostname2:/data/share
Only the iocache cache-size is reduced to 256MB and a user/password
authentication is added and of course the hostnames are adapted. I'm using
two glusterfs daemons/server as replication and the glusterfs is mounted on
these nodes, too. Additional two nodes have mounted the glusterfs. As
version glusterfs 3.0.4 is used, compiled without any special flags.

The only specialty the nodes are amazon ec2 instances, but I don't think it
should make a difference.

The last week I tried very hard to reproduce the problem by settings the
nodes under cpu load, memory usage, stressing the filesystem with writing
and reading and destroying the network connection and many permutations of
that :-), but I can't reproduce it. Today a simple bzr export operation
"crashes" it again.

Any idea how I can reproduce such a problem for further debugging?? Any
other ideas? Maybe some pity? :-)

Best regards
Tobias Wilken

P.S. The logs from the today "crash", on Host1 the load of the
glusterfs
process goes up.
Host1:
/var/log/glusterfs/data-share.log
[2010-06-14 08:52:07] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse:
14750285: GETATTR 140270343725136 (fuse_loc_fill() failed)
[2010-06-14 08:52:07] W [fuse-bridge.c:1529:fuse_rename_cbk] glusterfs-fuse:
14750288:
/applications/moodletest/repository/.bzr/branch/lock/jodfk6p0iu.tmp ->
/applications/moodletest/repository/.bzr/branch/lock/held => -1 (Directory
not empty)
[2010-06-14 08:52:07] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse:
14750295: GETATTR 140270343725136 (fuse_loc_fill() failed)
[2010-06-14 08:52:07] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse:
14750298: GETATTR 140270343725136 (fuse_loc_fill() failed)
[2010-06-14 08:52:07] W [fuse-bridge.c:1529:fuse_rename_cbk] glusterfs-fuse:
14750300:
/applications/moodletest/repository/.bzr/branch/lock/jhogp2wpi2.tmp ->
/applications/moodletest/repository/.bzr/branch/lock/held => -1 (Directory
not empty)
[2010-06-14 08:52:07] W [fuse-bridge.c:1529:fuse_rename_cbk] glusterfs-fuse:
14750303:
/applications/moodletest/repository/.bzr/branch/lock/18ytmffmsv.tmp ->
/applications/moodletest/repository/.bzr/branch/lock/held => -1 (Directory
not empty)
[2010-06-14 08:52:07] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse:
14750308: GETATTR 140270343725136 (fuse_loc_fill() failed)
[2010-06-14 08:52:07] W [fuse-bridge.c:1529:fuse_rename_cbk] glusterfs-fuse:
14750311:
/applications/moodletest/repository/.bzr/branch/lock/lzrpezkqno.tmp ->
/applications/moodletest/repository/.bzr/branch/lock/held => -1 (Directory
not empty)
[2010-06-14 08:52:47] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse:
14756735: FSTAT() ERR => -1 (File descriptor in bad state)


/var/log/glusterfs/glusterfsd.log
[2010-06-14 08:52:15] N [server-protocol.c:6788:notify] server-tcp:
10.227.26.95:1017 disconnected
[2010-06-14 08:52:15] N [server-protocol.c:6788:notify] server-tcp:
10.227.26.95:1016 disconnected
[2010-06-14 08:52:15] N [server-helpers.c:842:server_connection_destroy]
server-tcp: destroyed connection of
ip-10-227-26-95-6105-2010/06/10-01:28:54:815100-hostname2-1

Host2:
/var/log/glusterfs/data-share.log
[2010-06-14 08:52:15] W [fuse-bridge.c:1848:fuse_readv_cbk] glusterfs-fuse:
16468676: READ => -1 (File descriptor in bad state)
pending frames:

patchset: v3.0.4
signal received: 6
time of crash: 2010-06-14 08:52:15
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.0.4
/lib/libc.so.6(+0x33af0)[0x7ff48e9a3af0]
/lib/libc.so.6(gsignal+0x35)[0x7ff48e9a3a75]
/lib/libc.so.6(abort+0x180)[0x7ff48e9a75c0]
/lib/libc.so.6(+0x6d4fb)[0x7ff48e9dd4fb]
/lib/libc.so.6(+0x775b6)[0x7ff48e9e75b6]
/lib/libc.so.6(cfree+0x73)[0x7ff48e9ede53]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/quick-read.so(qr_readv+0x252)[0x7ff48d4e3842]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/stat-prefetch.so(sp_readv+0x142)[0x7ff48d2d2082]
/usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so(+0x5fa7)[0x7ff48d0b7fa7]
/usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so(+0x4b54)[0x7ff48d0b6b54]
/lib/libpthread.so.0(+0x69ca)[0x7ff48ecf89ca]
/lib/libc.so.6(clone+0x6d)[0x7ff48ea566cd]
---------

/var/log/glusterfs/glusterfsd.log
[2010-06-14 08:52:15] N [server-protocol.c:6788:notify] server-tcp:
10.227.26.95:1019 disconnected
[2010-06-14 08:52:15] N [server-protocol.c:6788:notify] server-tcp:
10.227.26.95:1018 disconnected
[2010-06-14 08:52:15] N [server-helpers.c:842:server_connection_destroy]
server-tcp: destroyed connection of
ip-10-227-26-95-6105-2010/06/10-01:28:54:815100-hostname1-1

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Gluster users - Jun 2010 - Load goes up every 3-5 days

[Gluster-users] Load goes up every 3-5 days

[Gluster-users] Load goes up every 3-5 days