Cedric Lagneau
2011-May-03 08:27 UTC
[Gluster-users] Issue with geo-replication and nfs auth
hi, I've some issue with geo-replication (since 3.2.0) and nfs auth (since initial release). Geo-replication --------------- System : Debian 6.0 amd64 Glusterfs: 3.2.0 MASTER (volume) => SLAVE (directory) For some volume it works, but for others i can't enable geo-replication and have this error with a faulty status: 2011-05-03 09:57:40.315774] E [syncdutils:131:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap tf(*aa) File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 118, in listen rid, exc, res = recv(self.inf) File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 42, in recv return pickle.load(inf) EOFError Command line : gluster volume geo-replication test slave.mydomain.com:/data/test/ start On /etc/glusterd i don't see any diff between /etc/glusterd files. #gluster volume geo-replication status MASTER SLAVE STATUS -------------------------------------------------------------------------------- test ssh://root at slave.mydomain.com:file:///data/test faulty test2 ssh://root at slave.mydomain.com:file:///data/test2 OK NFS auth allow -------------- Even if i set nfs.rpc-auth-allow on volume to restrict acces on some ip i can always mount via nfs. How it works ? Sample: #gluster volume set test nfs.rpc-auth-allow 10.0.0.10 #gluster volume info test Options Reconfigured: nfs.rpc-auth-allow: 10.0.0.10 My client ip 192.168.10.25 can mount the test volume with nfs with: mount -t nfs -o vers=3 glusterserveur:/test /mnt/test thanks for your help, best regards, -- C?dric Lagneau
Hi Cedric, Regarding the geo-replication state, the log essentially means that the client-server communication between geo-rep master & slave has gone down, this could be for various reasons. We could narrow down to the exact cause, if you could run the session again with debug log-level and send us the log-files of master & slave. run it in debug level by executing the following command: #gluster volume geo-replication test ssh://root at slave.mydomain.com:file:///data/test config log-level DEBUG to locate the master's log-file execute the following command: #gluster volume geo-replication test ssh://root at slave.mydomain.com:file:///data/test config log-file to locate the slave log-file do the following: execute this command on the slave domain: #gluster volume geo-replication ssh://root at slave.mydomain.com: file:///data/test config log-file you would get a template which includes ${session_owner} to get the session_owner of the geo-replication session execute the following command in MASTER: #gluster volume geo-replication test ssh://root at slave.mydomain.com:file:///data/test config session-owner Regarding nfs.rpc-auth-allow: It is a bug which would be addressed in the next minor-release, you can follow the status of it at http://bugs.gluster.com/show_bug.cgi?id=2866 <http://bugs.gluster.com/show_bug.cgi?id=2866>Regards, Kaushik BV On Tue, May 3, 2011 at 1:57 PM, Cedric Lagneau <cedric.lagneau at openwide.fr>wrote:> hi, > > I've some issue with geo-replication (since 3.2.0) and nfs auth (since > initial release). > > > Geo-replication > --------------- > System : Debian 6.0 amd64 > Glusterfs: 3.2.0 > > MASTER (volume) => SLAVE (directory) > For some volume it works, but for others i can't enable geo-replication and > have this error with a faulty status: > 2011-05-03 09:57:40.315774] E [syncdutils:131:log_raise_exception] <top>: > FAIL: > Traceback (most recent call last): > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", line > 152, in twrap > tf(*aa) > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 118, > in listen > rid, exc, res = recv(self.inf) > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 42, > in recv > return pickle.load(inf) > EOFError > > Command line : > gluster volume geo-replication test slave.mydomain.com:/data/test/ start > > On /etc/glusterd i don't see any diff between /etc/glusterd files. > > #gluster volume geo-replication status > MASTER SLAVE > STATUS > > -------------------------------------------------------------------------------- > test ssh://root at slave.mydomain.com:file:///data/test > faulty > test2 ssh://root at slave.mydomain.com:file:///data/test2 > OK > > > > NFS auth allow > -------------- > Even if i set nfs.rpc-auth-allow on volume to restrict acces on some ip i > can always mount via nfs. How it works ? > > Sample: > > #gluster volume set test nfs.rpc-auth-allow 10.0.0.10 > #gluster volume info test > Options Reconfigured: > nfs.rpc-auth-allow: 10.0.0.10 > > My client ip 192.168.10.25 can mount the test volume with nfs with: > mount -t nfs -o vers=3 glusterserveur:/test /mnt/test > > > thanks for your help, > > > best regards, > > > -- > > C?dric Lagneau > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110503/03343e93/attachment.html>
[repost for the ML after subscription, pls. reply to this one] Hi, On Tue, May 3, 2011 at 4:25 PM, Kaushik BV <kaushikbv at gluster.com> wrote:> to locate the slave log-file do the following: > execute this command on the slave domain: > #gluster volume geo-replication ssh://root at slave.mydomain.com:file:///data/test config log-fileSorry, this command is bogus. The proper command would be: # gluster volume geo-replication file:///data/test config log-file or for short: # gluster volume geo-replication /data/test config log-file (because on the slave you are already at the other end of the ssh tunnel, so you should strip the ssh part off of the slave url). Also, to ease debugging, you are suggested to set the log-level to DEBUG on slave side too (that is independent of master side log-level): # gluster volume geo-replication /data/test config log-level DEBUG Csaba
Cedric Lagneau
2011-May-12 09:40 UTC
[Gluster-users] Issue with geo-replication and nfs auth
----- Mail original -----> hi, > > I've some issue with geo-replication (since 3.2.0) and nfs auth (since > initial release). > > > Geo-replication > --------------- > System : Debian 6.0 amd64 > Glusterfs: 3.2.0 > > MASTER (volume) => SLAVE (directory) > For some volume it works, but for others i can't enable > geo-replication and have this error with a faulty status: > 2011-05-03 09:57:40.315774] E [syncdutils:131:log_raise_exception] > <top>: FAIL: > Traceback (most recent call last): > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", > line 152, in twrap > tf(*aa) > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line > 118, in listen > rid, exc, res = recv(self.inf) > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line > 42, in recv > return pickle.load(inf) > EOFError > > Command line : > gluster volume geo-replication test slave.mydomain.com:/data/test/ > start > > On /etc/glusterd i don't see any diff between /etc/glusterd files. > > #gluster volume geo-replication status > MASTER SLAVE STATUS > -------------------------------------------------------------------------------- > test ssh://root at slave.mydomain.com:file:///data/test faulty > test2 ssh://root at slave.mydomain.com:file:///data/test2 OK > > > > NFS auth allow > -------------- > Even if i set nfs.rpc-auth-allow on volume to restrict acces on some > ip i can always mount via nfs. How it works ? > > Sample: > > #gluster volume set test nfs.rpc-auth-allow 10.0.0.10 > #gluster volume info test > Options Reconfigured: > nfs.rpc-auth-allow: 10.0.0.10 > > My client ip 192.168.10.25 can mount the test volume with nfs with: > mount -t nfs -o vers=3 glusterserveur:/test /mnt/test > > > thanks for your help, > > > best regards, > > > -- > > C?dric Lagneau > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-usersMy initial problem on the testing platform is not solved: glusterd geo-replication command stop working after about one day. On Master: #cat ssh%3A%2F%2Froot%40slave.mydomain.com%3Afile%3A%2F%2F%2Fdata%2Ftest2.log [2011-05-12 10:50:53.451495] I [monitor(monitor):19:set_state] Monitor: new state: starting... [2011-05-12 10:50:53.465759] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------ [2011-05-12 10:50:53.466232] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker [2011-05-12 10:50:53.596132] I [gsyncd:287:main_i] <top>: syncing: gluster://localhost:test2 -> ssh://slave.mydomain.com:/data/test2 [2011-05-12 10:50:53.641566] D [repce:131:push] RepceClient: call 1879:140148091115264:1305190253.64 __repce_version__() ... [2011-05-12 10:50:53.751271] E [syncdutils:131:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap tf(*aa) File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 118, in listen rid, exc, res = recv(self.inf) File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 42, in recv return pickle.load(inf) EOFError [2011-05-12 10:50:53.759484] D [monitor(monitor):57:monitor] Monitor: worker got connected in 0 sec, waiting 59 more to make sure it's fine [2011-05-12 10:51:53.535005] I [monitor(monitor):19:set_state] Monitor: new state: faulty There is not test2-gluster.log. On Slave: no log (in debug mode) and no process /usr/bin/python /usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py tcpdump on SLAVE show some ssh traffic with Master server when i start geo-replication. glusterd strace on master with a starting geo-replication with status faulty: Process 28439 attached - interrupt to quit epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 261, 4294967295) = 1 accept(8, {sa_family=AF_INET, sin_port=htons(1020), sin_addr=inet_addr("127.0.0.1")}, [16]) = 15 fcntl(15, F_GETFL) = 0x2 (flags O_RDWR) fcntl(15, F_SETFL, O_RDWR|O_NONBLOCK) = 0 setsockopt(15, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(15, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 setsockopt(15, SOL_TCP, TCP_KEEPIDLE, [10], 4) = 0 setsockopt(15, SOL_TCP, TCP_KEEPINTVL, [2], 4) = 0 getsockname(15, {sa_family=AF_INET, sin_port=htons(24007), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 epoll_ctl(3, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLPRI, {u32=15, u64=38654705679}}) = 0 epoll_wait(3, {{EPOLLIN, {u32=15, u64=38654705679}}}, 261, 4294967295) = 1 readv(15, [{"\200\0\0\330", 4}], 1) = 4 readv(15, [{"\0\0\0\1\0\0\0\0", 8}], 1) = 8 readv(15, [{"\0\0\0\2\0\22\345\277\0\0\0\1\0\0\0\30\0\0\0\5\0\0\0X", 24}], 1) = 24 readv(15, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 184}], 1) = 184 gettimeofday({1305191963, 681734}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(7, "[2011-05-12 11:19:23.681734] vol"..., 105) = 105 gettimeofday({1305191963, 682178}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:19:23.682178] I ["..., 138) = 138 gettimeofday({1305191963, 682468}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:19:23.682468] I ["..., 110) = 110 gettimeofday({1305191963, 682751}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(7, "[2011-05-12 11:19:23.682751] vol"..., 117) = 117 gettimeofday({1305191963, 684106}, NULL) = 0 stat("/etc/glusterd/vols/test2", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 pipe2([82, 83], O_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f61412239d0) = 2367 close(83) = 0 fcntl(82, F_SETFD, 0) = 0 fstat(82, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f61410db000 read(82, "ssh://root at slave.mydomain.com:file:/"..., 4096) = 45 close(82) = 0 wait4(2367, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2367 --- SIGCHLD (Child exited) @ 0 (0) --- munmap(0x7f61410db000, 4096) = 0 gettimeofday({1305191963, 813500}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:19:23.813500] I ["..., 119) = 119 gettimeofday({1305191963, 814001}, NULL) = 0 gettimeofday({1305191963, 814156}, NULL) = 0 pipe2([82, 83], O_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f61412239d0) = 2370 close(83) = 0 fcntl(82, F_SETFD, 0) = 0 fstat(82, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f61410db000 read(82, "ssh://root at slave.mydomain.com:file:/"..., 4096) = 45 close(82) = 0 wait4(2370, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2370 --- SIGCHLD (Child exited) @ 0 (0) --- munmap(0x7f61410db000, 4096) = 0 mkdir("/etc/glusterd/vols/test2", 0777) = -1 EEXIST (File exists) open("/etc/glusterd/vols/test2/info.tmp", O_RDWR|O_CREAT|O_TRUNC, 0644) = 82 write(82, "type=2\n", 7) = 7 write(82, "count=2\n", 8) = 8 write(82, "status=1\n", 9) = 9 write(82, "sub_count=2\n", 12) = 12 write(82, "version=44\n", 11) = 11 write(82, "transport-type=0\n", 17) = 17 write(82, "volume-id=19d2cbaf-79b3-4ccf-97c"..., 47) = 47 write(82, "geo-replication.indexing=on\n", 28) = 28 write(82, "slave1=c746cb97-91e4-489c-81e0-7"..., 89) = 89 write(82, "brick-0=master.mydomain.com:-data-3\n", 31) = 31 mkdir("/etc/glusterd/vols/test2/bricks", 0777) = -1 EEXIST (File exists) open("/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-3.tmp", O_RDWR|O_CREAT|O_TRUNC, 0644) = 83 write(83, "hostname=master.mydomain.com\n", 24) = 24 write(83, "path=/data/3\n", 13) = 13 write(83, "listen-port=24011\n", 18) = 18 rename("/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-3.tmp", "/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-3") = 0 close(83) = 0 write(82, "brick-1=master.mydomain.com:-data-4\n", 31) = 31 mkdir("/etc/glusterd/vols/test2/bricks", 0777) = -1 EEXIST (File exists) open("/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-4.tmp", O_RDWR|O_CREAT|O_TRUNC, 0644) = 83 write(83, "hostname=master.mydomain.com\n", 24) = 24 write(83, "path=/data/4\n", 13) = 13 write(83, "listen-port=24012\n", 18) = 18 rename("/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-4.tmp", "/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-4") = 0 close(83) = 0 rename("/etc/glusterd/vols/test2/info.tmp", "/etc/glusterd/vols/test2/info") = 0 close(82) = 0 open("/etc/glusterd/vols/test2/cksum", O_RDWR|O_CREAT|O_TRUNC|O_APPEND, 0644) = 82 open("/tmp/test2.CHeNh1", O_RDWR|O_CREAT|O_EXCL, 0600) = 83 rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7f614023a1e0}, {0x404cf0, [INT], SA_RESTORER|SA_RESTART, 0x7f614023a1e0}, 8) = 0 rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7f614023a1e0}, {SIG_DFL, [], SA_RESTORER, 0x7f614023a1e0}, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [HUP USR1 USR2 TERM], 8) = 0 clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff8701a5d8) = 2373 wait4(2373, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2373 rt_sigaction(SIGINT, {0x404cf0, [INT], SA_RESTORER|SA_RESTART, 0x7f614023a1e0}, NULL, 8) = 0 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7f614023a1e0}, NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, [HUP USR1 USR2 TERM], NULL, 8) = 0 open("/tmp/test2.CHeNh1", O_RDWR) = 84 lseek(84, 0, SEEK_SET) = 0 read(84, "brick-0=master.mydomain.com:-data-3\nb"..., 1024) = 290 read(84, "", 1024) = 0 lseek(84, 0, SEEK_SET) = 0 close(84) = 0 write(82, "info=186676944\n", 15) = 15 lseek(82, 0, SEEK_SET) = 0 read(82, "info=186676944\n", 1024) = 15 read(82, "", 1024) = 0 lseek(82, 0, SEEK_SET) = 0 close(82) = 0 close(83) = 0 unlink("/tmp/test2.CHeNh1") = 0 pipe2([82, 83], O_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f61412239d0) = 2374 close(83) = 0 fcntl(82, F_SETFD, 0) = 0 fstat(82, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f61410db000 read(82, "/etc/glusterd/geo-replication/te"..., 4096) = 105 close(82) = 0 wait4(2374, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2374 --- SIGCHLD (Child exited) @ 0 (0) --- munmap(0x7f61410db000, 4096) = 0 open("/etc/glusterd/geo-replication/test2/ssh%3A%2F%2Froot%40slave.mydomain.com%3Afile%3A%2F%2F%2Fdata%2Ftest2.pid", O_RDWR) = -1 ENOENT (No such file or directory) fcntl(-1, F_GETLK, {type=F_RDLCK, whence=SEEK_CUR, start=0, len=0, pid=0}) = -1 EBADF (Bad file descriptor) close(4294967295) = -1 EBADF (Bad file descriptor) mkdir("/etc/glusterd/geo-replication/test2", 0777) = -1 EEXIST (File exists) stat("/etc/glusterd/geo-replication/test2", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 mkdir("/var/log/glusterfs/geo-replication/test2", 0777) = -1 EEXIST (File exists) stat("/var/log/glusterfs/geo-replication/test2", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f61412239d0) = 2377 wait4(2377, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2377 --- SIGCHLD (Child exited) @ 0 (0) --- gettimeofday({1305191964, 330323}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:19:24.330323] I ["..., 120) = 120 gettimeofday({1305191964, 330948}, NULL) = 0 gettimeofday({1305191964, 331080}, NULL) = 0 gettimeofday({1305191964, 331158}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:19:24.331158] I ["..., 111) = 111 writev(15, [{"\200\0\1\34", 4}, {"\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\5test"..., 260}], 3) = 288 gettimeofday({1305191964, 332321}, NULL) = 0 epoll_wait(3, {{EPOLLIN, {u32=15, u64=38654705679}}}, 261, 4294967295) = 1 readv(15, [{"\330\0\0\200", 4}], 1) = 0 gettimeofday({1305191964, 335478}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:19:24.335478] W ["..., 192) = 192 epoll_ctl(3, EPOLL_CTL_DEL, 15, NULL) = 0 close(15) = 0 epoll_wait(3, <unfinished ...> Process 28439 detached If restart glusterd (/etc/init.d/glusterd restart) it works again: i can start/stop geo-replication volume, logs are create and ok on master/slave. glusterd strace : Process 2458 attached - interrupt to quit epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 261, 4294967295) = 1 accept(8, {sa_family=AF_INET, sin_port=htons(1022), sin_addr=inet_addr("127.0.0.1")}, [16]) = 16 fcntl(16, F_GETFL) = 0x2 (flags O_RDWR) fcntl(16, F_SETFL, O_RDWR|O_NONBLOCK) = 0 setsockopt(16, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(16, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 setsockopt(16, SOL_TCP, TCP_KEEPIDLE, [10], 4) = 0 setsockopt(16, SOL_TCP, TCP_KEEPINTVL, [2], 4) = 0 getsockname(16, {sa_family=AF_INET, sin_port=htons(24007), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 epoll_ctl(3, EPOLL_CTL_ADD, 16, {EPOLLIN|EPOLLPRI, {u32=16, u64=38654705680}}) = 0 epoll_wait(3, {{EPOLLIN, {u32=16, u64=38654705680}}}, 261, 4294967295) = 1 readv(16, [{"\200\0\0\330", 4}], 1) = 4 readv(16, [{"\0\0\0\1\0\0\0\0", 8}], 1) = 8 readv(16, [{"\0\0\0\2\0\22\345\277\0\0\0\1\0\0\0\30\0\0\0\5\0\0\0X", 24}], 1) = 24 readv(16, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 184}], 1) = 184 gettimeofday({1305192323, 967813}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(7, "[2011-05-12 11:25:23.967813] vol"..., 105) = 105 gettimeofday({1305192323, 968349}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:25:23.968349] I ["..., 138) = 138 gettimeofday({1305192323, 968673}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:25:23.968673] I ["..., 110) = 110 gettimeofday({1305192323, 969031}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(7, "[2011-05-12 11:25:23.969031] vol"..., 117) = 117 gettimeofday({1305192323, 969433}, NULL) = 0 stat("/etc/glusterd/vols/test2", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 pipe2([17, 18], O_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f778b77f9d0) = 2528 close(18) = 0 fcntl(17, F_SETFD, 0) = 0 fstat(17, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7789285000 read(17, "ssh://root at slave.mydomain.com:file:/"..., 4096) = 45 close(17) = 0 wait4(2528, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2528 --- SIGCHLD (Child exited) @ 0 (0) --- munmap(0x7f7789285000, 4096) = 0 gettimeofday({1305192324, 88378}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:25:24.88378] I [g"..., 118) = 118 gettimeofday({1305192324, 88931}, NULL) = 0 gettimeofday({1305192324, 89060}, NULL) = 0 pipe2([17, 18], O_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f778b77f9d0) = 2531 close(18) = 0 fcntl(17, F_SETFD, 0) = 0 fstat(17, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7789285000 read(17, "ssh://root at slave.mydomain.com:file:/"..., 4096) = 45 close(17) = 0 wait4(2531, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2531 --- SIGCHLD (Child exited) @ 0 (0) --- munmap(0x7f7789285000, 4096) = 0 mkdir("/etc/glusterd/vols/test2", 0777) = -1 EEXIST (File exists) open("/etc/glusterd/vols/test2/info.tmp", O_RDWR|O_CREAT|O_TRUNC, 0644) = 17 write(17, "type=2\n", 7) = 7 write(17, "count=2\n", 8) = 8 write(17, "status=1\n", 9) = 9 write(17, "sub_count=2\n", 12) = 12 write(17, "version=46\n", 11) = 11 write(17, "transport-type=0\n", 17) = 17 write(17, "volume-id=19d2cbaf-79b3-4ccf-97c"..., 47) = 47 write(17, "geo-replication.indexing=on\n", 28) = 28 write(17, "slave1=c746cb97-91e4-489c-81e0-7"..., 89) = 89 write(17, "brick-0=master.mydomain.com:-data-3\n", 31) = 31 mkdir("/etc/glusterd/vols/test2/bricks", 0777) = -1 EEXIST (File exists) open("/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-3.tmp", O_RDWR|O_CREAT|O_TRUNC, 0644) = 18 write(18, "hostname=master.mydomain.com\n", 24) = 24 write(18, "path=/data/3\n", 13) = 13 write(18, "listen-port=24011\n", 18) = 18 rename("/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-3.tmp", "/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-3") = 0 close(18) = 0 write(17, "brick-1=master.mydomain.com:-data-4\n", 31) = 31 mkdir("/etc/glusterd/vols/test2/bricks", 0777) = -1 EEXIST (File exists) open("/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-4.tmp", O_RDWR|O_CREAT|O_TRUNC, 0644) = 18 write(18, "hostname=master.mydomain.com\n", 24) = 24 write(18, "path=/data/4\n", 13) = 13 write(18, "listen-port=24012\n", 18) = 18 rename("/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-4.tmp", "/etc/glusterd/vols/test2/bricks/master.mydomain.com:-data-4") = 0 close(18) = 0 rename("/etc/glusterd/vols/test2/info.tmp", "/etc/glusterd/vols/test2/info") = 0 close(17) = 0 open("/etc/glusterd/vols/test2/cksum", O_RDWR|O_CREAT|O_TRUNC|O_APPEND, 0644) = 17 open("/tmp/test2.suny1Z", O_RDWR|O_CREAT|O_EXCL, 0600) = 18 rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7f778a7961e0}, {0x404cf0, [INT], SA_RESTORER|SA_RESTART, 0x7f778a7961e0}, 8) = 0 rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7f778a7961e0}, {SIG_DFL, [], SA_RESTORER, 0x7f778a7961e0}, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [HUP USR1 USR2 TERM], 8) = 0 clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fffd34d4e48) = 2534 wait4(2534, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2534 rt_sigaction(SIGINT, {0x404cf0, [INT], SA_RESTORER|SA_RESTART, 0x7f778a7961e0}, NULL, 8) = 0 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7f778a7961e0}, NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, [HUP USR1 USR2 TERM], NULL, 8) = 0 open("/tmp/test2.suny1Z", O_RDWR) = 19 lseek(19, 0, SEEK_SET) = 0 read(19, "brick-0=master.mydomain.com:-data-3\nb"..., 1024) = 290 read(19, "", 1024) = 0 lseek(19, 0, SEEK_SET) = 0 close(19) = 0 write(17, "info=186675920\n", 15) = 15 lseek(17, 0, SEEK_SET) = 0 read(17, "info=186675920\n", 1024) = 15 read(17, "", 1024) = 0 lseek(17, 0, SEEK_SET) = 0 close(17) = 0 close(18) = 0 unlink("/tmp/test2.suny1Z") = 0 pipe2([17, 18], O_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f778b77f9d0) = 2535 close(18) = 0 fcntl(17, F_SETFD, 0) = 0 fstat(17, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7789285000 read(17, "/etc/glusterd/geo-replication/te"..., 4096) = 105 close(17) = 0 wait4(2535, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2535 --- SIGCHLD (Child exited) @ 0 (0) --- munmap(0x7f7789285000, 4096) = 0 open("/etc/glusterd/geo-replication/test2/ssh%3A%2F%2Froot%40slave.mydomain.com%3Afile%3A%2F%2F%2Fdata%2Ftest2.pid", O_RDWR) = -1 ENOENT (No such file or directory) fcntl(-1, F_GETLK, {type=F_RDLCK, whence=SEEK_CUR, start=0, len=0, pid=0}) = -1 EBADF (Bad file descriptor) close(4294967295) = -1 EBADF (Bad file descriptor) mkdir("/etc/glusterd/geo-replication/test2", 0777) = -1 EEXIST (File exists) stat("/etc/glusterd/geo-replication/test2", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 mkdir("/var/log/glusterfs/geo-replication/test2", 0777) = -1 EEXIST (File exists) stat("/var/log/glusterfs/geo-replication/test2", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f778b77f9d0) = 2538 wait4(2538, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2538 --- SIGCHLD (Child exited) @ 0 (0) --- gettimeofday({1305192324, 603405}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:25:24.603405] I ["..., 120) = 120 gettimeofday({1305192324, 604013}, NULL) = 0 gettimeofday({1305192324, 604090}, NULL) = 0 gettimeofday({1305192324, 604209}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:25:24.604209] I ["..., 111) = 111 writev(16, [{"\200\0\1\34", 4}, {"\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\5test"..., 260}], 3) = 288 gettimeofday({1305192324, 605385}, NULL) = 0 epoll_wait(3, {{EPOLLIN, {u32=16, u64=38654705680}}}, 261, 4294967295) = 1 readv(16, [{"\330\0\0\200", 4}], 1) = 0 gettimeofday({1305192324, 623948}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:25:24.623948] W ["..., 192) = 192 epoll_ctl(3, EPOLL_CTL_DEL, 16, NULL) = 0 close(16) = 0 epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 261, 4294967295) = 1 accept(8, {sa_family=AF_INET, sin_port=htons(1020), sin_addr=inet_addr("127.0.0.1")}, [16]) = 16 fcntl(16, F_GETFL) = 0x2 (flags O_RDWR) fcntl(16, F_SETFL, O_RDWR|O_NONBLOCK) = 0 setsockopt(16, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(16, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 setsockopt(16, SOL_TCP, TCP_KEEPIDLE, [10], 4) = 0 setsockopt(16, SOL_TCP, TCP_KEEPINTVL, [2], 4) = 0 getsockname(16, {sa_family=AF_INET, sin_port=htons(24007), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 epoll_ctl(3, EPOLL_CTL_ADD, 16, {EPOLLIN|EPOLLPRI, {u32=16, u64=38654705680}}) = 0 epoll_wait(3, {{EPOLLIN, {u32=16, u64=38654705680}}}, 261, 4294967295) = 1 readv(16, [{"\200\0\0\220", 4}], 1) = 4 readv(16, [{"\0\0\0\1\0\0\0\0", 8}], 1) = 8 readv(16, [{"\0\0\0\2\0\333\264\251\0\0\0\1\0\0\0\2\0\0\0\5\0\0\0X", 24}], 1) = 24 readv(16, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 112}], 1) = 112 stat("/etc/glusterd/vols/test2/test2.vol", 0x7fffd34dd540) = -1 ENOENT (No such file or directory) stat("/etc/glusterd/vols/test2/test2-fuse.vol", {st_mode=S_IFREG|0644, st_size=1090, ...}) = 0 stat("/etc/glusterd/vols/test2/test2-fuse.vol", {st_mode=S_IFREG|0644, st_size=1090, ...}) = 0 open("/etc/glusterd/vols/test2/test2-fuse.vol", O_RDONLY) = 17 read(17, "volume test2-client-0\n type p"..., 1090) = 1090 close(17) = 0 writev(16, [{"\200\0\4h", 4}, {"\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24}, {"\0\0\4B\0\0\0\0\0\0\4Bvolume test2-client-"..., 1104}], 3) = 1132 epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 261, 4294967295) = 1 accept(8, {sa_family=AF_INET, sin_port=htons(1017), sin_addr=inet_addr("master.mydomain.com")}, [16]) = 17 fcntl(17, F_GETFL) = 0x2 (flags O_RDWR) fcntl(17, F_SETFL, O_RDWR|O_NONBLOCK) = 0 setsockopt(17, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(17, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 setsockopt(17, SOL_TCP, TCP_KEEPIDLE, [10], 4) = 0 setsockopt(17, SOL_TCP, TCP_KEEPINTVL, [2], 4) = 0 getsockname(17, {sa_family=AF_INET, sin_port=htons(24007), sin_addr=inet_addr("master.mydomain.com")}, [16]) = 0 epoll_ctl(3, EPOLL_CTL_ADD, 17, {EPOLLIN|EPOLLPRI, {u32=17, u64=42949672977}}) = 0 epoll_wait(3, {{EPOLLIN, {u32=8, u64=8}}}, 261, 4294967295) = 1 accept(8, {sa_family=AF_INET, sin_port=htons(1016), sin_addr=inet_addr("master.mydomain.com")}, [16]) = 18 fcntl(18, F_GETFL) = 0x2 (flags O_RDWR) fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK) = 0 setsockopt(18, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(18, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 setsockopt(18, SOL_TCP, TCP_KEEPIDLE, [10], 4) = 0 setsockopt(18, SOL_TCP, TCP_KEEPINTVL, [2], 4) = 0 getsockname(18, {sa_family=AF_INET, sin_port=htons(24007), sin_addr=inet_addr("master.mydomain.com")}, [16]) = 0 epoll_ctl(3, EPOLL_CTL_ADD, 18, {EPOLLIN|EPOLLPRI, {u32=18, u64=47244640274}}) = 0 epoll_wait(3, {{EPOLLIN, {u32=17, u64=42949672977}}}, 261, 4294967295) = 1 readv(17, [{"\200\0\0\210", 4}], 1) = 4 readv(17, [{"\0\0\0\1\0\0\0\0", 8}], 1) = 8 readv(17, [{"\0\0\0\2\7[\270m\0\0\0\1\0\0\0\1\0\0\0\5\0\0\0X", 24}], 1) = 24 readv(17, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 104}], 1) = 104 writev(17, [{"\200\0\1\34", 4}, {"\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\26\0\0\0\1\0\0\0\7GF-DUMP\0"..., 260}], 3) = 288 epoll_wait(3, {{EPOLLIN, {u32=18, u64=47244640274}}}, 261, 4294967295) = 1 readv(18, [{"\200\0\0\210", 4}], 1) = 4 readv(18, [{"\0\0\0\1\0\0\0\0", 8}], 1) = 8 readv(18, [{"\0\0\0\2\7[\270m\0\0\0\1\0\0\0\1\0\0\0\5\0\0\0X", 24}], 1) = 24 readv(18, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 104}], 1) = 104 writev(18, [{"\200\0\1\34", 4}, {"\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\26\0\0\0\1\0\0\0\7GF-DUMP\0"..., 260}], 3) = 288 epoll_wait(3, {{EPOLLIN, {u32=17, u64=42949672977}}}, 261, 4294967295) = 1 readv(17, [{"\200\0\0\214", 4}], 1) = 4 readv(17, [{"\0\0\0\2\0\0\0\0", 8}], 1) = 8 readv(17, [{"\0\0\0\2\2\10\256\300\0\0\0\1\0\0\0\1\0\0\0\5\0\0\0X", 24}], 1) = 24 readv(17, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 108}], 1) = 108 writev(17, [{"\200\0\0(", 4}, {"\0\0\0\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0]\313", 16}], 3) = 44 epoll_wait(3, {{EPOLLIN, {u32=18, u64=47244640274}}}, 261, 4294967295) = 1 readv(18, [{"\200\0\0\214", 4}], 1) = 4 readv(18, [{"\0\0\0\2\0\0\0\0", 8}], 1) = 8 readv(18, [{"\0\0\0\2\2\10\256\300\0\0\0\1\0\0\0\1\0\0\0\5\0\0\0X", 24}], 1) = 24 readv(18, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 108}], 1) = 108 writev(18, [{"\200\0\0(", 4}, {"\0\0\0\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0]\314", 16}], 3) = 44 epoll_wait(3, {{EPOLLIN, {u32=17, u64=42949672977}}}, 261, 4294967295) = 1 readv(17, [{"\214\0\0\200", 4}], 1) = 0 gettimeofday({1305192326, 937372}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:25:26.937372] W ["..., 197) = 197 epoll_ctl(3, EPOLL_CTL_DEL, 17, NULL) = 0 epoll_ctl(3, EPOLL_CTL_MOD, 18, {EPOLLIN|EPOLLPRI, {u32=18, u64=42949672978}}) = 0 close(17) = 0 epoll_wait(3, {{EPOLLIN, {u32=18, u64=42949672978}}}, 261, 4294967295) = 1 readv(18, [{"\214\0\0\200", 4}], 1) = 0 gettimeofday({1305192326, 938265}, NULL) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 write(4, "[2011-05-12 11:25:26.938265] W ["..., 197) = 197 epoll_ctl(3, EPOLL_CTL_DEL, 18, NULL) = 0 close(18) = 0 epoll_wait(3, <unfinished ...> Process 2458 detached Maybe it's not related to glusterfs. I'll try the same configuration on another servers. Note that i've changed in this log my slave and master ip to [master|slave].mydomain.com -- C?dric Lagneau