Engelmann Florian
2015-Nov-10 11:57 UTC
[Gluster-users] concurrent "gluster volume status" crashes the command (v3.4 and v3.7)
Dear list, concurrent running "gluster volume status" on all 3 GlusterFS Nodes (actually those are LXC) somehow crashes the command. Two nodes reply "Another transaction is in progress. Please try again after sometime." and on the 3rd node the command hangs forever. Stopping the hanging command and running it again results also in "Another transaction is in progress. Please try again after sometime." on that machine. strace exits like: [...] connect(7, {sa_family=AF_LOCAL, sun_path="/var/run/gluster/quotad.socket"}, 110) = -1 ENOENT (No such file or directory) fcntl(7, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLONESHOT, {u32=1, u64=4294967297}}) = 0 pipe([8, 9]) = 0 fcntl(9, F_SETFD, FD_CLOEXEC) = 0 pipe([10, 11]) = 0 fcntl(10, F_GETFL) = 0 (flags O_RDONLY) fstat(10, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f67780e5000 lseek(10, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f67780d9a50) = 28493 close(-1) = -1 EBADF (Bad file descriptor) close(11) = 0 close(-1) = -1 EBADF (Bad file descriptor) close(9) = 0 read(8, "", 4) = 0 close(8) = 0 read(10, "gsyncd.py 0.0.1\n", 4096) = 16 wait4(28493, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 28493 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=28493, si_status=0, si_utime=5, si_stime=1} --- close(10) = 0 munmap(0x7f67780e5000, 4096) = 0 close(-1) = -1 EBADF (Bad file descriptor) close(-2) = -1 EBADF (Bad file descriptor) close(-1) = -1 EBADF (Bad file descriptor) mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6773545000 mprotect(0x7f6773545000, 4096, PROT_NONE) = 0 clone(child_stack=0x7f6773d44f70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6773d459d0, tls=0x7f6773d45700, child_tidptr=0x7f6773d459d0) = 28496 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6772d44000 mprotect(0x7f6772d44000, 4096, PROT_NONE) = 0 clone(child_stack=0x7f6773543f70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f67735449d0, tls=0x7f6773544700, child_tidptr=0x7f67735449d0) = 28497 futex(0x7f67735449d0, FUTEX_WAIT, 28497, NULLAnother transaction is in progress. Please try again after sometime. <unfinished ...> +++ exited with 1 +++ I had to stop all volumes and restart glusterd to solve that problem. Host OS: Ubuntu 14.04 LTS LXC OS: Ubuntu 14.04 LTS We've got this issue with 3.4.2 (Ubuntu official) and upgraded to 3.7.5 (Launchpad) to check if the problem still exists. Still unsolved. Any ideas? Thank you for your help, Florian