Ravishankar N
2020-Apr-16 16:18 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
On 16/04/20 8:04 pm, Erik Jacobson wrote:> Quick update just on how this got set. > > gluster volume set cm_shared performance.parallel-readdir on > > Is something we did turn on, thinking it might make our NFS services > faster and not knowing about it possibly being negative. > > Below is a diff of the nfs volume file ON vs OFF. So I will simply turn > this OFF and do a test run.Yes,that should do it. I am not sure if performance.parallel-readdir was intentionally made to have an effect on gnfs volfiles. Usually, for other performance xlators, `gluster volume set` only changes the fuse volfile.
Erik Jacobson
2020-Apr-16 17:03 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
So in my test runs since making that change, we have a different odd behavior now. As you recall, this is with your patch -- still not split-brain -- and now with performance.parallel-readdir off The NFS server grinds to a hault after a few test runs. It does not core dump. All that shows up in the log is: "pending frames:" with nothing after it and no date stamp. I will start looking for interesting break points I guess. The glusterfs for nfs is still alive: root 30541 1 42 09:57 ? 00:51:06 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/run/gluster/nfs/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/9ddb5561058ff543.socket [root at leader3 ~]# strace -f -p 30541 strace: Process 30541 attached with 40 threads [pid 30580] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30579] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30578] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30577] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30576] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30575] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30574] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30573] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30572] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30571] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30570] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30569] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30568] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30567] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30566] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30565] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30564] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30563] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30562] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30561] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30560] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30559] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30558] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30557] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30556] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30555] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30554] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30553] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30552] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30551] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30550] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...> [pid 30549] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30548] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=243775} <unfinished ...> [pid 30546] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...> [pid 30545] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...> [pid 30544] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30543] rt_sigtimedwait([HUP INT USR1 USR2 TERM], <unfinished ...> [pid 30542] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30541] futex(0x7f890c3a39d0, FUTEX_WAIT, 30548, NULL <unfinished ...> [pid 30547] <... select resumed> ) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout) [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}^Cstrace: Process 30541 detached strace: Process 30542 detached strace: Process 30543 detached strace: Process 30544 detached strace: Process 30545 detached strace: Process 30546 detached strace: Process 30547 detached <detached ...> strace: Process 30548 detached strace: Process 30549 detached strace: Process 30550 detached strace: Process 30551 detached strace: Process 30552 detached strace: Process 30553 detached strace: Process 30554 detached strace: Process 30555 detached strace: Process 30556 detached strace: Process 30557 detached strace: Process 30558 detached strace: Process 30559 detached strace: Process 30560 detached strace: Process 30561 detached strace: Process 30562 detached strace: Process 30563 detached strace: Process 30564 detached strace: Process 30565 detached strace: Process 30566 detached strace: Process 30567 detached strace: Process 30568 detached strace: Process 30569 detached strace: Process 30570 detached strace: Process 30571 detached strace: Process 30572 detached strace: Process 30573 detached strace: Process 30574 detached strace: Process 30575 detached strace: Process 30576 detached strace: Process 30577 detached strace: Process 30578 detached strace: Process 30579 detached strace: Process 30580 detached> On 16/04/20 8:04 pm, Erik Jacobson wrote: > > Quick update just on how this got set. > > > > gluster volume set cm_shared performance.parallel-readdir on > > > > Is something we did turn on, thinking it might make our NFS services > > faster and not knowing about it possibly being negative. > > > > Below is a diff of the nfs volume file ON vs OFF. So I will simply turn > > this OFF and do a test run. > Yes,that should do it. I am not sure if performance.parallel-readdir was > intentionally made to have an effect on gnfs volfiles. Usually, for other > performance xlators, `gluster volume set` only changes the fuse volfile.