Amar Tumballi
2020-Apr-17 05:05 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
This thread has been one of the largest effort to stabilize the systems in recent times. Thanks for patience and number of retries you did, Erik! We surely need to get to the glitch you found with the 7.4 version, as with every higher version, we expect more stability! Regards, Amar On Fri, Apr 17, 2020 at 2:46 AM Erik Jacobson <erik.jacobson at hpe.com> wrote:> I have some news. > > After many many many trials, reboots of gluster servers, reboots of > nodes... > in what should have reproduced the issue several times. I think we're > stable. > > It appears this glusterfs nfs daemon hang only happens in glusterfs74 > and not 72. > > So.... > 1) Your split-brain patch > 2) performance.parallel-readdir off > 3) glusterfs72 > > I declare it stable. I can't make it fail: split-brain, hang, noor seg > fault > with one leader down. > > I'm working on putting this in to a SW update. > > We are going to test if performance.parallel-readdir off impacts booting > at scale but we don't have a system to try it on at this time. > > THAK YOU! > > I may have access to the 57 node test system if there is something you'd > like me to try with regards to why glusterfs74 is unstable in this > situation. Just let me know. > > Erik > > On Thu, Apr 16, 2020 at 12:03:33PM -0500, Erik Jacobson wrote: > > So in my test runs since making that change, we have a different odd > > behavior now. As you recall, this is with your patch -- still not > > split-brain -- and now with performance.parallel-readdir off > > > > The NFS server grinds to a hault after a few test runs. It does not core > > dump. > > > > All that shows up in the log is: > > > > "pending frames:" with nothing after it and no date stamp. > > > > I will start looking for interesting break points I guess. > > > > > > The glusterfs for nfs is still alive: > > > > root 30541 1 42 09:57 ? 00:51:06 /usr/sbin/glusterfs -s > localhost --volfile-id gluster/nfs -p /var/run/gluster/nfs/nfs.pid -l > /var/log/glusterfs/nfs.log -S /var/run/gluster/9ddb5561058ff543.socket > > > > > > > > [root at leader3 ~]# strace -f -p 30541 > > strace: Process 30541 attached with 40 threads > > [pid 30580] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30579] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30578] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30577] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30576] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30575] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30574] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30573] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30572] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30571] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30570] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30569] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30568] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30567] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30566] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30565] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30564] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30563] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30562] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30561] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30560] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30559] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30558] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30557] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30556] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30555] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30554] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30553] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30552] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30551] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30550] restart_syscall(<... resuming interrupted restart_syscall > ...> <unfinished ...> > > [pid 30549] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30548] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=243775} > <unfinished ...> > > [pid 30546] restart_syscall(<... resuming interrupted restart_syscall > ...> <unfinished ...> > > [pid 30545] restart_syscall(<... resuming interrupted restart_syscall > ...> <unfinished ...> > > [pid 30544] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30543] rt_sigtimedwait([HUP INT USR1 USR2 TERM], <unfinished ...> > > [pid 30542] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30541] futex(0x7f890c3a39d0, FUTEX_WAIT, 30548, NULL <unfinished > ...> > > [pid 30547] <... select resumed> ) = 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 > (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}^Cstrace: > Process 30541 detached > > strace: Process 30542 detached > > strace: Process 30543 detached > > strace: Process 30544 detached > > strace: Process 30545 detached > > strace: Process 30546 detached > > strace: Process 30547 detached > > <detached ...> > > strace: Process 30548 detached > > strace: Process 30549 detached > > strace: Process 30550 detached > > strace: Process 30551 detached > > strace: Process 30552 detached > > strace: Process 30553 detached > > strace: Process 30554 detached > > strace: Process 30555 detached > > strace: Process 30556 detached > > strace: Process 30557 detached > > strace: Process 30558 detached > > strace: Process 30559 detached > > strace: Process 30560 detached > > strace: Process 30561 detached > > strace: Process 30562 detached > > strace: Process 30563 detached > > strace: Process 30564 detached > > strace: Process 30565 detached > > strace: Process 30566 detached > > strace: Process 30567 detached > > strace: Process 30568 detached > > strace: Process 30569 detached > > strace: Process 30570 detached > > strace: Process 30571 detached > > strace: Process 30572 detached > > strace: Process 30573 detached > > strace: Process 30574 detached > > strace: Process 30575 detached > > strace: Process 30576 detached > > strace: Process 30577 detached > > strace: Process 30578 detached > > strace: Process 30579 detached > > strace: Process 30580 detached > > > > > > > > > > > On 16/04/20 8:04 pm, Erik Jacobson wrote: > > > > Quick update just on how this got set. > > > > > > > > gluster volume set cm_shared performance.parallel-readdir on > > > > > > > > Is something we did turn on, thinking it might make our NFS services > > > > faster and not knowing about it possibly being negative. > > > > > > > > Below is a diff of the nfs volume file ON vs OFF. So I will simply > turn > > > > this OFF and do a test run. > > > Yes,that should do it. I am not sure if performance.parallel-readdir > was > > > intentionally made to have an effect on gnfs volfiles. Usually, for > other > > > performance xlators, `gluster volume set` only changes the fuse > volfile. > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- -- https://kadalu.io Container Storage made easy! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200417/ed692ec3/attachment.html>
Ravishankar N
2020-Apr-17 05:15 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
On 17/04/20 10:35 am, Amar Tumballi wrote:> This thread has been one of the largest effort to stabilize the > systems in recent times. > > Thanks for patience and number of retries you did, Erik!Thanks indeed! Once https://review.gluster.org/#/c/glusterfs/+/24316/ gets merged on master, I will back port it to the release branches.> > We surely need to get to the glitch you found with the 7.4 version, as > with every higher version, we expect more stability!True, maybe we should start a separate thread... Regards, Ravi> Regards, > Amar > > On Fri, Apr 17, 2020 at 2:46 AM Erik Jacobson <erik.jacobson at hpe.com > <mailto:erik.jacobson at hpe.com>> wrote: > > I have some news. > > After many many many trials, reboots of gluster servers, reboots > of nodes... > in what should have reproduced the issue several times. I think we're > stable. > > It appears this glusterfs nfs daemon hang only happens in glusterfs74 > and not 72. > > So.... > 1) Your split-brain patch > 2) performance.parallel-readdir off > 3) glusterfs72 > > I declare it stable. I can't make it fail: split-brain, hang, noor > seg fault > with one leader down. > > I'm working on putting this in to a SW update. > > We are going to test if performance.parallel-readdir off impacts > booting > at scale but we don't have a system to try it on at this time. > > THAK YOU! > > I may have access to the 57 node test system if there is something > you'd > like me to try with regards to why glusterfs74 is unstable in this > situation. Just let me know. > > Erik > > On Thu, Apr 16, 2020 at 12:03:33PM -0500, Erik Jacobson wrote: > > So in my test runs since making that change, we have a different odd > > behavior now. As you recall, this is with your patch -- still not > > split-brain -- and now with performance.parallel-readdir off > > > > The NFS server grinds to a hault after a few test runs. It does > not core > > dump. > > > > All that shows up in the log is: > > > > "pending frames:" with nothing after it and no date stamp. > > > > I will start looking for interesting break points I guess. > > > > > > The glusterfs for nfs is still alive: > > > > root? ? ?30541? ? ?1 42 09:57 ?? ? ? ? 00:51:06 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p > /var/run/gluster/nfs/nfs.pid -l /var/log/glusterfs/nfs.log -S > /var/run/gluster/9ddb5561058ff543.socket > > > > > > > > [root at leader3 ~]# strace -f? -p 30541 > > strace: Process 30541 attached with 40 threads > > [pid 30580] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30579] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30578] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30577] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30576] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30575] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30574] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30573] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30572] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30571] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30570] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30569] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30568] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30567] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30566] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30565] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30564] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30563] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30562] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30561] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30560] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30559] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30558] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30557] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30556] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30555] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30554] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30553] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30552] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30551] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30550] restart_syscall(<... resuming interrupted > restart_syscall ...> <unfinished ...> > > [pid 30549] futex(0x7f8904035f60, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30548] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=0, > tv_usec=243775} <unfinished ...> > > [pid 30546] restart_syscall(<... resuming interrupted > restart_syscall ...> <unfinished ...> > > [pid 30545] restart_syscall(<... resuming interrupted > restart_syscall ...> <unfinished ...> > > [pid 30544] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30543] rt_sigtimedwait([HUP INT USR1 USR2 TERM], > <unfinished ...> > > [pid 30542] futex(0x7f88b8000020, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > > [pid 30541] futex(0x7f890c3a39d0, FUTEX_WAIT, 30548, NULL > <unfinished ...> > > [pid 30547] <... select resumed> )? ? ? = 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) > 0 (Timeout) > > [pid 30547] select(0, NULL, NULL, NULL, {tv_sec=1, > tv_usec=0}^Cstrace: Process 30541 detached > > strace: Process 30542 detached > > strace: Process 30543 detached > > strace: Process 30544 detached > > strace: Process 30545 detached > > strace: Process 30546 detached > > strace: Process 30547 detached > >? <detached ...> > > strace: Process 30548 detached > > strace: Process 30549 detached > > strace: Process 30550 detached > > strace: Process 30551 detached > > strace: Process 30552 detached > > strace: Process 30553 detached > > strace: Process 30554 detached > > strace: Process 30555 detached > > strace: Process 30556 detached > > strace: Process 30557 detached > > strace: Process 30558 detached > > strace: Process 30559 detached > > strace: Process 30560 detached > > strace: Process 30561 detached > > strace: Process 30562 detached > > strace: Process 30563 detached > > strace: Process 30564 detached > > strace: Process 30565 detached > > strace: Process 30566 detached > > strace: Process 30567 detached > > strace: Process 30568 detached > > strace: Process 30569 detached > > strace: Process 30570 detached > > strace: Process 30571 detached > > strace: Process 30572 detached > > strace: Process 30573 detached > > strace: Process 30574 detached > > strace: Process 30575 detached > > strace: Process 30576 detached > > strace: Process 30577 detached > > strace: Process 30578 detached > > strace: Process 30579 detached > > strace: Process 30580 detached > > > > > > > > > > > On 16/04/20 8:04 pm, Erik Jacobson wrote: > > > > Quick update just on how this got set. > > > > > > > > gluster volume set cm_shared performance.parallel-readdir on > > > > > > > > Is something we did turn on, thinking it might make our NFS > services > > > > faster and not knowing about it possibly being negative. > > > > > > > > Below is a diff of the nfs volume file ON vs OFF. So I will > simply turn > > > > this OFF and do a test run. > > > Yes,that should do it. I am not sure if > performance.parallel-readdir was > > > intentionally made to have an effect on gnfs volfiles. > Usually, for other > > > performance xlators, `gluster volume set` only changes the > fuse volfile. > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > -- > https://kadalu.io > Container Storage made easy! >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200417/edc21f19/attachment.html>