Also, could you print local->fop please? -Krutika On Fri, Aug 5, 2016 at 10:46 AM, Krutika Dhananjay <kdhananj at redhat.com> wrote:> Were the images being renamed (specifically to a pathname that already > exists) while they were being written to? > > -Krutika > > On Thu, Aug 4, 2016 at 1:14 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > >> Hi, >> >> Kindly check the following link for all 7 bricks logs; >> >> https://db.tt/YP5qTGXk >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> ------------------------------ >> From: kdhananj at redhat.com >> Date: Thu, 4 Aug 2016 13:00:43 +0530 >> >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> To: mahdi.adnan at outlook.com >> CC: gluster-users at gluster.org >> >> Could you also attach the brick logs please? >> >> -Krutika >> >> On Thu, Aug 4, 2016 at 12:48 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> appreciate your help, >> >> (gdb) frame 2 >> #2 0x00007f195deb1787 in shard_common_inode_write_do >> (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:3716 >> 3716 anon_fd = fd_anonymous >> (local->inode_list[i]); >> (gdb) p local->inode_list[0] >> $4 = (inode_t *) 0x7f195c532b18 >> (gdb) p local->inode_list[1] >> $5 = (inode_t *) 0x0 >> (gdb) >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> ------------------------------ >> From: kdhananj at redhat.com >> Date: Thu, 4 Aug 2016 12:43:10 +0530 >> >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> To: mahdi.adnan at outlook.com >> CC: gluster-users at gluster.org >> >> OK. >> Could you also print the values of the following variables from the >> original core: >> i. i >> ii. local->inode_list[0] >> iii. local->inode_list[1] >> >> -Krutika >> >> On Wed, Aug 3, 2016 at 9:01 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> Hi, >> >> Unfortunately no, but i can setup a test bench and see if it gets the >> same results. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> ------------------------------ >> From: kdhananj at redhat.com >> Date: Wed, 3 Aug 2016 20:59:50 +0530 >> >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> To: mahdi.adnan at outlook.com >> CC: gluster-users at gluster.org >> >> Do you have a test case that consistently recreates this problem? >> >> -Krutika >> >> On Wed, Aug 3, 2016 at 8:32 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> Hi, >> >> So i have updated to 3.7.14 and i still have the same issue with NFS. >> based on what i have provided so far from logs and dumps do you think >> it's an NFS issue ? should i switch to nfs-ganesha ? >> the problem is, the current setup is used in a production environment, >> and switching the mount point of +50 VMs from native nfs to nfs-ganesha is >> not going to be smooth and without downtime, so i really appreciate your >> thoughts on this matter. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> ------------------------------ >> From: mahdi.adnan at outlook.com >> To: kdhananj at redhat.com >> Date: Tue, 2 Aug 2016 08:44:16 +0300 >> >> CC: gluster-users at gluster.org >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> >> Hi, >> >> The NFS just crashed again, latest bt; >> >> (gdb) bt >> #0 0x00007f0b71a9f210 in pthread_spin_lock () from /lib64/libpthread.so.0 >> #1 0x00007f0b72c6fcd5 in fd_anonymous (inode=0x0) at fd.c:804 >> #2 0x00007f0b64ca5787 in shard_common_inode_write_do >> (frame=0x7f0b707c062c, this=0x7f0b6002ac10) at shard.c:3716 >> #3 0x00007f0b64ca5a53 in shard_common_inode_write_post_lookup_shards_handler >> (frame=<optimized out>, this=<optimized out>) at shard.c:3769 >> #4 0x00007f0b64c9eff5 in shard_common_lookup_shards_cbk >> (frame=0x7f0b707c062c, cookie=<optimized out>, this=0x7f0b6002ac10, >> op_ret=0, >> op_errno=<optimized out>, inode=<optimized out>, buf=0x7f0b51407640, >> xdata=0x7f0b72f57648, postparent=0x7f0b514076b0) at shard.c:1601 >> #5 0x00007f0b64efe141 in dht_lookup_cbk (frame=0x7f0b7075fcdc, >> cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, >> inode=0x7f0b5f1d1f58, >> stbuf=0x7f0b51407640, xattr=0x7f0b72f57648, >> postparent=0x7f0b514076b0) at dht-common.c:2174 >> #6 0x00007f0b651871f3 in afr_lookup_done (frame=frame at entry=0x7f0b7079a4c8, >> this=this at entry=0x7f0b60023ba0) at afr-common.c:1825 >> #7 0x00007f0b65187b84 in afr_lookup_metadata_heal_check >> (frame=frame at entry=0x7f0b7079a4c8, this=0x7f0b60023ba0, this at entry >> =0xca0bd88259f5a800) >> at afr-common.c:2068 >> #8 0x00007f0b6518834f in afr_lookup_entry_heal (frame=frame at entry >> =0x7f0b7079a4c8, this=0xca0bd88259f5a800, this at entry=0x7f0b60023ba0) at >> afr-common.c:2157 >> #9 0x00007f0b6518867d in afr_lookup_cbk (frame=0x7f0b7079a4c8, >> cookie=<optimized out>, this=0x7f0b60023ba0, op_ret=<optimized out>, >> op_errno=<optimized out>, inode=<optimized out>, buf=0x7f0b564e9940, >> xdata=0x7f0b72f708c8, postparent=0x7f0b564e99b0) at afr-common.c:2205 >> #10 0x00007f0b653d6e42 in client3_3_lookup_cbk (req=<optimized out>, >> iov=<optimized out>, count=<optimized out>, myframe=0x7f0b7076354c) >> at client-rpc-fops.c:2981 >> #11 0x00007f0b72a00a30 in rpc_clnt_handle_reply (clnt=clnt at entry >> =0x7f0b603393c0, pollin=pollin at entry=0x7f0b50c1c2d0) at rpc-clnt.c:764 >> #12 0x00007f0b72a00cef in rpc_clnt_notify (trans=<optimized out>, >> mydata=0x7f0b603393f0, event=<optimized out>, data=0x7f0b50c1c2d0) at >> rpc-clnt.c:925 >> #13 0x00007f0b729fc7c3 in rpc_transport_notify (this=this at entry >> =0x7f0b60349040, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, >> data=data at entry=0x7f0b50c1c2d0) >> at rpc-transport.c:546 >> #14 0x00007f0b678c39a4 in socket_event_poll_in (this=this at entry >> =0x7f0b60349040) at socket.c:2353 >> #15 0x00007f0b678c65e4 in socket_event_handler (fd=fd at entry=29, >> idx=idx at entry=17, data=0x7f0b60349040, poll_in=1, poll_out=0, >> poll_err=0) at socket.c:2466 >> #16 0x00007f0b72ca0f7a in event_dispatch_epoll_handler >> (event=0x7f0b564e9e80, event_pool=0x7f0b7349bf20) at event-epoll.c:575 >> #17 event_dispatch_epoll_worker (data=0x7f0b60152d40) at event-epoll.c:678 >> #18 0x00007f0b71a9adc5 in start_thread () from /lib64/libpthread.so.0 >> #19 0x00007f0b713dfced in clone () from /lib64/libc.so.6 >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> ------------------------------ >> From: mahdi.adnan at outlook.com >> To: kdhananj at redhat.com >> Date: Mon, 1 Aug 2016 16:31:50 +0300 >> CC: gluster-users at gluster.org >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> >> Many thanks, >> >> here's the results; >> >> >> (gdb) p cur_block >> $15 = 4088 >> (gdb) p last_block >> $16 = 4088 >> (gdb) p local->first_block >> $17 = 4087 >> (gdb) p odirect >> $18 = _gf_false >> (gdb) p fd->flags >> $19 = 2 >> (gdb) p local->call_count >> $20 = 2 >> >> >> If you need more core dumps, i have several files i can upload. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> ------------------------------ >> From: kdhananj at redhat.com >> Date: Mon, 1 Aug 2016 18:39:27 +0530 >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> To: mahdi.adnan at outlook.com >> CC: gluster-users at gluster.org >> >> Sorry I didn't make myself clear. The reason I asked YOU to do it is >> because i tried it on my system and im not getting the backtrace (it's all >> question marks). >> >> Attach the core to gdb. >> At the gdb prompt, go to frame 2 by typing >> (gdb) f 2 >> >> There, for each of the variables i asked you to get the values of, type p >> followed by the variable name. >> For instance, to get the value of the variable 'odirect', do this: >> >> (gdb) p odirect >> >> and gdb will print its value for you in response. >> >> -Krutika >> >> On Mon, Aug 1, 2016 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> Hi, >> >> How to get the results of the below variables ? i cant get the results >> from gdb. >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> ------------------------------ >> From: kdhananj at redhat.com >> Date: Mon, 1 Aug 2016 15:51:38 +0530 >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> To: mahdi.adnan at outlook.com >> CC: gluster-users at gluster.org >> >> >> Could you also print and share the values of the following variables from >> the backtrace please: >> >> i. cur_block >> ii. last_block >> iii. local->first_block >> iv. odirect >> v. fd->flags >> vi. local->call_count >> >> -Krutika >> >> On Sat, Jul 30, 2016 at 5:04 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> Hi, >> >> I really appreciate if someone can help me fix my nfs crash, its >> happening a lot and it's causing lots of issues to my VMs; >> the problem is every few hours the native nfs crash and the volume become >> unavailable from the affected node unless i restart glusterd. >> the volume is used by vmware esxi as a datastore for it's VMs with the >> following options; >> >> >> OS: CentOS 7.2 >> Gluster: 3.7.13 >> >> Volume Name: vlm01 >> Type: Distributed-Replicate >> Volume ID: eacd8248-dca3-4530-9aed-7714a5a114f2 >> Status: Started >> Number of Bricks: 7 x 3 = 21 >> Transport-type: tcp >> Bricks: >> Brick1: gfs01:/bricks/b01/vlm01 >> Brick2: gfs02:/bricks/b01/vlm01 >> Brick3: gfs03:/bricks/b01/vlm01 >> Brick4: gfs01:/bricks/b02/vlm01 >> Brick5: gfs02:/bricks/b02/vlm01 >> Brick6: gfs03:/bricks/b02/vlm01 >> Brick7: gfs01:/bricks/b03/vlm01 >> Brick8: gfs02:/bricks/b03/vlm01 >> Brick9: gfs03:/bricks/b03/vlm01 >> Brick10: gfs01:/bricks/b04/vlm01 >> Brick11: gfs02:/bricks/b04/vlm01 >> Brick12: gfs03:/bricks/b04/vlm01 >> Brick13: gfs01:/bricks/b05/vlm01 >> Brick14: gfs02:/bricks/b05/vlm01 >> Brick15: gfs03:/bricks/b05/vlm01 >> Brick16: gfs01:/bricks/b06/vlm01 >> Brick17: gfs02:/bricks/b06/vlm01 >> Brick18: gfs03:/bricks/b06/vlm01 >> Brick19: gfs01:/bricks/b07/vlm01 >> Brick20: gfs02:/bricks/b07/vlm01 >> Brick21: gfs03:/bricks/b07/vlm01 >> Options Reconfigured: >> performance.readdir-ahead: off >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.stat-prefetch: off >> cluster.eager-lock: enable >> network.remote-dio: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> performance.strict-write-ordering: on >> performance.write-behind: off >> cluster.data-self-heal-algorithm: full >> cluster.self-heal-window-size: 128 >> features.shard-block-size: 16MB >> features.shard: on >> auth.allow: 192.168.221.50,192.168.221.51,192.168.221.52,192.168.221.56, >> 192.168.208.130,192.168.208.131,192.168.208.132,192.168.208. >> 89,192.168.208.85,192.168.208.208.86 >> network.ping-timeout: 10 >> >> >> latest bt; >> >> >> (gdb) bt >> #0 0x00007f196acab210 in pthread_spin_lock () from /lib64/libpthread.so.0 >> #1 0x00007f196be7bcd5 in fd_anonymous (inode=0x0) at fd.c:804 >> #2 0x00007f195deb1787 in shard_common_inode_write_do >> (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:3716 >> #3 0x00007f195deb1a53 in shard_common_inode_write_post_lookup_shards_handler >> (frame=<optimized out>, this=<optimized out>) at shard.c:3769 >> #4 0x00007f195deaaff5 in shard_common_lookup_shards_cbk >> (frame=0x7f19699f1164, cookie=<optimized out>, this=0x7f195802ac10, >> op_ret=0, >> op_errno=<optimized out>, inode=<optimized out>, buf=0x7f194970bc40, >> xdata=0x7f196c15451c, postparent=0x7f194970bcb0) at shard.c:1601 >> #5 0x00007f195e10a141 in dht_lookup_cbk (frame=0x7f196998e7d4, >> cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, >> inode=0x7f195c532b18, >> stbuf=0x7f194970bc40, xattr=0x7f196c15451c, >> postparent=0x7f194970bcb0) at dht-common.c:2174 >> #6 0x00007f195e3931f3 in afr_lookup_done (frame=frame at entry=0x7f196997f8a4, >> this=this at entry=0x7f1958022a20) at afr-common.c:1825 >> #7 0x00007f195e393b84 in afr_lookup_metadata_heal_check >> (frame=frame at entry=0x7f196997f8a4, this=0x7f1958022a20, this at entry >> =0xe3a929e0b67fa500) >> at afr-common.c:2068 >> #8 0x00007f195e39434f in afr_lookup_entry_heal (frame=frame at entry >> =0x7f196997f8a4, this=0xe3a929e0b67fa500, this at entry=0x7f1958022a20) at >> afr-common.c:2157 >> #9 0x00007f195e39467d in afr_lookup_cbk (frame=0x7f196997f8a4, >> cookie=<optimized out>, this=0x7f1958022a20, op_ret=<optimized out>, >> op_errno=<optimized out>, inode=<optimized out>, buf=0x7f195effa940, >> xdata=0x7f196c1853b0, postparent=0x7f195effa9b0) at afr-common.c:2205 >> #10 0x00007f195e5e2e42 in client3_3_lookup_cbk (req=<optimized out>, >> iov=<optimized out>, count=<optimized out>, myframe=0x7f196999952c) >> at client-rpc-fops.c:2981 >> #11 0x00007f196bc0ca30 in rpc_clnt_handle_reply (clnt=clnt at entry >> =0x7f19583adaf0, pollin=pollin at entry=0x7f195907f930) at rpc-clnt.c:764 >> #12 0x00007f196bc0ccef in rpc_clnt_notify (trans=<optimized out>, >> mydata=0x7f19583adb20, event=<optimized out>, data=0x7f195907f930) at >> rpc-clnt.c:925 >> #13 0x00007f196bc087c3 in rpc_transport_notify (this=this at entry >> =0x7f19583bd770, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, >> data=data at entry=0x7f195907f930) >> at rpc-transport.c:546 >> #14 0x00007f1960acf9a4 in socket_event_poll_in (this=this at entry >> =0x7f19583bd770) at socket.c:2353 >> #15 0x00007f1960ad25e4 in socket_event_handler (fd=fd at entry=25, >> idx=idx at entry=14, data=0x7f19583bd770, poll_in=1, poll_out=0, >> poll_err=0) at socket.c:2466 >> #16 0x00007f196beacf7a in event_dispatch_epoll_handler >> (event=0x7f195effae80, event_pool=0x7f196dbf5f20) at event-epoll.c:575 >> #17 event_dispatch_epoll_worker (data=0x7f196dc41e10) at event-epoll.c:678 >> #18 0x00007f196aca6dc5 in start_thread () from /lib64/libpthread.so.0 >> #19 0x00007f196a5ebced in clone () from /lib64/libc.so.6 >> >> >> >> >> nfs logs and the core dump can be found in the dropbox link below; >> https://db.tt/rZrC9d7f >> >> >> thanks in advance. >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> _______________________________________________ Gluster-users mailing >> list Gluster-users at gluster.org http://www.gluster.org/mailman >> /listinfo/gluster-users >> >> _______________________________________________ Gluster-users mailing >> list Gluster-users at gluster.org http://www.gluster.org/mailman >> /listinfo/gluster-users >> >> >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160805/ba0c245c/attachment.html>
Hi, Yes, i got some messages regarding an existing file name in the renaming process while the VMs are online. and here's the output;(gdb) frame 2#2 0x00007f195deb1787 in shard_common_inode_write_do (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:37163716 anon_fd = fd_anonymous (local->inode_list[i]);(gdb) p local->fop$1 = GF_FOP_WRITE(gdb) -- Respectfully Mahdi A. Mahdi From: kdhananj at redhat.com Date: Fri, 5 Aug 2016 10:48:36 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.adnan at outlook.com CC: gluster-users at gluster.org Also, could you print local->fop please? -Krutika On Fri, Aug 5, 2016 at 10:46 AM, Krutika Dhananjay <kdhananj at redhat.com> wrote: Were the images being renamed (specifically to a pathname that already exists) while they were being written to? -Krutika On Thu, Aug 4, 2016 at 1:14 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote: Hi, Kindly check the following link for all 7 bricks logs; https://db.tt/YP5qTGXk -- Respectfully Mahdi A. Mahdi From: kdhananj at redhat.com Date: Thu, 4 Aug 2016 13:00:43 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.adnan at outlook.com CC: gluster-users at gluster.org Could you also attach the brick logs please? -Krutika On Thu, Aug 4, 2016 at 12:48 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote: appreciate your help, (gdb) frame 2#2 0x00007f195deb1787 in shard_common_inode_write_do (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:37163716 anon_fd = fd_anonymous (local->inode_list[i]);(gdb) p local->inode_list[0]$4 = (inode_t *) 0x7f195c532b18(gdb) p local->inode_list[1]$5 = (inode_t *) 0x0(gdb) -- Respectfully Mahdi A. Mahdi From: kdhananj at redhat.com Date: Thu, 4 Aug 2016 12:43:10 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.adnan at outlook.com CC: gluster-users at gluster.org OK. Could you also print the values of the following variables from the original core: i. i ii. local->inode_list[0] iii. local->inode_list[1] -Krutika On Wed, Aug 3, 2016 at 9:01 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote: Hi, Unfortunately no, but i can setup a test bench and see if it gets the same results. -- Respectfully Mahdi A. Mahdi From: kdhananj at redhat.com Date: Wed, 3 Aug 2016 20:59:50 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.adnan at outlook.com CC: gluster-users at gluster.org Do you have a test case that consistently recreates this problem? -Krutika On Wed, Aug 3, 2016 at 8:32 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote: Hi, So i have updated to 3.7.14 and i still have the same issue with NFS.based on what i have provided so far from logs and dumps do you think it's an NFS issue ? should i switch to nfs-ganesha ? the problem is, the current setup is used in a production environment, and switching the mount point of +50 VMs from native nfs to nfs-ganesha is not going to be smooth and without downtime, so i really appreciate your thoughts on this matter. -- Respectfully Mahdi A. Mahdi From: mahdi.adnan at outlook.com To: kdhananj at redhat.com Date: Tue, 2 Aug 2016 08:44:16 +0300 CC: gluster-users at gluster.org Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash Hi, The NFS just crashed again, latest bt; (gdb) bt#0 0x00007f0b71a9f210 in pthread_spin_lock () from /lib64/libpthread.so.0#1 0x00007f0b72c6fcd5 in fd_anonymous (inode=0x0) at fd.c:804#2 0x00007f0b64ca5787 in shard_common_inode_write_do (frame=0x7f0b707c062c, this=0x7f0b6002ac10) at shard.c:3716#3 0x00007f0b64ca5a53 in shard_common_inode_write_post_lookup_shards_handler (frame=<optimized out>, this=<optimized out>) at shard.c:3769#4 0x00007f0b64c9eff5 in shard_common_lookup_shards_cbk (frame=0x7f0b707c062c, cookie=<optimized out>, this=0x7f0b6002ac10, op_ret=0, op_errno=<optimized out>, inode=<optimized out>, buf=0x7f0b51407640, xdata=0x7f0b72f57648, postparent=0x7f0b514076b0) at shard.c:1601#5 0x00007f0b64efe141 in dht_lookup_cbk (frame=0x7f0b7075fcdc, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, inode=0x7f0b5f1d1f58, stbuf=0x7f0b51407640, xattr=0x7f0b72f57648, postparent=0x7f0b514076b0) at dht-common.c:2174#6 0x00007f0b651871f3 in afr_lookup_done (frame=frame at entry=0x7f0b7079a4c8, this=this at entry=0x7f0b60023ba0) at afr-common.c:1825#7 0x00007f0b65187b84 in afr_lookup_metadata_heal_check (frame=frame at entry=0x7f0b7079a4c8, this=0x7f0b60023ba0, this at entry=0xca0bd88259f5a800) at afr-common.c:2068#8 0x00007f0b6518834f in afr_lookup_entry_heal (frame=frame at entry=0x7f0b7079a4c8, this=0xca0bd88259f5a800, this at entry=0x7f0b60023ba0) at afr-common.c:2157#9 0x00007f0b6518867d in afr_lookup_cbk (frame=0x7f0b7079a4c8, cookie=<optimized out>, this=0x7f0b60023ba0, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7f0b564e9940, xdata=0x7f0b72f708c8, postparent=0x7f0b564e99b0) at afr-common.c:2205#10 0x00007f0b653d6e42 in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f0b7076354c) at client-rpc-fops.c:2981#11 0x00007f0b72a00a30 in rpc_clnt_handle_reply (clnt=clnt at entry=0x7f0b603393c0, pollin=pollin at entry=0x7f0b50c1c2d0) at rpc-clnt.c:764#12 0x00007f0b72a00cef in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f0b603393f0, event=<optimized out>, data=0x7f0b50c1c2d0) at rpc-clnt.c:925#13 0x00007f0b729fc7c3 in rpc_transport_notify (this=this at entry=0x7f0b60349040, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7f0b50c1c2d0) at rpc-transport.c:546#14 0x00007f0b678c39a4 in socket_event_poll_in (this=this at entry=0x7f0b60349040) at socket.c:2353#15 0x00007f0b678c65e4 in socket_event_handler (fd=fd at entry=29, idx=idx at entry=17, data=0x7f0b60349040, poll_in=1, poll_out=0, poll_err=0) at socket.c:2466#16 0x00007f0b72ca0f7a in event_dispatch_epoll_handler (event=0x7f0b564e9e80, event_pool=0x7f0b7349bf20) at event-epoll.c:575#17 event_dispatch_epoll_worker (data=0x7f0b60152d40) at event-epoll.c:678#18 0x00007f0b71a9adc5 in start_thread () from /lib64/libpthread.so.0#19 0x00007f0b713dfced in clone () from /lib64/libc.so.6 -- Respectfully Mahdi A. Mahdi From: mahdi.adnan at outlook.com To: kdhananj at redhat.com Date: Mon, 1 Aug 2016 16:31:50 +0300 CC: gluster-users at gluster.org Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash Many thanks, here's the results; (gdb) p cur_block$15 = 4088(gdb) p last_block$16 = 4088(gdb) p local->first_block$17 = 4087(gdb) p odirect$18 = _gf_false(gdb) p fd->flags$19 = 2(gdb) p local->call_count$20 = 2 If you need more core dumps, i have several files i can upload. -- Respectfully Mahdi A. Mahdi From: kdhananj at redhat.com Date: Mon, 1 Aug 2016 18:39:27 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.adnan at outlook.com CC: gluster-users at gluster.org Sorry I didn't make myself clear. The reason I asked YOU to do it is because i tried it on my system and im not getting the backtrace (it's all question marks). Attach the core to gdb. At the gdb prompt, go to frame 2 by typing (gdb) f 2 There, for each of the variables i asked you to get the values of, type p followed by the variable name. For instance, to get the value of the variable 'odirect', do this: (gdb) p odirect and gdb will print its value for you in response. -Krutika On Mon, Aug 1, 2016 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote: Hi, How to get the results of the below variables ? i cant get the results from gdb. -- Respectfully Mahdi A. Mahdi From: kdhananj at redhat.com Date: Mon, 1 Aug 2016 15:51:38 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.adnan at outlook.com CC: gluster-users at gluster.org Could you also print and share the values of the following variables from the backtrace please: i. cur_block ii. last_block iii. local->first_block iv. odirect v. fd->flags vi. local->call_count -Krutika On Sat, Jul 30, 2016 at 5:04 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote: Hi, I really appreciate if someone can help me fix my nfs crash, its happening a lot and it's causing lots of issues to my VMs;the problem is every few hours the native nfs crash and the volume become unavailable from the affected node unless i restart glusterd.the volume is used by vmware esxi as a datastore for it's VMs with the following options; OS: CentOS 7.2Gluster: 3.7.13 Volume Name: vlm01Type: Distributed-ReplicateVolume ID: eacd8248-dca3-4530-9aed-7714a5a114f2Status: StartedNumber of Bricks: 7 x 3 = 21Transport-type: tcpBricks:Brick1: gfs01:/bricks/b01/vlm01Brick2: gfs02:/bricks/b01/vlm01Brick3: gfs03:/bricks/b01/vlm01Brick4: gfs01:/bricks/b02/vlm01Brick5: gfs02:/bricks/b02/vlm01Brick6: gfs03:/bricks/b02/vlm01Brick7: gfs01:/bricks/b03/vlm01Brick8: gfs02:/bricks/b03/vlm01Brick9: gfs03:/bricks/b03/vlm01Brick10: gfs01:/bricks/b04/vlm01Brick11: gfs02:/bricks/b04/vlm01Brick12: gfs03:/bricks/b04/vlm01Brick13: gfs01:/bricks/b05/vlm01Brick14: gfs02:/bricks/b05/vlm01Brick15: gfs03:/bricks/b05/vlm01Brick16: gfs01:/bricks/b06/vlm01Brick17: gfs02:/bricks/b06/vlm01Brick18: gfs03:/bricks/b06/vlm01Brick19: gfs01:/bricks/b07/vlm01Brick20: gfs02:/bricks/b07/vlm01Brick21: gfs03:/bricks/b07/vlm01Options Reconfigured:performance.readdir-ahead: offperformance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.stat-prefetch: offcluster.eager-lock: enablenetwork.remote-dio: enablecluster.quorum-type: autocluster.server-quorum-type: serverperformance.strict-write-ordering: onperformance.write-behind: offcluster.data-self-heal-algorithm: fullcluster.self-heal-window-size: 128features.shard-block-size: 16MBfeatures.shard: onauth.allow: 192.168.221.50,192.168.221.51,192.168.221.52,192.168.221.56,192.168.208.130,192.168.208.131,192.168.208.132,192.168.208.89,192.168.208.85,192.168.208.208.86network.ping-timeout: 10 latest bt; (gdb) bt #0 0x00007f196acab210 in pthread_spin_lock () from /lib64/libpthread.so.0#1 0x00007f196be7bcd5 in fd_anonymous (inode=0x0) at fd.c:804#2 0x00007f195deb1787 in shard_common_inode_write_do (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:3716#3 0x00007f195deb1a53 in shard_common_inode_write_post_lookup_shards_handler (frame=<optimized out>, this=<optimized out>) at shard.c:3769#4 0x00007f195deaaff5 in shard_common_lookup_shards_cbk (frame=0x7f19699f1164, cookie=<optimized out>, this=0x7f195802ac10, op_ret=0, op_errno=<optimized out>, inode=<optimized out>, buf=0x7f194970bc40, xdata=0x7f196c15451c, postparent=0x7f194970bcb0) at shard.c:1601#5 0x00007f195e10a141 in dht_lookup_cbk (frame=0x7f196998e7d4, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, inode=0x7f195c532b18, stbuf=0x7f194970bc40, xattr=0x7f196c15451c, postparent=0x7f194970bcb0) at dht-common.c:2174#6 0x00007f195e3931f3 in afr_lookup_done (frame=frame at entry=0x7f196997f8a4, this=this at entry=0x7f1958022a20) at afr-common.c:1825#7 0x00007f195e393b84 in afr_lookup_metadata_heal_check (frame=frame at entry=0x7f196997f8a4, this=0x7f1958022a20, this at entry=0xe3a929e0b67fa500) at afr-common.c:2068#8 0x00007f195e39434f in afr_lookup_entry_heal (frame=frame at entry=0x7f196997f8a4, this=0xe3a929e0b67fa500, this at entry=0x7f1958022a20) at afr-common.c:2157#9 0x00007f195e39467d in afr_lookup_cbk (frame=0x7f196997f8a4, cookie=<optimized out>, this=0x7f1958022a20, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7f195effa940, xdata=0x7f196c1853b0, postparent=0x7f195effa9b0) at afr-common.c:2205#10 0x00007f195e5e2e42 in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f196999952c) at client-rpc-fops.c:2981#11 0x00007f196bc0ca30 in rpc_clnt_handle_reply (clnt=clnt at entry=0x7f19583adaf0, pollin=pollin at entry=0x7f195907f930) at rpc-clnt.c:764#12 0x00007f196bc0ccef in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f19583adb20, event=<optimized out>, data=0x7f195907f930) at rpc-clnt.c:925#13 0x00007f196bc087c3 in rpc_transport_notify (this=this at entry=0x7f19583bd770, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7f195907f930) at rpc-transport.c:546#14 0x00007f1960acf9a4 in socket_event_poll_in (this=this at entry=0x7f19583bd770) at socket.c:2353#15 0x00007f1960ad25e4 in socket_event_handler (fd=fd at entry=25, idx=idx at entry=14, data=0x7f19583bd770, poll_in=1, poll_out=0, poll_err=0) at socket.c:2466#16 0x00007f196beacf7a in event_dispatch_epoll_handler (event=0x7f195effae80, event_pool=0x7f196dbf5f20) at event-epoll.c:575#17 event_dispatch_epoll_worker (data=0x7f196dc41e10) at event-epoll.c:678#18 0x00007f196aca6dc5 in start_thread () from /lib64/libpthread.so.0#19 0x00007f196a5ebced in clone () from /lib64/libc.so.6 nfs logs and the core dump can be found in the dropbox link below;https://db.tt/rZrC9d7f thanks in advance.Respectfully Mahdi A. Mahdi _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://www.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://www.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://www.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160805/1c378b69/attachment.html>