Atin Mukherjee
2015-Mar-20 04:57 UTC
[Gluster-users] Gluster volume brick keeps going offline
I see there is a crash in the brick log. patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-03-19 06:00:35configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.5.0 /lib/x86_64-linux-gnu/libc.so.6(+0x321e0)[0x7f027c7031e0] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(__get_entrylk_count+0x40)[0x7f0277fc5d70] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(get_entrylk_count+0x4d)[0x7f0277fc5ddd] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(pl_entrylk_xattr_fill+0x19)[0x7f0277fc2df9] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(pl_lookup_cbk+0x1d0)[0x7f0277fc3390] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/access-control.so(posix_acl_lookup_cbk+0x12b)[0x7f02781d91fb] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/storage/posix.so(posix_lookup+0x331)[0x7f02788046c1] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_lookup+0x70)[0x7f027d6d1270] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/access-control.so(posix_acl_lookup+0x1b5)[0x7f02781d72f5] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(pl_lookup+0x211)[0x7f0277fbd391] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/performance/io-threads.so(iot_lookup_wrapper+0x140)[0x7f0277da82d0] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x126)[0x7f027d6e5f16] /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/performance/io-threads.so(iot_worker+0x13e)[0x7f0277da86be] /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7f027ce5bb50] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f027c7ad70d] Pranith/Ravi, Could you help Kaamesh for it? Also at the glusterd side I see there is some RPC related failures (Probably a corruption). What's the gluster version are you using? Is there any surprising logs at the other node? ~Atin On 03/19/2015 12:58 PM, Kaamesh Kamalaaharan wrote:> Sorry, forgot to include the attachment > > Thank You Kindly, > Kaamesh > Bioinformatician > Novocraft Technologies Sdn Bhd > C-23A-05, 3 Two Square, Section 19, 46300 Petaling Jaya > Selangor Darul Ehsan > Malaysia > Mobile: +60176562635 > Ph: +60379600541 > Fax: +60379600540 > > On Thu, Mar 19, 2015 at 2:40 PM, Kaamesh Kamalaaharan <kaamesh at novocraft.com >> wrote: > >> Hi Atin, Thanks for the reply. Im not sure which logs are relevant so ill >> just attach them all in a gz file. >> >> I ran a sudo gluster volume start gfsvolume force at 2015-03-19 05:49 >> i hope this helps. >> >> Thank You Kindly, >> Kaamesh >> >> On Sun, Mar 15, 2015 at 11:41 PM, Atin Mukherjee <amukherj at redhat.com> >> wrote: >> >>> Could you attach the logs for the analysis? >>> >>> ~Atin >>> >>> On 03/13/2015 03:29 PM, Kaamesh Kamalaaharan wrote: >>>> Hi guys. Ive been using gluster for a while now and despite a few >>> hiccups, >>>> i find its a great system to use. One of my more persistent hiccups is >>> an >>>> issue with one brick going offline. >>>> >>>> My setup is a 2 brick 2 node setup. my main brick is gfs1 which has not >>>> given me any problem. gfs2 however keeps going offline. Following >>>> http://www.gluster.org/pipermail/gluster-users/2014-June/017583.html >>>> temporarily fixed the error but the brick goes offline within the hour. >>>> >>>> This is what i get from my volume status command : >>>> >>>> sudo gluster volume status >>>>> >>>>> Status of volume: gfsvolume >>>>> Gluster process Port Online Pid >>>>> >>>>> >>> ------------------------------------------------------------------------------ >>>>> Brick gfs1:/export/sda/brick 49153 Y 9760 >>>>> Brick gfs2:/export/sda/brick N/A N 13461 >>>>> NFS Server on localhost 2049 Y 13473 >>>>> Self-heal Daemon on localhost N/A Y 13480 >>>>> NFS Server on gfs1 2049 Y 16166 >>>>> Self-heal Daemon on gfs1 N/A Y 16173 >>>>> >>>>> Task Status of Volume gfsvolume >>>>> >>>>> >>> ------------------------------------------------------------------------------ >>>>> There are no active volume tasks >>>>> >>>>> >>>> doing sudo gluster volume start gfsvolume force gives me this: >>>> >>>> sudo gluster volume status >>>>> >>>>> Status of volume: gfsvolume >>>>> Gluster process Port Online Pid >>>>> >>>>> >>> ------------------------------------------------------------------------------ >>>>> Brick gfs1:/export/sda/brick 49153 Y 9760 >>>>> Brick gfs2:/export/sda/brick 49153 Y 13461 >>>>> NFS Server on localhost 2049 Y 13473 >>>>> Self-heal Daemon on localhost N/A Y 13480 >>>>> NFS Server on gfs1 2049 Y 16166 >>>>> Self-heal Daemon on gfs1 N/A Y 16173 >>>>> >>>>> Task Status of Volume gfsvolume >>>>> >>>>> >>> ------------------------------------------------------------------------------ >>>>> There are no active volume tasks >>>>> >>>>> half an hour later and my brick goes down again. >>>> >>>>> >>>>> >>>>> This is my glustershd.log. I snipped it because the rest of the log is >>> a >>>> repeat of the same error >>>> >>>> >>>>> >>>>> [2015-03-13 02:09:41.951556] I [glusterfsd.c:1959:main] >>>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version >>> 3.5.0 >>>>> (/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>>>> /var/lib/glus >>>>> terd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>>>> /var/run/deac2f873d0ac5b6c3e84b23c4790172.socket --xlator-option >>>>> *replicate*.node-uuid=adbb7505-3342-4c6d-be3d-75938633612c) >>>>> [2015-03-13 02:09:41.954173] I [socket.c:3561:socket_init] >>>>> 0-socket.glusterfsd: SSL support is NOT enabled >>>>> [2015-03-13 02:09:41.954236] I [socket.c:3576:socket_init] >>>>> 0-socket.glusterfsd: using system polling thread >>>>> [2015-03-13 02:09:41.954421] I [socket.c:3561:socket_init] 0-glusterfs: >>>>> SSL support is NOT enabled >>>>> [2015-03-13 02:09:41.954443] I [socket.c:3576:socket_init] 0-glusterfs: >>>>> using system polling thread >>>>> [2015-03-13 02:09:41.956731] I [graph.c:254:gf_add_cmdline_options] >>>>> 0-gfsvolume-replicate-0: adding option 'node-uuid' for volume >>>>> 'gfsvolume-replicate-0' with value >>> 'adbb7505-3342-4c6d-be3d-75938633612c' >>>>> [2015-03-13 02:09:41.960210] I >>> [rpc-clnt.c:972:rpc_clnt_connection_init] >>>>> 0-gfsvolume-client-1: setting frame-timeout to 90 >>>>> [2015-03-13 02:09:41.960288] I [socket.c:3561:socket_init] >>>>> 0-gfsvolume-client-1: SSL support is NOT enabled >>>>> [2015-03-13 02:09:41.960301] I [socket.c:3576:socket_init] >>>>> 0-gfsvolume-client-1: using system polling thread >>>>> [2015-03-13 02:09:41.961095] I >>> [rpc-clnt.c:972:rpc_clnt_connection_init] >>>>> 0-gfsvolume-client-0: setting frame-timeout to 90 >>>>> [2015-03-13 02:09:41.961134] I [socket.c:3561:socket_init] >>>>> 0-gfsvolume-client-0: SSL support is NOT enabled >>>>> [2015-03-13 02:09:41.961145] I [socket.c:3576:socket_init] >>>>> 0-gfsvolume-client-0: using system polling thread >>>>> [2015-03-13 02:09:41.961173] I [client.c:2273:notify] >>>>> 0-gfsvolume-client-0: parent translators are ready, attempting connect >>> on >>>>> transport >>>>> [2015-03-13 02:09:41.961412] I [client.c:2273:notify] >>>>> 0-gfsvolume-client-1: parent translators are ready, attempting connect >>> on >>>>> transport >>>>> Final graph: >>>>> >>>>> >>> +------------------------------------------------------------------------------+ >>>>> 1: volume gfsvolume-client-0 >>>>> 2: type protocol/client >>>>> 3: option remote-host gfs1 >>>>> 4: option remote-subvolume /export/sda/brick >>>>> 5: option transport-type socket >>>>> 6: option frame-timeout 90 >>>>> 7: option ping-timeout 30 >>>>> 8: end-volume >>>>> 9: >>>>> 10: volume gfsvolume-client-1 >>>>> 11: type protocol/client >>>>> 12: option remote-host gfs2 >>>>> 13: option remote-subvolume /export/sda/brick >>>>> 14: option transport-type socket >>>>> 15: option frame-timeout 90 >>>>> 16: option ping-timeout 30 >>>>> 17: end-volume >>>>> 18: >>>>> 19: volume gfsvolume-replicate-0 >>>>> 20: type cluster/replicate >>>>> 21: option node-uuid adbb7505-3342-4c6d-be3d-75938633612c >>>>> 22: option background-self-heal-count 0 >>>>> 23: option metadata-self-heal on >>>>> 24: option data-self-heal on >>>>> 25: option entry-self-heal on >>>>> 26: option self-heal-daemon on >>>>> 27: option data-self-heal-algorithm diff >>>>> 28: option quorum-type fixed >>>>> 29: option quorum-count 1 >>>>> 30: option iam-self-heal-daemon yes >>>>> 31: subvolumes gfsvolume-client-0 gfsvolume-client-1 >>>>> 32: end-volume >>>>> 33: >>>>> 34: volume glustershd >>>>> 35: type debug/io-stats >>>>> 36: subvolumes gfsvolume-replicate-0 >>>>> 37: end-volume >>>>> >>>>> >>> +------------------------------------------------------------------------------+ >>>>> [2015-03-13 02:09:41.961871] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>> [2015-03-13 02:09:41.962129] I >>>>> [client-handshake.c:1659:select_server_supported_programs] >>>>> 0-gfsvolume-client-1: Using Program GlusterFS 3.3, Num (1298437), >>> Version >>>>> (330) >>>>> [2015-03-13 02:09:41.962344] I >>>>> [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-1: >>>>> Connected to 172.20.20.22:49153, attached to remote volume >>>>> '/export/sda/brick'. >>>>> [2015-03-13 02:09:41.962363] I >>>>> [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-1: >>> Server >>>>> and Client lk-version numbers are not same, reopening the fds >>>>> [2015-03-13 02:09:41.962416] I [afr-common.c:3922:afr_notify] >>>>> 0-gfsvolume-replicate-0: Subvolume 'gfsvolume-client-1' came back up; >>> going >>>>> online. >>>>> [2015-03-13 02:09:41.962487] I >>>>> [client-handshake.c:450:client_set_lk_version_cbk] >>> 0-gfsvolume-client-1: >>>>> Server lk version = 1 >>>>> [2015-03-13 02:09:41.963109] E >>>>> [afr-self-heald.c:1479:afr_find_child_position] >>> 0-gfsvolume-replicate-0: >>>>> getxattr failed on gfsvolume-client-0 - (Transport endpoint is not >>>>> connected) >>>>> [2015-03-13 02:09:41.963502] I >>>>> [afr-self-heald.c:1687:afr_dir_exclusive_crawl] >>> 0-gfsvolume-replicate-0: >>>>> Another crawl is in progress for gfsvolume-client-1 >>>>> [2015-03-13 02:09:41.967478] E >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>. >>>>> [2015-03-13 02:09:41.968550] E >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>. >>>>> [2015-03-13 02:09:41.969663] E >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>. >>>>> [2015-03-13 02:09:41.974345] E >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>. >>>>> [2015-03-13 02:09:41.975657] E >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>. >>>>> [2015-03-13 02:09:41.977020] E >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>. >>>>> [2015-03-13 02:09:44.307219] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>> 0-gfsvolume-client-0: changing port to 49153 (from 0) >>>>> [2015-03-13 02:09:44.307748] I >>>>> [client-handshake.c:1659:select_server_supported_programs] >>>>> 0-gfsvolume-client-0: Using Program GlusterFS 3.3, Num (1298437), >>> Version >>>>> (330) >>>>> [2015-03-13 02:09:44.448377] I >>>>> [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-0: >>>>> Connected to 172.20.20.21:49153, attached to remote volume >>>>> '/export/sda/brick'. >>>>> [2015-03-13 02:09:44.448418] I >>>>> [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-0: >>> Server >>>>> and Client lk-version numbers are not same, reopening the fds >>>>> [2015-03-13 02:09:44.448713] I >>>>> [client-handshake.c:450:client_set_lk_version_cbk] >>> 0-gfsvolume-client-0: >>>>> Server lk version = 1 >>>>> [2015-03-13 02:09:44.515112] I >>>>> [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status] >>>>> 0-gfsvolume-replicate-0: foreground data self heal is successfully >>>>> completed, data self heal from gfsvolume-client-0 to sinks >>>>> gfsvolume-client-1, with 892928 bytes on gfsvolume-client-0, 892928 >>> bytes >>>>> on gfsvolume-client-1, data - Pending matrix: [ [ 0 155762 ] [ 0 0 ] >>> ] >>>>> on <gfid:123536cc-c34b-43d7-b0c6-cf80eefa8322> >>>>> [2015-03-13 02:09:44.809988] I >>>>> [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status] >>>>> 0-gfsvolume-replicate-0: foreground data self heal is successfully >>>>> completed, data self heal from gfsvolume-client-0 to sinks >>>>> gfsvolume-client-1, with 15998976 bytes on gfsvolume-client-0, >>> 15998976 >>>>> bytes on gfsvolume-client-1, data - Pending matrix: [ [ 0 36506 ] [ >>> 0 0 ] >>>>> ] on <gfid:b6dc0e74-31bf-469a-b629-ee51ab4cf729> >>>>> [2015-03-13 02:09:44.946050] W >>>>> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: >>> remote >>>>> operation failed: Stale NFS file handle >>>>> [2015-03-13 02:09:44.946097] I >>>>> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk] >>>>> 0-gfsvolume-replicate-0: readlink of >>>>> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>/PB2_corrected.fastq on >>>>> gfsvolume-client-1 failed (Stale NFS file handle) >>>>> [2015-03-13 02:09:44.951370] I >>>>> [afr-self-heal-entry.c:2321:afr_sh_entry_fix] 0-gfsvolume-replicate-0: >>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>: Performing conservative >>> merge >>>>> [2015-03-13 02:09:45.149995] W >>>>> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: >>> remote >>>>> operation failed: Stale NFS file handle >>>>> [2015-03-13 02:09:45.150036] I >>>>> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk] >>>>> 0-gfsvolume-replicate-0: readlink of >>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>/Rscript on >>> gfsvolume-client-1 >>>>> failed (Stale NFS file handle) >>>>> [2015-03-13 02:09:45.214253] W >>>>> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: >>> remote >>>>> operation failed: Stale NFS file handle >>>>> [2015-03-13 02:09:45.214295] I >>>>> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk] >>>>> 0-gfsvolume-replicate-0: readlink of >>>>> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>/ananas_d_tmp on >>>>> gfsvolume-client-1 failed (Stale NFS file handle) >>>>> [2015-03-13 02:13:27.324856] W [socket.c:522:__socket_rwv] >>>>> 0-gfsvolume-client-1: readv on 172.20.20.22:49153 failed (No data >>>>> available) >>>>> [2015-03-13 02:13:27.324961] I [client.c:2208:client_rpc_notify] >>>>> 0-gfsvolume-client-1: disconnected from 172.20.20.22:49153. Client >>>>> process will keep trying to connect to glusterd until brick's port is >>>>> available >>>>> [2015-03-13 02:13:37.981531] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>> [2015-03-13 02:13:37.981781] E [socket.c:2161:socket_connect_finish] >>>>> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed >>> (Connection >>>>> refused) >>>>> [2015-03-13 02:13:41.982125] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>> [2015-03-13 02:13:41.982353] E [socket.c:2161:socket_connect_finish] >>>>> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed >>> (Connection >>>>> refused) >>>>> [2015-03-13 02:13:45.982693] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>> [2015-03-13 02:13:45.982926] E [socket.c:2161:socket_connect_finish] >>>>> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed >>> (Connection >>>>> refused) >>>>> [2015-03-13 02:13:49.983309] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>> >>>>> >>>> >>>> Any help would be greatly appreciated. >>>> Thank You Kindly, >>>> Kaamesh >>>> >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>> >>> >>> >>> >> >-- ~Atin
Kaamesh Kamalaaharan
2015-Mar-20 07:37 UTC
[Gluster-users] Gluster volume brick keeps going offline
Hi Atin, Thank you so much for your continual assistance. I am using gluster 3.6.2 on both servers and on some of the clients. I have attached the gluster1 logs for your reference. The gluster1 log files are empty and the log.1 files are the ones that have data. I couldnt attach all the files as they exceed the 25 MB limit. Please let me know if there are any other files i could attach to help you understand this better. Thank You Kindly, Kaamesh Bioinformatician Novocraft Technologies Sdn Bhd C-23A-05, 3 Two Square, Section 19, 46300 Petaling Jaya Selangor Darul Ehsan Malaysia Mobile: +60176562635 Ph: +60379600541 Fax: +60379600540 On Fri, Mar 20, 2015 at 12:57 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> I see there is a crash in the brick log. > > patchset: git://git.gluster.com/glusterfs.git > > signal received: 11 > > time of crash: 2015-03-19 06:00:35configuration details: > > argp 1 > > backtrace 1 > > dlfcn 1 > > fdatasync 1 > > libpthread 1 > > llistxattr 1 > > setfsid 1 > > spinlock 1 > > epoll.h 1 > > xattr.h 1 > > st_atim.tv_nsec 1 > > package-string: glusterfs 3.5.0 > > /lib/x86_64-linux-gnu/libc.so.6(+0x321e0)[0x7f027c7031e0] > > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(__get_entrylk_count+0x40)[0x7f0277fc5d70] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(get_entrylk_count+0x4d)[0x7f0277fc5ddd] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(pl_entrylk_xattr_fill+0x19)[0x7f0277fc2df9] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(pl_lookup_cbk+0x1d0)[0x7f0277fc3390] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/access-control.so(posix_acl_lookup_cbk+0x12b)[0x7f02781d91fb] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/storage/posix.so(posix_lookup+0x331)[0x7f02788046c1] > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_lookup+0x70)[0x7f027d6d1270] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/access-control.so(posix_acl_lookup+0x1b5)[0x7f02781d72f5] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(pl_lookup+0x211)[0x7f0277fbd391] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/performance/io-threads.so(iot_lookup_wrapper+0x140)[0x7f0277da82d0] > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x126)[0x7f027d6e5f16] > > > /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/performance/io-threads.so(iot_worker+0x13e)[0x7f0277da86be] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7f027ce5bb50] > > /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f027c7ad70d] > > Pranith/Ravi, > > Could you help Kaamesh for it? > > Also at the glusterd side I see there is some RPC related failures > (Probably a corruption). What's the gluster version are you using? Is > there any surprising logs at the other node? > > ~Atin > > On 03/19/2015 12:58 PM, Kaamesh Kamalaaharan wrote: > > Sorry, forgot to include the attachment > > > > Thank You Kindly, > > Kaamesh > > Bioinformatician > > Novocraft Technologies Sdn Bhd > > C-23A-05, 3 Two Square, Section 19, 46300 Petaling Jaya > > Selangor Darul Ehsan > > Malaysia > > Mobile: +60176562635 > > Ph: +60379600541 > > Fax: +60379600540 > > > > On Thu, Mar 19, 2015 at 2:40 PM, Kaamesh Kamalaaharan < > kaamesh at novocraft.com > >> wrote: > > > >> Hi Atin, Thanks for the reply. Im not sure which logs are relevant so > ill > >> just attach them all in a gz file. > >> > >> I ran a sudo gluster volume start gfsvolume force at 2015-03-19 05:49 > >> i hope this helps. > >> > >> Thank You Kindly, > >> Kaamesh > >> > >> On Sun, Mar 15, 2015 at 11:41 PM, Atin Mukherjee <amukherj at redhat.com> > >> wrote: > >> > >>> Could you attach the logs for the analysis? > >>> > >>> ~Atin > >>> > >>> On 03/13/2015 03:29 PM, Kaamesh Kamalaaharan wrote: > >>>> Hi guys. Ive been using gluster for a while now and despite a few > >>> hiccups, > >>>> i find its a great system to use. One of my more persistent hiccups is > >>> an > >>>> issue with one brick going offline. > >>>> > >>>> My setup is a 2 brick 2 node setup. my main brick is gfs1 which has > not > >>>> given me any problem. gfs2 however keeps going offline. Following > >>>> http://www.gluster.org/pipermail/gluster-users/2014-June/017583.html > >>>> temporarily fixed the error but the brick goes offline within the > hour. > >>>> > >>>> This is what i get from my volume status command : > >>>> > >>>> sudo gluster volume status > >>>>> > >>>>> Status of volume: gfsvolume > >>>>> Gluster process Port Online Pid > >>>>> > >>>>> > >>> > ------------------------------------------------------------------------------ > >>>>> Brick gfs1:/export/sda/brick 49153 Y 9760 > >>>>> Brick gfs2:/export/sda/brick N/A N 13461 > >>>>> NFS Server on localhost 2049 Y 13473 > >>>>> Self-heal Daemon on localhost N/A Y 13480 > >>>>> NFS Server on gfs1 2049 Y 16166 > >>>>> Self-heal Daemon on gfs1 N/A Y 16173 > >>>>> > >>>>> Task Status of Volume gfsvolume > >>>>> > >>>>> > >>> > ------------------------------------------------------------------------------ > >>>>> There are no active volume tasks > >>>>> > >>>>> > >>>> doing sudo gluster volume start gfsvolume force gives me this: > >>>> > >>>> sudo gluster volume status > >>>>> > >>>>> Status of volume: gfsvolume > >>>>> Gluster process Port Online Pid > >>>>> > >>>>> > >>> > ------------------------------------------------------------------------------ > >>>>> Brick gfs1:/export/sda/brick 49153 Y 9760 > >>>>> Brick gfs2:/export/sda/brick 49153 Y 13461 > >>>>> NFS Server on localhost 2049 Y 13473 > >>>>> Self-heal Daemon on localhost N/A Y 13480 > >>>>> NFS Server on gfs1 2049 Y 16166 > >>>>> Self-heal Daemon on gfs1 N/A Y 16173 > >>>>> > >>>>> Task Status of Volume gfsvolume > >>>>> > >>>>> > >>> > ------------------------------------------------------------------------------ > >>>>> There are no active volume tasks > >>>>> > >>>>> half an hour later and my brick goes down again. > >>>> > >>>>> > >>>>> > >>>>> This is my glustershd.log. I snipped it because the rest of the log > is > >>> a > >>>> repeat of the same error > >>>> > >>>> > >>>>> > >>>>> [2015-03-13 02:09:41.951556] I [glusterfsd.c:1959:main] > >>>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version > >>> 3.5.0 > >>>>> (/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > >>>>> /var/lib/glus > >>>>> terd/glustershd/run/glustershd.pid -l > >>> /var/log/glusterfs/glustershd.log -S > >>>>> /var/run/deac2f873d0ac5b6c3e84b23c4790172.socket --xlator-option > >>>>> *replicate*.node-uuid=adbb7505-3342-4c6d-be3d-75938633612c) > >>>>> [2015-03-13 02:09:41.954173] I [socket.c:3561:socket_init] > >>>>> 0-socket.glusterfsd: SSL support is NOT enabled > >>>>> [2015-03-13 02:09:41.954236] I [socket.c:3576:socket_init] > >>>>> 0-socket.glusterfsd: using system polling thread > >>>>> [2015-03-13 02:09:41.954421] I [socket.c:3561:socket_init] > 0-glusterfs: > >>>>> SSL support is NOT enabled > >>>>> [2015-03-13 02:09:41.954443] I [socket.c:3576:socket_init] > 0-glusterfs: > >>>>> using system polling thread > >>>>> [2015-03-13 02:09:41.956731] I [graph.c:254:gf_add_cmdline_options] > >>>>> 0-gfsvolume-replicate-0: adding option 'node-uuid' for volume > >>>>> 'gfsvolume-replicate-0' with value > >>> 'adbb7505-3342-4c6d-be3d-75938633612c' > >>>>> [2015-03-13 02:09:41.960210] I > >>> [rpc-clnt.c:972:rpc_clnt_connection_init] > >>>>> 0-gfsvolume-client-1: setting frame-timeout to 90 > >>>>> [2015-03-13 02:09:41.960288] I [socket.c:3561:socket_init] > >>>>> 0-gfsvolume-client-1: SSL support is NOT enabled > >>>>> [2015-03-13 02:09:41.960301] I [socket.c:3576:socket_init] > >>>>> 0-gfsvolume-client-1: using system polling thread > >>>>> [2015-03-13 02:09:41.961095] I > >>> [rpc-clnt.c:972:rpc_clnt_connection_init] > >>>>> 0-gfsvolume-client-0: setting frame-timeout to 90 > >>>>> [2015-03-13 02:09:41.961134] I [socket.c:3561:socket_init] > >>>>> 0-gfsvolume-client-0: SSL support is NOT enabled > >>>>> [2015-03-13 02:09:41.961145] I [socket.c:3576:socket_init] > >>>>> 0-gfsvolume-client-0: using system polling thread > >>>>> [2015-03-13 02:09:41.961173] I [client.c:2273:notify] > >>>>> 0-gfsvolume-client-0: parent translators are ready, attempting > connect > >>> on > >>>>> transport > >>>>> [2015-03-13 02:09:41.961412] I [client.c:2273:notify] > >>>>> 0-gfsvolume-client-1: parent translators are ready, attempting > connect > >>> on > >>>>> transport > >>>>> Final graph: > >>>>> > >>>>> > >>> > +------------------------------------------------------------------------------+ > >>>>> 1: volume gfsvolume-client-0 > >>>>> 2: type protocol/client > >>>>> 3: option remote-host gfs1 > >>>>> 4: option remote-subvolume /export/sda/brick > >>>>> 5: option transport-type socket > >>>>> 6: option frame-timeout 90 > >>>>> 7: option ping-timeout 30 > >>>>> 8: end-volume > >>>>> 9: > >>>>> 10: volume gfsvolume-client-1 > >>>>> 11: type protocol/client > >>>>> 12: option remote-host gfs2 > >>>>> 13: option remote-subvolume /export/sda/brick > >>>>> 14: option transport-type socket > >>>>> 15: option frame-timeout 90 > >>>>> 16: option ping-timeout 30 > >>>>> 17: end-volume > >>>>> 18: > >>>>> 19: volume gfsvolume-replicate-0 > >>>>> 20: type cluster/replicate > >>>>> 21: option node-uuid adbb7505-3342-4c6d-be3d-75938633612c > >>>>> 22: option background-self-heal-count 0 > >>>>> 23: option metadata-self-heal on > >>>>> 24: option data-self-heal on > >>>>> 25: option entry-self-heal on > >>>>> 26: option self-heal-daemon on > >>>>> 27: option data-self-heal-algorithm diff > >>>>> 28: option quorum-type fixed > >>>>> 29: option quorum-count 1 > >>>>> 30: option iam-self-heal-daemon yes > >>>>> 31: subvolumes gfsvolume-client-0 gfsvolume-client-1 > >>>>> 32: end-volume > >>>>> 33: > >>>>> 34: volume glustershd > >>>>> 35: type debug/io-stats > >>>>> 36: subvolumes gfsvolume-replicate-0 > >>>>> 37: end-volume > >>>>> > >>>>> > >>> > +------------------------------------------------------------------------------+ > >>>>> [2015-03-13 02:09:41.961871] I [rpc-clnt.c:1685:rpc_clnt_reconfig] > >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) > >>>>> [2015-03-13 02:09:41.962129] I > >>>>> [client-handshake.c:1659:select_server_supported_programs] > >>>>> 0-gfsvolume-client-1: Using Program GlusterFS 3.3, Num (1298437), > >>> Version > >>>>> (330) > >>>>> [2015-03-13 02:09:41.962344] I > >>>>> [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-1: > >>>>> Connected to 172.20.20.22:49153, attached to remote volume > >>>>> '/export/sda/brick'. > >>>>> [2015-03-13 02:09:41.962363] I > >>>>> [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-1: > >>> Server > >>>>> and Client lk-version numbers are not same, reopening the fds > >>>>> [2015-03-13 02:09:41.962416] I [afr-common.c:3922:afr_notify] > >>>>> 0-gfsvolume-replicate-0: Subvolume 'gfsvolume-client-1' came back up; > >>> going > >>>>> online. > >>>>> [2015-03-13 02:09:41.962487] I > >>>>> [client-handshake.c:450:client_set_lk_version_cbk] > >>> 0-gfsvolume-client-1: > >>>>> Server lk version = 1 > >>>>> [2015-03-13 02:09:41.963109] E > >>>>> [afr-self-heald.c:1479:afr_find_child_position] > >>> 0-gfsvolume-replicate-0: > >>>>> getxattr failed on gfsvolume-client-0 - (Transport endpoint is not > >>>>> connected) > >>>>> [2015-03-13 02:09:41.963502] I > >>>>> [afr-self-heald.c:1687:afr_dir_exclusive_crawl] > >>> 0-gfsvolume-replicate-0: > >>>>> Another crawl is in progress for gfsvolume-client-1 > >>>>> [2015-03-13 02:09:41.967478] E > >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] > >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for > >>>>> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>. > >>>>> [2015-03-13 02:09:41.968550] E > >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] > >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for > >>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>. > >>>>> [2015-03-13 02:09:41.969663] E > >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] > >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for > >>>>> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>. > >>>>> [2015-03-13 02:09:41.974345] E > >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] > >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for > >>>>> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>. > >>>>> [2015-03-13 02:09:41.975657] E > >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] > >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for > >>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>. > >>>>> [2015-03-13 02:09:41.977020] E > >>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] > >>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for > >>>>> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>. > >>>>> [2015-03-13 02:09:44.307219] I [rpc-clnt.c:1685:rpc_clnt_reconfig] > >>>>> 0-gfsvolume-client-0: changing port to 49153 (from 0) > >>>>> [2015-03-13 02:09:44.307748] I > >>>>> [client-handshake.c:1659:select_server_supported_programs] > >>>>> 0-gfsvolume-client-0: Using Program GlusterFS 3.3, Num (1298437), > >>> Version > >>>>> (330) > >>>>> [2015-03-13 02:09:44.448377] I > >>>>> [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-0: > >>>>> Connected to 172.20.20.21:49153, attached to remote volume > >>>>> '/export/sda/brick'. > >>>>> [2015-03-13 02:09:44.448418] I > >>>>> [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-0: > >>> Server > >>>>> and Client lk-version numbers are not same, reopening the fds > >>>>> [2015-03-13 02:09:44.448713] I > >>>>> [client-handshake.c:450:client_set_lk_version_cbk] > >>> 0-gfsvolume-client-0: > >>>>> Server lk version = 1 > >>>>> [2015-03-13 02:09:44.515112] I > >>>>> [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status] > >>>>> 0-gfsvolume-replicate-0: foreground data self heal is successfully > >>>>> completed, data self heal from gfsvolume-client-0 to sinks > >>>>> gfsvolume-client-1, with 892928 bytes on gfsvolume-client-0, 892928 > >>> bytes > >>>>> on gfsvolume-client-1, data - Pending matrix: [ [ 0 155762 ] [ 0 0 > ] > >>> ] > >>>>> on <gfid:123536cc-c34b-43d7-b0c6-cf80eefa8322> > >>>>> [2015-03-13 02:09:44.809988] I > >>>>> [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status] > >>>>> 0-gfsvolume-replicate-0: foreground data self heal is successfully > >>>>> completed, data self heal from gfsvolume-client-0 to sinks > >>>>> gfsvolume-client-1, with 15998976 bytes on gfsvolume-client-0, > >>> 15998976 > >>>>> bytes on gfsvolume-client-1, data - Pending matrix: [ [ 0 36506 ] [ > >>> 0 0 ] > >>>>> ] on <gfid:b6dc0e74-31bf-469a-b629-ee51ab4cf729> > >>>>> [2015-03-13 02:09:44.946050] W > >>>>> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: > >>> remote > >>>>> operation failed: Stale NFS file handle > >>>>> [2015-03-13 02:09:44.946097] I > >>>>> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk] > >>>>> 0-gfsvolume-replicate-0: readlink of > >>>>> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>/PB2_corrected.fastq on > >>>>> gfsvolume-client-1 failed (Stale NFS file handle) > >>>>> [2015-03-13 02:09:44.951370] I > >>>>> [afr-self-heal-entry.c:2321:afr_sh_entry_fix] > 0-gfsvolume-replicate-0: > >>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>: Performing conservative > >>> merge > >>>>> [2015-03-13 02:09:45.149995] W > >>>>> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: > >>> remote > >>>>> operation failed: Stale NFS file handle > >>>>> [2015-03-13 02:09:45.150036] I > >>>>> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk] > >>>>> 0-gfsvolume-replicate-0: readlink of > >>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>/Rscript on > >>> gfsvolume-client-1 > >>>>> failed (Stale NFS file handle) > >>>>> [2015-03-13 02:09:45.214253] W > >>>>> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: > >>> remote > >>>>> operation failed: Stale NFS file handle > >>>>> [2015-03-13 02:09:45.214295] I > >>>>> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk] > >>>>> 0-gfsvolume-replicate-0: readlink of > >>>>> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>/ananas_d_tmp on > >>>>> gfsvolume-client-1 failed (Stale NFS file handle) > >>>>> [2015-03-13 02:13:27.324856] W [socket.c:522:__socket_rwv] > >>>>> 0-gfsvolume-client-1: readv on 172.20.20.22:49153 failed (No data > >>>>> available) > >>>>> [2015-03-13 02:13:27.324961] I [client.c:2208:client_rpc_notify] > >>>>> 0-gfsvolume-client-1: disconnected from 172.20.20.22:49153. Client > >>>>> process will keep trying to connect to glusterd until brick's port is > >>>>> available > >>>>> [2015-03-13 02:13:37.981531] I [rpc-clnt.c:1685:rpc_clnt_reconfig] > >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) > >>>>> [2015-03-13 02:13:37.981781] E [socket.c:2161:socket_connect_finish] > >>>>> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed > >>> (Connection > >>>>> refused) > >>>>> [2015-03-13 02:13:41.982125] I [rpc-clnt.c:1685:rpc_clnt_reconfig] > >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) > >>>>> [2015-03-13 02:13:41.982353] E [socket.c:2161:socket_connect_finish] > >>>>> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed > >>> (Connection > >>>>> refused) > >>>>> [2015-03-13 02:13:45.982693] I [rpc-clnt.c:1685:rpc_clnt_reconfig] > >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) > >>>>> [2015-03-13 02:13:45.982926] E [socket.c:2161:socket_connect_finish] > >>>>> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed > >>> (Connection > >>>>> refused) > >>>>> [2015-03-13 02:13:49.983309] I [rpc-clnt.c:1685:rpc_clnt_reconfig] > >>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) > >>>>> > >>>>> > >>>> > >>>> Any help would be greatly appreciated. > >>>> Thank You Kindly, > >>>> Kaamesh > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Gluster-users mailing list > >>>> Gluster-users at gluster.org > >>>> http://www.gluster.org/mailman/listinfo/gluster-users > >>>> > >>> > >>> > >>> > >> > > > > -- > ~Atin >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150320/2b6200ff/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: export-sda-brick.log Type: text/x-log Size: 78289 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150320/2b6200ff/attachment-0001.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd.log.1 Type: application/octet-stream Size: 6014029 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150320/2b6200ff/attachment-0001.obj>
Atin Mukherjee
2015-Mar-20 08:29 UTC
[Gluster-users] Gluster volume brick keeps going offline
On 03/20/2015 01:07 PM, Kaamesh Kamalaaharan wrote:> Hi Atin, > Thank you so much for your continual assistance. I am using gluster 3.6.2 > on both servers and on some of the clients. I have attached the gluster1 > logs for your reference. The gluster1 log files are empty and the log.1 > files are the ones that have data. I couldnt attach all the files as they > exceed the 25 MB limit. Please let me know if there are any other files i > could attach to help you understand this better.Glusterd log indicates that version is 3.5 as per this: [2015-03-16 01:05:09.829478] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.5.0 (/ usr/sbin/glusterd -p /var/run/glusterd.pid) Could you re-confirm? ~Atin> > > Thank You Kindly, > Kaamesh > Bioinformatician > Novocraft Technologies Sdn Bhd > C-23A-05, 3 Two Square, Section 19, 46300 Petaling Jaya > Selangor Darul Ehsan > Malaysia > Mobile: +60176562635 > Ph: +60379600541 > Fax: +60379600540 > > On Fri, Mar 20, 2015 at 12:57 PM, Atin Mukherjee <amukherj at redhat.com> > wrote: > >> I see there is a crash in the brick log. >> >> patchset: git://git.gluster.com/glusterfs.git >> >> signal received: 11 >> >> time of crash: 2015-03-19 06:00:35configuration details: >> >> argp 1 >> >> backtrace 1 >> >> dlfcn 1 >> >> fdatasync 1 >> >> libpthread 1 >> >> llistxattr 1 >> >> setfsid 1 >> >> spinlock 1 >> >> epoll.h 1 >> >> xattr.h 1 >> >> st_atim.tv_nsec 1 >> >> package-string: glusterfs 3.5.0 >> >> /lib/x86_64-linux-gnu/libc.so.6(+0x321e0)[0x7f027c7031e0] >> >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(__get_entrylk_count+0x40)[0x7f0277fc5d70] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(get_entrylk_count+0x4d)[0x7f0277fc5ddd] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(pl_entrylk_xattr_fill+0x19)[0x7f0277fc2df9] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(pl_lookup_cbk+0x1d0)[0x7f0277fc3390] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/access-control.so(posix_acl_lookup_cbk+0x12b)[0x7f02781d91fb] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/storage/posix.so(posix_lookup+0x331)[0x7f02788046c1] >> >> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_lookup+0x70)[0x7f027d6d1270] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/access-control.so(posix_acl_lookup+0x1b5)[0x7f02781d72f5] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/features/locks.so(pl_lookup+0x211)[0x7f0277fbd391] >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/performance/io-threads.so(iot_lookup_wrapper+0x140)[0x7f0277da82d0] >> >> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x126)[0x7f027d6e5f16] >> >> >> /usr/lib/x86_64-linux-gnu/glusterfs/3.5.0/xlator/performance/io-threads.so(iot_worker+0x13e)[0x7f0277da86be] >> /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7f027ce5bb50] >> >> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f027c7ad70d] >> >> Pranith/Ravi, >> >> Could you help Kaamesh for it? >> >> Also at the glusterd side I see there is some RPC related failures >> (Probably a corruption). What's the gluster version are you using? Is >> there any surprising logs at the other node? >> >> ~Atin >> >> On 03/19/2015 12:58 PM, Kaamesh Kamalaaharan wrote: >>> Sorry, forgot to include the attachment >>> >>> Thank You Kindly, >>> Kaamesh >>> Bioinformatician >>> Novocraft Technologies Sdn Bhd >>> C-23A-05, 3 Two Square, Section 19, 46300 Petaling Jaya >>> Selangor Darul Ehsan >>> Malaysia >>> Mobile: +60176562635 >>> Ph: +60379600541 >>> Fax: +60379600540 >>> >>> On Thu, Mar 19, 2015 at 2:40 PM, Kaamesh Kamalaaharan < >> kaamesh at novocraft.com >>>> wrote: >>> >>>> Hi Atin, Thanks for the reply. Im not sure which logs are relevant so >> ill >>>> just attach them all in a gz file. >>>> >>>> I ran a sudo gluster volume start gfsvolume force at 2015-03-19 05:49 >>>> i hope this helps. >>>> >>>> Thank You Kindly, >>>> Kaamesh >>>> >>>> On Sun, Mar 15, 2015 at 11:41 PM, Atin Mukherjee <amukherj at redhat.com> >>>> wrote: >>>> >>>>> Could you attach the logs for the analysis? >>>>> >>>>> ~Atin >>>>> >>>>> On 03/13/2015 03:29 PM, Kaamesh Kamalaaharan wrote: >>>>>> Hi guys. Ive been using gluster for a while now and despite a few >>>>> hiccups, >>>>>> i find its a great system to use. One of my more persistent hiccups is >>>>> an >>>>>> issue with one brick going offline. >>>>>> >>>>>> My setup is a 2 brick 2 node setup. my main brick is gfs1 which has >> not >>>>>> given me any problem. gfs2 however keeps going offline. Following >>>>>> http://www.gluster.org/pipermail/gluster-users/2014-June/017583.html >>>>>> temporarily fixed the error but the brick goes offline within the >> hour. >>>>>> >>>>>> This is what i get from my volume status command : >>>>>> >>>>>> sudo gluster volume status >>>>>>> >>>>>>> Status of volume: gfsvolume >>>>>>> Gluster process Port Online Pid >>>>>>> >>>>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>>>> Brick gfs1:/export/sda/brick 49153 Y 9760 >>>>>>> Brick gfs2:/export/sda/brick N/A N 13461 >>>>>>> NFS Server on localhost 2049 Y 13473 >>>>>>> Self-heal Daemon on localhost N/A Y 13480 >>>>>>> NFS Server on gfs1 2049 Y 16166 >>>>>>> Self-heal Daemon on gfs1 N/A Y 16173 >>>>>>> >>>>>>> Task Status of Volume gfsvolume >>>>>>> >>>>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>>>> There are no active volume tasks >>>>>>> >>>>>>> >>>>>> doing sudo gluster volume start gfsvolume force gives me this: >>>>>> >>>>>> sudo gluster volume status >>>>>>> >>>>>>> Status of volume: gfsvolume >>>>>>> Gluster process Port Online Pid >>>>>>> >>>>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>>>> Brick gfs1:/export/sda/brick 49153 Y 9760 >>>>>>> Brick gfs2:/export/sda/brick 49153 Y 13461 >>>>>>> NFS Server on localhost 2049 Y 13473 >>>>>>> Self-heal Daemon on localhost N/A Y 13480 >>>>>>> NFS Server on gfs1 2049 Y 16166 >>>>>>> Self-heal Daemon on gfs1 N/A Y 16173 >>>>>>> >>>>>>> Task Status of Volume gfsvolume >>>>>>> >>>>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>>>> There are no active volume tasks >>>>>>> >>>>>>> half an hour later and my brick goes down again. >>>>>> >>>>>>> >>>>>>> >>>>>>> This is my glustershd.log. I snipped it because the rest of the log >> is >>>>> a >>>>>> repeat of the same error >>>>>> >>>>>> >>>>>>> >>>>>>> [2015-03-13 02:09:41.951556] I [glusterfsd.c:1959:main] >>>>>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version >>>>> 3.5.0 >>>>>>> (/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>>>>>> /var/lib/glus >>>>>>> terd/glustershd/run/glustershd.pid -l >>>>> /var/log/glusterfs/glustershd.log -S >>>>>>> /var/run/deac2f873d0ac5b6c3e84b23c4790172.socket --xlator-option >>>>>>> *replicate*.node-uuid=adbb7505-3342-4c6d-be3d-75938633612c) >>>>>>> [2015-03-13 02:09:41.954173] I [socket.c:3561:socket_init] >>>>>>> 0-socket.glusterfsd: SSL support is NOT enabled >>>>>>> [2015-03-13 02:09:41.954236] I [socket.c:3576:socket_init] >>>>>>> 0-socket.glusterfsd: using system polling thread >>>>>>> [2015-03-13 02:09:41.954421] I [socket.c:3561:socket_init] >> 0-glusterfs: >>>>>>> SSL support is NOT enabled >>>>>>> [2015-03-13 02:09:41.954443] I [socket.c:3576:socket_init] >> 0-glusterfs: >>>>>>> using system polling thread >>>>>>> [2015-03-13 02:09:41.956731] I [graph.c:254:gf_add_cmdline_options] >>>>>>> 0-gfsvolume-replicate-0: adding option 'node-uuid' for volume >>>>>>> 'gfsvolume-replicate-0' with value >>>>> 'adbb7505-3342-4c6d-be3d-75938633612c' >>>>>>> [2015-03-13 02:09:41.960210] I >>>>> [rpc-clnt.c:972:rpc_clnt_connection_init] >>>>>>> 0-gfsvolume-client-1: setting frame-timeout to 90 >>>>>>> [2015-03-13 02:09:41.960288] I [socket.c:3561:socket_init] >>>>>>> 0-gfsvolume-client-1: SSL support is NOT enabled >>>>>>> [2015-03-13 02:09:41.960301] I [socket.c:3576:socket_init] >>>>>>> 0-gfsvolume-client-1: using system polling thread >>>>>>> [2015-03-13 02:09:41.961095] I >>>>> [rpc-clnt.c:972:rpc_clnt_connection_init] >>>>>>> 0-gfsvolume-client-0: setting frame-timeout to 90 >>>>>>> [2015-03-13 02:09:41.961134] I [socket.c:3561:socket_init] >>>>>>> 0-gfsvolume-client-0: SSL support is NOT enabled >>>>>>> [2015-03-13 02:09:41.961145] I [socket.c:3576:socket_init] >>>>>>> 0-gfsvolume-client-0: using system polling thread >>>>>>> [2015-03-13 02:09:41.961173] I [client.c:2273:notify] >>>>>>> 0-gfsvolume-client-0: parent translators are ready, attempting >> connect >>>>> on >>>>>>> transport >>>>>>> [2015-03-13 02:09:41.961412] I [client.c:2273:notify] >>>>>>> 0-gfsvolume-client-1: parent translators are ready, attempting >> connect >>>>> on >>>>>>> transport >>>>>>> Final graph: >>>>>>> >>>>>>> >>>>> >> +------------------------------------------------------------------------------+ >>>>>>> 1: volume gfsvolume-client-0 >>>>>>> 2: type protocol/client >>>>>>> 3: option remote-host gfs1 >>>>>>> 4: option remote-subvolume /export/sda/brick >>>>>>> 5: option transport-type socket >>>>>>> 6: option frame-timeout 90 >>>>>>> 7: option ping-timeout 30 >>>>>>> 8: end-volume >>>>>>> 9: >>>>>>> 10: volume gfsvolume-client-1 >>>>>>> 11: type protocol/client >>>>>>> 12: option remote-host gfs2 >>>>>>> 13: option remote-subvolume /export/sda/brick >>>>>>> 14: option transport-type socket >>>>>>> 15: option frame-timeout 90 >>>>>>> 16: option ping-timeout 30 >>>>>>> 17: end-volume >>>>>>> 18: >>>>>>> 19: volume gfsvolume-replicate-0 >>>>>>> 20: type cluster/replicate >>>>>>> 21: option node-uuid adbb7505-3342-4c6d-be3d-75938633612c >>>>>>> 22: option background-self-heal-count 0 >>>>>>> 23: option metadata-self-heal on >>>>>>> 24: option data-self-heal on >>>>>>> 25: option entry-self-heal on >>>>>>> 26: option self-heal-daemon on >>>>>>> 27: option data-self-heal-algorithm diff >>>>>>> 28: option quorum-type fixed >>>>>>> 29: option quorum-count 1 >>>>>>> 30: option iam-self-heal-daemon yes >>>>>>> 31: subvolumes gfsvolume-client-0 gfsvolume-client-1 >>>>>>> 32: end-volume >>>>>>> 33: >>>>>>> 34: volume glustershd >>>>>>> 35: type debug/io-stats >>>>>>> 36: subvolumes gfsvolume-replicate-0 >>>>>>> 37: end-volume >>>>>>> >>>>>>> >>>>> >> +------------------------------------------------------------------------------+ >>>>>>> [2015-03-13 02:09:41.961871] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>>>> [2015-03-13 02:09:41.962129] I >>>>>>> [client-handshake.c:1659:select_server_supported_programs] >>>>>>> 0-gfsvolume-client-1: Using Program GlusterFS 3.3, Num (1298437), >>>>> Version >>>>>>> (330) >>>>>>> [2015-03-13 02:09:41.962344] I >>>>>>> [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-1: >>>>>>> Connected to 172.20.20.22:49153, attached to remote volume >>>>>>> '/export/sda/brick'. >>>>>>> [2015-03-13 02:09:41.962363] I >>>>>>> [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-1: >>>>> Server >>>>>>> and Client lk-version numbers are not same, reopening the fds >>>>>>> [2015-03-13 02:09:41.962416] I [afr-common.c:3922:afr_notify] >>>>>>> 0-gfsvolume-replicate-0: Subvolume 'gfsvolume-client-1' came back up; >>>>> going >>>>>>> online. >>>>>>> [2015-03-13 02:09:41.962487] I >>>>>>> [client-handshake.c:450:client_set_lk_version_cbk] >>>>> 0-gfsvolume-client-1: >>>>>>> Server lk version = 1 >>>>>>> [2015-03-13 02:09:41.963109] E >>>>>>> [afr-self-heald.c:1479:afr_find_child_position] >>>>> 0-gfsvolume-replicate-0: >>>>>>> getxattr failed on gfsvolume-client-0 - (Transport endpoint is not >>>>>>> connected) >>>>>>> [2015-03-13 02:09:41.963502] I >>>>>>> [afr-self-heald.c:1687:afr_dir_exclusive_crawl] >>>>> 0-gfsvolume-replicate-0: >>>>>>> Another crawl is in progress for gfsvolume-client-1 >>>>>>> [2015-03-13 02:09:41.967478] E >>>>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>>>> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>. >>>>>>> [2015-03-13 02:09:41.968550] E >>>>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>. >>>>>>> [2015-03-13 02:09:41.969663] E >>>>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>>>> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>. >>>>>>> [2015-03-13 02:09:41.974345] E >>>>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>>>> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>. >>>>>>> [2015-03-13 02:09:41.975657] E >>>>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>. >>>>>>> [2015-03-13 02:09:41.977020] E >>>>>>> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk] >>>>>>> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for >>>>>>> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>. >>>>>>> [2015-03-13 02:09:44.307219] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>>>> 0-gfsvolume-client-0: changing port to 49153 (from 0) >>>>>>> [2015-03-13 02:09:44.307748] I >>>>>>> [client-handshake.c:1659:select_server_supported_programs] >>>>>>> 0-gfsvolume-client-0: Using Program GlusterFS 3.3, Num (1298437), >>>>> Version >>>>>>> (330) >>>>>>> [2015-03-13 02:09:44.448377] I >>>>>>> [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-0: >>>>>>> Connected to 172.20.20.21:49153, attached to remote volume >>>>>>> '/export/sda/brick'. >>>>>>> [2015-03-13 02:09:44.448418] I >>>>>>> [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-0: >>>>> Server >>>>>>> and Client lk-version numbers are not same, reopening the fds >>>>>>> [2015-03-13 02:09:44.448713] I >>>>>>> [client-handshake.c:450:client_set_lk_version_cbk] >>>>> 0-gfsvolume-client-0: >>>>>>> Server lk version = 1 >>>>>>> [2015-03-13 02:09:44.515112] I >>>>>>> [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status] >>>>>>> 0-gfsvolume-replicate-0: foreground data self heal is successfully >>>>>>> completed, data self heal from gfsvolume-client-0 to sinks >>>>>>> gfsvolume-client-1, with 892928 bytes on gfsvolume-client-0, 892928 >>>>> bytes >>>>>>> on gfsvolume-client-1, data - Pending matrix: [ [ 0 155762 ] [ 0 0 >> ] >>>>> ] >>>>>>> on <gfid:123536cc-c34b-43d7-b0c6-cf80eefa8322> >>>>>>> [2015-03-13 02:09:44.809988] I >>>>>>> [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status] >>>>>>> 0-gfsvolume-replicate-0: foreground data self heal is successfully >>>>>>> completed, data self heal from gfsvolume-client-0 to sinks >>>>>>> gfsvolume-client-1, with 15998976 bytes on gfsvolume-client-0, >>>>> 15998976 >>>>>>> bytes on gfsvolume-client-1, data - Pending matrix: [ [ 0 36506 ] [ >>>>> 0 0 ] >>>>>>> ] on <gfid:b6dc0e74-31bf-469a-b629-ee51ab4cf729> >>>>>>> [2015-03-13 02:09:44.946050] W >>>>>>> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: >>>>> remote >>>>>>> operation failed: Stale NFS file handle >>>>>>> [2015-03-13 02:09:44.946097] I >>>>>>> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk] >>>>>>> 0-gfsvolume-replicate-0: readlink of >>>>>>> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>/PB2_corrected.fastq on >>>>>>> gfsvolume-client-1 failed (Stale NFS file handle) >>>>>>> [2015-03-13 02:09:44.951370] I >>>>>>> [afr-self-heal-entry.c:2321:afr_sh_entry_fix] >> 0-gfsvolume-replicate-0: >>>>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>: Performing conservative >>>>> merge >>>>>>> [2015-03-13 02:09:45.149995] W >>>>>>> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: >>>>> remote >>>>>>> operation failed: Stale NFS file handle >>>>>>> [2015-03-13 02:09:45.150036] I >>>>>>> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk] >>>>>>> 0-gfsvolume-replicate-0: readlink of >>>>>>> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>/Rscript on >>>>> gfsvolume-client-1 >>>>>>> failed (Stale NFS file handle) >>>>>>> [2015-03-13 02:09:45.214253] W >>>>>>> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0: >>>>> remote >>>>>>> operation failed: Stale NFS file handle >>>>>>> [2015-03-13 02:09:45.214295] I >>>>>>> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk] >>>>>>> 0-gfsvolume-replicate-0: readlink of >>>>>>> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>/ananas_d_tmp on >>>>>>> gfsvolume-client-1 failed (Stale NFS file handle) >>>>>>> [2015-03-13 02:13:27.324856] W [socket.c:522:__socket_rwv] >>>>>>> 0-gfsvolume-client-1: readv on 172.20.20.22:49153 failed (No data >>>>>>> available) >>>>>>> [2015-03-13 02:13:27.324961] I [client.c:2208:client_rpc_notify] >>>>>>> 0-gfsvolume-client-1: disconnected from 172.20.20.22:49153. Client >>>>>>> process will keep trying to connect to glusterd until brick's port is >>>>>>> available >>>>>>> [2015-03-13 02:13:37.981531] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>>>> [2015-03-13 02:13:37.981781] E [socket.c:2161:socket_connect_finish] >>>>>>> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed >>>>> (Connection >>>>>>> refused) >>>>>>> [2015-03-13 02:13:41.982125] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>>>> [2015-03-13 02:13:41.982353] E [socket.c:2161:socket_connect_finish] >>>>>>> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed >>>>> (Connection >>>>>>> refused) >>>>>>> [2015-03-13 02:13:45.982693] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>>>> [2015-03-13 02:13:45.982926] E [socket.c:2161:socket_connect_finish] >>>>>>> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed >>>>> (Connection >>>>>>> refused) >>>>>>> [2015-03-13 02:13:49.983309] I [rpc-clnt.c:1685:rpc_clnt_reconfig] >>>>>>> 0-gfsvolume-client-1: changing port to 49153 (from 0) >>>>>>> >>>>>>> >>>>>> >>>>>> Any help would be greatly appreciated. >>>>>> Thank You Kindly, >>>>>> Kaamesh >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> >>>>> >>>>> >>>> >>> >> >> -- >> ~Atin >> >-- ~Atin