thr3ads.net - Gluster users - [Gluster-users] glusterd crashing [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2015-Oct-07 04:04 UTC

[Gluster-users] glusterd crashing

On 10/06/2015 08:15 PM, Gene Liverman wrote:> Sorry for the delay... they joys of multiple proverbial fires at once.
> In /var/log/messages I found this for our most recent crash:
> 
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> pending frames:
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> patchset: git://git.gluster.com/glusterfs.git
> <http://git.gluster.com/glusterfs.git>
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> signal received: 6
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: time
> of crash:
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> 2015-10-03 04:26:21
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> configuration details:
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: argp 1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> backtrace 1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: dlfcn 1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> libpthread 1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> llistxattr 1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: setfsid
1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> spinlock 1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: epoll.h
1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: xattr.h
1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> st_atim.tv_nsec 1
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
> package-string: glusterfs 3.7.4
> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
---------
> 
> 
> I have posted etc-glusterfs-glusterd.vol.log
> to http://pastebin.com/Pzq1j5J3. I also put the core file and an
> sosreport on my web server for you but don't want to leave them there
> for long so I'd appreciate it if you'd let me know once you get
them.
> They are at the following url's:
> http://www.westga.edu/~gliverma/tmp-files/core.36992Could you get the backtrace and share with us with the following commands:

$ gdb glusterd2 <core file path>
$ bt
>
http://www.westga.edu/~gliverma/tmp-files/sosreport-gliverman.gluster-crashing-20151006101239.tar.xz
>
http://www.westga.edu/~gliverma/tmp-files/sosreport-gliverman.gluster-crashing-20151006101239.tar.xz.md5
> 
> 
> 
> 
> Thanks again for the help!
> *Gene Liverman*
> Systems Integration Architect
> Information Technology Services
> University of West Georgia
> gliverma at westga.edu <mailto:gliverma at westga.edu>
> 
> ITS: Making Technology Work for You!
> 
> 
> 
> 
> On Fri, Oct 2, 2015 at 11:18 AM, Gaurav Garg <ggarg at redhat.com
> <mailto:ggarg at redhat.com>> wrote:
> 
>     >> Pulling those logs now but how do I generate the core file you
are asking
>     for?
> 
>     When there is crash then core file automatically generated based on
>     your *ulimit* set option. you can find location of core file in your
>     root or current working directory or where ever you have set your
>     core dump file location. core file gives you information regarding
>     crash, where exactly crash happened.
>     you can find appropriate core file by looking at crash time in
>     glusterd log's by searching "crash" keyword. you can also
paste few
>     line's just above latest "crash" keyword in glusterd
logs.
> 
>     Just for your curiosity if you willing to look where it crash then
>     you can debug it by #gdb -c <location of core file> glusterd
> 
>     Thank you...
> 
>     Regards,
>     Gaurav
> 
>     ----- Original Message -----
>     From: "Gene Liverman" <gliverma at westga.edu
<mailto:gliverma at westga.edu>>
>     To: "Gaurav Garg" <ggarg at redhat.com <mailto:ggarg at
redhat.com>>
>     Cc: "gluster-users" <gluster-users at gluster.org
>     <mailto:gluster-users at gluster.org>>
>     Sent: Friday, October 2, 2015 8:28:49 PM
>     Subject: Re: [Gluster-users] glusterd crashing
> 
>     Pulling those logs now but how do I generate the core file you are
>     asking
>     for?
> 
> 
> 
> 
> 
>     --
>     *Gene Liverman*
>     Systems Integration Architect
>     Information Technology Services
>     University of West Georgia
>     gliverma at westga.edu <mailto:gliverma at westga.edu>
>     678.839.5492 <tel:678.839.5492>
> 
>     ITS: Making Technology Work for You!
> 
> 
> 
> 
>     On Fri, Oct 2, 2015 at 2:25 AM, Gaurav Garg <ggarg at redhat.com
>     <mailto:ggarg at redhat.com>> wrote:
> 
>     > Hi Gene,
>     >
>     > you have paste glustershd log. we asked you to paste glusterd log.
>     > glusterd and glustershd both are different process. with this
>     information
>     > we can't find out why your glusterd crashed. could you paste
>     *glusterd*
>     > logs
(/var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log*) in
>     > pastebin (not in this mail thread) and give the link of pastebin
>     in this
>     > mail thread. Can you also attach core file or you can paste
>     backtrace of
>     > that core dump file.
>     > It will be great if you give us sos report of the node where the
crash
>     > happen.
>     >
>     > Thanx,
>     >
>     > ~Gaurav
>     >
>     > ----- Original Message -----
>     > From: "Gene Liverman" <gliverma at westga.edu
>     <mailto:gliverma at westga.edu>>
>     > To: "gluster-users" <gluster-users at gluster.org
>     <mailto:gluster-users at gluster.org>>
>     > Sent: Friday, October 2, 2015 4:47:00 AM
>     > Subject: Re: [Gluster-users] glusterd crashing
>     >
>     > Sorry for the delay. Here is what's installed:
>     > # rpm -qa | grep gluster
>     > glusterfs-geo-replication-3.7.4-2.el6.x86_64
>     > glusterfs-client-xlators-3.7.4-2.el6.x86_64
>     > glusterfs-3.7.4-2.el6.x86_64
>     > glusterfs-libs-3.7.4-2.el6.x86_64
>     > glusterfs-api-3.7.4-2.el6.x86_64
>     > glusterfs-fuse-3.7.4-2.el6.x86_64
>     > glusterfs-server-3.7.4-2.el6.x86_64
>     > glusterfs-cli-3.7.4-2.el6.x86_64
>     >
>     > The cmd_history.log file is attached.
>     > In gluster.log I have filtered out a bunch of lines like the one
>     below due
>     > to make them more readable. I had a node down for multiple days
due to
>     > maintenance and another one went down due to a hardware failure
>     during that
>     > time too.
>     > [2015-10-01 00:16:09.643631] W [MSGID: 114031]
>     > [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-gv0-client-0:
remote
>     > operation failed. Path:
<gfid:31f17f8c-6c96-4440-88c0-f813b3c8d364>
>     > (31f17f8c-6c96-4440-88c0-f813b3c8d364) [No such file or directory]
>     >
>     > I also filtered out a boat load of self heal lines like these two:
>     > [2015-10-01 15:14:14.851015] I [MSGID: 108026]
>     > [afr-self-heal-metadata.c:56:__afr_selfheal_metadata_do]
>     0-gv0-replicate-0:
>     > performing metadata selfheal on
f78a47db-a359-430d-a655-1d217eb848c3
>     > [2015-10-01 15:14:14.856392] I [MSGID: 108026]
>     > [afr-self-heal-common.c:651:afr_log_selfheal] 0-gv0-replicate-0:
>     Completed
>     > metadata selfheal on f78a47db-a359-430d-a655-1d217eb848c3.
>     source=0 sinks=1
>     >
>     >
>     > [root at eapps-gluster01 glusterfs]# cat glustershd.log |grep -v
'remote
>     > operation failed' |grep -v 'self-heal'
>     > [2015-09-27 08:46:56.893125] E [rpc-clnt.c:201:call_bail]
0-glusterfs:
>     > bailing out frame type(GlusterFS Handshake) op(GETSPEC(2)) xid
>     0x6 sent >     > 2015-09-27 08:16:51.742731. timeout = 1800 for
127.0.0.1:24007
>     <http://127.0.0.1:24007>
>     > [2015-09-28 12:54:17.524924] W [socket.c:588:__socket_rwv]
>     0-glusterfs:
>     > readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed
>     (Connection reset by peer)
>     > [2015-09-28 12:54:27.844374] I
>     [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>     > 0-glusterfs: No change in volfile, continuing
>     > [2015-09-28 12:57:03.485027] W [socket.c:588:__socket_rwv]
>     0-gv0-client-2:
>     > readv on 160.10.31.227:24007 <http://160.10.31.227:24007>
failed
>     (Connection reset by peer)
>     > [2015-09-28 12:57:05.872973] E
[socket.c:2278:socket_connect_finish]
>     > 0-gv0-client-2: connection to 160.10.31.227:24007
>     <http://160.10.31.227:24007> failed (Connection
>     > refused)
>     > [2015-09-28 12:57:38.490578] W [socket.c:588:__socket_rwv]
>     0-glusterfs:
>     > readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed (No
data
>     available)
>     > [2015-09-28 12:57:49.054475] I
>     [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>     > 0-glusterfs: No change in volfile, continuing
>     > [2015-09-28 13:01:12.062960] W
[glusterfsd.c:1219:cleanup_and_exit]
>     > (-->/lib64/libpthread.so.0() [0x3c65e07a51]
>     > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>     > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-:
>     received
>     > signum (15), shutting down
>     > [2015-09-28 13:01:12.981945] I [MSGID: 100030]
>     [glusterfsd.c:2301:main]
>     > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
>     3.7.4
>     > (args: /usr/sbin/glusterfs -s localhost --volfile-id
>     gluster/glustershd -p
>     > /var/lib/glusterd/glustershd/run/glustershd.pid -l
>     > /var/log/glusterfs/glustershd.log -S
>     > /var/run/gluster/9a9819e90404187e84e67b01614bbe10.socket
>     --xlator-option
>     > *replicate*.node-uuid=416d712a-06fc-4b3c-a92f-8c82145626ff)
>     > [2015-09-28 13:01:13.009171] I [MSGID: 101190]
>     > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
>     thread
>     > with index 1
>     > [2015-09-28 13:01:13.092483] I
[graph.c:269:gf_add_cmdline_options]
>     > 0-gv0-replicate-0: adding option 'node-uuid' for volume
>     'gv0-replicate-0'
>     > with value '416d712a-06fc-4b3c-a92f-8c82145626ff'
>     > [2015-09-28 13:01:13.100856] I [MSGID: 101190]
>     > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
>     thread
>     > with index 2
>     > [2015-09-28 13:01:13.103995] I [MSGID: 114020]
[client.c:2118:notify]
>     > 0-gv0-client-0: parent translators are ready, attempting connect
on
>     > transport
>     > [2015-09-28 13:01:13.114745] I [MSGID: 114020]
[client.c:2118:notify]
>     > 0-gv0-client-1: parent translators are ready, attempting connect
on
>     > transport
>     > [2015-09-28 13:01:13.115725] I [rpc-clnt.c:1851:rpc_clnt_reconfig]
>     > 0-gv0-client-0: changing port to 49152 (from 0)
>     > [2015-09-28 13:01:13.125619] I [MSGID: 114020]
[client.c:2118:notify]
>     > 0-gv0-client-2: parent translators are ready, attempting connect
on
>     > transport
>     > [2015-09-28 13:01:13.132316] E
[socket.c:2278:socket_connect_finish]
>     > 0-gv0-client-1: connection to 160.10.31.64:24007
>     <http://160.10.31.64:24007> failed (Connection
>     > refused)
>     > [2015-09-28 13:01:13.132650] I [MSGID: 114057]
>     > [client-handshake.c:1437:select_server_supported_programs]
>     0-gv0-client-0:
>     > Using Program GlusterFS 3.3, Num (1298437), Version (330)
>     > [2015-09-28 13:01:13.133322] I [MSGID: 114046]
>     > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-0:
>     Connected to
>     > gv0-client-0, attached to remote volume
'/export/sdb1/gv0'.
>     > [2015-09-28 13:01:13.133365] I [MSGID: 114047]
>     > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-0:
>     Server and
>     > Client lk-version numbers are not same, reopening the fds
>     > [2015-09-28 13:01:13.133782] I [MSGID: 108005]
>     > [afr-common.c:3998:afr_notify] 0-gv0-replicate-0: Subvolume
>     'gv0-client-0'
>     > came back up; going online.
>     > [2015-09-28 13:01:13.133863] I [MSGID: 114035]
>     > [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-0:
>     Server
>     > lk version = 1
>     > Final graph:
>     >
>     >
>    
+------------------------------------------------------------------------------+
>     > 1: volume gv0-client-0
>     > 2: type protocol/client
>     > 3: option clnt-lk-version 1
>     > 4: option volfile-checksum 0
>     > 5: option volfile-key gluster/glustershd
>     > 6: option client-version 3.7.4
>     > 7: option process-uuid
>     > eapps-gluster01-65147-2015/09/28-13:01:12:970131-gv0-client-0-0-0
>     > 8: option fops-version 1298437
>     > 9: option ping-timeout 42
>     > 10: option remote-host eapps-gluster01.uwg.westga.edu
>     <http://eapps-gluster01.uwg.westga.edu>
>     > 11: option remote-subvolume /export/sdb1/gv0
>     > 12: option transport-type socket
>     > 13: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>     > 14: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>     > 15: end-volume
>     > 16:
>     > 17: volume gv0-client-1
>     > 18: type protocol/client
>     > 19: option ping-timeout 42
>     > 20: option remote-host eapps-gluster02.uwg.westga.edu
>     <http://eapps-gluster02.uwg.westga.edu>
>     > 21: option remote-subvolume /export/sdb1/gv0
>     > 22: option transport-type socket
>     > 23: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>     > 24: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>     > 25: end-volume
>     > 26:
>     > 27: volume gv0-client-2
>     > 28: type protocol/client
>     > 29: option ping-timeout 42
>     > 30: option remote-host eapps-gluster03.uwg.westga.edu
>     <http://eapps-gluster03.uwg.westga.edu>
>     > 31: option remote-subvolume /export/sdb1/gv0
>     > 32: option transport-type socket
>     > 33: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>     > 34: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>     > 35: end-volume
>     > 36:
>     > 37: volume gv0-replicate-0
>     > 38: type cluster/replicate
>     > 39: option node-uuid 416d712a-06fc-4b3c-a92f-8c82145626ff
>     > 46: subvolumes gv0-client-0 gv0-client-1 gv0-client-2
>     > 47: end-volume
>     > 48:
>     > 49: volume glustershd
>     > 50: type debug/io-stats
>     > 51: subvolumes gv0-replicate-0
>     > 52: end-volume
>     > 53:
>     >
>     >
>    
+------------------------------------------------------------------------------+
>     > [2015-09-28 13:01:13.154898] E [MSGID: 114058]
>     > [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-2:
>     failed
>     > to get the port number for remote subvolume. Please run
'gluster
>     volume
>     > status' on server to see if brick process is running.
>     > [2015-09-28 13:01:13.155031] I [MSGID: 114018]
>     > [client.c:2042:client_rpc_notify] 0-gv0-client-2: disconnected
from
>     > gv0-client-2. Client process will keep trying to connect to
>     glusterd until
>     > brick's port is available
>     > [2015-09-28 13:01:13.155080] W [MSGID: 108001]
>     > [afr-common.c:4081:afr_notify] 0-gv0-replicate-0: Client-quorum is
>     not met
>     > [2015-09-29 08:11:24.728797] I [MSGID: 100011]
>     > [glusterfsd.c:1291:reincarnate] 0-glusterfsd: Fetching the volume
>     file from
>     > server...
>     > [2015-09-29 08:11:24.763338] I
>     [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>     > 0-glusterfs: No change in volfile, continuing
>     > [2015-09-29 12:50:41.915254] E [rpc-clnt.c:201:call_bail]
>     0-gv0-client-2:
>     > bailing out frame type(GF-DUMP) op(DUMP(1)) xid = 0xd91f sent >
2015-09-29
>     > 12:20:36.092734. timeout = 1800 for 160.10.31.227:24007
>     <http://160.10.31.227:24007>
>     > [2015-09-29 12:50:41.923550] W [MSGID: 114032]
>     > [client-handshake.c:1623:client_dump_version_cbk] 0-gv0-client-2:
>     received
>     > RPC status error [Transport endpoint is not connected]
>     > [2015-09-30 23:54:36.547979] W [socket.c:588:__socket_rwv]
>     0-glusterfs:
>     > readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed (No
data
>     available)
>     > [2015-09-30 23:54:46.812870] E
[socket.c:2278:socket_connect_finish]
>     > 0-glusterfs: connection to 127.0.0.1:24007
>     <http://127.0.0.1:24007> failed (Connection refused)
>     > [2015-10-01 00:14:20.997081] I
>     [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>     > 0-glusterfs: No change in volfile, continuing
>     > [2015-10-01 00:15:36.770579] W [socket.c:588:__socket_rwv]
>     0-gv0-client-2:
>     > readv on 160.10.31.227:24007 <http://160.10.31.227:24007>
failed
>     (Connection reset by peer)
>     > [2015-10-01 00:15:37.906708] E
[socket.c:2278:socket_connect_finish]
>     > 0-gv0-client-2: connection to 160.10.31.227:24007
>     <http://160.10.31.227:24007> failed (Connection
>     > refused)
>     > [2015-10-01 00:15:53.008130] W
[glusterfsd.c:1219:cleanup_and_exit]
>     > (-->/lib64/libpthread.so.0() [0x3b91807a51]
>     > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>     > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-:
>     received
>     > signum (15), shutting down
>     > [2015-10-01 00:15:53.008697] I [timer.c:48:gf_timer_call_after]
>     > (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3e2)
[0x3b9480f992]
>     > -->/usr/lib64/libgfrpc.so.0(__save_frame+0x76) [0x3b9480f046]
>     > -->/usr/lib64/libglusterfs.so.0(gf_timer_call_after+0x1b1)
>     [0x3b93447881] )
>     > 0-timer: ctx cleanup started
>     > [2015-10-01 00:15:53.994698] I [MSGID: 100030]
>     [glusterfsd.c:2301:main]
>     > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
>     3.7.4
>     > (args: /usr/sbin/glusterfs -s localhost --volfile-id
>     gluster/glustershd -p
>     > /var/lib/glusterd/glustershd/run/glustershd.pid -l
>     > /var/log/glusterfs/glustershd.log -S
>     > /var/run/gluster/9a9819e90404187e84e67b01614bbe10.socket
>     --xlator-option
>     > *replicate*.node-uuid=416d712a-06fc-4b3c-a92f-8c82145626ff)
>     > [2015-10-01 00:15:54.020401] I [MSGID: 101190]
>     > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
>     thread
>     > with index 1
>     > [2015-10-01 00:15:54.086777] I
[graph.c:269:gf_add_cmdline_options]
>     > 0-gv0-replicate-0: adding option 'node-uuid' for volume
>     'gv0-replicate-0'
>     > with value '416d712a-06fc-4b3c-a92f-8c82145626ff'
>     > [2015-10-01 00:15:54.093004] I [MSGID: 101190]
>     > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
>     thread
>     > with index 2
>     > [2015-10-01 00:15:54.098144] I [MSGID: 114020]
[client.c:2118:notify]
>     > 0-gv0-client-0: parent translators are ready, attempting connect
on
>     > transport
>     > [2015-10-01 00:15:54.107432] I [MSGID: 114020]
[client.c:2118:notify]
>     > 0-gv0-client-1: parent translators are ready, attempting connect
on
>     > transport
>     > [2015-10-01 00:15:54.115962] I [MSGID: 114020]
[client.c:2118:notify]
>     > 0-gv0-client-2: parent translators are ready, attempting connect
on
>     > transport
>     > [2015-10-01 00:15:54.120474] E
[socket.c:2278:socket_connect_finish]
>     > 0-gv0-client-1: connection to 160.10.31.64:24007
>     <http://160.10.31.64:24007> failed (Connection
>     > refused)
>     > [2015-10-01 00:15:54.120639] I [rpc-clnt.c:1851:rpc_clnt_reconfig]
>     > 0-gv0-client-0: changing port to 49152 (from 0)
>     > Final graph:
>     >
>     >
>    
+------------------------------------------------------------------------------+
>     > 1: volume gv0-client-0
>     > 2: type protocol/client
>     > 3: option ping-timeout 42
>     > 4: option remote-host eapps-gluster01.uwg.westga.edu
>     <http://eapps-gluster01.uwg.westga.edu>
>     > 5: option remote-subvolume /export/sdb1/gv0
>     > 6: option transport-type socket
>     > 7: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>     > 8: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>     > 9: end-volume
>     > 10:
>     > 11: volume gv0-client-1
>     > 12: type protocol/client
>     > 13: option ping-timeout 42
>     > 14: option remote-host eapps-gluster02.uwg.westga.edu
>     <http://eapps-gluster02.uwg.westga.edu>
>     > 15: option remote-subvolume /export/sdb1/gv0
>     > 16: option transport-type socket
>     > 17: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>     > 18: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>     > 19: end-volume
>     > 20:
>     > 21: volume gv0-client-2
>     > 22: type protocol/client
>     > 23: option ping-timeout 42
>     > 24: option remote-host eapps-gluster03.uwg.westga.edu
>     <http://eapps-gluster03.uwg.westga.edu>
>     > 25: option remote-subvolume /export/sdb1/gv0
>     > 26: option transport-type socket
>     > 27: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>     > 28: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>     > 29: end-volume
>     > 30:
>     > 31: volume gv0-replicate-0
>     > 32: type cluster/replicate
>     > 33: option node-uuid 416d712a-06fc-4b3c-a92f-8c82145626ff
>     > 40: subvolumes gv0-client-0 gv0-client-1 gv0-client-2
>     > 41: end-volume
>     > 42:
>     > 43: volume glustershd
>     > 44: type debug/io-stats
>     > 45: subvolumes gv0-replicate-0
>     > 46: end-volume
>     > 47:
>     >
>     >
>    
+------------------------------------------------------------------------------+
>     > [2015-10-01 00:15:54.135650] I [MSGID: 114057]
>     > [client-handshake.c:1437:select_server_supported_programs]
>     0-gv0-client-0:
>     > Using Program GlusterFS 3.3, Num (1298437), Version (330)
>     > [2015-10-01 00:15:54.136223] I [MSGID: 114046]
>     > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-0:
>     Connected to
>     > gv0-client-0, attached to remote volume
'/export/sdb1/gv0'.
>     > [2015-10-01 00:15:54.136262] I [MSGID: 114047]
>     > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-0:
>     Server and
>     > Client lk-version numbers are not same, reopening the fds
>     > [2015-10-01 00:15:54.136410] I [MSGID: 108005]
>     > [afr-common.c:3998:afr_notify] 0-gv0-replicate-0: Subvolume
>     'gv0-client-0'
>     > came back up; going online.
>     > [2015-10-01 00:15:54.136500] I [MSGID: 114035]
>     > [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-0:
>     Server
>     > lk version = 1
>     > [2015-10-01 00:15:54.401702] E [MSGID: 114058]
>     > [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-2:
>     failed
>     > to get the port number for remote subvolume. Please run
'gluster
>     volume
>     > status' on server to see if brick process is running.
>     > [2015-10-01 00:15:54.401834] I [MSGID: 114018]
>     > [client.c:2042:client_rpc_notify] 0-gv0-client-2: disconnected
from
>     > gv0-client-2. Client process will keep trying to connect to
>     glusterd until
>     > brick's port is available
>     > [2015-10-01 00:15:54.401878] W [MSGID: 108001]
>     > [afr-common.c:4081:afr_notify] 0-gv0-replicate-0: Client-quorum is
>     not met
>     > [2015-10-01 03:57:52.755426] E
[socket.c:2278:socket_connect_finish]
>     > 0-gv0-client-2: connection to 160.10.31.227:24007
>     <http://160.10.31.227:24007> failed (Connection
>     > refused)
>     > [2015-10-01 13:50:49.000708] E
[socket.c:2278:socket_connect_finish]
>     > 0-gv0-client-2: connection to 160.10.31.227:24007
>     <http://160.10.31.227:24007> failed (Connection
>     > timed out)
>     > [2015-10-01 14:36:40.481673] E [MSGID: 114058]
>     > [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:
>     failed
>     > to get the port number for remote subvolume. Please run
'gluster
>     volume
>     > status' on server to see if brick process is running.
>     > [2015-10-01 14:36:40.481833] I [MSGID: 114018]
>     > [client.c:2042:client_rpc_notify] 0-gv0-client-1: disconnected
from
>     > gv0-client-1. Client process will keep trying to connect to
>     glusterd until
>     > brick's port is available
>     > [2015-10-01 14:36:41.982037] I [rpc-clnt.c:1851:rpc_clnt_reconfig]
>     > 0-gv0-client-1: changing port to 49152 (from 0)
>     > [2015-10-01 14:36:41.993478] I [MSGID: 114057]
>     > [client-handshake.c:1437:select_server_supported_programs]
>     0-gv0-client-1:
>     > Using Program GlusterFS 3.3, Num (1298437), Version (330)
>     > [2015-10-01 14:36:41.994568] I [MSGID: 114046]
>     > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1:
>     Connected to
>     > gv0-client-1, attached to remote volume
'/export/sdb1/gv0'.
>     > [2015-10-01 14:36:41.994647] I [MSGID: 114047]
>     > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1:
>     Server and
>     > Client lk-version numbers are not same, reopening the fds
>     > [2015-10-01 14:36:41.994899] I [MSGID: 108002]
>     > [afr-common.c:4077:afr_notify] 0-gv0-replicate-0: Client-quorum is
met
>     > [2015-10-01 14:36:42.002275] I [MSGID: 114035]
>     > [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1:
>     Server
>     > lk version = 1
>     >
>     >
>     >
>     >
>     > Thanks,
>     > Gene Liverman
>     > Systems Integration Architect
>     > Information Technology Services
>     > University of West Georgia
>     > gliverma at westga.edu <mailto:gliverma at westga.edu>
>     >
>     > ITS: Making Technology Work for You!
>     >
>     >
>     >
>     > On Wed, Sep 30, 2015 at 10:54 PM, Gaurav Garg < ggarg at
redhat.com
>     <mailto:ggarg at redhat.com> > wrote:
>     >
>     >
>     > Hi Gene,
>     >
>     > Could you paste or attach core file/glusterd log file/cmd history
>     to find
>     > out actual RCA of the crash. What steps you performed for this
crash.
>     >
>     > >> How can I troubleshoot this?
>     >
>     > If you want to troubleshoot this then you can look into the
>     glusterd log
>     > file, core file.
>     >
>     > Thank you..
>     >
>     > Regards,
>     > Gaurav
>     >
>     > ----- Original Message -----
>     > From: "Gene Liverman" < gliverma at westga.edu
>     <mailto:gliverma at westga.edu> >
>     > To: gluster-users at gluster.org <mailto:gluster-users at
gluster.org>
>     > Sent: Thursday, October 1, 2015 7:59:47 AM
>     > Subject: [Gluster-users] glusterd crashing
>     >
>     > In the last few days I've started having issues with my
glusterd
>     service
>     > crashing. When it goes down it seems to do so on all nodes in my
>     replicated
>     > volume. How can I troubleshoot this? I'm on a mix of CentOS 6
and
>     RHEL 6.
>     > Thanks!
>     >
>     >
>     >
>     > Gene Liverman
>     > Systems Integration Architect
>     > Information Technology Services
>     > University of West Georgia
>     > gliverma at westga.edu <mailto:gliverma at westga.edu>
>     >
>     >
>     > Sent from Outlook on my iPhone
>     >
>     >
>     > _______________________________________________
>     > Gluster-users mailing list
>     > Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     > http://www.gluster.org/mailman/listinfo/gluster-users
>     >
>     >
>     > _______________________________________________
>     > Gluster-users mailing list
>     > Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     > http://www.gluster.org/mailman/listinfo/gluster-users
>     >
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Atin Mukherjee

2015-Oct-07 04:06 UTC

head link

[Gluster-users] glusterd crashing

On 10/07/2015 09:34 AM, Atin Mukherjee wrote:> 
> 
> On 10/06/2015 08:15 PM, Gene Liverman wrote:
>> Sorry for the delay... they joys of multiple proverbial fires at once.
>> In /var/log/messages I found this for our most recent crash:
>>
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> pending frames:
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> patchset: git://git.gluster.com/glusterfs.git
>> <http://git.gluster.com/glusterfs.git>
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> signal received: 6
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: time
>> of crash:
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> 2015-10-03 04:26:21
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> configuration details:
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: argp
1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> backtrace 1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
dlfcn 1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> libpthread 1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> llistxattr 1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
setfsid 1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> spinlock 1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
epoll.h 1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
xattr.h 1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> st_atim.tv_nsec 1
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
>> package-string: glusterfs 3.7.4
>> Oct  3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]:
---------
>>
>>
>> I have posted etc-glusterfs-glusterd.vol.log
>> to http://pastebin.com/Pzq1j5J3. I also put the core file and an
>> sosreport on my web server for you but don't want to leave them
there
>> for long so I'd appreciate it if you'd let me know once you get
them.
>> They are at the following url's:
>> http://www.westga.edu/~gliverma/tmp-files/core.36992
> Could you get the backtrace and share with us with the following commands:
> 
> $ gdb glusterd2 <core file path>
> $ bt
Also "t a a bt" output in gdb might help.> 
>>
http://www.westga.edu/~gliverma/tmp-files/sosreport-gliverman.gluster-crashing-20151006101239.tar.xz
>>
http://www.westga.edu/~gliverma/tmp-files/sosreport-gliverman.gluster-crashing-20151006101239.tar.xz.md5
>>
>>
>>
>>
>> Thanks again for the help!
>> *Gene Liverman*
>> Systems Integration Architect
>> Information Technology Services
>> University of West Georgia
>> gliverma at westga.edu <mailto:gliverma at westga.edu>
>>
>> ITS: Making Technology Work for You!
>>
>>
>>
>>
>> On Fri, Oct 2, 2015 at 11:18 AM, Gaurav Garg <ggarg at redhat.com
>> <mailto:ggarg at redhat.com>> wrote:
>>
>>     >> Pulling those logs now but how do I generate the core file
you are asking
>>     for?
>>
>>     When there is crash then core file automatically generated based on
>>     your *ulimit* set option. you can find location of core file in
your
>>     root or current working directory or where ever you have set your
>>     core dump file location. core file gives you information regarding
>>     crash, where exactly crash happened.
>>     you can find appropriate core file by looking at crash time in
>>     glusterd log's by searching "crash" keyword. you can
also paste few
>>     line's just above latest "crash" keyword in glusterd
logs.
>>
>>     Just for your curiosity if you willing to look where it crash then
>>     you can debug it by #gdb -c <location of core file> glusterd
>>
>>     Thank you...
>>
>>     Regards,
>>     Gaurav
>>
>>     ----- Original Message -----
>>     From: "Gene Liverman" <gliverma at westga.edu
<mailto:gliverma at westga.edu>>
>>     To: "Gaurav Garg" <ggarg at redhat.com
<mailto:ggarg at redhat.com>>
>>     Cc: "gluster-users" <gluster-users at gluster.org
>>     <mailto:gluster-users at gluster.org>>
>>     Sent: Friday, October 2, 2015 8:28:49 PM
>>     Subject: Re: [Gluster-users] glusterd crashing
>>
>>     Pulling those logs now but how do I generate the core file you are
>>     asking
>>     for?
>>
>>
>>
>>
>>
>>     --
>>     *Gene Liverman*
>>     Systems Integration Architect
>>     Information Technology Services
>>     University of West Georgia
>>     gliverma at westga.edu <mailto:gliverma at westga.edu>
>>     678.839.5492 <tel:678.839.5492>
>>
>>     ITS: Making Technology Work for You!
>>
>>
>>
>>
>>     On Fri, Oct 2, 2015 at 2:25 AM, Gaurav Garg <ggarg at redhat.com
>>     <mailto:ggarg at redhat.com>> wrote:
>>
>>     > Hi Gene,
>>     >
>>     > you have paste glustershd log. we asked you to paste glusterd
log.
>>     > glusterd and glustershd both are different process. with this
>>     information
>>     > we can't find out why your glusterd crashed. could you
paste
>>     *glusterd*
>>     > logs
(/var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log*) in
>>     > pastebin (not in this mail thread) and give the link of
pastebin
>>     in this
>>     > mail thread. Can you also attach core file or you can paste
>>     backtrace of
>>     > that core dump file.
>>     > It will be great if you give us sos report of the node where
the crash
>>     > happen.
>>     >
>>     > Thanx,
>>     >
>>     > ~Gaurav
>>     >
>>     > ----- Original Message -----
>>     > From: "Gene Liverman" <gliverma at westga.edu
>>     <mailto:gliverma at westga.edu>>
>>     > To: "gluster-users" <gluster-users at gluster.org
>>     <mailto:gluster-users at gluster.org>>
>>     > Sent: Friday, October 2, 2015 4:47:00 AM
>>     > Subject: Re: [Gluster-users] glusterd crashing
>>     >
>>     > Sorry for the delay. Here is what's installed:
>>     > # rpm -qa | grep gluster
>>     > glusterfs-geo-replication-3.7.4-2.el6.x86_64
>>     > glusterfs-client-xlators-3.7.4-2.el6.x86_64
>>     > glusterfs-3.7.4-2.el6.x86_64
>>     > glusterfs-libs-3.7.4-2.el6.x86_64
>>     > glusterfs-api-3.7.4-2.el6.x86_64
>>     > glusterfs-fuse-3.7.4-2.el6.x86_64
>>     > glusterfs-server-3.7.4-2.el6.x86_64
>>     > glusterfs-cli-3.7.4-2.el6.x86_64
>>     >
>>     > The cmd_history.log file is attached.
>>     > In gluster.log I have filtered out a bunch of lines like the
one
>>     below due
>>     > to make them more readable. I had a node down for multiple
days due to
>>     > maintenance and another one went down due to a hardware
failure
>>     during that
>>     > time too.
>>     > [2015-10-01 00:16:09.643631] W [MSGID: 114031]
>>     > [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-gv0-client-0:
remote
>>     > operation failed. Path:
<gfid:31f17f8c-6c96-4440-88c0-f813b3c8d364>
>>     > (31f17f8c-6c96-4440-88c0-f813b3c8d364) [No such file or
directory]
>>     >
>>     > I also filtered out a boat load of self heal lines like these
two:
>>     > [2015-10-01 15:14:14.851015] I [MSGID: 108026]
>>     > [afr-self-heal-metadata.c:56:__afr_selfheal_metadata_do]
>>     0-gv0-replicate-0:
>>     > performing metadata selfheal on
f78a47db-a359-430d-a655-1d217eb848c3
>>     > [2015-10-01 15:14:14.856392] I [MSGID: 108026]
>>     > [afr-self-heal-common.c:651:afr_log_selfheal]
0-gv0-replicate-0:
>>     Completed
>>     > metadata selfheal on f78a47db-a359-430d-a655-1d217eb848c3.
>>     source=0 sinks=1
>>     >
>>     >
>>     > [root at eapps-gluster01 glusterfs]# cat glustershd.log |grep
-v 'remote
>>     > operation failed' |grep -v 'self-heal'
>>     > [2015-09-27 08:46:56.893125] E [rpc-clnt.c:201:call_bail]
0-glusterfs:
>>     > bailing out frame type(GlusterFS Handshake) op(GETSPEC(2)) xid
>>     0x6 sent >>     > 2015-09-27 08:16:51.742731. timeout =
1800 for 127.0.0.1:24007
>>     <http://127.0.0.1:24007>
>>     > [2015-09-28 12:54:17.524924] W [socket.c:588:__socket_rwv]
>>     0-glusterfs:
>>     > readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed
>>     (Connection reset by peer)
>>     > [2015-09-28 12:54:27.844374] I
>>     [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>>     > 0-glusterfs: No change in volfile, continuing
>>     > [2015-09-28 12:57:03.485027] W [socket.c:588:__socket_rwv]
>>     0-gv0-client-2:
>>     > readv on 160.10.31.227:24007
<http://160.10.31.227:24007> failed
>>     (Connection reset by peer)
>>     > [2015-09-28 12:57:05.872973] E
[socket.c:2278:socket_connect_finish]
>>     > 0-gv0-client-2: connection to 160.10.31.227:24007
>>     <http://160.10.31.227:24007> failed (Connection
>>     > refused)
>>     > [2015-09-28 12:57:38.490578] W [socket.c:588:__socket_rwv]
>>     0-glusterfs:
>>     > readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed
(No data
>>     available)
>>     > [2015-09-28 12:57:49.054475] I
>>     [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>>     > 0-glusterfs: No change in volfile, continuing
>>     > [2015-09-28 13:01:12.062960] W
[glusterfsd.c:1219:cleanup_and_exit]
>>     > (-->/lib64/libpthread.so.0() [0x3c65e07a51]
>>     > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>>     > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] )
0-:
>>     received
>>     > signum (15), shutting down
>>     > [2015-09-28 13:01:12.981945] I [MSGID: 100030]
>>     [glusterfsd.c:2301:main]
>>     > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs
version
>>     3.7.4
>>     > (args: /usr/sbin/glusterfs -s localhost --volfile-id
>>     gluster/glustershd -p
>>     > /var/lib/glusterd/glustershd/run/glustershd.pid -l
>>     > /var/log/glusterfs/glustershd.log -S
>>     > /var/run/gluster/9a9819e90404187e84e67b01614bbe10.socket
>>     --xlator-option
>>     > *replicate*.node-uuid=416d712a-06fc-4b3c-a92f-8c82145626ff)
>>     > [2015-09-28 13:01:13.009171] I [MSGID: 101190]
>>     > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
Started
>>     thread
>>     > with index 1
>>     > [2015-09-28 13:01:13.092483] I
[graph.c:269:gf_add_cmdline_options]
>>     > 0-gv0-replicate-0: adding option 'node-uuid' for
volume
>>     'gv0-replicate-0'
>>     > with value '416d712a-06fc-4b3c-a92f-8c82145626ff'
>>     > [2015-09-28 13:01:13.100856] I [MSGID: 101190]
>>     > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
Started
>>     thread
>>     > with index 2
>>     > [2015-09-28 13:01:13.103995] I [MSGID: 114020]
[client.c:2118:notify]
>>     > 0-gv0-client-0: parent translators are ready, attempting
connect on
>>     > transport
>>     > [2015-09-28 13:01:13.114745] I [MSGID: 114020]
[client.c:2118:notify]
>>     > 0-gv0-client-1: parent translators are ready, attempting
connect on
>>     > transport
>>     > [2015-09-28 13:01:13.115725] I
[rpc-clnt.c:1851:rpc_clnt_reconfig]
>>     > 0-gv0-client-0: changing port to 49152 (from 0)
>>     > [2015-09-28 13:01:13.125619] I [MSGID: 114020]
[client.c:2118:notify]
>>     > 0-gv0-client-2: parent translators are ready, attempting
connect on
>>     > transport
>>     > [2015-09-28 13:01:13.132316] E
[socket.c:2278:socket_connect_finish]
>>     > 0-gv0-client-1: connection to 160.10.31.64:24007
>>     <http://160.10.31.64:24007> failed (Connection
>>     > refused)
>>     > [2015-09-28 13:01:13.132650] I [MSGID: 114057]
>>     > [client-handshake.c:1437:select_server_supported_programs]
>>     0-gv0-client-0:
>>     > Using Program GlusterFS 3.3, Num (1298437), Version (330)
>>     > [2015-09-28 13:01:13.133322] I [MSGID: 114046]
>>     > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-0:
>>     Connected to
>>     > gv0-client-0, attached to remote volume
'/export/sdb1/gv0'.
>>     > [2015-09-28 13:01:13.133365] I [MSGID: 114047]
>>     > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-0:
>>     Server and
>>     > Client lk-version numbers are not same, reopening the fds
>>     > [2015-09-28 13:01:13.133782] I [MSGID: 108005]
>>     > [afr-common.c:3998:afr_notify] 0-gv0-replicate-0: Subvolume
>>     'gv0-client-0'
>>     > came back up; going online.
>>     > [2015-09-28 13:01:13.133863] I [MSGID: 114035]
>>     > [client-handshake.c:193:client_set_lk_version_cbk]
0-gv0-client-0:
>>     Server
>>     > lk version = 1
>>     > Final graph:
>>     >
>>     >
>>    
+------------------------------------------------------------------------------+
>>     > 1: volume gv0-client-0
>>     > 2: type protocol/client
>>     > 3: option clnt-lk-version 1
>>     > 4: option volfile-checksum 0
>>     > 5: option volfile-key gluster/glustershd
>>     > 6: option client-version 3.7.4
>>     > 7: option process-uuid
>>     >
eapps-gluster01-65147-2015/09/28-13:01:12:970131-gv0-client-0-0-0
>>     > 8: option fops-version 1298437
>>     > 9: option ping-timeout 42
>>     > 10: option remote-host eapps-gluster01.uwg.westga.edu
>>     <http://eapps-gluster01.uwg.westga.edu>
>>     > 11: option remote-subvolume /export/sdb1/gv0
>>     > 12: option transport-type socket
>>     > 13: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>>     > 14: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>>     > 15: end-volume
>>     > 16:
>>     > 17: volume gv0-client-1
>>     > 18: type protocol/client
>>     > 19: option ping-timeout 42
>>     > 20: option remote-host eapps-gluster02.uwg.westga.edu
>>     <http://eapps-gluster02.uwg.westga.edu>
>>     > 21: option remote-subvolume /export/sdb1/gv0
>>     > 22: option transport-type socket
>>     > 23: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>>     > 24: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>>     > 25: end-volume
>>     > 26:
>>     > 27: volume gv0-client-2
>>     > 28: type protocol/client
>>     > 29: option ping-timeout 42
>>     > 30: option remote-host eapps-gluster03.uwg.westga.edu
>>     <http://eapps-gluster03.uwg.westga.edu>
>>     > 31: option remote-subvolume /export/sdb1/gv0
>>     > 32: option transport-type socket
>>     > 33: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>>     > 34: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>>     > 35: end-volume
>>     > 36:
>>     > 37: volume gv0-replicate-0
>>     > 38: type cluster/replicate
>>     > 39: option node-uuid 416d712a-06fc-4b3c-a92f-8c82145626ff
>>     > 46: subvolumes gv0-client-0 gv0-client-1 gv0-client-2
>>     > 47: end-volume
>>     > 48:
>>     > 49: volume glustershd
>>     > 50: type debug/io-stats
>>     > 51: subvolumes gv0-replicate-0
>>     > 52: end-volume
>>     > 53:
>>     >
>>     >
>>    
+------------------------------------------------------------------------------+
>>     > [2015-09-28 13:01:13.154898] E [MSGID: 114058]
>>     > [client-handshake.c:1524:client_query_portmap_cbk]
0-gv0-client-2:
>>     failed
>>     > to get the port number for remote subvolume. Please run
'gluster
>>     volume
>>     > status' on server to see if brick process is running.
>>     > [2015-09-28 13:01:13.155031] I [MSGID: 114018]
>>     > [client.c:2042:client_rpc_notify] 0-gv0-client-2: disconnected
from
>>     > gv0-client-2. Client process will keep trying to connect to
>>     glusterd until
>>     > brick's port is available
>>     > [2015-09-28 13:01:13.155080] W [MSGID: 108001]
>>     > [afr-common.c:4081:afr_notify] 0-gv0-replicate-0:
Client-quorum is
>>     not met
>>     > [2015-09-29 08:11:24.728797] I [MSGID: 100011]
>>     > [glusterfsd.c:1291:reincarnate] 0-glusterfsd: Fetching the
volume
>>     file from
>>     > server...
>>     > [2015-09-29 08:11:24.763338] I
>>     [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>>     > 0-glusterfs: No change in volfile, continuing
>>     > [2015-09-29 12:50:41.915254] E [rpc-clnt.c:201:call_bail]
>>     0-gv0-client-2:
>>     > bailing out frame type(GF-DUMP) op(DUMP(1)) xid = 0xd91f sent
>>     2015-09-29
>>     > 12:20:36.092734. timeout = 1800 for 160.10.31.227:24007
>>     <http://160.10.31.227:24007>
>>     > [2015-09-29 12:50:41.923550] W [MSGID: 114032]
>>     > [client-handshake.c:1623:client_dump_version_cbk]
0-gv0-client-2:
>>     received
>>     > RPC status error [Transport endpoint is not connected]
>>     > [2015-09-30 23:54:36.547979] W [socket.c:588:__socket_rwv]
>>     0-glusterfs:
>>     > readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed
(No data
>>     available)
>>     > [2015-09-30 23:54:46.812870] E
[socket.c:2278:socket_connect_finish]
>>     > 0-glusterfs: connection to 127.0.0.1:24007
>>     <http://127.0.0.1:24007> failed (Connection refused)
>>     > [2015-10-01 00:14:20.997081] I
>>     [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>>     > 0-glusterfs: No change in volfile, continuing
>>     > [2015-10-01 00:15:36.770579] W [socket.c:588:__socket_rwv]
>>     0-gv0-client-2:
>>     > readv on 160.10.31.227:24007
<http://160.10.31.227:24007> failed
>>     (Connection reset by peer)
>>     > [2015-10-01 00:15:37.906708] E
[socket.c:2278:socket_connect_finish]
>>     > 0-gv0-client-2: connection to 160.10.31.227:24007
>>     <http://160.10.31.227:24007> failed (Connection
>>     > refused)
>>     > [2015-10-01 00:15:53.008130] W
[glusterfsd.c:1219:cleanup_and_exit]
>>     > (-->/lib64/libpthread.so.0() [0x3b91807a51]
>>     > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>>     > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] )
0-:
>>     received
>>     > signum (15), shutting down
>>     > [2015-10-01 00:15:53.008697] I
[timer.c:48:gf_timer_call_after]
>>     > (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3e2)
[0x3b9480f992]
>>     > -->/usr/lib64/libgfrpc.so.0(__save_frame+0x76)
[0x3b9480f046]
>>     > -->/usr/lib64/libglusterfs.so.0(gf_timer_call_after+0x1b1)
>>     [0x3b93447881] )
>>     > 0-timer: ctx cleanup started
>>     > [2015-10-01 00:15:53.994698] I [MSGID: 100030]
>>     [glusterfsd.c:2301:main]
>>     > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs
version
>>     3.7.4
>>     > (args: /usr/sbin/glusterfs -s localhost --volfile-id
>>     gluster/glustershd -p
>>     > /var/lib/glusterd/glustershd/run/glustershd.pid -l
>>     > /var/log/glusterfs/glustershd.log -S
>>     > /var/run/gluster/9a9819e90404187e84e67b01614bbe10.socket
>>     --xlator-option
>>     > *replicate*.node-uuid=416d712a-06fc-4b3c-a92f-8c82145626ff)
>>     > [2015-10-01 00:15:54.020401] I [MSGID: 101190]
>>     > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
Started
>>     thread
>>     > with index 1
>>     > [2015-10-01 00:15:54.086777] I
[graph.c:269:gf_add_cmdline_options]
>>     > 0-gv0-replicate-0: adding option 'node-uuid' for
volume
>>     'gv0-replicate-0'
>>     > with value '416d712a-06fc-4b3c-a92f-8c82145626ff'
>>     > [2015-10-01 00:15:54.093004] I [MSGID: 101190]
>>     > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
Started
>>     thread
>>     > with index 2
>>     > [2015-10-01 00:15:54.098144] I [MSGID: 114020]
[client.c:2118:notify]
>>     > 0-gv0-client-0: parent translators are ready, attempting
connect on
>>     > transport
>>     > [2015-10-01 00:15:54.107432] I [MSGID: 114020]
[client.c:2118:notify]
>>     > 0-gv0-client-1: parent translators are ready, attempting
connect on
>>     > transport
>>     > [2015-10-01 00:15:54.115962] I [MSGID: 114020]
[client.c:2118:notify]
>>     > 0-gv0-client-2: parent translators are ready, attempting
connect on
>>     > transport
>>     > [2015-10-01 00:15:54.120474] E
[socket.c:2278:socket_connect_finish]
>>     > 0-gv0-client-1: connection to 160.10.31.64:24007
>>     <http://160.10.31.64:24007> failed (Connection
>>     > refused)
>>     > [2015-10-01 00:15:54.120639] I
[rpc-clnt.c:1851:rpc_clnt_reconfig]
>>     > 0-gv0-client-0: changing port to 49152 (from 0)
>>     > Final graph:
>>     >
>>     >
>>    
+------------------------------------------------------------------------------+
>>     > 1: volume gv0-client-0
>>     > 2: type protocol/client
>>     > 3: option ping-timeout 42
>>     > 4: option remote-host eapps-gluster01.uwg.westga.edu
>>     <http://eapps-gluster01.uwg.westga.edu>
>>     > 5: option remote-subvolume /export/sdb1/gv0
>>     > 6: option transport-type socket
>>     > 7: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>>     > 8: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>>     > 9: end-volume
>>     > 10:
>>     > 11: volume gv0-client-1
>>     > 12: type protocol/client
>>     > 13: option ping-timeout 42
>>     > 14: option remote-host eapps-gluster02.uwg.westga.edu
>>     <http://eapps-gluster02.uwg.westga.edu>
>>     > 15: option remote-subvolume /export/sdb1/gv0
>>     > 16: option transport-type socket
>>     > 17: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>>     > 18: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>>     > 19: end-volume
>>     > 20:
>>     > 21: volume gv0-client-2
>>     > 22: type protocol/client
>>     > 23: option ping-timeout 42
>>     > 24: option remote-host eapps-gluster03.uwg.westga.edu
>>     <http://eapps-gluster03.uwg.westga.edu>
>>     > 25: option remote-subvolume /export/sdb1/gv0
>>     > 26: option transport-type socket
>>     > 27: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14
>>     > 28: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01
>>     > 29: end-volume
>>     > 30:
>>     > 31: volume gv0-replicate-0
>>     > 32: type cluster/replicate
>>     > 33: option node-uuid 416d712a-06fc-4b3c-a92f-8c82145626ff
>>     > 40: subvolumes gv0-client-0 gv0-client-1 gv0-client-2
>>     > 41: end-volume
>>     > 42:
>>     > 43: volume glustershd
>>     > 44: type debug/io-stats
>>     > 45: subvolumes gv0-replicate-0
>>     > 46: end-volume
>>     > 47:
>>     >
>>     >
>>    
+------------------------------------------------------------------------------+
>>     > [2015-10-01 00:15:54.135650] I [MSGID: 114057]
>>     > [client-handshake.c:1437:select_server_supported_programs]
>>     0-gv0-client-0:
>>     > Using Program GlusterFS 3.3, Num (1298437), Version (330)
>>     > [2015-10-01 00:15:54.136223] I [MSGID: 114046]
>>     > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-0:
>>     Connected to
>>     > gv0-client-0, attached to remote volume
'/export/sdb1/gv0'.
>>     > [2015-10-01 00:15:54.136262] I [MSGID: 114047]
>>     > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-0:
>>     Server and
>>     > Client lk-version numbers are not same, reopening the fds
>>     > [2015-10-01 00:15:54.136410] I [MSGID: 108005]
>>     > [afr-common.c:3998:afr_notify] 0-gv0-replicate-0: Subvolume
>>     'gv0-client-0'
>>     > came back up; going online.
>>     > [2015-10-01 00:15:54.136500] I [MSGID: 114035]
>>     > [client-handshake.c:193:client_set_lk_version_cbk]
0-gv0-client-0:
>>     Server
>>     > lk version = 1
>>     > [2015-10-01 00:15:54.401702] E [MSGID: 114058]
>>     > [client-handshake.c:1524:client_query_portmap_cbk]
0-gv0-client-2:
>>     failed
>>     > to get the port number for remote subvolume. Please run
'gluster
>>     volume
>>     > status' on server to see if brick process is running.
>>     > [2015-10-01 00:15:54.401834] I [MSGID: 114018]
>>     > [client.c:2042:client_rpc_notify] 0-gv0-client-2: disconnected
from
>>     > gv0-client-2. Client process will keep trying to connect to
>>     glusterd until
>>     > brick's port is available
>>     > [2015-10-01 00:15:54.401878] W [MSGID: 108001]
>>     > [afr-common.c:4081:afr_notify] 0-gv0-replicate-0:
Client-quorum is
>>     not met
>>     > [2015-10-01 03:57:52.755426] E
[socket.c:2278:socket_connect_finish]
>>     > 0-gv0-client-2: connection to 160.10.31.227:24007
>>     <http://160.10.31.227:24007> failed (Connection
>>     > refused)
>>     > [2015-10-01 13:50:49.000708] E
[socket.c:2278:socket_connect_finish]
>>     > 0-gv0-client-2: connection to 160.10.31.227:24007
>>     <http://160.10.31.227:24007> failed (Connection
>>     > timed out)
>>     > [2015-10-01 14:36:40.481673] E [MSGID: 114058]
>>     > [client-handshake.c:1524:client_query_portmap_cbk]
0-gv0-client-1:
>>     failed
>>     > to get the port number for remote subvolume. Please run
'gluster
>>     volume
>>     > status' on server to see if brick process is running.
>>     > [2015-10-01 14:36:40.481833] I [MSGID: 114018]
>>     > [client.c:2042:client_rpc_notify] 0-gv0-client-1: disconnected
from
>>     > gv0-client-1. Client process will keep trying to connect to
>>     glusterd until
>>     > brick's port is available
>>     > [2015-10-01 14:36:41.982037] I
[rpc-clnt.c:1851:rpc_clnt_reconfig]
>>     > 0-gv0-client-1: changing port to 49152 (from 0)
>>     > [2015-10-01 14:36:41.993478] I [MSGID: 114057]
>>     > [client-handshake.c:1437:select_server_supported_programs]
>>     0-gv0-client-1:
>>     > Using Program GlusterFS 3.3, Num (1298437), Version (330)
>>     > [2015-10-01 14:36:41.994568] I [MSGID: 114046]
>>     > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1:
>>     Connected to
>>     > gv0-client-1, attached to remote volume
'/export/sdb1/gv0'.
>>     > [2015-10-01 14:36:41.994647] I [MSGID: 114047]
>>     > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1:
>>     Server and
>>     > Client lk-version numbers are not same, reopening the fds
>>     > [2015-10-01 14:36:41.994899] I [MSGID: 108002]
>>     > [afr-common.c:4077:afr_notify] 0-gv0-replicate-0:
Client-quorum is met
>>     > [2015-10-01 14:36:42.002275] I [MSGID: 114035]
>>     > [client-handshake.c:193:client_set_lk_version_cbk]
0-gv0-client-1:
>>     Server
>>     > lk version = 1
>>     >
>>     >
>>     >
>>     >
>>     > Thanks,
>>     > Gene Liverman
>>     > Systems Integration Architect
>>     > Information Technology Services
>>     > University of West Georgia
>>     > gliverma at westga.edu <mailto:gliverma at westga.edu>
>>     >
>>     > ITS: Making Technology Work for You!
>>     >
>>     >
>>     >
>>     > On Wed, Sep 30, 2015 at 10:54 PM, Gaurav Garg < ggarg at
redhat.com
>>     <mailto:ggarg at redhat.com> > wrote:
>>     >
>>     >
>>     > Hi Gene,
>>     >
>>     > Could you paste or attach core file/glusterd log file/cmd
history
>>     to find
>>     > out actual RCA of the crash. What steps you performed for this
crash.
>>     >
>>     > >> How can I troubleshoot this?
>>     >
>>     > If you want to troubleshoot this then you can look into the
>>     glusterd log
>>     > file, core file.
>>     >
>>     > Thank you..
>>     >
>>     > Regards,
>>     > Gaurav
>>     >
>>     > ----- Original Message -----
>>     > From: "Gene Liverman" < gliverma at westga.edu
>>     <mailto:gliverma at westga.edu> >
>>     > To: gluster-users at gluster.org <mailto:gluster-users at
gluster.org>
>>     > Sent: Thursday, October 1, 2015 7:59:47 AM
>>     > Subject: [Gluster-users] glusterd crashing
>>     >
>>     > In the last few days I've started having issues with my
glusterd
>>     service
>>     > crashing. When it goes down it seems to do so on all nodes in
my
>>     replicated
>>     > volume. How can I troubleshoot this? I'm on a mix of
CentOS 6 and
>>     RHEL 6.
>>     > Thanks!
>>     >
>>     >
>>     >
>>     > Gene Liverman
>>     > Systems Integration Architect
>>     > Information Technology Services
>>     > University of West Georgia
>>     > gliverma at westga.edu <mailto:gliverma at westga.edu>
>>     >
>>     >
>>     > Sent from Outlook on my iPhone
>>     >
>>     >
>>     > _______________________________________________
>>     > Gluster-users mailing list
>>     > Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>     > http://www.gluster.org/mailman/listinfo/gluster-users
>>     >
>>     >
>>     > _______________________________________________
>>     > Gluster-users mailing list
>>     > Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>     > http://www.gluster.org/mailman/listinfo/gluster-users
>>     >
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Gluster users - Oct 2015 - glusterd crashing

[Gluster-users] glusterd crashing

[Gluster-users] glusterd crashing