This looks like a glibc corruption to me. Which distribution platform are
you running Gluster on?
-Atin
Sent from one plus one
On Oct 7, 2015 9:12 PM, "Gene Liverman" <gliverma at westga.edu>
wrote:
> Both of the requested trace commands are below:
>
> Core was generated by `/usr/sbin/glusterd
> --pid-file=/var/run/glusterd.pid'.
> Program terminated with signal 6, Aborted.
> #0 0x0000003b91432625 in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>
>
>
> (gdb) bt
> #0 0x0000003b91432625 in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1 0x0000003b91433e05 in abort () at abort.c:92
> #2 0x0000003b91470537 in __libc_message (do_abort=2, fmt=0x3b915588c0
> "*** glibc detected *** %s: %s: 0x%s ***\n") at
> ../sysdeps/unix/sysv/linux/libc_fatal.c:198
> #3 0x0000003b91475f4e in malloc_printerr (action=3, str=0x3b9155687d
> "corrupted double-linked list", ptr=<value optimized out>,
ar_ptr=<value
> optimized out>) at malloc.c:6350
> #4 0x0000003b914763d3 in malloc_consolidate (av=0x7fee90000020) at
> malloc.c:5216
> #5 0x0000003b91479c28 in _int_malloc (av=0x7fee90000020, bytes=<value
> optimized out>) at malloc.c:4415
> #6 0x0000003b9147a7ed in __libc_calloc (n=<value optimized out>,
> elem_size=<value optimized out>) at malloc.c:4093
> #7 0x0000003b9345c81f in __gf_calloc (nmemb=<value optimized out>,
> size=<value optimized out>, type=59, typestr=0x7fee9ed2d708
> "gf_common_mt_rpc_trans_t") at mem-pool.c:117
> #8 0x00007fee9ed2830b in socket_server_event_handler (fd=<value
optimized
> out>, idx=<value optimized out>, data=0xf3eca0, poll_in=1,
poll_out=<value
> optimized out>,
> poll_err=<value optimized out>) at socket.c:2622
> #9 0x0000003b9348b0a0 in event_dispatch_epoll_handler (data=0xf408b0) at
> event-epoll.c:575
> #10 event_dispatch_epoll_worker (data=0xf408b0) at event-epoll.c:678
> #11 0x0000003b91807a51 in start_thread (arg=0x7fee9db3b700) at
> pthread_create.c:301
> #12 0x0000003b914e893d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
>
>
>
> (gdb) t a a bt
>
> Thread 9 (Thread 0x7fee9e53c700 (LWP 37122)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
> #1 0x00007fee9fffcf93 in hooks_worker (args=<value optimized out>)
at
> glusterd-hooks.c:534
> #2 0x0000003b91807a51 in start_thread (arg=0x7fee9e53c700) at
> pthread_create.c:301
> #3 0x0000003b914e893d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> Thread 8 (Thread 0x7feea0c99700 (LWP 36996)):
> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
> #1 0x0000003b9346cbdb in syncenv_task (proc=0xefa8c0) at syncop.c:607
> #2 0x0000003b93472cb0 in syncenv_processor (thdata=0xefa8c0) at
> syncop.c:699
> #3 0x0000003b91807a51 in start_thread (arg=0x7feea0c99700) at
> pthread_create.c:301
> #4 0x0000003b914e893d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> Thread 7 (Thread 0x7feea209b700 (LWP 36994)):
> #0 do_sigwait (set=<value optimized out>, sig=0x7feea209ae5c) at
> ../sysdeps/unix/sysv/linux/sigwait.c:65
> #1 __sigwait (set=<value optimized out>, sig=0x7feea209ae5c) at
> ../sysdeps/unix/sysv/linux/sigwait.c:100
> #2 0x0000000000405dfb in glusterfs_sigwaiter (arg=<value optimized
out>)
> at glusterfsd.c:1989
> #3 0x0000003b91807a51 in start_thread (arg=0x7feea209b700) at
> pthread_create.c:301
> #4 0x0000003b914e893d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> Thread 6 (Thread 0x7feea2a9c700 (LWP 36993)):
> #0 0x0000003b9180efbd in nanosleep () at
> ../sysdeps/unix/syscall-template.S:82
> #1 0x0000003b934473ea in gf_timer_proc (ctx=0xecc010) at timer.c:205
> #2 0x0000003b91807a51 in start_thread (arg=0x7feea2a9c700) at
> pthread_create.c:301
> #3 0x0000003b914e893d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> Thread 5 (Thread 0x7feea9e04740 (LWP 36992)):
> #0 0x0000003b918082ad in pthread_join (threadid=140662814254848,
> thread_return=0x0) at pthread_join.c:89
> #1 0x0000003b9348ab4d in event_dispatch_epoll (event_pool=0xeeb5b0) at
> event-epoll.c:762
> #2 0x0000000000407b24 in main (argc=2, argv=0x7fff5294adc8) at
> glusterfsd.c:2333
>
> Thread 4 (Thread 0x7feea169a700 (LWP 36995)):
> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
> #1 0x0000003b9346cbdb in syncenv_task (proc=0xefa500) at syncop.c:607
> #2 0x0000003b93472cb0 in syncenv_processor (thdata=0xefa500) at
> syncop.c:699
> #3 0x0000003b91807a51 in start_thread (arg=0x7feea169a700) at
> pthread_create.c:301
> #4 0x0000003b914e893d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> Thread 3 (Thread 0x7fee9d13a700 (LWP 37124)):
> #0 0x0000003b914e8f33 in epoll_wait () at
> ../sysdeps/unix/syscall-template.S:82
> #1 0x0000003b9348aed1 in event_dispatch_epoll_worker (data=0xf405b0) at
> event-epoll.c:668
> #2 0x0000003b91807a51 in start_thread (arg=0x7fee9d13a700) at
> pthread_create.c:301
> #3 0x0000003b914e893d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> Thread 2 (Thread 0x7fee97fff700 (LWP 37125)):
> #0 0x0000003b914e8f33 in epoll_wait () at
> ../sysdeps/unix/syscall-template.S:82
> #1 0x0000003b9348aed1 in event_dispatch_epoll_worker (data=0xf6b4d0) at
> event-epoll.c:668
> #2 0x0000003b91807a51 in start_thread (arg=0x7fee97fff700) at
> pthread_create.c:301
> #3 0x0000003b914e893d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> Thread 1 (Thread 0x7fee9db3b700 (LWP 37123)):
> #0 0x0000003b91432625 in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1 0x0000003b91433e05 in abort () at abort.c:92
> #2 0x0000003b91470537 in __libc_message (do_abort=2, fmt=0x3b915588c0
> "*** glibc detected *** %s: %s: 0x%s ***\n") at
> ../sysdeps/unix/sysv/linux/libc_fatal.c:198
>
> ---Type <return> to continue, or q <return> to quit---
> #3 0x0000003b91475f4e in malloc_printerr (action=3, str=0x3b9155687d
> "corrupted double-linked list", ptr=<value optimized out>,
ar_ptr=<value
> optimized out>) at malloc.c:6350
> #4 0x0000003b914763d3 in malloc_consolidate (av=0x7fee90000020) at
> malloc.c:5216
> #5 0x0000003b91479c28 in _int_malloc (av=0x7fee90000020, bytes=<value
> optimized out>) at malloc.c:4415
> #6 0x0000003b9147a7ed in __libc_calloc (n=<value optimized out>,
> elem_size=<value optimized out>) at malloc.c:4093
> #7 0x0000003b9345c81f in __gf_calloc (nmemb=<value optimized out>,
> size=<value optimized out>, type=59, typestr=0x7fee9ed2d708
> "gf_common_mt_rpc_trans_t") at mem-pool.c:117
> #8 0x00007fee9ed2830b in socket_server_event_handler (fd=<value
optimized
> out>, idx=<value optimized out>, data=0xf3eca0, poll_in=1,
poll_out=<value
> optimized out>,
> poll_err=<value optimized out>) at socket.c:2622
> #9 0x0000003b9348b0a0 in event_dispatch_epoll_handler (data=0xf408b0) at
> event-epoll.c:575
> #10 event_dispatch_epoll_worker (data=0xf408b0) at event-epoll.c:678
> #11 0x0000003b91807a51 in start_thread (arg=0x7fee9db3b700) at
> pthread_create.c:301
> #12 0x0000003b914e893d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
>
>
>
>
>
>
> --
> *Gene Liverman*
> Systems Integration Architect
> Information Technology Services
> University of West Georgia
> gliverma at westga.edu
> 678.839.5492
>
> ITS: Making Technology Work for You!
>
>
>
>
> On Wed, Oct 7, 2015 at 12:06 AM, Atin Mukherjee <amukherj at
redhat.com>
> wrote:
>
>>
>>
>> On 10/07/2015 09:34 AM, Atin Mukherjee wrote:
>> >
>> >
>> > On 10/06/2015 08:15 PM, Gene Liverman wrote:
>> >> Sorry for the delay... they joys of multiple proverbial fires
at once.
>> >> In /var/log/messages I found this for our most recent crash:
>> >>
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> pending frames:
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> patchset: git://git.gluster.com/glusterfs.git
>> >> <http://git.gluster.com/glusterfs.git>
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> signal received: 6
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]: time
>> >> of crash:
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> 2015-10-03 04:26:21
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> configuration details:
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> argp 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> backtrace 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> dlfcn 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> libpthread 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> llistxattr 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> setfsid 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> spinlock 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> epoll.h 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> xattr.h 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> st_atim.tv_nsec 1
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> >> package-string: glusterfs 3.7.4
>> >> Oct 3 00:26:21 eapps-gluster01
etc-glusterfs-glusterd.vol[36992]:
>> ---------
>> >>
>> >>
>> >> I have posted etc-glusterfs-glusterd.vol.log
>> >> to http://pastebin.com/Pzq1j5J3. I also put the core file and
an
>> >> sosreport on my web server for you but don't want to leave
them there
>> >> for long so I'd appreciate it if you'd let me know
once you get them.
>> >> They are at the following url's:
>> >> http://www.westga.edu/~gliverma/tmp-files/core.36992
>> > Could you get the backtrace and share with us with the following
>> commands:
>> >
>> > $ gdb glusterd2 <core file path>
>> > $ bt
>> Also "t a a bt" output in gdb might help.
>> >
>> >>
>>
http://www.westga.edu/~gliverma/tmp-files/sosreport-gliverman.gluster-crashing-20151006101239.tar.xz
>> >>
>>
http://www.westga.edu/~gliverma/tmp-files/sosreport-gliverman.gluster-crashing-20151006101239.tar.xz.md5
>> >>
>> >>
>> >>
>> >>
>> >> Thanks again for the help!
>> >> *Gene Liverman*
>> >> Systems Integration Architect
>> >> Information Technology Services
>> >> University of West Georgia
>> >> gliverma at westga.edu <mailto:gliverma at westga.edu>
>> >>
>> >> ITS: Making Technology Work for You!
>> >>
>> >>
>> >>
>> >>
>> >> On Fri, Oct 2, 2015 at 11:18 AM, Gaurav Garg <ggarg at
redhat.com
>> >> <mailto:ggarg at redhat.com>> wrote:
>> >>
>> >> >> Pulling those logs now but how do I generate the
core file you
>> are asking
>> >> for?
>> >>
>> >> When there is crash then core file automatically generated
based on
>> >> your *ulimit* set option. you can find location of core
file in
>> your
>> >> root or current working directory or where ever you have
set your
>> >> core dump file location. core file gives you information
regarding
>> >> crash, where exactly crash happened.
>> >> you can find appropriate core file by looking at crash
time in
>> >> glusterd log's by searching "crash" keyword.
you can also paste few
>> >> line's just above latest "crash" keyword in
glusterd logs.
>> >>
>> >> Just for your curiosity if you willing to look where it
crash then
>> >> you can debug it by #gdb -c <location of core file>
glusterd
>> >>
>> >> Thank you...
>> >>
>> >> Regards,
>> >> Gaurav
>> >>
>> >> ----- Original Message -----
>> >> From: "Gene Liverman" <gliverma at westga.edu
<mailto:
>> gliverma at westga.edu>>
>> >> To: "Gaurav Garg" <ggarg at redhat.com
<mailto:ggarg at redhat.com>>
>> >> Cc: "gluster-users" <gluster-users at
gluster.org
>> >> <mailto:gluster-users at gluster.org>>
>> >> Sent: Friday, October 2, 2015 8:28:49 PM
>> >> Subject: Re: [Gluster-users] glusterd crashing
>> >>
>> >> Pulling those logs now but how do I generate the core file
you are
>> >> asking
>> >> for?
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> *Gene Liverman*
>> >> Systems Integration Architect
>> >> Information Technology Services
>> >> University of West Georgia
>> >> gliverma at westga.edu <mailto:gliverma at
westga.edu>
>> >> 678.839.5492 <tel:678.839.5492>
>> >>
>> >> ITS: Making Technology Work for You!
>> >>
>> >>
>> >>
>> >>
>> >> On Fri, Oct 2, 2015 at 2:25 AM, Gaurav Garg <ggarg at
redhat.com
>> >> <mailto:ggarg at redhat.com>> wrote:
>> >>
>> >> > Hi Gene,
>> >> >
>> >> > you have paste glustershd log. we asked you to paste
glusterd
>> log.
>> >> > glusterd and glustershd both are different process.
with this
>> >> information
>> >> > we can't find out why your glusterd crashed.
could you paste
>> >> *glusterd*
>> >> > logs
>> (/var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log*) in
>> >> > pastebin (not in this mail thread) and give the link
of pastebin
>> >> in this
>> >> > mail thread. Can you also attach core file or you can
paste
>> >> backtrace of
>> >> > that core dump file.
>> >> > It will be great if you give us sos report of the
node where the
>> crash
>> >> > happen.
>> >> >
>> >> > Thanx,
>> >> >
>> >> > ~Gaurav
>> >> >
>> >> > ----- Original Message -----
>> >> > From: "Gene Liverman" <gliverma at
westga.edu
>> >> <mailto:gliverma at westga.edu>>
>> >> > To: "gluster-users" <gluster-users at
gluster.org
>> >> <mailto:gluster-users at gluster.org>>
>> >> > Sent: Friday, October 2, 2015 4:47:00 AM
>> >> > Subject: Re: [Gluster-users] glusterd crashing
>> >> >
>> >> > Sorry for the delay. Here is what's installed:
>> >> > # rpm -qa | grep gluster
>> >> > glusterfs-geo-replication-3.7.4-2.el6.x86_64
>> >> > glusterfs-client-xlators-3.7.4-2.el6.x86_64
>> >> > glusterfs-3.7.4-2.el6.x86_64
>> >> > glusterfs-libs-3.7.4-2.el6.x86_64
>> >> > glusterfs-api-3.7.4-2.el6.x86_64
>> >> > glusterfs-fuse-3.7.4-2.el6.x86_64
>> >> > glusterfs-server-3.7.4-2.el6.x86_64
>> >> > glusterfs-cli-3.7.4-2.el6.x86_64
>> >> >
>> >> > The cmd_history.log file is attached.
>> >> > In gluster.log I have filtered out a bunch of lines
like the one
>> >> below due
>> >> > to make them more readable. I had a node down for
multiple days
>> due to
>> >> > maintenance and another one went down due to a
hardware failure
>> >> during that
>> >> > time too.
>> >> > [2015-10-01 00:16:09.643631] W [MSGID: 114031]
>> >> > [client-rpc-fops.c:2971:client3_3_lookup_cbk]
0-gv0-client-0:
>> remote
>> >> > operation failed. Path:
>> <gfid:31f17f8c-6c96-4440-88c0-f813b3c8d364>
>> >> > (31f17f8c-6c96-4440-88c0-f813b3c8d364) [No such file
or
>> directory]
>> >> >
>> >> > I also filtered out a boat load of self heal lines
like these
>> two:
>> >> > [2015-10-01 15:14:14.851015] I [MSGID: 108026]
>> >> >
[afr-self-heal-metadata.c:56:__afr_selfheal_metadata_do]
>> >> 0-gv0-replicate-0:
>> >> > performing metadata selfheal on
>> f78a47db-a359-430d-a655-1d217eb848c3
>> >> > [2015-10-01 15:14:14.856392] I [MSGID: 108026]
>> >> > [afr-self-heal-common.c:651:afr_log_selfheal]
0-gv0-replicate-0:
>> >> Completed
>> >> > metadata selfheal on
f78a47db-a359-430d-a655-1d217eb848c3.
>> >> source=0 sinks=1
>> >> >
>> >> >
>> >> > [root at eapps-gluster01 glusterfs]# cat
glustershd.log |grep -v
>> 'remote
>> >> > operation failed' |grep -v 'self-heal'
>> >> > [2015-09-27 08:46:56.893125] E
[rpc-clnt.c:201:call_bail]
>> 0-glusterfs:
>> >> > bailing out frame type(GlusterFS Handshake)
op(GETSPEC(2)) xid >> >> 0x6 sent >> >> >
2015-09-27 08:16:51.742731. timeout = 1800 for 127.0.0.1:24007
>> >> <http://127.0.0.1:24007>
>> >> > [2015-09-28 12:54:17.524924] W
[socket.c:588:__socket_rwv]
>> >> 0-glusterfs:
>> >> > readv on 127.0.0.1:24007
<http://127.0.0.1:24007> failed
>> >> (Connection reset by peer)
>> >> > [2015-09-28 12:54:27.844374] I
>> >> [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>> >> > 0-glusterfs: No change in volfile, continuing
>> >> > [2015-09-28 12:57:03.485027] W
[socket.c:588:__socket_rwv]
>> >> 0-gv0-client-2:
>> >> > readv on 160.10.31.227:24007
<http://160.10.31.227:24007> failed
>> >> (Connection reset by peer)
>> >> > [2015-09-28 12:57:05.872973] E
>> [socket.c:2278:socket_connect_finish]
>> >> > 0-gv0-client-2: connection to 160.10.31.227:24007
>> >> <http://160.10.31.227:24007> failed (Connection
>> >> > refused)
>> >> > [2015-09-28 12:57:38.490578] W
[socket.c:588:__socket_rwv]
>> >> 0-glusterfs:
>> >> > readv on 127.0.0.1:24007
<http://127.0.0.1:24007> failed (No
>> data
>> >> available)
>> >> > [2015-09-28 12:57:49.054475] I
>> >> [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>> >> > 0-glusterfs: No change in volfile, continuing
>> >> > [2015-09-28 13:01:12.062960] W
>> [glusterfsd.c:1219:cleanup_and_exit]
>> >> > (-->/lib64/libpthread.so.0() [0x3c65e07a51]
>> >> > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd)
[0x405e4d]
>> >> > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65)
[0x4059b5] ) 0-:
>> >> received
>> >> > signum (15), shutting down
>> >> > [2015-09-28 13:01:12.981945] I [MSGID: 100030]
>> >> [glusterfsd.c:2301:main]
>> >> > 0-/usr/sbin/glusterfs: Started running
/usr/sbin/glusterfs
>> version
>> >> 3.7.4
>> >> > (args: /usr/sbin/glusterfs -s localhost --volfile-id
>> >> gluster/glustershd -p
>> >> > /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> >> > /var/log/glusterfs/glustershd.log -S
>> >> >
/var/run/gluster/9a9819e90404187e84e67b01614bbe10.socket
>> >> --xlator-option
>> >> >
*replicate*.node-uuid=416d712a-06fc-4b3c-a92f-8c82145626ff)
>> >> > [2015-09-28 13:01:13.009171] I [MSGID: 101190]
>> >> > [event-epoll.c:632:event_dispatch_epoll_worker]
0-epoll: Started
>> >> thread
>> >> > with index 1
>> >> > [2015-09-28 13:01:13.092483] I
>> [graph.c:269:gf_add_cmdline_options]
>> >> > 0-gv0-replicate-0: adding option 'node-uuid'
for volume
>> >> 'gv0-replicate-0'
>> >> > with value
'416d712a-06fc-4b3c-a92f-8c82145626ff'
>> >> > [2015-09-28 13:01:13.100856] I [MSGID: 101190]
>> >> > [event-epoll.c:632:event_dispatch_epoll_worker]
0-epoll: Started
>> >> thread
>> >> > with index 2
>> >> > [2015-09-28 13:01:13.103995] I [MSGID: 114020]
>> [client.c:2118:notify]
>> >> > 0-gv0-client-0: parent translators are ready,
attempting connect
>> on
>> >> > transport
>> >> > [2015-09-28 13:01:13.114745] I [MSGID: 114020]
>> [client.c:2118:notify]
>> >> > 0-gv0-client-1: parent translators are ready,
attempting connect
>> on
>> >> > transport
>> >> > [2015-09-28 13:01:13.115725] I
>> [rpc-clnt.c:1851:rpc_clnt_reconfig]
>> >> > 0-gv0-client-0: changing port to 49152 (from 0)
>> >> > [2015-09-28 13:01:13.125619] I [MSGID: 114020]
>> [client.c:2118:notify]
>> >> > 0-gv0-client-2: parent translators are ready,
attempting connect
>> on
>> >> > transport
>> >> > [2015-09-28 13:01:13.132316] E
>> [socket.c:2278:socket_connect_finish]
>> >> > 0-gv0-client-1: connection to 160.10.31.64:24007
>> >> <http://160.10.31.64:24007> failed (Connection
>> >> > refused)
>> >> > [2015-09-28 13:01:13.132650] I [MSGID: 114057]
>> >> >
[client-handshake.c:1437:select_server_supported_programs]
>> >> 0-gv0-client-0:
>> >> > Using Program GlusterFS 3.3, Num (1298437), Version
(330)
>> >> > [2015-09-28 13:01:13.133322] I [MSGID: 114046]
>> >> > [client-handshake.c:1213:client_setvolume_cbk]
0-gv0-client-0:
>> >> Connected to
>> >> > gv0-client-0, attached to remote volume
'/export/sdb1/gv0'.
>> >> > [2015-09-28 13:01:13.133365] I [MSGID: 114047]
>> >> > [client-handshake.c:1224:client_setvolume_cbk]
0-gv0-client-0:
>> >> Server and
>> >> > Client lk-version numbers are not same, reopening the
fds
>> >> > [2015-09-28 13:01:13.133782] I [MSGID: 108005]
>> >> > [afr-common.c:3998:afr_notify] 0-gv0-replicate-0:
Subvolume
>> >> 'gv0-client-0'
>> >> > came back up; going online.
>> >> > [2015-09-28 13:01:13.133863] I [MSGID: 114035]
>> >> > [client-handshake.c:193:client_set_lk_version_cbk]
>> 0-gv0-client-0:
>> >> Server
>> >> > lk version = 1
>> >> > Final graph:
>> >> >
>> >> >
>> >>
>>
+------------------------------------------------------------------------------+
>> >> > 1: volume gv0-client-0
>> >> > 2: type protocol/client
>> >> > 3: option clnt-lk-version 1
>> >> > 4: option volfile-checksum 0
>> >> > 5: option volfile-key gluster/glustershd
>> >> > 6: option client-version 3.7.4
>> >> > 7: option process-uuid
>> >> >
eapps-gluster01-65147-2015/09/28-13:01:12:970131-gv0-client-0-0-0
>> >> > 8: option fops-version 1298437
>> >> > 9: option ping-timeout 42
>> >> > 10: option remote-host eapps-gluster01.uwg.westga.edu
>> >> <http://eapps-gluster01.uwg.westga.edu>
>> >> > 11: option remote-subvolume /export/sdb1/gv0
>> >> > 12: option transport-type socket
>> >> > 13: option username
0005f8fa-107a-4cc8-ac38-bb821c014c14
>> >> > 14: option password
379bae9a-6529-4564-a6f5-f5a9f7424d01
>> >> > 15: end-volume
>> >> > 16:
>> >> > 17: volume gv0-client-1
>> >> > 18: type protocol/client
>> >> > 19: option ping-timeout 42
>> >> > 20: option remote-host eapps-gluster02.uwg.westga.edu
>> >> <http://eapps-gluster02.uwg.westga.edu>
>> >> > 21: option remote-subvolume /export/sdb1/gv0
>> >> > 22: option transport-type socket
>> >> > 23: option username
0005f8fa-107a-4cc8-ac38-bb821c014c14
>> >> > 24: option password
379bae9a-6529-4564-a6f5-f5a9f7424d01
>> >> > 25: end-volume
>> >> > 26:
>> >> > 27: volume gv0-client-2
>> >> > 28: type protocol/client
>> >> > 29: option ping-timeout 42
>> >> > 30: option remote-host eapps-gluster03.uwg.westga.edu
>> >> <http://eapps-gluster03.uwg.westga.edu>
>> >> > 31: option remote-subvolume /export/sdb1/gv0
>> >> > 32: option transport-type socket
>> >> > 33: option username
0005f8fa-107a-4cc8-ac38-bb821c014c14
>> >> > 34: option password
379bae9a-6529-4564-a6f5-f5a9f7424d01
>> >> > 35: end-volume
>> >> > 36:
>> >> > 37: volume gv0-replicate-0
>> >> > 38: type cluster/replicate
>> >> > 39: option node-uuid
416d712a-06fc-4b3c-a92f-8c82145626ff
>> >> > 46: subvolumes gv0-client-0 gv0-client-1 gv0-client-2
>> >> > 47: end-volume
>> >> > 48:
>> >> > 49: volume glustershd
>> >> > 50: type debug/io-stats
>> >> > 51: subvolumes gv0-replicate-0
>> >> > 52: end-volume
>> >> > 53:
>> >> >
>> >> >
>> >>
>>
+------------------------------------------------------------------------------+
>> >> > [2015-09-28 13:01:13.154898] E [MSGID: 114058]
>> >> > [client-handshake.c:1524:client_query_portmap_cbk]
>> 0-gv0-client-2:
>> >> failed
>> >> > to get the port number for remote subvolume. Please
run 'gluster
>> >> volume
>> >> > status' on server to see if brick process is
running.
>> >> > [2015-09-28 13:01:13.155031] I [MSGID: 114018]
>> >> > [client.c:2042:client_rpc_notify] 0-gv0-client-2:
disconnected
>> from
>> >> > gv0-client-2. Client process will keep trying to
connect to
>> >> glusterd until
>> >> > brick's port is available
>> >> > [2015-09-28 13:01:13.155080] W [MSGID: 108001]
>> >> > [afr-common.c:4081:afr_notify] 0-gv0-replicate-0:
Client-quorum
>> is
>> >> not met
>> >> > [2015-09-29 08:11:24.728797] I [MSGID: 100011]
>> >> > [glusterfsd.c:1291:reincarnate] 0-glusterfsd:
Fetching the volume
>> >> file from
>> >> > server...
>> >> > [2015-09-29 08:11:24.763338] I
>> >> [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>> >> > 0-glusterfs: No change in volfile, continuing
>> >> > [2015-09-29 12:50:41.915254] E
[rpc-clnt.c:201:call_bail]
>> >> 0-gv0-client-2:
>> >> > bailing out frame type(GF-DUMP) op(DUMP(1)) xid =
0xd91f sent >> >> 2015-09-29
>> >> > 12:20:36.092734. timeout = 1800 for
160.10.31.227:24007
>> >> <http://160.10.31.227:24007>
>> >> > [2015-09-29 12:50:41.923550] W [MSGID: 114032]
>> >> > [client-handshake.c:1623:client_dump_version_cbk]
0-gv0-client-2:
>> >> received
>> >> > RPC status error [Transport endpoint is not
connected]
>> >> > [2015-09-30 23:54:36.547979] W
[socket.c:588:__socket_rwv]
>> >> 0-glusterfs:
>> >> > readv on 127.0.0.1:24007
<http://127.0.0.1:24007> failed (No
>> data
>> >> available)
>> >> > [2015-09-30 23:54:46.812870] E
>> [socket.c:2278:socket_connect_finish]
>> >> > 0-glusterfs: connection to 127.0.0.1:24007
>> >> <http://127.0.0.1:24007> failed (Connection refused)
>> >> > [2015-10-01 00:14:20.997081] I
>> >> [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
>> >> > 0-glusterfs: No change in volfile, continuing
>> >> > [2015-10-01 00:15:36.770579] W
[socket.c:588:__socket_rwv]
>> >> 0-gv0-client-2:
>> >> > readv on 160.10.31.227:24007
<http://160.10.31.227:24007> failed
>> >> (Connection reset by peer)
>> >> > [2015-10-01 00:15:37.906708] E
>> [socket.c:2278:socket_connect_finish]
>> >> > 0-gv0-client-2: connection to 160.10.31.227:24007
>> >> <http://160.10.31.227:24007> failed (Connection
>> >> > refused)
>> >> > [2015-10-01 00:15:53.008130] W
>> [glusterfsd.c:1219:cleanup_and_exit]
>> >> > (-->/lib64/libpthread.so.0() [0x3b91807a51]
>> >> > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd)
[0x405e4d]
>> >> > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65)
[0x4059b5] ) 0-:
>> >> received
>> >> > signum (15), shutting down
>> >> > [2015-10-01 00:15:53.008697] I
[timer.c:48:gf_timer_call_after]
>> >> >
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3e2)
>> [0x3b9480f992]
>> >> > -->/usr/lib64/libgfrpc.so.0(__save_frame+0x76)
[0x3b9480f046]
>> >> >
-->/usr/lib64/libglusterfs.so.0(gf_timer_call_after+0x1b1)
>> >> [0x3b93447881] )
>> >> > 0-timer: ctx cleanup started
>> >> > [2015-10-01 00:15:53.994698] I [MSGID: 100030]
>> >> [glusterfsd.c:2301:main]
>> >> > 0-/usr/sbin/glusterfs: Started running
/usr/sbin/glusterfs
>> version
>> >> 3.7.4
>> >> > (args: /usr/sbin/glusterfs -s localhost --volfile-id
>> >> gluster/glustershd -p
>> >> > /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> >> > /var/log/glusterfs/glustershd.log -S
>> >> >
/var/run/gluster/9a9819e90404187e84e67b01614bbe10.socket
>> >> --xlator-option
>> >> >
*replicate*.node-uuid=416d712a-06fc-4b3c-a92f-8c82145626ff)
>> >> > [2015-10-01 00:15:54.020401] I [MSGID: 101190]
>> >> > [event-epoll.c:632:event_dispatch_epoll_worker]
0-epoll: Started
>> >> thread
>> >> > with index 1
>> >> > [2015-10-01 00:15:54.086777] I
>> [graph.c:269:gf_add_cmdline_options]
>> >> > 0-gv0-replicate-0: adding option 'node-uuid'
for volume
>> >> 'gv0-replicate-0'
>> >> > with value
'416d712a-06fc-4b3c-a92f-8c82145626ff'
>> >> > [2015-10-01 00:15:54.093004] I [MSGID: 101190]
>> >> > [event-epoll.c:632:event_dispatch_epoll_worker]
0-epoll: Started
>> >> thread
>> >> > with index 2
>> >> > [2015-10-01 00:15:54.098144] I [MSGID: 114020]
>> [client.c:2118:notify]
>> >> > 0-gv0-client-0: parent translators are ready,
attempting connect
>> on
>> >> > transport
>> >> > [2015-10-01 00:15:54.107432] I [MSGID: 114020]
>> [client.c:2118:notify]
>> >> > 0-gv0-client-1: parent translators are ready,
attempting connect
>> on
>> >> > transport
>> >> > [2015-10-01 00:15:54.115962] I [MSGID: 114020]
>> [client.c:2118:notify]
>> >> > 0-gv0-client-2: parent translators are ready,
attempting connect
>> on
>> >> > transport
>> >> > [2015-10-01 00:15:54.120474] E
>> [socket.c:2278:socket_connect_finish]
>> >> > 0-gv0-client-1: connection to 160.10.31.64:24007
>> >> <http://160.10.31.64:24007> failed (Connection
>> >> > refused)
>> >> > [2015-10-01 00:15:54.120639] I
>> [rpc-clnt.c:1851:rpc_clnt_reconfig]
>> >> > 0-gv0-client-0: changing port to 49152 (from 0)
>> >> > Final graph:
>> >> >
>> >> >
>> >>
>>
+------------------------------------------------------------------------------+
>> >> > 1: volume gv0-client-0
>> >> > 2: type protocol/client
>> >> > 3: option ping-timeout 42
>> >> > 4: option remote-host eapps-gluster01.uwg.westga.edu
>> >> <http://eapps-gluster01.uwg.westga.edu>
>> >> > 5: option remote-subvolume /export/sdb1/gv0
>> >> > 6: option transport-type socket
>> >> > 7: option username
0005f8fa-107a-4cc8-ac38-bb821c014c14
>> >> > 8: option password
379bae9a-6529-4564-a6f5-f5a9f7424d01
>> >> > 9: end-volume
>> >> > 10:
>> >> > 11: volume gv0-client-1
>> >> > 12: type protocol/client
>> >> > 13: option ping-timeout 42
>> >> > 14: option remote-host eapps-gluster02.uwg.westga.edu
>> >> <http://eapps-gluster02.uwg.westga.edu>
>> >> > 15: option remote-subvolume /export/sdb1/gv0
>> >> > 16: option transport-type socket
>> >> > 17: option username
0005f8fa-107a-4cc8-ac38-bb821c014c14
>> >> > 18: option password
379bae9a-6529-4564-a6f5-f5a9f7424d01
>> >> > 19: end-volume
>> >> > 20:
>> >> > 21: volume gv0-client-2
>> >> > 22: type protocol/client
>> >> > 23: option ping-timeout 42
>> >> > 24: option remote-host eapps-gluster03.uwg.westga.edu
>> >> <http://eapps-gluster03.uwg.westga.edu>
>> >> > 25: option remote-subvolume /export/sdb1/gv0
>> >> > 26: option transport-type socket
>> >> > 27: option username
0005f8fa-107a-4cc8-ac38-bb821c014c14
>> >> > 28: option password
379bae9a-6529-4564-a6f5-f5a9f7424d01
>> >> > 29: end-volume
>> >> > 30:
>> >> > 31: volume gv0-replicate-0
>> >> > 32: type cluster/replicate
>> >> > 33: option node-uuid
416d712a-06fc-4b3c-a92f-8c82145626ff
>> >> > 40: subvolumes gv0-client-0 gv0-client-1 gv0-client-2
>> >> > 41: end-volume
>> >> > 42:
>> >> > 43: volume glustershd
>> >> > 44: type debug/io-stats
>> >> > 45: subvolumes gv0-replicate-0
>> >> > 46: end-volume
>> >> > 47:
>> >> >
>> >> >
>> >>
>>
+------------------------------------------------------------------------------+
>> >> > [2015-10-01 00:15:54.135650] I [MSGID: 114057]
>> >> >
[client-handshake.c:1437:select_server_supported_programs]
>> >> 0-gv0-client-0:
>> >> > Using Program GlusterFS 3.3, Num (1298437), Version
(330)
>> >> > [2015-10-01 00:15:54.136223] I [MSGID: 114046]
>> >> > [client-handshake.c:1213:client_setvolume_cbk]
0-gv0-client-0:
>> >> Connected to
>> >> > gv0-client-0, attached to remote volume
'/export/sdb1/gv0'.
>> >> > [2015-10-01 00:15:54.136262] I [MSGID: 114047]
>> >> > [client-handshake.c:1224:client_setvolume_cbk]
0-gv0-client-0:
>> >> Server and
>> >> > Client lk-version numbers are not same, reopening the
fds
>> >> > [2015-10-01 00:15:54.136410] I [MSGID: 108005]
>> >> > [afr-common.c:3998:afr_notify] 0-gv0-replicate-0:
Subvolume
>> >> 'gv0-client-0'
>> >> > came back up; going online.
>> >> > [2015-10-01 00:15:54.136500] I [MSGID: 114035]
>> >> > [client-handshake.c:193:client_set_lk_version_cbk]
>> 0-gv0-client-0:
>> >> Server
>> >> > lk version = 1
>> >> > [2015-10-01 00:15:54.401702] E [MSGID: 114058]
>> >> > [client-handshake.c:1524:client_query_portmap_cbk]
>> 0-gv0-client-2:
>> >> failed
>> >> > to get the port number for remote subvolume. Please
run 'gluster
>> >> volume
>> >> > status' on server to see if brick process is
running.
>> >> > [2015-10-01 00:15:54.401834] I [MSGID: 114018]
>> >> > [client.c:2042:client_rpc_notify] 0-gv0-client-2:
disconnected
>> from
>> >> > gv0-client-2. Client process will keep trying to
connect to
>> >> glusterd until
>> >> > brick's port is available
>> >> > [2015-10-01 00:15:54.401878] W [MSGID: 108001]
>> >> > [afr-common.c:4081:afr_notify] 0-gv0-replicate-0:
Client-quorum
>> is
>> >> not met
>> >> > [2015-10-01 03:57:52.755426] E
>> [socket.c:2278:socket_connect_finish]
>> >> > 0-gv0-client-2: connection to 160.10.31.227:24007
>> >> <http://160.10.31.227:24007> failed (Connection
>> >> > refused)
>> >> > [2015-10-01 13:50:49.000708] E
>> [socket.c:2278:socket_connect_finish]
>> >> > 0-gv0-client-2: connection to 160.10.31.227:24007
>> >> <http://160.10.31.227:24007> failed (Connection
>> >> > timed out)
>> >> > [2015-10-01 14:36:40.481673] E [MSGID: 114058]
>> >> > [client-handshake.c:1524:client_query_portmap_cbk]
>> 0-gv0-client-1:
>> >> failed
>> >> > to get the port number for remote subvolume. Please
run 'gluster
>> >> volume
>> >> > status' on server to see if brick process is
running.
>> >> > [2015-10-01 14:36:40.481833] I [MSGID: 114018]
>> >> > [client.c:2042:client_rpc_notify] 0-gv0-client-1:
disconnected
>> from
>> >> > gv0-client-1. Client process will keep trying to
connect to
>> >> glusterd until
>> >> > brick's port is available
>> >> > [2015-10-01 14:36:41.982037] I
>> [rpc-clnt.c:1851:rpc_clnt_reconfig]
>> >> > 0-gv0-client-1: changing port to 49152 (from 0)
>> >> > [2015-10-01 14:36:41.993478] I [MSGID: 114057]
>> >> >
[client-handshake.c:1437:select_server_supported_programs]
>> >> 0-gv0-client-1:
>> >> > Using Program GlusterFS 3.3, Num (1298437), Version
(330)
>> >> > [2015-10-01 14:36:41.994568] I [MSGID: 114046]
>> >> > [client-handshake.c:1213:client_setvolume_cbk]
0-gv0-client-1:
>> >> Connected to
>> >> > gv0-client-1, attached to remote volume
'/export/sdb1/gv0'.
>> >> > [2015-10-01 14:36:41.994647] I [MSGID: 114047]
>> >> > [client-handshake.c:1224:client_setvolume_cbk]
0-gv0-client-1:
>> >> Server and
>> >> > Client lk-version numbers are not same, reopening the
fds
>> >> > [2015-10-01 14:36:41.994899] I [MSGID: 108002]
>> >> > [afr-common.c:4077:afr_notify] 0-gv0-replicate-0:
Client-quorum
>> is met
>> >> > [2015-10-01 14:36:42.002275] I [MSGID: 114035]
>> >> > [client-handshake.c:193:client_set_lk_version_cbk]
>> 0-gv0-client-1:
>> >> Server
>> >> > lk version = 1
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Thanks,
>> >> > Gene Liverman
>> >> > Systems Integration Architect
>> >> > Information Technology Services
>> >> > University of West Georgia
>> >> > gliverma at westga.edu <mailto:gliverma at
westga.edu>
>> >> >
>> >> > ITS: Making Technology Work for You!
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Sep 30, 2015 at 10:54 PM, Gaurav Garg <
ggarg at redhat.com
>> >> <mailto:ggarg at redhat.com> > wrote:
>> >> >
>> >> >
>> >> > Hi Gene,
>> >> >
>> >> > Could you paste or attach core file/glusterd log
file/cmd history
>> >> to find
>> >> > out actual RCA of the crash. What steps you performed
for this
>> crash.
>> >> >
>> >> > >> How can I troubleshoot this?
>> >> >
>> >> > If you want to troubleshoot this then you can look
into the
>> >> glusterd log
>> >> > file, core file.
>> >> >
>> >> > Thank you..
>> >> >
>> >> > Regards,
>> >> > Gaurav
>> >> >
>> >> > ----- Original Message -----
>> >> > From: "Gene Liverman" < gliverma at
westga.edu
>> >> <mailto:gliverma at westga.edu> >
>> >> > To: gluster-users at gluster.org
<mailto:gluster-users at gluster.org>
>> >> > Sent: Thursday, October 1, 2015 7:59:47 AM
>> >> > Subject: [Gluster-users] glusterd crashing
>> >> >
>> >> > In the last few days I've started having issues
with my glusterd
>> >> service
>> >> > crashing. When it goes down it seems to do so on all
nodes in my
>> >> replicated
>> >> > volume. How can I troubleshoot this? I'm on a mix
of CentOS 6 and
>> >> RHEL 6.
>> >> > Thanks!
>> >> >
>> >> >
>> >> >
>> >> > Gene Liverman
>> >> > Systems Integration Architect
>> >> > Information Technology Services
>> >> > University of West Georgia
>> >> > gliverma at westga.edu <mailto:gliverma at
westga.edu>
>> >> >
>> >> >
>> >> > Sent from Outlook on my iPhone
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Gluster-users mailing list
>> >> > Gluster-users at gluster.org <mailto:Gluster-users
at gluster.org>
>> >> > http://www.gluster.org/mailman/listinfo/gluster-users
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Gluster-users mailing list
>> >> > Gluster-users at gluster.org <mailto:Gluster-users
at gluster.org>
>> >> > http://www.gluster.org/mailman/listinfo/gluster-users
>> >> >
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>> >>
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://www.gluster.org/mailman/listinfo/gluster-users
>> >
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151007/c62cba05/attachment.html>