Ravishankar N
2018-Oct-16 04:23 UTC
[Gluster-users] glustershd coredump generated while reboot all 3 sn nodes
Hi, - Is this stock glusterfs-3.12.3? Or do you have any patches applied on top of it? - If it is stock, could you create a BZ and attach the core file and the /var/log/glusterfs/ logs from 3 nodes at the time of crash? Thanks, Ravi On 10/16/2018 08:45 AM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:> > Hi, > > This issue happened twice recently, when glustershd do heal, it > generate coredump occassinally, > > I do some debug and find that sometimes > afr_selfheal_unlocked_discover_on do lookup and saved the frame in > function rpc_clnt_submit, when reply comes, it find the saved frame , > but the address is different from the saved frame address, I think > this is wrong, but I can not find a clue how this happened? > > [root at mn-0:/home/robot] > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > Core was generated by `/usr/sbin/glusterfs -s sn-0.local --volfile-id > gluster/glustershd -p /var/run/g'. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0? 0x00007fb1a6fd9d24 in client3_3_lookup_cbk (req=0x7fb188010fb0, > iov=0x7fb188010ff0, count=1, myframe=*0x7fb188215740*) at > client-rpc-fops.c:2802 > > 2802 client-rpc-fops.c: No such file or directory. > > [Current thread is 1 (Thread 0x7fb1a7a0e700 (LWP 8151))] > > Missing separate debuginfos, use: dnf debuginfo-install > rcp-pack-glusterfs-1.2.0-RCP2.wf29.x86_64 > > (gdb) bt > > #0? 0x00007fb1a6fd9d24 in client3_3_lookup_cbk (req=0x7fb188010fb0, > iov=0x7fb188010ff0, count=1, myframe=0x7fb188215740) at > client-rpc-fops.c:2802 > > #1? 0x00007fb1acf55d47 in rpc_clnt_handle_reply (clnt=0x7fb1a008fff0, > pollin=0x7fb1a0843910) at rpc-clnt.c:778 > > #2? 0x00007fb1acf562e5 in rpc_clnt_notify (trans=0x7fb1a00901c0, > mydata=0x7fb1a0090020, event=RPC_TRANSPORT_MSG_RECEIVED, > data=0x7fb1a0843910) at rpc-clnt.c:971 > > #3? 0x00007fb1acf52319 in rpc_transport_notify (this=0x7fb1a00901c0, > event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fb1a0843910) at > rpc-transport.c:538 > > #4? 0x00007fb1a7e9934d in socket_event_poll_in (this=0x7fb1a00901c0, > notify_handled=_gf_true) at socket.c:2315 > > #5? 0x00007fb1a7e99992 in socket_event_handler (fd=20, idx=14, > gen=103, data=0x7fb1a00901c0, poll_in=1, poll_out=0, poll_err=0) at > socket.c:2471 > > #6? 0x00007fb1ad2005ac in event_dispatch_epoll_handler > (event_pool=0x175fb00, event=0x7fb1a7a0de84) at event-epoll.c:583 > > #7? 0x00007fb1ad200883 in event_dispatch_epoll_worker (data=0x17a73d0) > at event-epoll.c:659 > > #8? 0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0 > > #9? 0x00007fb1ab822cbf in clone () from /lib64/libc.so.6 > > (gdb) info thread > > ? Id?? Target Id Frame > > * 1??? Thread 0x7fb1a7a0e700 (LWP 8151) 0x00007fb1a6fd9d24 in > client3_3_lookup_cbk (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1, > myframe=0x7fb188215740) at client-rpc-fops.c:2802 > > ? 2??? Thread 0x7fb1aa0af700 (LWP 8147) 0x00007fb1ab761cbc in > sigtimedwait () from /lib64/libc.so.6 > > ? 3??? Thread 0x7fb1a98ae700 (LWP 8148) 0x00007fb1ab7f04b0 in > nanosleep () from /lib64/libc.so.6 > > ? 4??? Thread 0x7fb1957fa700 (LWP 8266) 0x00007fb1abf528ca in > pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > ? 5??? Thread 0x7fb1a88ac700 (LWP 8150) 0x00007fb1abf528ca in > pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > ? 6??? Thread 0x7fb17f7fe700 (LWP 8269) 0x00007fb1abf5250c in > pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > ? 7??? Thread 0x7fb1aa8b0700 (LWP 8146) 0x00007fb1abf56300 in > nanosleep () from /lib64/libpthread.so.0 > > ? 8??? Thread 0x7fb1ad685780 (LWP 8145) 0x00007fb1abf4da3d in > __pthread_timedjoin_ex () from /lib64/libpthread.so.0 > > ? 9??? Thread 0x7fb1a542d700 (LWP 8251) 0x00007fb1ab7f04b0 in > nanosleep () from /lib64/libc.so.6 > > ? 10?? Thread 0x7fb1a4c2c700 (LWP 8260) 0x00007fb1abf528ca in > pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > ? 11?? Thread 0x7fb196ffd700 (LWP 8263) 0x00007fb1abf528ca in > pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > ? 12?? Thread 0x7fb1a60d7700 (LWP 8247) 0x00007fb1ab822fe7 in > epoll_wait () from /lib64/libc.so.6 > > ? 13?? Thread 0x7fb1a90ad700 (LWP 8149) 0x00007fb1abf528ca in > pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > (gdb) print (call_frame_t*)myframe > > $1 = (call_frame_t *) 0x7fb188215740 > > (gdb) print *(call_frame_t*)myframe > > $2 = {root = 0x7fb1a0085090, parent = 0xcd4642c4a3efd678, frames = > {next = 0x151e2a92a5ae1bb, prev = 0x0}, *local = 0x0, this = 0x0, ret > = 0x0*, ref_count = 0, lock = {spinlock = 0, mutex = {__data = { > > ??????? __lock = 0, __count = 0, __owner = 0, __nusers = 4, __kind = > 0, __spins = 0, __elision = 0, __list = {__prev = 0x7fb188215798, > __next = 0x7fb188215798}}, > > ??????__size = '\000' <repeats 12 times>, "\004", '\000' <repeats 11 > times>, "\230W!\210\261\177\000\000\230W!\210\261\177\000", __align = > 0}}, cookie = 0x7fb1882157a8, complete = (unknown: 2283886504), > > ??op = 32689, begin = {tv_sec = 140400469825464, tv_usec = > 140400469825464}, end = {tv_sec = 140400878737576, tv_usec = > 140400132101048}, wind_from = 0x7fb18801cdc0 "", wind_to = 0x0, > unwind_from = 0x0, > > ??unwind_to = 0x0} > > (gdb) thread 6 > > [Switching to thread 6 (Thread 0x7fb17f7fe700 (LWP 8269))] > > #0? 0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > > (gdb) bt > > #0? 0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > > #1? 0x00007fb1ad1dc993 in __syncbarrier_wait (barrier=0x7fb188014790, > waitfor=3) at syncop.c:1138 > > #2? 0x00007fb1ad1dc9e4 in syncbarrier_wait (barrier=0x7fb188014790, > waitfor=3) at syncop.c:1155 > > #3? 0x00007fb1a6d59cde in afr_selfheal_unlocked_discover_on > (*frame=0x7fb1882162d0*, inode=0x7fb188215740, gfid=0x7fb17f7fdb00 > "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177", > > ????replies=0x7fb17f7fcf40, discover_on=0x7fb1a0084cb0 "\001\001\001", > <incomplete sequence \360\255\272>) at afr-self-heal-common.c:1809 > > #4? 0x00007fb1a6d59d80 in afr_selfheal_unlocked_discover > (*frame=0x7fb1882162d0*, inode=0x7fb188215740, gfid=0x7fb17f7fdb00 > "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177", > > ????replies=0x7fb17f7fcf40) at afr-self-heal-common.c:1828 > > #5? 0x00007fb1a6d5e51f in afr_selfheal_unlocked_inspect > (frame=0x7fb1882162d0, this=0x7fb1a001db40, gfid=0x7fb17f7fdb00 > "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177", > > ????link_inode=0x7fb17f7fd9c8, data_selfheal=0x7fb17f7fd9c4, > metadata_selfheal=0x7fb17f7fd9c0, entry_selfheal=0x7fb17f7fd9bc) at > afr-self-heal-common.c:2241 > > #6? 0x00007fb1a6d5f19b in afr_selfheal_do (frame=0x7fb1882162d0, > this=0x7fb1a001db40, gfid=0x7fb17f7fdb00 > "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177") at > afr-self-heal-common.c:2483 > > #7? 0x00007fb1a6d5f346 in afr_selfheal (this=0x7fb1a001db40, > gfid=0x7fb17f7fdb00 > "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177") at > afr-self-heal-common.c:2543 > > #8? 0x00007fb1a6d6ac5c in afr_shd_selfheal (healer=0x7fb1a0085640, > child=0, gfid=0x7fb17f7fdb00 > "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177") at > afr-self-heald.c:343 > > #9? 0x00007fb1a6d6b00b in afr_shd_index_heal (subvol=0x7fb1a00171e0, > entry=0x7fb1a0714180, parent=0x7fb17f7fddc0, data=0x7fb1a0085640) at > afr-self-heald.c:440 > > #10 0x00007fb1ad201ed3 in syncop_mt_dir_scan (frame=0x7fb1a07a0e90, > subvol=0x7fb1a00171e0, loc=0x7fb17f7fddc0, pid=-6, > data=0x7fb1a0085640, fn=0x7fb1a6d6aebc <afr_shd_index_heal>, > xdata=0x7fb1a07b4ed0, > > ????max_jobs=1, max_qlen=1024) at syncop-utils.c:407 > > #11 0x00007fb1a6d6b2b5 in afr_shd_index_sweep (healer=0x7fb1a0085640, > vgfid=0x7fb1a6d93610 "glusterfs.xattrop_index_gfid") at > afr-self-heald.c:494 > > #12 0x00007fb1a6d6b394 in afr_shd_index_sweep_all > (healer=0x7fb1a0085640) at afr-self-heald.c:517 > > #13 0x00007fb1a6d6b697 in afr_shd_index_healer (data=0x7fb1a0085640) > at afr-self-heald.c:597 > > #14 0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0 > > #15 0x00007fb1ab822cbf in clone () from /lib64/libc.so.6 > > *From:*Zhou, Cynthia (NSB - CN/Hangzhou) > *Sent:* Thursday, October 11, 2018 3:36 PM > *To:* Ravishankar N <ravishankar at redhat.com> > *Cc:* gluster-users <gluster-users at gluster.org> > *Subject:* glustershd coredump generated while reboot all 3 sn nodes > > Hi, > > I find that when restart sn node sometimes, the glustershd will exit > and generate coredump. It has happened twice in my env, I would like > to know your opinion on this issue, thanks! > > The glusterfs version I use is glusterfs3.12.3 > > [root at sn-1:/root] > > # gluster v info log > > Volume Name: log > > Type: Replicate > > Volume ID: 87bcbaf8-5fa4-4060-9149-23f832befe92 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: sn-0.local:/mnt/bricks/log/brick > > Brick2: sn-1.local:/mnt/bricks/log/brick > > Brick3: sn-2.local:/mnt/bricks/log/brick > > Options Reconfigured: > > server.allow-insecure: on > > cluster.quorum-type: auto > > network.ping-timeout: 42 > > cluster.consistent-metadata: on > > cluster.favorite-child-policy: mtime > > cluster.quorum-reads: no > > cluster.server-quorum-type: none > > transport.address-family: inet > > nfs.disable: on > > performance.client-io-threads: off > > cluster.server-quorum-ratio: 51% > > [root at sn-1:/root] > > ///////////////////////////////////////////////glustershd > coredump//////////////////////////////////////////////////////////////// > > # lz4 -d > core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000.lz4 > > Decoding file > core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000 > > core.glusterfs.0.c5f : decoded 263188480 bytes > > [root at sn-0:/mnt/export] > > # gdb /usr/sbin/glusterfs > core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000 > > GNU gdb (GDB) Fedora 8.1-14.wf29 > > Copyright (C) 2018 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law.? Type "show copying" > > and "show warranty" for details. > > This GDB was configured as "x86_64-redhat-linux-gnu". > > Type "show configuration" for configuration details. > > For bug reporting instructions, please see: > > <http://www.gnu.org/software/gdb/bugs/>. > > Find the GDB manual and other documentation resources online at: > > <http://www.gnu.org/software/gdb/documentation/>. > > For help, type "help". > > Type "apropos word" to search for commands related to "word"... > > Reading symbols from /usr/sbin/glusterfs...(no debugging symbols > found)...done. > > warning: core file may not match specified executable file. > > [New LWP 1818] > > [New LWP 1812] > > [New LWP 1813] > > [New LWP 1817] > > [New LWP 1966] > > [New LWP 1968] > > [New LWP 1970] > > [New LWP 1974] > > [New LWP 1976] > > [New LWP 1814] > > [New LWP 1815] > > [New LWP 1816] > > [New LWP 1828] > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > Core was generated by `/usr/sbin/glusterfs -s sn-0.local --volfile-id > gluster/glustershd -p /var/run/g'. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0? 0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, > iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at > client-rpc-fops.c:2802 > > 2802 client-rpc-fops.c: No such file or directory. > > [Current thread is 1 (Thread 0x7f1b5f00c700 (LWP 1818))] > > Missing separate debuginfos, use: dnf debuginfo-install > rcp-pack-glusterfs-1.2.0_1_g54e6196-RCP2.wf29.x86_64 > > (gdb) bt > > #0? 0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, > iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at > client-rpc-fops.c:2802 > > #1? 0x00007f1b64553d47 in rpc_clnt_handle_reply (clnt=0x7f1b5808bbb0, > pollin=0x7f1b580c6620) at rpc-clnt.c:778 > > #2? 0x00007f1b645542e5 in rpc_clnt_notify (trans=0x7f1b5808bde0, > mydata=0x7f1b5808bbe0, event=RPC_TRANSPORT_MSG_RECEIVED, > data=0x7f1b580c6620) at rpc-clnt.c:971 > > #3? 0x00007f1b64550319 in rpc_transport_notify (this=0x7f1b5808bde0, > event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f1b580c6620) at > rpc-transport.c:538 > > #4? 0x00007f1b5f49734d in socket_event_poll_in (this=0x7f1b5808bde0, > notify_handled=_gf_true) at socket.c:2315 > > #5? 0x00007f1b5f497992 in socket_event_handler (fd=25, idx=15, gen=7, > data=0x7f1b5808bde0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471 > > #6? 0x00007f1b647fe5ac in event_dispatch_epoll_handler > (event_pool=0x230cb00, event=0x7f1b5f00be84) at event-epoll.c:583 > > #7? 0x00007f1b647fe883 in event_dispatch_epoll_worker (data=0x23543d0) > at event-epoll.c:659 > > #8? 0x00007f1b6354a5da in start_thread () from /lib64/libpthread.so.0 > > #9? 0x00007f1b62e20cbf in clone () from /lib64/libc.so.6 > > *(gdb) print *(call_frame_t*)myframe* > > *$1 = {root = 0x100000000, parent = 0x100000005, frames = {next = > 0x7f1b4401c8a8, prev = 0x7f1b44010190}, local = 0x0, this = 0x0, ret = > 0x0, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {* > > *__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, > __spins = 0, __elision = 0, __list = {__prev = 0x7f1b44010190, __next > = 0x0}}, * > > *??????__size = '\000' <repeats 24 times>, > "\220\001\001D\033\177\000\000\000\000\000\000\000\000\000", __align = > 0}}, cookie = 0x7f1b4401ccf0, complete = _gf_false, op = GF_FOP_NULL, > begin = {* > > *tv_sec = 139755081730912, tv_usec = 139755081785872}, end = {tv_sec = > 448811404, tv_usec = 21474836481}, wind_from = 0x0, wind_to = 0x0, > unwind_from = 0x0, unwind_to = 0x0}* > > (gdb) info thread > > ? Id?? Target Id Frame > > * 1??? Thread 0x7f1b5f00c700 (LWP 1818) 0x00007f1b5e5d7d24 in > client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, > myframe=0x7f1b4401c850) at client-rpc-fops.c:2802 > > ? 2??? Thread 0x7f1b64c83780 (LWP 1812) 0x00007f1b6354ba3d in > __pthread_timedjoin_ex () from /lib64/libpthread.so.0 > > ? 3??? Thread 0x7f1b61eae700 (LWP 1813) 0x00007f1b63554300 in > nanosleep () from /lib64/libpthread.so.0 > > ? 4??? Thread 0x7f1b5feaa700 (LWP 1817) 0x00007f1b635508ca in > pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > ? 5??? Thread 0x7f1b5ca2b700 (LWP 1966) 0x00007f1b62dee4b0 in > nanosleep () from /lib64/libc.so.6 > > ? 6??? Thread 0x7f1b4f7fe700 (LWP 1968) 0x00007f1b6355050c in > pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > ? 7??? Thread 0x7f1b4e7fc700 (LWP 1970) 0x00007f1b6355050c in > pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > ? 8??? Thread 0x7f1b4d7fa700 (LWP 1974) 0x00007f1b62dee4b0 in > nanosleep () from /lib64/libc.so.6 > > ? 9??? Thread 0x7f1b33fff700 (LWP 1976) 0x00007f1b62dee4b0 in > nanosleep () from /lib64/libc.so.6 > > ? 10?? Thread 0x7f1b616ad700 (LWP 1814) 0x00007f1b62d5fcbc in > sigtimedwait () from /lib64/libc.so.6 > > ? 11?? Thread 0x7f1b60eac700 (LWP 1815) 0x00007f1b62dee4b0 in > nanosleep () from /lib64/libc.so.6 > > ? 12?? Thread 0x7f1b606ab700 (LWP 1816) 0x00007f1b635508ca in > pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > > ? 13?? Thread 0x7f1b5d6d5700 (LWP 1828) 0x00007f1b62e20fe7 in > epoll_wait () from /lib64/libc.so.6 > > (gdb) quit > > The source code is like this, so from gdb it coredump because > frame->local is *NULL*!! > > From sn-0 journal log, > > Sep 26 16:04:40.034577 sn-0 systemd-coredump[2612]: Process 1812 > (glusterfs) of user 0 dumped core. > > ????????????????????????????????????????????????????Stack trace of > thread 1818: > > #0 ?0x00007f1b5e5d7d24 client3_3_lookup_cbk (client.so) > > #1? 0x00007f1b64553d47 rpc_clnt_handle_reply (libgfrpc.so.0) > > #2? 0x00007f1b645542e5 rpc_clnt_notify (libgfrpc.so.0) > > #3? 0x00007f1b64550319 rpc_transport_notify (libgfrpc.so.0) > > #4? 0x00007f1b5f49734d socket_event_poll_in (socket.so) > > ?????????????????????????????????#5? 0x00007f1b5f497992 > socket_event_handler (socket.so) > > #6? 0x00007f1b647fe5ac event_dispatch_epoll_handler (libglusterfs.so.0) > > ?????????#7? 0x00007f1b647fe883 event_dispatch_epoll_worker > (libglusterfs.so.0) > > #8? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > #9? 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1812: > > #0? 0x00007f1b6354ba3d __GI___pthread_timedjoin_ex (libpthread.so.0) > > #1? 0x00007f1b647feae1 event_dispatch_epoll (libglusterfs.so.0) > > ?????????????????????????????????#2? 0x00007f1b647c2703 event_dispatch > (libglusterfs.so.0) > > #3? 0x000000000040ab95 main (glusterfsd) > > #4? 0x00007f1b62d4baf7 __libc_start_main (libc.so.6) > > #5? 0x000000000040543a _start (glusterfsd) > > ????????????????????????????????????????????????????Stack trace of > thread 1813: > > #0? 0x00007f1b63554300 __nanosleep (libpthread.so.0) > > #1? 0x00007f1b647a04e5 gf_timer_proc (libglusterfs.so.0) > > ???????????????#2? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > #3? 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1817: > > #0? 0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2 > (libpthread.so.0) > > #1? 0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0) > > #2? 0x00007f1b647d9b7e syncenv_processor (libglusterfs.so.0) > > #3? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > ?????????????????????????????????????#4? 0x00007f1b62e20cbf __clone > (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1966: > > ?????????#0? 0x00007f1b62dee4b0 __nanosleep (libc.so.6) > > #1? 0x00007f1b62dee38a sleep (libc.so.6) > > #2? 0x00007f1b5e36970c afr_shd_index_healer (replicate.so) > > #3? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > #4? 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1968: > > #0? 0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) > > #1? 0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0) > > #2? 0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0) > > ??????????????????????#3? 0x00007f1b5e357cde > afr_selfheal_unlocked_discover_on (replicate.so) > > #4? 0x00007f1b5e357d80 afr_selfheal_unlocked_discover (replicate.so) > > ???????????#5? 0x00007f1b5e363bf8 __afr_selfheal_entry_prepare > (replicate.so) > > #6? 0x00007f1b5e3641c0 afr_selfheal_entry_dirent (replicate.so) > > #7? 0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so) > > #8? 0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so) > > #9? 0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so) > > #10 0x00007f1b5e365bba afr_selfheal_entry (replicate.so) > > #11 0x00007f1b5e35d250 afr_selfheal_do (replicate.so) > > ???????????????????????????????????#12 0x00007f1b5e35d346 afr_selfheal > (replicate.so) > > #13 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so) > > #14 0x00007f1b5e36900b afr_shd_index_heal (replicate.so) > > #15 0x00007f1b647ffed3 syncop_mt_dir_scan (libglusterfs.so.0) > > #16 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so) > > #17 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so) > > #18 0x00007f1b5e369697 afr_shd_index_healer (replicate.so) > > ???????????????????????????????????#19 0x00007f1b6354a5da start_thread > (libpthread.so.0) > > #20 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1970: > > #0? 0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) > > #1? 0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0) > > #2? 0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0) > > #3? 0x00007f1b5e357742 afr_selfheal_unlocked_lookup_on (replicate.so) > > #4? 0x00007f1b5e364204 afr_selfheal_entry_dirent (replicate.so) > > #5? 0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so) > > #6? 0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so) > > #7? 0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so) > > ???????????????????#8? 0x00007f1b5e365bba afr_selfheal_entry > (replicate.so) > > #9? 0x00007f1b5e35d250 afr_selfheal_do (replicate.so) > > #10 0x00007f1b5e35d346 afr_selfheal (replicate.so) > > #11 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so) > > #12 0x00007f1b5e36900b afr_shd_index_heal (replicate.so) > > ?????????????????????????????????????????#13 0x00007f1b647ffed3 > syncop_mt_dir_scan (libglusterfs.so.0) > > #14 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so) > > ?????????#15 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so) > > #16 0x00007f1b5e369697 afr_shd_index_healer (replicate.so) > > #17 0x00007f1b6354a5da start_thread (libpthread.so.0) > > #18 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1974: > > #0? 0x00007f1b62dee4b0 __nanosleep (libc.so.6) > > #1? 0x00007f1b62dee38a sleep (libc.so.6) > > #2? 0x00007f1b5e36970c afr_shd_index_healer (replicate.so) > > #3? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > #4? 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1976: > > #0? 0x00007f1b62dee4b0 __nanosleep (libc.so.6) > > ????????????????????????#1? 0x00007f1b62dee38a sleep (libc.so.6) > > #2? 0x00007f1b5e36970c afr_shd_index_healer (replicate.so) > > #3? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > #4? 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1814: > > #0? 0x00007f1b62d5fcbc __sigtimedwait (libc.so.6) > > #1? 0x00007f1b63554afc sigwait (libpthread.so.0) > > #2? 0x0000000000409ed7 glusterfs_sigwaiter (glusterfsd) > > #3? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > #4? 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1815: > > #0? 0x00007f1b62dee4b0 __nanosleep (libc.so.6) > > ????????????????????????????#1? 0x00007f1b62dee38a sleep (libc.so.6) > > #2? 0x00007f1b647c3f5c pool_sweeper (libglusterfs.so.0) > > #3? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > #4? 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1816: > > ???????????????????????????????????????????????????#0 > 0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0) > > #1? 0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0) > > ????????????????????????????????#2? 0x00007f1b647d9b7e > syncenv_processor (libglusterfs.so.0) > > #3? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > #4? 0x00007f1b62e20cbf __clone (libc.so.6) > > ????????????????????????????????????????????????????Stack trace of > thread 1828: > > #0? 0x00007f1b62e20fe7 epoll_wait (libc.so.6) > > #1? 0x00007f1b647fe855 event_dispatch_epoll_worker (libglusterfs.so.0) > > #2? 0x00007f1b6354a5da start_thread (libpthread.so.0) > > ??????????????????????????????????????#3? 0x00007f1b62e20cbf __clone > (libc.so.6) >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181016/99c98a2b/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 59028 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181016/99c98a2b/attachment.jpg>
Zhou, Cynthia (NSB - CN/Hangzhou)
2018-Oct-16 06:15 UTC
[Gluster-users] glustershd coredump generated while reboot all 3 sn nodes
Hi, Yes it is glusterfs3.12.3 I will create BZ and attach related coredump and glusterfs log cynthia From: Ravishankar N <ravishankar at redhat.com> Sent: Tuesday, October 16, 2018 12:23 PM To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com> Cc: gluster-users <gluster-users at gluster.org> Subject: Re: glustershd coredump generated while reboot all 3 sn nodes Hi, - Is this stock glusterfs-3.12.3? Or do you have any patches applied on top of it? - If it is stock, could you create a BZ and attach the core file and the /var/log/glusterfs/ logs from 3 nodes at the time of crash? Thanks, Ravi On 10/16/2018 08:45 AM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote: Hi, This issue happened twice recently, when glustershd do heal, it generate coredump occassinally, I do some debug and find that sometimes afr_selfheal_unlocked_discover_on do lookup and saved the frame in function rpc_clnt_submit, when reply comes, it find the saved frame , but the address is different from the saved frame address, I think this is wrong, but I can not find a clue how this happened? [root at mn-0:/home/robot] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs -s sn-0.local --volfile-id gluster/glustershd -p /var/run/g'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fb1a6fd9d24 in client3_3_lookup_cbk (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1, myframe=0x7fb188215740) at client-rpc-fops.c:2802 2802 client-rpc-fops.c: No such file or directory. [Current thread is 1 (Thread 0x7fb1a7a0e700 (LWP 8151))] Missing separate debuginfos, use: dnf debuginfo-install rcp-pack-glusterfs-1.2.0-RCP2.wf29.x86_64 (gdb) bt #0 0x00007fb1a6fd9d24 in client3_3_lookup_cbk (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1, myframe=0x7fb188215740) at client-rpc-fops.c:2802 #1 0x00007fb1acf55d47 in rpc_clnt_handle_reply (clnt=0x7fb1a008fff0, pollin=0x7fb1a0843910) at rpc-clnt.c:778 #2 0x00007fb1acf562e5 in rpc_clnt_notify (trans=0x7fb1a00901c0, mydata=0x7fb1a0090020, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fb1a0843910) at rpc-clnt.c:971 #3 0x00007fb1acf52319 in rpc_transport_notify (this=0x7fb1a00901c0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fb1a0843910) at rpc-transport.c:538 #4 0x00007fb1a7e9934d in socket_event_poll_in (this=0x7fb1a00901c0, notify_handled=_gf_true) at socket.c:2315 #5 0x00007fb1a7e99992 in socket_event_handler (fd=20, idx=14, gen=103, data=0x7fb1a00901c0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471 #6 0x00007fb1ad2005ac in event_dispatch_epoll_handler (event_pool=0x175fb00, event=0x7fb1a7a0de84) at event-epoll.c:583 #7 0x00007fb1ad200883 in event_dispatch_epoll_worker (data=0x17a73d0) at event-epoll.c:659 #8 0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0 #9 0x00007fb1ab822cbf in clone () from /lib64/libc.so.6 (gdb) info thread Id Target Id Frame * 1 Thread 0x7fb1a7a0e700 (LWP 8151) 0x00007fb1a6fd9d24 in client3_3_lookup_cbk (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1, myframe=0x7fb188215740) at client-rpc-fops.c:2802 2 Thread 0x7fb1aa0af700 (LWP 8147) 0x00007fb1ab761cbc in sigtimedwait () from /lib64/libc.so.6 3 Thread 0x7fb1a98ae700 (LWP 8148) 0x00007fb1ab7f04b0 in nanosleep () from /lib64/libc.so.6 4 Thread 0x7fb1957fa700 (LWP 8266) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 5 Thread 0x7fb1a88ac700 (LWP 8150) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 6 Thread 0x7fb17f7fe700 (LWP 8269) 0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 7 Thread 0x7fb1aa8b0700 (LWP 8146) 0x00007fb1abf56300 in nanosleep () from /lib64/libpthread.so.0 8 Thread 0x7fb1ad685780 (LWP 8145) 0x00007fb1abf4da3d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0 9 Thread 0x7fb1a542d700 (LWP 8251) 0x00007fb1ab7f04b0 in nanosleep () from /lib64/libc.so.6 10 Thread 0x7fb1a4c2c700 (LWP 8260) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 11 Thread 0x7fb196ffd700 (LWP 8263) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 12 Thread 0x7fb1a60d7700 (LWP 8247) 0x00007fb1ab822fe7 in epoll_wait () from /lib64/libc.so.6 13 Thread 0x7fb1a90ad700 (LWP 8149) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 (gdb) print (call_frame_t*)myframe $1 = (call_frame_t *) 0x7fb188215740 (gdb) print *(call_frame_t*)myframe $2 = {root = 0x7fb1a0085090, parent = 0xcd4642c4a3efd678, frames = {next = 0x151e2a92a5ae1bb, prev = 0x0}, local = 0x0, this = 0x0, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex = {__data = { __lock = 0, __count = 0, __owner = 0, __nusers = 4, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x7fb188215798, __next = 0x7fb188215798}}, __size = '\000' <repeats 12 times>, "\004", '\000' <repeats 11 times>, "\230W!\210\261\177\000\000\230W!\210\261\177\000", __align = 0}}, cookie = 0x7fb1882157a8, complete = (unknown: 2283886504), op = 32689, begin = {tv_sec = 140400469825464, tv_usec = 140400469825464}, end = {tv_sec = 140400878737576, tv_usec = 140400132101048}, wind_from = 0x7fb18801cdc0 "", wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0} (gdb) thread 6 [Switching to thread 6 (Thread 0x7fb17f7fe700 (LWP 8269))] #0 0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 #1 0x00007fb1ad1dc993 in __syncbarrier_wait (barrier=0x7fb188014790, waitfor=3) at syncop.c:1138 #2 0x00007fb1ad1dc9e4 in syncbarrier_wait (barrier=0x7fb188014790, waitfor=3) at syncop.c:1155 #3 0x00007fb1a6d59cde in afr_selfheal_unlocked_discover_on (frame=0x7fb1882162d0, inode=0x7fb188215740, gfid=0x7fb17f7fdb00 "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177", replies=0x7fb17f7fcf40, discover_on=0x7fb1a0084cb0 "\001\001\001", <incomplete sequence \360\255\272>) at afr-self-heal-common.c:1809 #4 0x00007fb1a6d59d80 in afr_selfheal_unlocked_discover (frame=0x7fb1882162d0, inode=0x7fb188215740, gfid=0x7fb17f7fdb00 "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177", replies=0x7fb17f7fcf40) at afr-self-heal-common.c:1828 #5 0x00007fb1a6d5e51f in afr_selfheal_unlocked_inspect (frame=0x7fb1882162d0, this=0x7fb1a001db40, gfid=0x7fb17f7fdb00 "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177", link_inode=0x7fb17f7fd9c8, data_selfheal=0x7fb17f7fd9c4, metadata_selfheal=0x7fb17f7fd9c0, entry_selfheal=0x7fb17f7fd9bc) at afr-self-heal-common.c:2241 #6 0x00007fb1a6d5f19b in afr_selfheal_do (frame=0x7fb1882162d0, this=0x7fb1a001db40, gfid=0x7fb17f7fdb00 "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177") at afr-self-heal-common.c:2483 #7 0x00007fb1a6d5f346 in afr_selfheal (this=0x7fb1a001db40, gfid=0x7fb17f7fdb00 "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177") at afr-self-heal-common.c:2543 #8 0x00007fb1a6d6ac5c in afr_shd_selfheal (healer=0x7fb1a0085640, child=0, gfid=0x7fb17f7fdb00 "x\326\357\243\304BF?\341Z*\251\342Q\001\060\333\177\177\261\177") at afr-self-heald.c:343 #9 0x00007fb1a6d6b00b in afr_shd_index_heal (subvol=0x7fb1a00171e0, entry=0x7fb1a0714180, parent=0x7fb17f7fddc0, data=0x7fb1a0085640) at afr-self-heald.c:440 #10 0x00007fb1ad201ed3 in syncop_mt_dir_scan (frame=0x7fb1a07a0e90, subvol=0x7fb1a00171e0, loc=0x7fb17f7fddc0, pid=-6, data=0x7fb1a0085640, fn=0x7fb1a6d6aebc <afr_shd_index_heal>, xdata=0x7fb1a07b4ed0, max_jobs=1, max_qlen=1024) at syncop-utils.c:407 #11 0x00007fb1a6d6b2b5 in afr_shd_index_sweep (healer=0x7fb1a0085640, vgfid=0x7fb1a6d93610 "glusterfs.xattrop_index_gfid") at afr-self-heald.c:494 #12 0x00007fb1a6d6b394 in afr_shd_index_sweep_all (healer=0x7fb1a0085640) at afr-self-heald.c:517 #13 0x00007fb1a6d6b697 in afr_shd_index_healer (data=0x7fb1a0085640) at afr-self-heald.c:597 #14 0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0 #15 0x00007fb1ab822cbf in clone () from /lib64/libc.so.6 From: Zhou, Cynthia (NSB - CN/Hangzhou) Sent: Thursday, October 11, 2018 3:36 PM To: Ravishankar N <ravishankar at redhat.com><mailto:ravishankar at redhat.com> Cc: gluster-users <gluster-users at gluster.org><mailto:gluster-users at gluster.org> Subject: glustershd coredump generated while reboot all 3 sn nodes Hi, I find that when restart sn node sometimes, the glustershd will exit and generate coredump. It has happened twice in my env, I would like to know your opinion on this issue, thanks! The glusterfs version I use is glusterfs3.12.3 [root at sn-1:/root] # gluster v info log Volume Name: log Type: Replicate Volume ID: 87bcbaf8-5fa4-4060-9149-23f832befe92 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: sn-0.local:/mnt/bricks/log/brick Brick2: sn-1.local:/mnt/bricks/log/brick Brick3: sn-2.local:/mnt/bricks/log/brick Options Reconfigured: server.allow-insecure: on cluster.quorum-type: auto network.ping-timeout: 42 cluster.consistent-metadata: on cluster.favorite-child-policy: mtime cluster.quorum-reads: no cluster.server-quorum-type: none transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.server-quorum-ratio: 51% [root at sn-1:/root] ///////////////////////////////////////////////glustershd coredump//////////////////////////////////////////////////////////////// # lz4 -d core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000.lz4 Decoding file core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000 core.glusterfs.0.c5f : decoded 263188480 bytes [root at sn-0:/mnt/export] # gdb /usr/sbin/glusterfs core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000 GNU gdb (GDB) Fedora 8.1-14.wf29 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/glusterfs...(no debugging symbols found)...done. warning: core file may not match specified executable file. [New LWP 1818] [New LWP 1812] [New LWP 1813] [New LWP 1817] [New LWP 1966] [New LWP 1968] [New LWP 1970] [New LWP 1974] [New LWP 1976] [New LWP 1814] [New LWP 1815] [New LWP 1816] [New LWP 1828] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs -s sn-0.local --volfile-id gluster/glustershd -p /var/run/g'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802 2802 client-rpc-fops.c: No such file or directory. [Current thread is 1 (Thread 0x7f1b5f00c700 (LWP 1818))] Missing separate debuginfos, use: dnf debuginfo-install rcp-pack-glusterfs-1.2.0_1_g54e6196-RCP2.wf29.x86_64 (gdb) bt #0 0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802 #1 0x00007f1b64553d47 in rpc_clnt_handle_reply (clnt=0x7f1b5808bbb0, pollin=0x7f1b580c6620) at rpc-clnt.c:778 #2 0x00007f1b645542e5 in rpc_clnt_notify (trans=0x7f1b5808bde0, mydata=0x7f1b5808bbe0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f1b580c6620) at rpc-clnt.c:971 #3 0x00007f1b64550319 in rpc_transport_notify (this=0x7f1b5808bde0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f1b580c6620) at rpc-transport.c:538 #4 0x00007f1b5f49734d in socket_event_poll_in (this=0x7f1b5808bde0, notify_handled=_gf_true) at socket.c:2315 #5 0x00007f1b5f497992 in socket_event_handler (fd=25, idx=15, gen=7, data=0x7f1b5808bde0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471 #6 0x00007f1b647fe5ac in event_dispatch_epoll_handler (event_pool=0x230cb00, event=0x7f1b5f00be84) at event-epoll.c:583 #7 0x00007f1b647fe883 in event_dispatch_epoll_worker (data=0x23543d0) at event-epoll.c:659 #8 0x00007f1b6354a5da in start_thread () from /lib64/libpthread.so.0 #9 0x00007f1b62e20cbf in clone () from /lib64/libc.so.6 (gdb) print *(call_frame_t*)myframe $1 = {root = 0x100000000, parent = 0x100000005, frames = {next = 0x7f1b4401c8a8, prev = 0x7f1b44010190}, local = 0x0, this = 0x0, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex = {__data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x7f1b44010190, __next = 0x0}}, __size = '\000' <repeats 24 times>, "\220\001\001D\033\177\000\000\000\000\000\000\000\000\000", __align = 0}}, cookie = 0x7f1b4401ccf0, complete = _gf_false, op = GF_FOP_NULL, begin = { tv_sec = 139755081730912, tv_usec = 139755081785872}, end = {tv_sec = 448811404, tv_usec = 21474836481}, wind_from = 0x0, wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0} (gdb) info thread Id Target Id Frame * 1 Thread 0x7f1b5f00c700 (LWP 1818) 0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802 2 Thread 0x7f1b64c83780 (LWP 1812) 0x00007f1b6354ba3d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0 3 Thread 0x7f1b61eae700 (LWP 1813) 0x00007f1b63554300 in nanosleep () from /lib64/libpthread.so.0 4 Thread 0x7f1b5feaa700 (LWP 1817) 0x00007f1b635508ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 5 Thread 0x7f1b5ca2b700 (LWP 1966) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6 6 Thread 0x7f1b4f7fe700 (LWP 1968) 0x00007f1b6355050c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 7 Thread 0x7f1b4e7fc700 (LWP 1970) 0x00007f1b6355050c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 8 Thread 0x7f1b4d7fa700 (LWP 1974) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6 9 Thread 0x7f1b33fff700 (LWP 1976) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6 10 Thread 0x7f1b616ad700 (LWP 1814) 0x00007f1b62d5fcbc in sigtimedwait () from /lib64/libc.so.6 11 Thread 0x7f1b60eac700 (LWP 1815) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6 12 Thread 0x7f1b606ab700 (LWP 1816) 0x00007f1b635508ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0 13 Thread 0x7f1b5d6d5700 (LWP 1828) 0x00007f1b62e20fe7 in epoll_wait () from /lib64/libc.so.6 (gdb) quit The source code is like this, so from gdb it coredump because frame->local is NULL!! [cid:image002.jpg at 01D46550.2FF32540] From sn-0 journal log, Sep 26 16:04:40.034577 sn-0 systemd-coredump[2612]: Process 1812 (glusterfs) of user 0 dumped core. Stack trace of thread 1818: #0 0x00007f1b5e5d7d24 client3_3_lookup_cbk (client.so) #1 0x00007f1b64553d47 rpc_clnt_handle_reply (libgfrpc.so.0) #2 0x00007f1b645542e5 rpc_clnt_notify (libgfrpc.so.0) #3 0x00007f1b64550319 rpc_transport_notify (libgfrpc.so.0) #4 0x00007f1b5f49734d socket_event_poll_in (socket.so) #5 0x00007f1b5f497992 socket_event_handler (socket.so) #6 0x00007f1b647fe5ac event_dispatch_epoll_handler (libglusterfs.so.0) #7 0x00007f1b647fe883 event_dispatch_epoll_worker (libglusterfs.so.0) #8 0x00007f1b6354a5da start_thread (libpthread.so.0) #9 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1812: #0 0x00007f1b6354ba3d __GI___pthread_timedjoin_ex (libpthread.so.0) #1 0x00007f1b647feae1 event_dispatch_epoll (libglusterfs.so.0) #2 0x00007f1b647c2703 event_dispatch (libglusterfs.so.0) #3 0x000000000040ab95 main (glusterfsd) #4 0x00007f1b62d4baf7 __libc_start_main (libc.so.6) #5 0x000000000040543a _start (glusterfsd) Stack trace of thread 1813: #0 0x00007f1b63554300 __nanosleep (libpthread.so.0) #1 0x00007f1b647a04e5 gf_timer_proc (libglusterfs.so.0) #2 0x00007f1b6354a5da start_thread (libpthread.so.0) #3 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1817: #0 0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> (libpthread.so.0) #1 0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0) #2 0x00007f1b647d9b7e syncenv_processor (libglusterfs.so.0) #3 0x00007f1b6354a5da start_thread (libpthread.so.0) #4 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1966: #0 0x00007f1b62dee4b0 __nanosleep (libc.so.6) #1 0x00007f1b62dee38a sleep (libc.so.6) #2 0x00007f1b5e36970c afr_shd_index_healer (replicate.so) #3 0x00007f1b6354a5da start_thread (libpthread.so.0) #4 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1968: #0 0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> (libpthread.so.0) #1 0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0) #2 0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0) #3 0x00007f1b5e357cde afr_selfheal_unlocked_discover_on (replicate.so) #4 0x00007f1b5e357d80 afr_selfheal_unlocked_discover (replicate.so) #5 0x00007f1b5e363bf8 __afr_selfheal_entry_prepare (replicate.so) #6 0x00007f1b5e3641c0 afr_selfheal_entry_dirent (replicate.so) #7 0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so) #8 0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so) #9 0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so) #10 0x00007f1b5e365bba afr_selfheal_entry (replicate.so) #11 0x00007f1b5e35d250 afr_selfheal_do (replicate.so) #12 0x00007f1b5e35d346 afr_selfheal (replicate.so) #13 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so) #14 0x00007f1b5e36900b afr_shd_index_heal (replicate.so) #15 0x00007f1b647ffed3 syncop_mt_dir_scan (libglusterfs.so.0) #16 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so) #17 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so) #18 0x00007f1b5e369697 afr_shd_index_healer (replicate.so) #19 0x00007f1b6354a5da start_thread (libpthread.so.0) #20 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1970: #0 0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> (libpthread.so.0) #1 0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0) #2 0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0) #3 0x00007f1b5e357742 afr_selfheal_unlocked_lookup_on (replicate.so) #4 0x00007f1b5e364204 afr_selfheal_entry_dirent (replicate.so) #5 0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so) #6 0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so) #7 0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so) #8 0x00007f1b5e365bba afr_selfheal_entry (replicate.so) #9 0x00007f1b5e35d250 afr_selfheal_do (replicate.so) #10 0x00007f1b5e35d346 afr_selfheal (replicate.so) #11 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so) #12 0x00007f1b5e36900b afr_shd_index_heal (replicate.so) #13 0x00007f1b647ffed3 syncop_mt_dir_scan (libglusterfs.so.0) #14 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so) #15 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so) #16 0x00007f1b5e369697 afr_shd_index_healer (replicate.so) #17 0x00007f1b6354a5da start_thread (libpthread.so.0) #18 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1974: #0 0x00007f1b62dee4b0 __nanosleep (libc.so.6) #1 0x00007f1b62dee38a sleep (libc.so.6) #2 0x00007f1b5e36970c afr_shd_index_healer (replicate.so) #3 0x00007f1b6354a5da start_thread (libpthread.so.0) #4 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1976: #0 0x00007f1b62dee4b0 __nanosleep (libc.so.6) #1 0x00007f1b62dee38a sleep (libc.so.6) #2 0x00007f1b5e36970c afr_shd_index_healer (replicate.so) #3 0x00007f1b6354a5da start_thread (libpthread.so.0) #4 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1814: #0 0x00007f1b62d5fcbc __sigtimedwait (libc.so.6) #1 0x00007f1b63554afc sigwait (libpthread.so.0) #2 0x0000000000409ed7 glusterfs_sigwaiter (glusterfsd) #3 0x00007f1b6354a5da start_thread (libpthread.so.0) #4 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1815: #0 0x00007f1b62dee4b0 __nanosleep (libc.so.6) #1 0x00007f1b62dee38a sleep (libc.so.6) #2 0x00007f1b647c3f5c pool_sweeper (libglusterfs.so.0) #3 0x00007f1b6354a5da start_thread (libpthread.so.0) #4 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1816: #0 0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> (libpthread.so.0) #1 0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0) #2 0x00007f1b647d9b7e syncenv_processor (libglusterfs.so.0) #3 0x00007f1b6354a5da start_thread (libpthread.so.0) #4 0x00007f1b62e20cbf __clone (libc.so.6) Stack trace of thread 1828: #0 0x00007f1b62e20fe7 epoll_wait (libc.so.6) #1 0x00007f1b647fe855 event_dispatch_epoll_worker (libglusterfs.so.0) #2 0x00007f1b6354a5da start_thread (libpthread.so.0) #3 0x00007f1b62e20cbf __clone (libc.so.6) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181016/3da6bd2b/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 58969 bytes Desc: image002.jpg URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181016/3da6bd2b/attachment-0001.jpg>