On Mon, 4 Sep 2017 at 20:04, Serkan ?oban <cobanserkan at gmail.com> wrote:> I have been using a 60 server 1560 brick 3.7.11 cluster without > problems for 1 years. I did not see this problem with it. > Note that this problem does not happen when I install packages & start > glusterd & peer probe and create the volumes. But after glusterd > restart. > > Also note that this still happens without any volumes. So it is not > related with brick count I think...The backtrace you shared earlier involves code path where all brick details are synced up. So I'd be really interested to see the backtrace of this when there are no volumes associated.> > On Mon, Sep 4, 2017 at 5:08 PM, Atin Mukherjee <amukherj at redhat.com> > wrote: > > > > > > On Mon, Sep 4, 2017 at 5:28 PM, Serkan ?oban <cobanserkan at gmail.com> > wrote: > >> > >> >1. On 80 nodes cluster, did you reboot only one node or multiple ones? > >> Tried both, result is same, but the logs/stacks are from stopping and > >> starting glusterd only on one server while others are running. > >> > >> >2. Are you sure that pstack output was always constantly pointing on > >> > strcmp being stuck? > >> It stays 70-80 minutes in %100 cpu consuming state, the stacks I send > >> is from first 5-10 minutes. I will capture stack traces with 10 > >> minutes waits and send them to you tomorrow. Also with 40 servers It > >> stays that way for 5 minutes and then returns to normal. > >> > >> >3. Are you absolutely sure even after few hours glusterd is stuck at > the > >> > same point? > >> It goes to normal state after 70-80 minutes and I can run cluster > >> commands after that. I will check this again to be sure.. > > > > > > So this is scalability issue you're hitting with current glusterd's > design. > > As I mentioned earlier, peer handshaking can be a really costly > operations > > based on you scale the cluster and hence you might experience a huge > delay > > in the node bringing up all the services and be operational. > > > >> > >> On Mon, Sep 4, 2017 at 1:43 PM, Atin Mukherjee <amukherj at redhat.com> > >> wrote: > >> > > >> > > >> > On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire <mchangir at redhat.com> > >> > wrote: > >> >> > >> >> Serkan, > >> >> I have gone through other mails in the mail thread as well but > >> >> responding > >> >> to this one specifically. > >> >> > >> >> Is this a source install or an RPM install ? > >> >> If this is an RPM install, could you please install the > >> >> glusterfs-debuginfo RPM and retry to capture the gdb backtrace. > >> >> > >> >> If this is a source install, then you'll need to configure the build > >> >> with > >> >> --enable-debug and reinstall and retry capturing the gdb backtrace. > >> >> > >> >> Having the debuginfo package or a debug build helps to resolve the > >> >> function names and/or line numbers. > >> >> -- > >> >> Milind > >> >> > >> >> > >> >> > >> >> On Thu, Aug 24, 2017 at 11:19 AM, Serkan ?oban < > cobanserkan at gmail.com> > >> >> wrote: > >> >>> > >> >>> Here you can find 10 stack trace samples from glusterd. I wait 10 > >> >>> seconds between each trace. > >> >>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0 > >> >>> > >> >>> Content of the first stack trace is here: > >> >>> > >> >>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)): > >> >>> #0 0x0000003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0 > >> >>> #1 0x000000303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0 > >> >>> #2 0x0000003aa5c07aa1 in start_thread () from > /lib64/libpthread.so.0 > >> >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)): > >> >>> #0 0x0000003aa5c0f585 in sigwait () from /lib64/libpthread.so.0 > >> >>> #1 0x000000000040643b in glusterfs_sigwaiter () > >> >>> #2 0x0000003aa5c07aa1 in start_thread () from > /lib64/libpthread.so.0 > >> >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)): > >> >>> #0 0x0000003aa58acc4d in nanosleep () from /lib64/libc.so.6 > >> >>> #1 0x0000003aa58acac0 in sleep () from /lib64/libc.so.6 > >> >>> #2 0x000000303f8528fb in pool_sweeper () from > >> >>> /usr/lib64/libglusterfs.so.0 > >> >>> #3 0x0000003aa5c07aa1 in start_thread () from > /lib64/libpthread.so.0 > >> >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)): > >> >>> #0 0x0000003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () > from > >> >>> /lib64/libpthread.so.0 > >> >>> #1 0x000000303f864afc in syncenv_task () from > >> >>> /usr/lib64/libglusterfs.so.0 > >> >>> #2 0x000000303f8729f0 in syncenv_processor () from > >> >>> /usr/lib64/libglusterfs.so.0 > >> >>> #3 0x0000003aa5c07aa1 in start_thread () from > /lib64/libpthread.so.0 > >> >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)): > >> >>> #0 0x0000003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () > from > >> >>> /lib64/libpthread.so.0 > >> >>> #1 0x000000303f864afc in syncenv_task () from > >> >>> /usr/lib64/libglusterfs.so.0 > >> >>> #2 0x000000303f8729f0 in syncenv_processor () from > >> >>> /usr/lib64/libglusterfs.so.0 > >> >>> #3 0x0000003aa5c07aa1 in start_thread () from > /lib64/libpthread.so.0 > >> >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)): > >> >>> #0 0x0000003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > >> >>> /lib64/libpthread.so.0 > >> >>> #1 0x00007f7a898a099b in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >>> #2 0x0000003aa5c07aa1 in start_thread () from > /lib64/libpthread.so.0 > >> >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >>> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)): > >> >>> #0 0x0000003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6 > >> >>> #1 0x000000303f82244a in ?? () from /usr/lib64/libglusterfs.so.0 > >> >>> #2 0x000000303f82433d in ?? () from /usr/lib64/libglusterfs.so.0 > >> >>> #3 0x000000303f8245f5 in dict_set () from > >> >>> /usr/lib64/libglusterfs.so.0 > >> >>> #4 0x000000303f82524c in dict_set_str () from > >> >>> /usr/lib64/libglusterfs.so.0 > >> >>> #5 0x00007f7a898da7fd in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >>> #6 0x00007f7a8981b0df in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >>> #7 0x00007f7a8981b47c in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >>> #8 0x00007f7a89831edf in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >>> #9 0x00007f7a897f28f7 in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >>> #10 0x00007f7a897f0bb9 in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >>> #11 0x00007f7a8984c89a in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >>> #12 0x00007f7a898323ee in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >>> #13 0x000000303f40fad5 in rpc_clnt_handle_reply () from > >> >>> /usr/lib64/libgfrpc.so.0 > >> >>> #14 0x000000303f410c85 in rpc_clnt_notify () from > >> >>> /usr/lib64/libgfrpc.so.0 > >> >>> #15 0x000000303f40bd68 in rpc_transport_notify () from > >> >>> /usr/lib64/libgfrpc.so.0 > >> >>> #16 0x00007f7a88a6fccd in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > >> >>> #17 0x00007f7a88a70ffe in ?? () from > >> >>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > >> >>> #18 0x000000303f887806 in ?? () from /usr/lib64/libglusterfs.so.0 > >> >>> #19 0x0000003aa5c07aa1 in start_thread () from > /lib64/libpthread.so.0 > >> >>> #20 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >>> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)): > >> >>> #0 0x0000003aa5c082fd in pthread_join () from > /lib64/libpthread.so.0 > >> >>> #1 0x000000303f8872d5 in ?? () from /usr/lib64/libglusterfs.so.0 > >> >>> #2 0x0000000000409020 in main () > >> > > >> > > >> > Serkan, > >> > > >> > If you could answer the following questions, that would help us to > debug > >> > this issue further: > >> > > >> > 1. On 80 nodes cluster, did you reboot only one node or multiple ones? > >> > 2. Are you sure that pstack output was always constantly pointing on > >> > strcmp > >> > being stuck? The reason I ask this is because on 80 nodes setup, > friend > >> > handshake operation would be very costly due to the existing design of > >> > glusterd following n square mesh communication approach and making > sure > >> > all > >> > the configuration data is consistent across and this is the exact > reason > >> > why > >> > we want to move to GlusterD2. > >> > 3. Are you absolutely sure even after few hours glusterd is stuck at > the > >> > same point? > >> > > >> > Looking at the backtrace, I don't find any reason why a strcmp will be > >> > stuck > >> > until and unless we're try to read through all the bricks (1600 X 3) X > >> > 79 > >> > times. > >> > > >> >>> > >> >>> On Wed, Aug 23, 2017 at 8:46 PM, Atin Mukherjee < > amukherj at redhat.com> > >> >>> wrote: > >> >>> > Could you be able to provide the pstack dump of the glusterd > >> >>> > process? > >> >>> > > >> >>> > On Wed, 23 Aug 2017 at 20:22, Atin Mukherjee <amukherj at redhat.com > > > >> >>> > wrote: > >> >>> >> > >> >>> >> Not yet. Gaurav will be taking a look at it tomorrow. > >> >>> >> > >> >>> >> On Wed, 23 Aug 2017 at 20:14, Serkan ?oban < > cobanserkan at gmail.com> > >> >>> >> wrote: > >> >>> >>> > >> >>> >>> Hi Atin, > >> >>> >>> > >> >>> >>> Do you have time to check the logs? > >> >>> >>> > >> >>> >>> On Wed, Aug 23, 2017 at 10:02 AM, Serkan ?oban > >> >>> >>> <cobanserkan at gmail.com> > >> >>> >>> wrote: > >> >>> >>> > Same thing happens with 3.12.rc0. This time perf top shows > >> >>> >>> > hanging > >> >>> >>> > in > >> >>> >>> > libglusterfs.so and below is the glusterd logs, which are > >> >>> >>> > different > >> >>> >>> > from 3.10. > >> >>> >>> > With 3.10.5, after 60-70 minutes CPU usage becomes normal and > we > >> >>> >>> > see > >> >>> >>> > brick processes come online and system starts to answer > commands > >> >>> >>> > like > >> >>> >>> > "gluster peer status".. > >> >>> >>> > > >> >>> >>> > [2017-08-23 06:46:02.150472] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.152181] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.152287] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.153503] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.153647] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.153866] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.153948] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154018] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154108] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154162] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154250] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154322] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154425] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154494] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154575] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154649] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154705] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154774] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154852] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154903] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.154995] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.155052] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:02.155141] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:27.074052] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > [2017-08-23 06:46:27.077034] E [client_t.c:324:gf_client_ref] > >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >>> >>> > [0x7f5ae2c091b1] > >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >>> >>> > [0x7f5ae2c0851c] > >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> >>> >>> > > >> >>> >>> > On Tue, Aug 22, 2017 at 7:00 PM, Serkan ?oban > >> >>> >>> > <cobanserkan at gmail.com> > >> >>> >>> > wrote: > >> >>> >>> >> I reboot multiple times, also I destroyed the gluster > >> >>> >>> >> configuration > >> >>> >>> >> and recreate multiple times. The behavior is same. > >> >>> >>> >> > >> >>> >>> >> On Tue, Aug 22, 2017 at 6:47 PM, Atin Mukherjee > >> >>> >>> >> <amukherj at redhat.com> > >> >>> >>> >> wrote: > >> >>> >>> >>> My guess is there is a corruption in vol list or peer list > >> >>> >>> >>> which > >> >>> >>> >>> has > >> >>> >>> >>> lead > >> >>> >>> >>> glusterd to get into a infinite loop of traversing a > >> >>> >>> >>> peer/volume > >> >>> >>> >>> list > >> >>> >>> >>> and > >> >>> >>> >>> CPU to hog up. Again this is a guess and I've not got a > chance > >> >>> >>> >>> to > >> >>> >>> >>> take a > >> >>> >>> >>> detail look at the logs and the strace output. > >> >>> >>> >>> > >> >>> >>> >>> I believe if you get to reboot the node again the problem > will > >> >>> >>> >>> disappear. > >> >>> >>> >>> > >> >>> >>> >>> On Tue, 22 Aug 2017 at 20:07, Serkan ?oban > >> >>> >>> >>> <cobanserkan at gmail.com> > >> >>> >>> >>> wrote: > >> >>> >>> >>>> > >> >>> >>> >>>> As an addition perf top shows %80 libc-2.12.so > __strcmp_sse42 > >> >>> >>> >>>> during > >> >>> >>> >>>> glusterd %100 cpu usage > >> >>> >>> >>>> Hope this helps... > >> >>> >>> >>>> > >> >>> >>> >>>> On Tue, Aug 22, 2017 at 2:41 PM, Serkan ?oban > >> >>> >>> >>>> <cobanserkan at gmail.com> > >> >>> >>> >>>> wrote: > >> >>> >>> >>>> > Hi there, > >> >>> >>> >>>> > > >> >>> >>> >>>> > I have a strange problem. > >> >>> >>> >>>> > Gluster version in 3.10.5, I am testing new servers. > >> >>> >>> >>>> > Gluster > >> >>> >>> >>>> > configuration is 16+4 EC, I have three volumes, each have > >> >>> >>> >>>> > 1600 > >> >>> >>> >>>> > bricks. > >> >>> >>> >>>> > I can successfully create the cluster and volumes without > >> >>> >>> >>>> > any > >> >>> >>> >>>> > problems. I write data to cluster from 100 clients for 12 > >> >>> >>> >>>> > hours > >> >>> >>> >>>> > again > >> >>> >>> >>>> > no problem. But when I try to reboot a node, glusterd > >> >>> >>> >>>> > process > >> >>> >>> >>>> > hangs on > >> >>> >>> >>>> > %100 CPU usage and seems to do nothing, no brick > processes > >> >>> >>> >>>> > come > >> >>> >>> >>>> > online. You can find strace of glusterd process for 1 > >> >>> >>> >>>> > minutes > >> >>> >>> >>>> > here: > >> >>> >>> >>>> > > >> >>> >>> >>>> > > >> >>> >>> >>>> > > >> >>> >>> >>>> > > https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0 > >> >>> >>> >>>> > > >> >>> >>> >>>> > Here is the glusterd logs: > >> >>> >>> >>>> > > https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0 > >> >>> >>> >>>> > > >> >>> >>> >>>> > > >> >>> >>> >>>> > By the way, reboot of one server completes without > problem > >> >>> >>> >>>> > if > >> >>> >>> >>>> > I > >> >>> >>> >>>> > reboot > >> >>> >>> >>>> > the servers before creating any volumes. > >> >>> >>> >>>> _______________________________________________ > >> >>> >>> >>>> Gluster-users mailing list > >> >>> >>> >>>> Gluster-users at gluster.org > >> >>> >>> >>>> http://lists.gluster.org/mailman/listinfo/gluster-users > >> >>> >>> >>> > >> >>> >>> >>> -- > >> >>> >>> >>> - Atin (atinm) > >> >>> >> > >> >>> >> -- > >> >>> >> - Atin (atinm) > >> >>> > > >> >>> > -- > >> >>> > - Atin (atinm) > >> >>> _______________________________________________ > >> >>> Gluster-users mailing list > >> >>> Gluster-users at gluster.org > >> >>> http://lists.gluster.org/mailman/listinfo/gluster-users > >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> Milind > >> >> > >> > > > > > >-- - Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170904/202046a6/attachment.html>
Some corrections about the previous mails. Problem does not happen when no volumes created. Problem happens volumes created but in stopped state. Problem also happens when volumes started state. Below is the 5 stack traces taken by 10 min intervals and volumes stopped state. --1-- Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): #0 0x0000003d99c0f00d in nanosleep () from /lib64/libpthread.so.0 #1 0x00007f4146312d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0 #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 #1 0x000000000040643b in glusterfs_sigwaiter () #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f413befb99b in hooks_worker () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f413a047700 (LWP 104256)): #0 0x00007f41462fd43d in dict_lookup_common () from /usr/lib64/libglusterfs.so.0 #1 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 #2 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 #3 0x00007f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0 #4 0x00007f413be75f29 in glusterd_add_volume_to_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #5 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #6 0x00007f413be8cedf in glusterd_rpc_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #7 0x00007f413be4d8f7 in glusterd_ac_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #8 0x00007f413be4bbb9 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #9 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #10 0x00007f413be8d3ee in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #11 0x00007f41460cfad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0 #12 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 #13 0x00007f41460cbd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #14 0x00007f413ae8dccd in socket_event_poll_in () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #15 0x00007f413ae8effe in socket_event_handler () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #16 0x00007f4146362806 in event_dispatch_epoll_worker () from /usr/lib64/libglusterfs.so.0 #17 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #18 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f41463622d5 in event_dispatch_epoll () from /usr/lib64/libglusterfs.so.0 #2 0x0000000000409020 in main () --2-- Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f41463405cb in __synclock_lock () from /usr/lib64/libglusterfs.so.0 #2 0x00007f41463407ae in synclock_lock () from /usr/lib64/libglusterfs.so.0 #3 0x00007f413be8d3df in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #4 0x00007f41460d04c4 in call_bail () from /usr/lib64/libgfrpc.so.0 #5 0x00007f4146312cca in gf_timer_proc () from /usr/lib64/libglusterfs.so.0 #6 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #7 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 #1 0x000000000040643b in glusterfs_sigwaiter () #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f413befb99b in hooks_worker () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f413a047700 (LWP 104256)): #0 0x0000003d99928692 in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00007f41462fd44a in dict_lookup_common () from /usr/lib64/libglusterfs.so.0 #2 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 #3 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 #4 0x00007f414630007c in dict_set_dynstr () from /usr/lib64/libglusterfs.so.0 #5 0x00007f4146300124 in dict_set_dynstr_with_alloc () from /usr/lib64/libglusterfs.so.0 #6 0x00007f413be7608d in glusterd_add_volume_to_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #7 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #8 0x00007f413be8cedf in glusterd_rpc_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #9 0x00007f413be4d8f7 in glusterd_ac_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #10 0x00007f413be4bbb9 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #11 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #12 0x00007f413be8d3ee in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #13 0x00007f41460cfad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0 #14 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 #15 0x00007f41460cbd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #16 0x00007f413ae8dccd in socket_event_poll_in () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #17 0x00007f413ae8effe in socket_event_handler () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #18 0x00007f4146362806 in event_dispatch_epoll_worker () from /usr/lib64/libglusterfs.so.0 #19 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #20 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f41463622d5 in event_dispatch_epoll () from /usr/lib64/libglusterfs.so.0 #2 0x0000000000409020 in main () --3-- Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): #0 0x0000003d99c0f00d in nanosleep () from /lib64/libpthread.so.0 #1 0x00007f4146312d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0 #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 #1 0x000000000040643b in glusterfs_sigwaiter () #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f413befb99b in hooks_worker () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f413a047700 (LWP 104256)): #0 0x0000003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00007f41462fd44a in dict_lookup_common () from /usr/lib64/libglusterfs.so.0 #2 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 #3 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 #4 0x00007f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0 #5 0x00007f413bf3583c in gd_add_brick_snap_details_to_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #6 0x00007f413be760df in glusterd_add_volume_to_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #7 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #8 0x00007f413be8cedf in glusterd_rpc_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #9 0x00007f413be4d8f7 in glusterd_ac_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #10 0x00007f413be4bbb9 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #11 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #12 0x00007f413be8d3ee in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #13 0x00007f41460cfad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0 #14 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 #15 0x00007f41460cbd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #16 0x00007f413ae8dccd in socket_event_poll_in () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #17 0x00007f413ae8effe in socket_event_handler () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #18 0x00007f4146362806 in event_dispatch_epoll_worker () from /usr/lib64/libglusterfs.so.0 #19 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #20 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f41463622d5 in event_dispatch_epoll () from /usr/lib64/libglusterfs.so.0 #2 0x0000000000409020 in main () --4-- Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): #0 0x0000003d99c0f00d in nanosleep () from /lib64/libpthread.so.0 #1 0x00007f4146312d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0 #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 #1 0x000000000040643b in glusterfs_sigwaiter () #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f413befb99b in hooks_worker () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f413a047700 (LWP 104256)): #0 0x0000003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00007f41462fd44a in dict_lookup_common () from /usr/lib64/libglusterfs.so.0 #2 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 #3 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 #4 0x00007f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0 #5 0x00007f413bf358c4 in gd_add_brick_snap_details_to_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #6 0x00007f413be760df in glusterd_add_volume_to_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #7 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #8 0x00007f413be8cedf in glusterd_rpc_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #9 0x00007f413be4d8f7 in glusterd_ac_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #10 0x00007f413be4bbb9 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #11 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #12 0x00007f413be8d3ee in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #13 0x00007f41460cfad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0 #14 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 #15 0x00007f41460cbd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #16 0x00007f413ae8dccd in socket_event_poll_in () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #17 0x00007f413ae8effe in socket_event_handler () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #18 0x00007f4146362806 in event_dispatch_epoll_worker () from /usr/lib64/libglusterfs.so.0 #19 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #20 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f41463622d5 in event_dispatch_epoll () from /usr/lib64/libglusterfs.so.0 #2 0x0000000000409020 in main () --5-- Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): #0 0x0000003d99c0f00d in nanosleep () from /lib64/libpthread.so.0 #1 0x00007f4146312d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0 #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 #1 0x000000000040643b in glusterfs_sigwaiter () #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f413befb99b in hooks_worker () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f413a047700 (LWP 104256)): #0 0x0000003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00007f41462fd44a in dict_lookup_common () from /usr/lib64/libglusterfs.so.0 #2 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 #3 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 #4 0x00007f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0 #5 0x00007f413bf357fd in gd_add_brick_snap_details_to_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #6 0x00007f413be760df in glusterd_add_volume_to_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #7 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #8 0x00007f413be8cedf in glusterd_rpc_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #9 0x00007f413be4d8f7 in glusterd_ac_friend_add () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #10 0x00007f413be4bbb9 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #11 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #12 0x00007f413be8d3ee in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so #13 0x00007f41460cfad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0 #14 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 #15 0x00007f41460cbd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #16 0x00007f413ae8dccd in socket_event_poll_in () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #17 0x00007f413ae8effe in socket_event_handler () from /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so #18 0x00007f4146362806 in event_dispatch_epoll_worker () from /usr/lib64/libglusterfs.so.0 #19 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 #20 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f41463622d5 in event_dispatch_epoll () from /usr/lib64/libglusterfs.so.0 #2 0x0000000000409020 in main () On Mon, Sep 4, 2017 at 5:50 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> > On Mon, 4 Sep 2017 at 20:04, Serkan ?oban <cobanserkan at gmail.com> wrote: >> >> I have been using a 60 server 1560 brick 3.7.11 cluster without >> problems for 1 years. I did not see this problem with it. >> Note that this problem does not happen when I install packages & start >> glusterd & peer probe and create the volumes. But after glusterd >> restart. >> >> Also note that this still happens without any volumes. So it is not >> related with brick count I think... > > > The backtrace you shared earlier involves code path where all brick details > are synced up. So I'd be really interested to see the backtrace of this when > there are no volumes associated. > >> >> >> On Mon, Sep 4, 2017 at 5:08 PM, Atin Mukherjee <amukherj at redhat.com> >> wrote: >> > >> > >> > On Mon, Sep 4, 2017 at 5:28 PM, Serkan ?oban <cobanserkan at gmail.com> >> > wrote: >> >> >> >> >1. On 80 nodes cluster, did you reboot only one node or multiple ones? >> >> Tried both, result is same, but the logs/stacks are from stopping and >> >> starting glusterd only on one server while others are running. >> >> >> >> >2. Are you sure that pstack output was always constantly pointing on >> >> > strcmp being stuck? >> >> It stays 70-80 minutes in %100 cpu consuming state, the stacks I send >> >> is from first 5-10 minutes. I will capture stack traces with 10 >> >> minutes waits and send them to you tomorrow. Also with 40 servers It >> >> stays that way for 5 minutes and then returns to normal. >> >> >> >> >3. Are you absolutely sure even after few hours glusterd is stuck at >> >> > the >> >> > same point? >> >> It goes to normal state after 70-80 minutes and I can run cluster >> >> commands after that. I will check this again to be sure.. >> > >> > >> > So this is scalability issue you're hitting with current glusterd's >> > design. >> > As I mentioned earlier, peer handshaking can be a really costly >> > operations >> > based on you scale the cluster and hence you might experience a huge >> > delay >> > in the node bringing up all the services and be operational. >> > >> >> >> >> On Mon, Sep 4, 2017 at 1:43 PM, Atin Mukherjee <amukherj at redhat.com> >> >> wrote: >> >> > >> >> > >> >> > On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire <mchangir at redhat.com> >> >> > wrote: >> >> >> >> >> >> Serkan, >> >> >> I have gone through other mails in the mail thread as well but >> >> >> responding >> >> >> to this one specifically. >> >> >> >> >> >> Is this a source install or an RPM install ? >> >> >> If this is an RPM install, could you please install the >> >> >> glusterfs-debuginfo RPM and retry to capture the gdb backtrace. >> >> >> >> >> >> If this is a source install, then you'll need to configure the build >> >> >> with >> >> >> --enable-debug and reinstall and retry capturing the gdb backtrace. >> >> >> >> >> >> Having the debuginfo package or a debug build helps to resolve the >> >> >> function names and/or line numbers. >> >> >> -- >> >> >> Milind >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Aug 24, 2017 at 11:19 AM, Serkan ?oban >> >> >> <cobanserkan at gmail.com> >> >> >> wrote: >> >> >>> >> >> >>> Here you can find 10 stack trace samples from glusterd. I wait 10 >> >> >>> seconds between each trace. >> >> >>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0 >> >> >>> >> >> >>> Content of the first stack trace is here: >> >> >>> >> >> >>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)): >> >> >>> #0 0x0000003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0 >> >> >>> #1 0x000000303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0 >> >> >>> #2 0x0000003aa5c07aa1 in start_thread () from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 >> >> >>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)): >> >> >>> #0 0x0000003aa5c0f585 in sigwait () from /lib64/libpthread.so.0 >> >> >>> #1 0x000000000040643b in glusterfs_sigwaiter () >> >> >>> #2 0x0000003aa5c07aa1 in start_thread () from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 >> >> >>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)): >> >> >>> #0 0x0000003aa58acc4d in nanosleep () from /lib64/libc.so.6 >> >> >>> #1 0x0000003aa58acac0 in sleep () from /lib64/libc.so.6 >> >> >>> #2 0x000000303f8528fb in pool_sweeper () from >> >> >>> /usr/lib64/libglusterfs.so.0 >> >> >>> #3 0x0000003aa5c07aa1 in start_thread () from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 >> >> >>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)): >> >> >>> #0 0x0000003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () >> >> >>> from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #1 0x000000303f864afc in syncenv_task () from >> >> >>> /usr/lib64/libglusterfs.so.0 >> >> >>> #2 0x000000303f8729f0 in syncenv_processor () from >> >> >>> /usr/lib64/libglusterfs.so.0 >> >> >>> #3 0x0000003aa5c07aa1 in start_thread () from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 >> >> >>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)): >> >> >>> #0 0x0000003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () >> >> >>> from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #1 0x000000303f864afc in syncenv_task () from >> >> >>> /usr/lib64/libglusterfs.so.0 >> >> >>> #2 0x000000303f8729f0 in syncenv_processor () from >> >> >>> /usr/lib64/libglusterfs.so.0 >> >> >>> #3 0x0000003aa5c07aa1 in start_thread () from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 >> >> >>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)): >> >> >>> #0 0x0000003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #1 0x00007f7a898a099b in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so >> >> >>> #2 0x0000003aa5c07aa1 in start_thread () from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 >> >> >>> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)): >> >> >>> #0 0x0000003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6 >> >> >>> #1 0x000000303f82244a in ?? () from /usr/lib64/libglusterfs.so.0 >> >> >>> #2 0x000000303f82433d in ?? () from /usr/lib64/libglusterfs.so.0 >> >> >>> #3 0x000000303f8245f5 in dict_set () from >> >> >>> /usr/lib64/libglusterfs.so.0 >> >> >>> #4 0x000000303f82524c in dict_set_str () from >> >> >>> /usr/lib64/libglusterfs.so.0 >> >> >>> #5 0x00007f7a898da7fd in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so >> >> >>> #6 0x00007f7a8981b0df in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so >> >> >>> #7 0x00007f7a8981b47c in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so >> >> >>> #8 0x00007f7a89831edf in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so >> >> >>> #9 0x00007f7a897f28f7 in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so >> >> >>> #10 0x00007f7a897f0bb9 in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so >> >> >>> #11 0x00007f7a8984c89a in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so >> >> >>> #12 0x00007f7a898323ee in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so >> >> >>> #13 0x000000303f40fad5 in rpc_clnt_handle_reply () from >> >> >>> /usr/lib64/libgfrpc.so.0 >> >> >>> #14 0x000000303f410c85 in rpc_clnt_notify () from >> >> >>> /usr/lib64/libgfrpc.so.0 >> >> >>> #15 0x000000303f40bd68 in rpc_transport_notify () from >> >> >>> /usr/lib64/libgfrpc.so.0 >> >> >>> #16 0x00007f7a88a6fccd in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so >> >> >>> #17 0x00007f7a88a70ffe in ?? () from >> >> >>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so >> >> >>> #18 0x000000303f887806 in ?? () from /usr/lib64/libglusterfs.so.0 >> >> >>> #19 0x0000003aa5c07aa1 in start_thread () from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #20 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 >> >> >>> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)): >> >> >>> #0 0x0000003aa5c082fd in pthread_join () from >> >> >>> /lib64/libpthread.so.0 >> >> >>> #1 0x000000303f8872d5 in ?? () from /usr/lib64/libglusterfs.so.0 >> >> >>> #2 0x0000000000409020 in main () >> >> > >> >> > >> >> > Serkan, >> >> > >> >> > If you could answer the following questions, that would help us to >> >> > debug >> >> > this issue further: >> >> > >> >> > 1. On 80 nodes cluster, did you reboot only one node or multiple >> >> > ones? >> >> > 2. Are you sure that pstack output was always constantly pointing on >> >> > strcmp >> >> > being stuck? The reason I ask this is because on 80 nodes setup, >> >> > friend >> >> > handshake operation would be very costly due to the existing design >> >> > of >> >> > glusterd following n square mesh communication approach and making >> >> > sure >> >> > all >> >> > the configuration data is consistent across and this is the exact >> >> > reason >> >> > why >> >> > we want to move to GlusterD2. >> >> > 3. Are you absolutely sure even after few hours glusterd is stuck at >> >> > the >> >> > same point? >> >> > >> >> > Looking at the backtrace, I don't find any reason why a strcmp will >> >> > be >> >> > stuck >> >> > until and unless we're try to read through all the bricks (1600 X 3) >> >> > X >> >> > 79 >> >> > times. >> >> > >> >> >>> >> >> >>> On Wed, Aug 23, 2017 at 8:46 PM, Atin Mukherjee >> >> >>> <amukherj at redhat.com> >> >> >>> wrote: >> >> >>> > Could you be able to provide the pstack dump of the glusterd >> >> >>> > process? >> >> >>> > >> >> >>> > On Wed, 23 Aug 2017 at 20:22, Atin Mukherjee >> >> >>> > <amukherj at redhat.com> >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Not yet. Gaurav will be taking a look at it tomorrow. >> >> >>> >> >> >> >>> >> On Wed, 23 Aug 2017 at 20:14, Serkan ?oban >> >> >>> >> <cobanserkan at gmail.com> >> >> >>> >> wrote: >> >> >>> >>> >> >> >>> >>> Hi Atin, >> >> >>> >>> >> >> >>> >>> Do you have time to check the logs? >> >> >>> >>> >> >> >>> >>> On Wed, Aug 23, 2017 at 10:02 AM, Serkan ?oban >> >> >>> >>> <cobanserkan at gmail.com> >> >> >>> >>> wrote: >> >> >>> >>> > Same thing happens with 3.12.rc0. This time perf top shows >> >> >>> >>> > hanging >> >> >>> >>> > in >> >> >>> >>> > libglusterfs.so and below is the glusterd logs, which are >> >> >>> >>> > different >> >> >>> >>> > from 3.10. >> >> >>> >>> > With 3.10.5, after 60-70 minutes CPU usage becomes normal and >> >> >>> >>> > we >> >> >>> >>> > see >> >> >>> >>> > brick processes come online and system starts to answer >> >> >>> >>> > commands >> >> >>> >>> > like >> >> >>> >>> > "gluster peer status".. >> >> >>> >>> > >> >> >>> >>> > [2017-08-23 06:46:02.150472] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.152181] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.152287] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.153503] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.153647] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.153866] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.153948] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154018] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154108] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154162] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154250] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154322] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154425] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154494] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154575] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154649] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154705] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154774] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154852] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154903] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.154995] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.155052] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:02.155141] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:27.074052] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > [2017-08-23 06:46:27.077034] E [client_t.c:324:gf_client_ref] >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) >> >> >>> >>> > [0x7f5ae2c091b1] >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) >> >> >>> >>> > [0x7f5ae2c0851c] >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] >> >> >>> >>> > >> >> >>> >>> > On Tue, Aug 22, 2017 at 7:00 PM, Serkan ?oban >> >> >>> >>> > <cobanserkan at gmail.com> >> >> >>> >>> > wrote: >> >> >>> >>> >> I reboot multiple times, also I destroyed the gluster >> >> >>> >>> >> configuration >> >> >>> >>> >> and recreate multiple times. The behavior is same. >> >> >>> >>> >> >> >> >>> >>> >> On Tue, Aug 22, 2017 at 6:47 PM, Atin Mukherjee >> >> >>> >>> >> <amukherj at redhat.com> >> >> >>> >>> >> wrote: >> >> >>> >>> >>> My guess is there is a corruption in vol list or peer list >> >> >>> >>> >>> which >> >> >>> >>> >>> has >> >> >>> >>> >>> lead >> >> >>> >>> >>> glusterd to get into a infinite loop of traversing a >> >> >>> >>> >>> peer/volume >> >> >>> >>> >>> list >> >> >>> >>> >>> and >> >> >>> >>> >>> CPU to hog up. Again this is a guess and I've not got a >> >> >>> >>> >>> chance >> >> >>> >>> >>> to >> >> >>> >>> >>> take a >> >> >>> >>> >>> detail look at the logs and the strace output. >> >> >>> >>> >>> >> >> >>> >>> >>> I believe if you get to reboot the node again the problem >> >> >>> >>> >>> will >> >> >>> >>> >>> disappear. >> >> >>> >>> >>> >> >> >>> >>> >>> On Tue, 22 Aug 2017 at 20:07, Serkan ?oban >> >> >>> >>> >>> <cobanserkan at gmail.com> >> >> >>> >>> >>> wrote: >> >> >>> >>> >>>> >> >> >>> >>> >>>> As an addition perf top shows %80 libc-2.12.so >> >> >>> >>> >>>> __strcmp_sse42 >> >> >>> >>> >>>> during >> >> >>> >>> >>>> glusterd %100 cpu usage >> >> >>> >>> >>>> Hope this helps... >> >> >>> >>> >>>> >> >> >>> >>> >>>> On Tue, Aug 22, 2017 at 2:41 PM, Serkan ?oban >> >> >>> >>> >>>> <cobanserkan at gmail.com> >> >> >>> >>> >>>> wrote: >> >> >>> >>> >>>> > Hi there, >> >> >>> >>> >>>> > >> >> >>> >>> >>>> > I have a strange problem. >> >> >>> >>> >>>> > Gluster version in 3.10.5, I am testing new servers. >> >> >>> >>> >>>> > Gluster >> >> >>> >>> >>>> > configuration is 16+4 EC, I have three volumes, each >> >> >>> >>> >>>> > have >> >> >>> >>> >>>> > 1600 >> >> >>> >>> >>>> > bricks. >> >> >>> >>> >>>> > I can successfully create the cluster and volumes >> >> >>> >>> >>>> > without >> >> >>> >>> >>>> > any >> >> >>> >>> >>>> > problems. I write data to cluster from 100 clients for >> >> >>> >>> >>>> > 12 >> >> >>> >>> >>>> > hours >> >> >>> >>> >>>> > again >> >> >>> >>> >>>> > no problem. But when I try to reboot a node, glusterd >> >> >>> >>> >>>> > process >> >> >>> >>> >>>> > hangs on >> >> >>> >>> >>>> > %100 CPU usage and seems to do nothing, no brick >> >> >>> >>> >>>> > processes >> >> >>> >>> >>>> > come >> >> >>> >>> >>>> > online. You can find strace of glusterd process for 1 >> >> >>> >>> >>>> > minutes >> >> >>> >>> >>>> > here: >> >> >>> >>> >>>> > >> >> >>> >>> >>>> > >> >> >>> >>> >>>> > >> >> >>> >>> >>>> > >> >> >>> >>> >>>> > https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0 >> >> >>> >>> >>>> > >> >> >>> >>> >>>> > Here is the glusterd logs: >> >> >>> >>> >>>> > >> >> >>> >>> >>>> > https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0 >> >> >>> >>> >>>> > >> >> >>> >>> >>>> > >> >> >>> >>> >>>> > By the way, reboot of one server completes without >> >> >>> >>> >>>> > problem >> >> >>> >>> >>>> > if >> >> >>> >>> >>>> > I >> >> >>> >>> >>>> > reboot >> >> >>> >>> >>>> > the servers before creating any volumes. >> >> >>> >>> >>>> _______________________________________________ >> >> >>> >>> >>>> Gluster-users mailing list >> >> >>> >>> >>>> Gluster-users at gluster.org >> >> >>> >>> >>>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >>> >>> >>> >> >> >>> >>> >>> -- >> >> >>> >>> >>> - Atin (atinm) >> >> >>> >> >> >> >>> >> -- >> >> >>> >> - Atin (atinm) >> >> >>> > >> >> >>> > -- >> >> >>> > - Atin (atinm) >> >> >>> _______________________________________________ >> >> >>> Gluster-users mailing list >> >> >>> Gluster-users at gluster.org >> >> >>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Milind >> >> >> >> >> > >> > >> > > > -- > - Atin (atinm)
On Tue, Sep 5, 2017 at 6:13 PM, Serkan ?oban <cobanserkan at gmail.com> wrote:> Some corrections about the previous mails. Problem does not happen > when no volumes created. > Problem happens volumes created but in stopped state. Problem also > happens when volumes started state. > Below is the 5 stack traces taken by 10 min intervals and volumes stopped > state. >As I mentioned earlier, this is technically not a *hang* . Due to the costly handshaking operations for too many bricks from too many nodes, the glusterd takes a quite long amount of time to finish the handshake.> > --1-- > Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): > #0 0x0000003d99c0f00d in nanosleep () from /lib64/libpthread.so.0 > #1 0x00007f4146312d57 in gf_timer_proc () from > /usr/lib64/libglusterfs.so.0 > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): > #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 > #1 0x000000000040643b in glusterfs_sigwaiter () > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): > #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 > #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 > #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): > #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f413befb99b in hooks_worker () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 2 (Thread 0x7f413a047700 (LWP 104256)): > #0 0x00007f41462fd43d in dict_lookup_common () from > /usr/lib64/libglusterfs.so.0 > #1 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 > #3 0x00007f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0 > #4 0x00007f413be75f29 in glusterd_add_volume_to_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #5 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #6 0x00007f413be8cedf in glusterd_rpc_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #7 0x00007f413be4d8f7 in glusterd_ac_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #8 0x00007f413be4bbb9 in glusterd_friend_sm () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #9 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () > from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #10 0x00007f413be8d3ee in glusterd_big_locked_cbk () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #11 0x00007f41460cfad5 in rpc_clnt_handle_reply () from > /usr/lib64/libgfrpc.so.0 > #12 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 > #13 0x00007f41460cbd68 in rpc_transport_notify () from > /usr/lib64/libgfrpc.so.0 > #14 0x00007f413ae8dccd in socket_event_poll_in () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #15 0x00007f413ae8effe in socket_event_handler () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #16 0x00007f4146362806 in event_dispatch_epoll_worker () from > /usr/lib64/libglusterfs.so.0 > #17 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #18 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): > #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 > #1 0x00007f41463622d5 in event_dispatch_epoll () from > /usr/lib64/libglusterfs.so.0 > #2 0x0000000000409020 in main () > > --2-- > Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): > #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f41463405cb in __synclock_lock () from > /usr/lib64/libglusterfs.so.0 > #2 0x00007f41463407ae in synclock_lock () from > /usr/lib64/libglusterfs.so.0 > #3 0x00007f413be8d3df in glusterd_big_locked_cbk () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #4 0x00007f41460d04c4 in call_bail () from /usr/lib64/libgfrpc.so.0 > #5 0x00007f4146312cca in gf_timer_proc () from > /usr/lib64/libglusterfs.so.0 > #6 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #7 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): > #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 > #1 0x000000000040643b in glusterfs_sigwaiter () > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): > #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 > #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 > #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): > #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f413befb99b in hooks_worker () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 2 (Thread 0x7f413a047700 (LWP 104256)): > #0 0x0000003d99928692 in __strcmp_sse42 () from /lib64/libc.so.6 > #1 0x00007f41462fd44a in dict_lookup_common () from > /usr/lib64/libglusterfs.so.0 > #2 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 > #3 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 > #4 0x00007f414630007c in dict_set_dynstr () from > /usr/lib64/libglusterfs.so.0 > #5 0x00007f4146300124 in dict_set_dynstr_with_alloc () from > /usr/lib64/libglusterfs.so.0 > #6 0x00007f413be7608d in glusterd_add_volume_to_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #7 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #8 0x00007f413be8cedf in glusterd_rpc_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #9 0x00007f413be4d8f7 in glusterd_ac_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #10 0x00007f413be4bbb9 in glusterd_friend_sm () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #11 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () > from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #12 0x00007f413be8d3ee in glusterd_big_locked_cbk () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #13 0x00007f41460cfad5 in rpc_clnt_handle_reply () from > /usr/lib64/libgfrpc.so.0 > #14 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 > #15 0x00007f41460cbd68 in rpc_transport_notify () from > /usr/lib64/libgfrpc.so.0 > #16 0x00007f413ae8dccd in socket_event_poll_in () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #17 0x00007f413ae8effe in socket_event_handler () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #18 0x00007f4146362806 in event_dispatch_epoll_worker () from > /usr/lib64/libglusterfs.so.0 > #19 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #20 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): > #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 > #1 0x00007f41463622d5 in event_dispatch_epoll () from > /usr/lib64/libglusterfs.so.0 > #2 0x0000000000409020 in main () > > --3-- > Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): > #0 0x0000003d99c0f00d in nanosleep () from /lib64/libpthread.so.0 > #1 0x00007f4146312d57 in gf_timer_proc () from > /usr/lib64/libglusterfs.so.0 > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): > #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 > #1 0x000000000040643b in glusterfs_sigwaiter () > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): > #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 > #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 > #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): > #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f413befb99b in hooks_worker () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 2 (Thread 0x7f413a047700 (LWP 104256)): > #0 0x0000003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6 > #1 0x00007f41462fd44a in dict_lookup_common () from > /usr/lib64/libglusterfs.so.0 > #2 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 > #3 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 > #4 0x00007f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0 > #5 0x00007f413bf3583c in gd_add_brick_snap_details_to_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #6 0x00007f413be760df in glusterd_add_volume_to_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #7 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #8 0x00007f413be8cedf in glusterd_rpc_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #9 0x00007f413be4d8f7 in glusterd_ac_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #10 0x00007f413be4bbb9 in glusterd_friend_sm () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #11 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () > from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #12 0x00007f413be8d3ee in glusterd_big_locked_cbk () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #13 0x00007f41460cfad5 in rpc_clnt_handle_reply () from > /usr/lib64/libgfrpc.so.0 > #14 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 > #15 0x00007f41460cbd68 in rpc_transport_notify () from > /usr/lib64/libgfrpc.so.0 > #16 0x00007f413ae8dccd in socket_event_poll_in () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #17 0x00007f413ae8effe in socket_event_handler () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #18 0x00007f4146362806 in event_dispatch_epoll_worker () from > /usr/lib64/libglusterfs.so.0 > #19 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #20 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): > #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 > #1 0x00007f41463622d5 in event_dispatch_epoll () from > /usr/lib64/libglusterfs.so.0 > #2 0x0000000000409020 in main () > > --4-- > Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): > #0 0x0000003d99c0f00d in nanosleep () from /lib64/libpthread.so.0 > #1 0x00007f4146312d57 in gf_timer_proc () from > /usr/lib64/libglusterfs.so.0 > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): > #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 > #1 0x000000000040643b in glusterfs_sigwaiter () > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): > #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 > #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 > #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): > #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f413befb99b in hooks_worker () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 2 (Thread 0x7f413a047700 (LWP 104256)): > #0 0x0000003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6 > #1 0x00007f41462fd44a in dict_lookup_common () from > /usr/lib64/libglusterfs.so.0 > #2 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 > #3 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 > #4 0x00007f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0 > #5 0x00007f413bf358c4 in gd_add_brick_snap_details_to_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #6 0x00007f413be760df in glusterd_add_volume_to_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #7 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #8 0x00007f413be8cedf in glusterd_rpc_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #9 0x00007f413be4d8f7 in glusterd_ac_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #10 0x00007f413be4bbb9 in glusterd_friend_sm () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #11 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () > from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #12 0x00007f413be8d3ee in glusterd_big_locked_cbk () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #13 0x00007f41460cfad5 in rpc_clnt_handle_reply () from > /usr/lib64/libgfrpc.so.0 > #14 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 > #15 0x00007f41460cbd68 in rpc_transport_notify () from > /usr/lib64/libgfrpc.so.0 > #16 0x00007f413ae8dccd in socket_event_poll_in () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #17 0x00007f413ae8effe in socket_event_handler () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #18 0x00007f4146362806 in event_dispatch_epoll_worker () from > /usr/lib64/libglusterfs.so.0 > #19 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #20 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): > #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 > #1 0x00007f41463622d5 in event_dispatch_epoll () from > /usr/lib64/libglusterfs.so.0 > #2 0x0000000000409020 in main () > > --5-- > Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)): > #0 0x0000003d99c0f00d in nanosleep () from /lib64/libpthread.so.0 > #1 0x00007f4146312d57 in gf_timer_proc () from > /usr/lib64/libglusterfs.so.0 > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)): > #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 > #1 0x000000000040643b in glusterfs_sigwaiter () > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)): > #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 > #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 > #2 0x00007f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 4 (Thread 0x7f413cba3700 (LWP 104253)): > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0 > #2 0x00007f414634d9f0 in syncenv_processor () from > /usr/lib64/libglusterfs.so.0 > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 3 (Thread 0x7f413aa48700 (LWP 104255)): > #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f413befb99b in hooks_worker () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 2 (Thread 0x7f413a047700 (LWP 104256)): > #0 0x0000003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6 > #1 0x00007f41462fd44a in dict_lookup_common () from > /usr/lib64/libglusterfs.so.0 > #2 0x00007f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0 > #3 0x00007f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0 > #4 0x00007f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0 > #5 0x00007f413bf357fd in gd_add_brick_snap_details_to_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #6 0x00007f413be760df in glusterd_add_volume_to_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #7 0x00007f413be7647c in glusterd_add_volumes_to_export_dict () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #8 0x00007f413be8cedf in glusterd_rpc_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #9 0x00007f413be4d8f7 in glusterd_ac_friend_add () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #10 0x00007f413be4bbb9 in glusterd_friend_sm () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #11 0x00007f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk () > from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #12 0x00007f413be8d3ee in glusterd_big_locked_cbk () from > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > #13 0x00007f41460cfad5 in rpc_clnt_handle_reply () from > /usr/lib64/libgfrpc.so.0 > #14 0x00007f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 > #15 0x00007f41460cbd68 in rpc_transport_notify () from > /usr/lib64/libgfrpc.so.0 > #16 0x00007f413ae8dccd in socket_event_poll_in () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #17 0x00007f413ae8effe in socket_event_handler () from > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > #18 0x00007f4146362806 in event_dispatch_epoll_worker () from > /usr/lib64/libglusterfs.so.0 > #19 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > #20 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)): > #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 > #1 0x00007f41463622d5 in event_dispatch_epoll () from > /usr/lib64/libglusterfs.so.0 > #2 0x0000000000409020 in main () > > On Mon, Sep 4, 2017 at 5:50 PM, Atin Mukherjee <amukherj at redhat.com> > wrote: > > > > On Mon, 4 Sep 2017 at 20:04, Serkan ?oban <cobanserkan at gmail.com> wrote: > >> > >> I have been using a 60 server 1560 brick 3.7.11 cluster without > >> problems for 1 years. I did not see this problem with it. > >> Note that this problem does not happen when I install packages & start > >> glusterd & peer probe and create the volumes. But after glusterd > >> restart. > >> > >> Also note that this still happens without any volumes. So it is not > >> related with brick count I think... > > > > > > The backtrace you shared earlier involves code path where all brick > details > > are synced up. So I'd be really interested to see the backtrace of this > when > > there are no volumes associated. > > > >> > >> > >> On Mon, Sep 4, 2017 at 5:08 PM, Atin Mukherjee <amukherj at redhat.com> > >> wrote: > >> > > >> > > >> > On Mon, Sep 4, 2017 at 5:28 PM, Serkan ?oban <cobanserkan at gmail.com> > >> > wrote: > >> >> > >> >> >1. On 80 nodes cluster, did you reboot only one node or multiple > ones? > >> >> Tried both, result is same, but the logs/stacks are from stopping and > >> >> starting glusterd only on one server while others are running. > >> >> > >> >> >2. Are you sure that pstack output was always constantly pointing on > >> >> > strcmp being stuck? > >> >> It stays 70-80 minutes in %100 cpu consuming state, the stacks I send > >> >> is from first 5-10 minutes. I will capture stack traces with 10 > >> >> minutes waits and send them to you tomorrow. Also with 40 servers It > >> >> stays that way for 5 minutes and then returns to normal. > >> >> > >> >> >3. Are you absolutely sure even after few hours glusterd is stuck at > >> >> > the > >> >> > same point? > >> >> It goes to normal state after 70-80 minutes and I can run cluster > >> >> commands after that. I will check this again to be sure.. > >> > > >> > > >> > So this is scalability issue you're hitting with current glusterd's > >> > design. > >> > As I mentioned earlier, peer handshaking can be a really costly > >> > operations > >> > based on you scale the cluster and hence you might experience a huge > >> > delay > >> > in the node bringing up all the services and be operational. > >> > > >> >> > >> >> On Mon, Sep 4, 2017 at 1:43 PM, Atin Mukherjee <amukherj at redhat.com> > >> >> wrote: > >> >> > > >> >> > > >> >> > On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire < > mchangir at redhat.com> > >> >> > wrote: > >> >> >> > >> >> >> Serkan, > >> >> >> I have gone through other mails in the mail thread as well but > >> >> >> responding > >> >> >> to this one specifically. > >> >> >> > >> >> >> Is this a source install or an RPM install ? > >> >> >> If this is an RPM install, could you please install the > >> >> >> glusterfs-debuginfo RPM and retry to capture the gdb backtrace. > >> >> >> > >> >> >> If this is a source install, then you'll need to configure the > build > >> >> >> with > >> >> >> --enable-debug and reinstall and retry capturing the gdb > backtrace. > >> >> >> > >> >> >> Having the debuginfo package or a debug build helps to resolve the > >> >> >> function names and/or line numbers. > >> >> >> -- > >> >> >> Milind > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Thu, Aug 24, 2017 at 11:19 AM, Serkan ?oban > >> >> >> <cobanserkan at gmail.com> > >> >> >> wrote: > >> >> >>> > >> >> >>> Here you can find 10 stack trace samples from glusterd. I wait 10 > >> >> >>> seconds between each trace. > >> >> >>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_ > pstack.zip?dl=0 > >> >> >>> > >> >> >>> Content of the first stack trace is here: > >> >> >>> > >> >> >>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)): > >> >> >>> #0 0x0000003aa5c0f00d in nanosleep () from > /lib64/libpthread.so.0 > >> >> >>> #1 0x000000303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0 > >> >> >>> #2 0x0000003aa5c07aa1 in start_thread () from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >> >>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)): > >> >> >>> #0 0x0000003aa5c0f585 in sigwait () from /lib64/libpthread.so.0 > >> >> >>> #1 0x000000000040643b in glusterfs_sigwaiter () > >> >> >>> #2 0x0000003aa5c07aa1 in start_thread () from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >> >>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)): > >> >> >>> #0 0x0000003aa58acc4d in nanosleep () from /lib64/libc.so.6 > >> >> >>> #1 0x0000003aa58acac0 in sleep () from /lib64/libc.so.6 > >> >> >>> #2 0x000000303f8528fb in pool_sweeper () from > >> >> >>> /usr/lib64/libglusterfs.so.0 > >> >> >>> #3 0x0000003aa5c07aa1 in start_thread () from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >> >>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)): > >> >> >>> #0 0x0000003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () > >> >> >>> from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #1 0x000000303f864afc in syncenv_task () from > >> >> >>> /usr/lib64/libglusterfs.so.0 > >> >> >>> #2 0x000000303f8729f0 in syncenv_processor () from > >> >> >>> /usr/lib64/libglusterfs.so.0 > >> >> >>> #3 0x0000003aa5c07aa1 in start_thread () from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >> >>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)): > >> >> >>> #0 0x0000003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () > >> >> >>> from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #1 0x000000303f864afc in syncenv_task () from > >> >> >>> /usr/lib64/libglusterfs.so.0 > >> >> >>> #2 0x000000303f8729f0 in syncenv_processor () from > >> >> >>> /usr/lib64/libglusterfs.so.0 > >> >> >>> #3 0x0000003aa5c07aa1 in start_thread () from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >> >>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)): > >> >> >>> #0 0x0000003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #1 0x00007f7a898a099b in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >> >>> #2 0x0000003aa5c07aa1 in start_thread () from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >> >>> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)): > >> >> >>> #0 0x0000003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6 > >> >> >>> #1 0x000000303f82244a in ?? () from /usr/lib64/libglusterfs.so.0 > >> >> >>> #2 0x000000303f82433d in ?? () from /usr/lib64/libglusterfs.so.0 > >> >> >>> #3 0x000000303f8245f5 in dict_set () from > >> >> >>> /usr/lib64/libglusterfs.so.0 > >> >> >>> #4 0x000000303f82524c in dict_set_str () from > >> >> >>> /usr/lib64/libglusterfs.so.0 > >> >> >>> #5 0x00007f7a898da7fd in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >> >>> #6 0x00007f7a8981b0df in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >> >>> #7 0x00007f7a8981b47c in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >> >>> #8 0x00007f7a89831edf in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >> >>> #9 0x00007f7a897f28f7 in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >> >>> #10 0x00007f7a897f0bb9 in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >> >>> #11 0x00007f7a8984c89a in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >> >>> #12 0x00007f7a898323ee in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> >> >>> #13 0x000000303f40fad5 in rpc_clnt_handle_reply () from > >> >> >>> /usr/lib64/libgfrpc.so.0 > >> >> >>> #14 0x000000303f410c85 in rpc_clnt_notify () from > >> >> >>> /usr/lib64/libgfrpc.so.0 > >> >> >>> #15 0x000000303f40bd68 in rpc_transport_notify () from > >> >> >>> /usr/lib64/libgfrpc.so.0 > >> >> >>> #16 0x00007f7a88a6fccd in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > >> >> >>> #17 0x00007f7a88a70ffe in ?? () from > >> >> >>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > >> >> >>> #18 0x000000303f887806 in ?? () from /usr/lib64/libglusterfs.so.0 > >> >> >>> #19 0x0000003aa5c07aa1 in start_thread () from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #20 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> >> >>> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)): > >> >> >>> #0 0x0000003aa5c082fd in pthread_join () from > >> >> >>> /lib64/libpthread.so.0 > >> >> >>> #1 0x000000303f8872d5 in ?? () from /usr/lib64/libglusterfs.so.0 > >> >> >>> #2 0x0000000000409020 in main () > >> >> > > >> >> > > >> >> > Serkan, > >> >> > > >> >> > If you could answer the following questions, that would help us to > >> >> > debug > >> >> > this issue further: > >> >> > > >> >> > 1. On 80 nodes cluster, did you reboot only one node or multiple > >> >> > ones? > >> >> > 2. Are you sure that pstack output was always constantly pointing > on > >> >> > strcmp > >> >> > being stuck? The reason I ask this is because on 80 nodes setup, > >> >> > friend > >> >> > handshake operation would be very costly due to the existing design > >> >> > of > >> >> > glusterd following n square mesh communication approach and making > >> >> > sure > >> >> > all > >> >> > the configuration data is consistent across and this is the exact > >> >> > reason > >> >> > why > >> >> > we want to move to GlusterD2. > >> >> > 3. Are you absolutely sure even after few hours glusterd is stuck > at > >> >> > the > >> >> > same point? > >> >> > > >> >> > Looking at the backtrace, I don't find any reason why a strcmp will > >> >> > be > >> >> > stuck > >> >> > until and unless we're try to read through all the bricks (1600 X > 3) > >> >> > X > >> >> > 79 > >> >> > times. > >> >> > > >> >> >>> > >> >> >>> On Wed, Aug 23, 2017 at 8:46 PM, Atin Mukherjee > >> >> >>> <amukherj at redhat.com> > >> >> >>> wrote: > >> >> >>> > Could you be able to provide the pstack dump of the glusterd > >> >> >>> > process? > >> >> >>> > > >> >> >>> > On Wed, 23 Aug 2017 at 20:22, Atin Mukherjee > >> >> >>> > <amukherj at redhat.com> > >> >> >>> > wrote: > >> >> >>> >> > >> >> >>> >> Not yet. Gaurav will be taking a look at it tomorrow. > >> >> >>> >> > >> >> >>> >> On Wed, 23 Aug 2017 at 20:14, Serkan ?oban > >> >> >>> >> <cobanserkan at gmail.com> > >> >> >>> >> wrote: > >> >> >>> >>> > >> >> >>> >>> Hi Atin, > >> >> >>> >>> > >> >> >>> >>> Do you have time to check the logs? > >> >> >>> >>> > >> >> >>> >>> On Wed, Aug 23, 2017 at 10:02 AM, Serkan ?oban > >> >> >>> >>> <cobanserkan at gmail.com> > >> >> >>> >>> wrote: > >> >> >>> >>> > Same thing happens with 3.12.rc0. This time perf top shows > >> >> >>> >>> > hanging > >> >> >>> >>> > in > >> >> >>> >>> > libglusterfs.so and below is the glusterd logs, which are > >> >> >>> >>> > different > >> >> >>> >>> > from 3.10. > >> >> >>> >>> > With 3.10.5, after 60-70 minutes CPU usage becomes normal > and > >> >> >>> >>> > we > >> >> >>> >>> > see > >> >> >>> >>> > brick processes come online and system starts to answer > >> >> >>> >>> > commands > >> >> >>> >>> > like > >> >> >>> >>> > "gluster peer status".. > >> >> >>> >>> > > >> >> >>> >>> > [2017-08-23 06:46:02.150472] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.152181] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.152287] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.153503] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.153647] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.153866] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.153948] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154018] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154108] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154162] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154250] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154322] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154425] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154494] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154575] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154649] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154705] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154774] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154852] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154903] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.154995] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.155052] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:02.155141] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:27.074052] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > [2017-08-23 06:46:27.077034] E > [client_t.c:324:gf_client_ref] > >> >> >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> >> >>> >>> > [0x7f5ae2c091b1] > >> >> >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> >> >>> >>> > [0x7f5ae2c0851c] > >> >> >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> >> >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid > argument] > >> >> >>> >>> > > >> >> >>> >>> > On Tue, Aug 22, 2017 at 7:00 PM, Serkan ?oban > >> >> >>> >>> > <cobanserkan at gmail.com> > >> >> >>> >>> > wrote: > >> >> >>> >>> >> I reboot multiple times, also I destroyed the gluster > >> >> >>> >>> >> configuration > >> >> >>> >>> >> and recreate multiple times. The behavior is same. > >> >> >>> >>> >> > >> >> >>> >>> >> On Tue, Aug 22, 2017 at 6:47 PM, Atin Mukherjee > >> >> >>> >>> >> <amukherj at redhat.com> > >> >> >>> >>> >> wrote: > >> >> >>> >>> >>> My guess is there is a corruption in vol list or peer > list > >> >> >>> >>> >>> which > >> >> >>> >>> >>> has > >> >> >>> >>> >>> lead > >> >> >>> >>> >>> glusterd to get into a infinite loop of traversing a > >> >> >>> >>> >>> peer/volume > >> >> >>> >>> >>> list > >> >> >>> >>> >>> and > >> >> >>> >>> >>> CPU to hog up. Again this is a guess and I've not got a > >> >> >>> >>> >>> chance > >> >> >>> >>> >>> to > >> >> >>> >>> >>> take a > >> >> >>> >>> >>> detail look at the logs and the strace output. > >> >> >>> >>> >>> > >> >> >>> >>> >>> I believe if you get to reboot the node again the problem > >> >> >>> >>> >>> will > >> >> >>> >>> >>> disappear. > >> >> >>> >>> >>> > >> >> >>> >>> >>> On Tue, 22 Aug 2017 at 20:07, Serkan ?oban > >> >> >>> >>> >>> <cobanserkan at gmail.com> > >> >> >>> >>> >>> wrote: > >> >> >>> >>> >>>> > >> >> >>> >>> >>>> As an addition perf top shows %80 libc-2.12.so > >> >> >>> >>> >>>> __strcmp_sse42 > >> >> >>> >>> >>>> during > >> >> >>> >>> >>>> glusterd %100 cpu usage > >> >> >>> >>> >>>> Hope this helps... > >> >> >>> >>> >>>> > >> >> >>> >>> >>>> On Tue, Aug 22, 2017 at 2:41 PM, Serkan ?oban > >> >> >>> >>> >>>> <cobanserkan at gmail.com> > >> >> >>> >>> >>>> wrote: > >> >> >>> >>> >>>> > Hi there, > >> >> >>> >>> >>>> > > >> >> >>> >>> >>>> > I have a strange problem. > >> >> >>> >>> >>>> > Gluster version in 3.10.5, I am testing new servers. > >> >> >>> >>> >>>> > Gluster > >> >> >>> >>> >>>> > configuration is 16+4 EC, I have three volumes, each > >> >> >>> >>> >>>> > have > >> >> >>> >>> >>>> > 1600 > >> >> >>> >>> >>>> > bricks. > >> >> >>> >>> >>>> > I can successfully create the cluster and volumes > >> >> >>> >>> >>>> > without > >> >> >>> >>> >>>> > any > >> >> >>> >>> >>>> > problems. I write data to cluster from 100 clients for > >> >> >>> >>> >>>> > 12 > >> >> >>> >>> >>>> > hours > >> >> >>> >>> >>>> > again > >> >> >>> >>> >>>> > no problem. But when I try to reboot a node, glusterd > >> >> >>> >>> >>>> > process > >> >> >>> >>> >>>> > hangs on > >> >> >>> >>> >>>> > %100 CPU usage and seems to do nothing, no brick > >> >> >>> >>> >>>> > processes > >> >> >>> >>> >>>> > come > >> >> >>> >>> >>>> > online. You can find strace of glusterd process for 1 > >> >> >>> >>> >>>> > minutes > >> >> >>> >>> >>>> > here: > >> >> >>> >>> >>>> > > >> >> >>> >>> >>>> > > >> >> >>> >>> >>>> > > >> >> >>> >>> >>>> > > >> >> >>> >>> >>>> > https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_ > strace.out?dl=0 > >> >> >>> >>> >>>> > > >> >> >>> >>> >>>> > Here is the glusterd logs: > >> >> >>> >>> >>>> > > >> >> >>> >>> >>>> > https://www.dropbox.com/s/ > hkstb3mdeil9a5u/glusterd.log?dl=0 > >> >> >>> >>> >>>> > > >> >> >>> >>> >>>> > > >> >> >>> >>> >>>> > By the way, reboot of one server completes without > >> >> >>> >>> >>>> > problem > >> >> >>> >>> >>>> > if > >> >> >>> >>> >>>> > I > >> >> >>> >>> >>>> > reboot > >> >> >>> >>> >>>> > the servers before creating any volumes. > >> >> >>> >>> >>>> _______________________________________________ > >> >> >>> >>> >>>> Gluster-users mailing list > >> >> >>> >>> >>>> Gluster-users at gluster.org > >> >> >>> >>> >>>> http://lists.gluster.org/mailman/listinfo/gluster-users > >> >> >>> >>> >>> > >> >> >>> >>> >>> -- > >> >> >>> >>> >>> - Atin (atinm) > >> >> >>> >> > >> >> >>> >> -- > >> >> >>> >> - Atin (atinm) > >> >> >>> > > >> >> >>> > -- > >> >> >>> > - Atin (atinm) > >> >> >>> _______________________________________________ > >> >> >>> Gluster-users mailing list > >> >> >>> Gluster-users at gluster.org > >> >> >>> http://lists.gluster.org/mailman/listinfo/gluster-users > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Milind > >> >> >> > >> >> > > >> > > >> > > > > > -- > > - Atin (atinm) >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170905/f7207d34/attachment.html>