Artem Russakovskii
2019-Feb-02 20:14 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this? In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround. By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process. monit check: check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 start program = "/bin/mount /mnt/glusterfs_data1" stop program = "/bin/umount /mnt/glusterfs_data1" if space usage > 90% for 5 times within 15 cycles then alert else if succeeded for 10 cycles then alert stack trace: [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427] The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] pending frames: frame : type(1) op(LOOKUP) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 6 time of crash: 2019-02-01 23:22:03 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.3 /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <archon810 at gmail.com> wrote:> Hi, > > The first (and so far only) crash happened at 2am the next day after we > upgraded, on only one of four servers and only to one of two mounts. > > I have no idea what caused it, but yeah, we do have a pretty busy site ( > apkmirror.com), and it caused a disruption for any uploads or downloads > from that server until I woke up and fixed the mount. > > I wish I could be more helpful but all I have is that stack trace. > > I'm glad it's a blocker and will hopefully be resolved soon. > > On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > >> Hi Artem, >> >> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a >> clone of other bugs where recent discussions happened), and marked it as a >> blocker for glusterfs-5.4 release. >> >> We already have fixes for log flooding - https://review.gluster.org/22128, >> and are the process of identifying and fixing the issue seen with crash. >> >> Can you please tell if the crashes happened as soon as upgrade ? or was >> there any particular pattern you observed before the crash. >> >> -Amar >> >> >> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <archon810 at gmail.com> >> wrote: >> >>> Within 24 hours after updating from rock solid 4.1 to 5.3, I already got >>> a crash which others have mentioned in >>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount, >>> kill gluster, and remount: >>> >>> >>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>> [0x7fcccafcd329] >>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>> [0x7fcccafcd329] >>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>> [0x7fcccafcd329] >>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>> [0x7fcccafcd329] >>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>> The message "I [MSGID: 108031] >>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>> The message "E [MSGID: 101191] >>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>> [2019-01-31 09:38:04.696993] >>> pending frames: >>> frame : type(1) op(READ) >>> frame : type(1) op(OPEN) >>> frame : type(0) op(0) >>> patchset: git://git.gluster.org/glusterfs.git >>> signal received: 6 >>> time of crash: >>> 2019-01-31 09:38:04 >>> configuration details: >>> argp 1 >>> backtrace 1 >>> dlfcn 1 >>> libpthread 1 >>> llistxattr 1 >>> setfsid 1 >>> spinlock 1 >>> epoll.h 1 >>> xattr.h 1 >>> st_atim.tv_nsec 1 >>> package-string: glusterfs 5.3 >>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>> >>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>> >>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>> --------- >>> >>> Do the pending patches fix the crash or only the repeated warnings? I'm >>> running glusterfs on OpenSUSE 15.0 installed via >>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>> not too sure how to make it core dump. >>> >>> If it's not fixed by the patches above, has anyone already opened a >>> ticket for the crashes that I can join and monitor? This is going to create >>> a massive problem for us since production systems are crashing. >>> >>> Thanks. >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>> <http://www.apkmirror.com/>, Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>> <http://twitter.com/ArtemR> >>> >>> >>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>> rgowdapp at redhat.com> wrote: >>> >>>> >>>> >>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <archon810 at gmail.com> >>>> wrote: >>>> >>>>> Also, not sure if related or not, but I got a ton of these "Failed to >>>>> dispatch handler" in my logs as well. Many people have been commenting >>>>> about this issue here >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>> >>>> >>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this. >>>> >>>> >>>>> ==> mnt-SITE_data1.log <=>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>> [0x7fd966fcd329] >>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>> ==> mnt-SITE_data3.log <=>>>>>> The message "E [MSGID: 101191] >>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>> [2019-01-30 20:38:20.015593] >>>>>> The message "I [MSGID: 108031] >>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>> ==> mnt-SITE_data1.log <=>>>>>> The message "I [MSGID: 108031] >>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>> The message "E [MSGID: 101191] >>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>> [2019-01-30 20:38:20.546355] >>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>> selecting local read_child SITE_data1-client-0 >>>>>> ==> mnt-SITE_data3.log <=>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>> selecting local read_child SITE_data3-client-0 >>>>>> ==> mnt-SITE_data1.log <=>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>> handler >>>>> >>>>> >>>>> I'm hoping raising the issue here on the mailing list may bring some >>>>> additional eyeballs and get them both fixed. >>>>> >>>>> Thanks. >>>>> >>>>> Sincerely, >>>>> Artem >>>>> >>>>> -- >>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>> beerpla.net | +ArtemRussakovskii >>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>> <http://twitter.com/ArtemR> >>>>> >>>>> >>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>> archon810 at gmail.com> wrote: >>>>> >>>>>> I found a similar issue here: >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a >>>>>> comment from 3 days ago from someone else with 5.3 who started seeing the >>>>>> spam. >>>>>> >>>>>> Here's the command that repeats over and over: >>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>> [0x7fd966fcd329] >>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>> >>>>> >>>> +Milind Changire <mchangir at redhat.com> Can you check why this message >>>> is logged and send a fix? >>>> >>>> >>>>>> Is there any fix for this issue? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Amar Tumballi (amarts) >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190202/97676b19/attachment.html>
Artem Russakovskii
2019-Feb-04 23:59 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
The fuse crash happened two more times, but this time monit helped recover within 1 minute, so it's a great workaround for now. What's odd is that the crashes are only happening on one of 4 servers, and I don't know why. Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <archon810 at gmail.com> wrote:> The fuse crash happened again yesterday, to another volume. Are there any > mount options that could help mitigate this? > > In the meantime, I set up a monit (https://mmonit.com/monit/) task to > watch and restart the mount, which works and recovers the mount point > within a minute. Not ideal, but a temporary workaround. > > By the way, the way to reproduce this "Transport endpoint is not > connected" condition for testing purposes is to kill -9 the right > "glusterfs --process-name fuse" process. > > > monit check: > check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 > start program = "/bin/mount /mnt/glusterfs_data1" > stop program = "/bin/umount /mnt/glusterfs_data1" > if space usage > 90% for 5 times within 15 cycles > then alert else if succeeded for 10 cycles then alert > > > stack trace: > [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] > (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) > [0x7fa0249e4329] > -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) > [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) > [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] > [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] > (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) > [0x7fa0249e4329] > -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) > [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) > [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] > The message "E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler" repeated 26 times between [2019-02-01 23:21:20.857333] and > [2019-02-01 23:21:56.164427] > The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] > 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" > repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 > 23:22:03.474036] > pending frames: > frame : type(1) op(LOOKUP) > frame : type(0) op(0) > patchset: git://git.gluster.org/glusterfs.git > signal received: 6 > time of crash: > 2019-02-01 23:22:03 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 5.3 > /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] > /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] > /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] > /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] > /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] > /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] > /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] > > /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] > > /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] > > /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] > /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] > /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] > /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] > /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] > /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] > /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] > /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] > > Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > <http://twitter.com/ArtemR> > > > On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <archon810 at gmail.com> > wrote: > >> Hi, >> >> The first (and so far only) crash happened at 2am the next day after we >> upgraded, on only one of four servers and only to one of two mounts. >> >> I have no idea what caused it, but yeah, we do have a pretty busy site ( >> apkmirror.com), and it caused a disruption for any uploads or downloads >> from that server until I woke up and fixed the mount. >> >> I wish I could be more helpful but all I have is that stack trace. >> >> I'm glad it's a blocker and will hopefully be resolved soon. >> >> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >> atumball at redhat.com> wrote: >> >>> Hi Artem, >>> >>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a >>> clone of other bugs where recent discussions happened), and marked it as a >>> blocker for glusterfs-5.4 release. >>> >>> We already have fixes for log flooding - >>> https://review.gluster.org/22128, and are the process of identifying >>> and fixing the issue seen with crash. >>> >>> Can you please tell if the crashes happened as soon as upgrade ? or was >>> there any particular pattern you observed before the crash. >>> >>> -Amar >>> >>> >>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <archon810 at gmail.com> >>> wrote: >>> >>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I already >>>> got a crash which others have mentioned in >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to >>>> unmount, kill gluster, and remount: >>>> >>>> >>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>> [0x7fcccafcd329] >>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>> [0x7fcccafcd329] >>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>> [0x7fcccafcd329] >>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>> [0x7fcccafcd329] >>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>> The message "I [MSGID: 108031] >>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>> The message "E [MSGID: 101191] >>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>> [2019-01-31 09:38:04.696993] >>>> pending frames: >>>> frame : type(1) op(READ) >>>> frame : type(1) op(OPEN) >>>> frame : type(0) op(0) >>>> patchset: git://git.gluster.org/glusterfs.git >>>> signal received: 6 >>>> time of crash: >>>> 2019-01-31 09:38:04 >>>> configuration details: >>>> argp 1 >>>> backtrace 1 >>>> dlfcn 1 >>>> libpthread 1 >>>> llistxattr 1 >>>> setfsid 1 >>>> spinlock 1 >>>> epoll.h 1 >>>> xattr.h 1 >>>> st_atim.tv_nsec 1 >>>> package-string: glusterfs 5.3 >>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>> >>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>> >>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>> >>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>> --------- >>>> >>>> Do the pending patches fix the crash or only the repeated warnings? I'm >>>> running glusterfs on OpenSUSE 15.0 installed via >>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>> not too sure how to make it core dump. >>>> >>>> If it's not fixed by the patches above, has anyone already opened a >>>> ticket for the crashes that I can join and monitor? This is going to create >>>> a massive problem for us since production systems are crashing. >>>> >>>> Thanks. >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>> <http://twitter.com/ArtemR> >>>> >>>> >>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>> rgowdapp at redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>> archon810 at gmail.com> wrote: >>>>> >>>>>> Also, not sure if related or not, but I got a ton of these "Failed to >>>>>> dispatch handler" in my logs as well. Many people have been commenting >>>>>> about this issue here >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>> >>>>> >>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this. >>>>> >>>>> >>>>>> ==> mnt-SITE_data1.log <=>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>> [0x7fd966fcd329] >>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>> ==> mnt-SITE_data3.log <=>>>>>>> The message "E [MSGID: 101191] >>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>> [2019-01-30 20:38:20.015593] >>>>>>> The message "I [MSGID: 108031] >>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>> ==> mnt-SITE_data1.log <=>>>>>>> The message "I [MSGID: 108031] >>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>> The message "E [MSGID: 101191] >>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>> [2019-01-30 20:38:20.546355] >>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>> ==> mnt-SITE_data3.log <=>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>> ==> mnt-SITE_data1.log <=>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>> handler >>>>>> >>>>>> >>>>>> I'm hoping raising the issue here on the mailing list may bring some >>>>>> additional eyeballs and get them both fixed. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>>> >>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>> archon810 at gmail.com> wrote: >>>>>> >>>>>>> I found a similar issue here: >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a >>>>>>> comment from 3 days ago from someone else with 5.3 who started seeing the >>>>>>> spam. >>>>>>> >>>>>>> Here's the command that repeats over and over: >>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>> [0x7fd966fcd329] >>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>> >>>>>> >>>>> +Milind Changire <mchangir at redhat.com> Can you check why this message >>>>> is logged and send a fix? >>>>> >>>>> >>>>>>> Is there any fix for this issue? >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> Amar Tumballi (amarts) >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/eeabb742/attachment.html>