Artem Russakovskii
2019-Feb-07 21:28 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
I've added the lru-limit=0 parameter to the mounts, and I see it's taken effect correctly: "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" Let's see if it stops crashing or not. Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <archon810 at gmail.com> wrote:> Hi Nithya, > > Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing > crashes, and no further releases have been made yet. > > volume info: > Type: Replicate > Volume ID: ****SNIP**** > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 4 = 4 > Transport-type: tcp > Bricks: > Brick1: ****SNIP**** > Brick2: ****SNIP**** > Brick3: ****SNIP**** > Brick4: ****SNIP**** > Options Reconfigured: > cluster.quorum-count: 1 > cluster.quorum-type: fixed > network.ping-timeout: 5 > network.remote-dio: enable > performance.rda-cache-limit: 256MB > performance.readdir-ahead: on > performance.parallel-readdir: on > network.inode-lru-limit: 500000 > performance.md-cache-timeout: 600 > performance.cache-invalidation: on > performance.stat-prefetch: on > features.cache-invalidation-timeout: 600 > features.cache-invalidation: on > cluster.readdir-optimize: on > performance.io-thread-count: 32 > server.event-threads: 4 > client.event-threads: 4 > performance.read-ahead: off > cluster.lookup-optimize: on > performance.cache-size: 1GB > cluster.self-heal-daemon: enable > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: on > cluster.granular-entry-heal: enable > cluster.data-self-heal-algorithm: full > > Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > <http://twitter.com/ArtemR> > > > On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <nbalacha at redhat.com> > wrote: > >> Hi Artem, >> >> Do you still see the crashes with 5.3? If yes, please try mount the >> volume using the mount option lru-limit=0 and see if that helps. We are >> looking into the crashes and will update when have a fix. >> >> Also, please provide the gluster volume info for the volume in question. >> >> >> regards, >> Nithya >> >> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon810 at gmail.com> >> wrote: >> >>> The fuse crash happened two more times, but this time monit helped >>> recover within 1 minute, so it's a great workaround for now. >>> >>> What's odd is that the crashes are only happening on one of 4 servers, >>> and I don't know why. >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>> <http://www.apkmirror.com/>, Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>> <http://twitter.com/ArtemR> >>> >>> >>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <archon810 at gmail.com> >>> wrote: >>> >>>> The fuse crash happened again yesterday, to another volume. Are there >>>> any mount options that could help mitigate this? >>>> >>>> In the meantime, I set up a monit (https://mmonit.com/monit/) task to >>>> watch and restart the mount, which works and recovers the mount point >>>> within a minute. Not ideal, but a temporary workaround. >>>> >>>> By the way, the way to reproduce this "Transport endpoint is not >>>> connected" condition for testing purposes is to kill -9 the right >>>> "glusterfs --process-name fuse" process. >>>> >>>> >>>> monit check: >>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 >>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>> if space usage > 90% for 5 times within 15 cycles >>>> then alert else if succeeded for 10 cycles then alert >>>> >>>> >>>> stack trace: >>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>> [0x7fa0249e4329] >>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>> [0x7fa0249e4329] >>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>> The message "E [MSGID: 101191] >>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>> [2019-02-01 23:21:56.164427] >>>> The message "I [MSGID: 108031] >>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>> pending frames: >>>> frame : type(1) op(LOOKUP) >>>> frame : type(0) op(0) >>>> patchset: git://git.gluster.org/glusterfs.git >>>> signal received: 6 >>>> time of crash: >>>> 2019-02-01 23:22:03 >>>> configuration details: >>>> argp 1 >>>> backtrace 1 >>>> dlfcn 1 >>>> libpthread 1 >>>> llistxattr 1 >>>> setfsid 1 >>>> spinlock 1 >>>> epoll.h 1 >>>> xattr.h 1 >>>> st_atim.tv_nsec 1 >>>> package-string: glusterfs 5.3 >>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>> >>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>> >>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>> >>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>> >>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>> <http://twitter.com/ArtemR> >>>> >>>> >>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <archon810 at gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> The first (and so far only) crash happened at 2am the next day after >>>>> we upgraded, on only one of four servers and only to one of two mounts. >>>>> >>>>> I have no idea what caused it, but yeah, we do have a pretty busy site >>>>> (apkmirror.com), and it caused a disruption for any uploads or >>>>> downloads from that server until I woke up and fixed the mount. >>>>> >>>>> I wish I could be more helpful but all I have is that stack trace. >>>>> >>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>> >>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>> atumball at redhat.com> wrote: >>>>> >>>>>> Hi Artem, >>>>>> >>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a >>>>>> clone of other bugs where recent discussions happened), and marked it as a >>>>>> blocker for glusterfs-5.4 release. >>>>>> >>>>>> We already have fixes for log flooding - >>>>>> https://review.gluster.org/22128, and are the process of identifying >>>>>> and fixing the issue seen with crash. >>>>>> >>>>>> Can you please tell if the crashes happened as soon as upgrade ? or >>>>>> was there any particular pattern you observed before the crash. >>>>>> >>>>>> -Amar >>>>>> >>>>>> >>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>> archon810 at gmail.com> wrote: >>>>>> >>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I already >>>>>>> got a crash which others have mentioned in >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to >>>>>>> unmount, kill gluster, and remount: >>>>>>> >>>>>>> >>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>> [0x7fcccafcd329] >>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>> [0x7fcccafcd329] >>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>> [0x7fcccafcd329] >>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>> [0x7fcccafcd329] >>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>> The message "I [MSGID: 108031] >>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>> The message "E [MSGID: 101191] >>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>> [2019-01-31 09:38:04.696993] >>>>>>> pending frames: >>>>>>> frame : type(1) op(READ) >>>>>>> frame : type(1) op(OPEN) >>>>>>> frame : type(0) op(0) >>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>> signal received: 6 >>>>>>> time of crash: >>>>>>> 2019-01-31 09:38:04 >>>>>>> configuration details: >>>>>>> argp 1 >>>>>>> backtrace 1 >>>>>>> dlfcn 1 >>>>>>> libpthread 1 >>>>>>> llistxattr 1 >>>>>>> setfsid 1 >>>>>>> spinlock 1 >>>>>>> epoll.h 1 >>>>>>> xattr.h 1 >>>>>>> st_atim.tv_nsec 1 >>>>>>> package-string: glusterfs 5.3 >>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>> --------- >>>>>>> >>>>>>> Do the pending patches fix the crash or only the repeated warnings? >>>>>>> I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>> not too sure how to make it core dump. >>>>>>> >>>>>>> If it's not fixed by the patches above, has anyone already opened a >>>>>>> ticket for the crashes that I can join and monitor? This is going to create >>>>>>> a massive problem for us since production systems are crashing. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>> rgowdapp at redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>> archon810 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Also, not sure if related or not, but I got a ton of these "Failed >>>>>>>>> to dispatch handler" in my logs as well. Many people have been commenting >>>>>>>>> about this issue here >>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>> >>>>>>>> >>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this. >>>>>>>> >>>>>>>> >>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>> handler >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm hoping raising the issue here on the mailing list may bring >>>>>>>>> some additional eyeballs and get them both fixed. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Sincerely, >>>>>>>>> Artem >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I found a similar issue here: >>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a >>>>>>>>>> comment from 3 days ago from someone else with 5.3 who started seeing the >>>>>>>>>> spam. >>>>>>>>>> >>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>> >>>>>>>>> >>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check why this >>>>>>>> message is logged and send a fix? >>>>>>>> >>>>>>>> >>>>>>>>>> Is there any fix for this issue? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Artem >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Amar Tumballi (amarts) >>>>>> >>>>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190207/fc7fee16/attachment.html>
Artem Russakovskii
2019-Feb-08 01:20 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
Sorry to disappoint, but the crash just happened again, so lru-limit=0 didn't help. Here's the snippet of the crash and the subsequent remount by monit. [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7f4402b99329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7f440b6b5218] ) 0-dict: dict is NULL [In valid argument] The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: selecting local read_child <SNIP>_data1-client-3" repeated 39 times between [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 515 times between [2019-02-08 01:11:17.932515] and [2019-02-08 01:13:09.311554] pending frames: frame : type(1) op(LOOKUP) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 6 time of crash: 2019-02-08 01:13:09 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.3 /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] /lib64/libc.so.6(+0x36160)[0x7f440a887160] /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] --------- [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1) [2019-02-08 01:13:35.637830] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-02-08 01:13:35.651405] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2019-02-08 01:13:35.651628] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 [2019-02-08 01:13:35.651747] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4 [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect on transport [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect on transport [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect on transport [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect on transport [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-<SNIP>_data1-client-0: changing port to 49153 (from 0) Final graph: Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon810 at gmail.com> wrote:> I've added the lru-limit=0 parameter to the mounts, and I see it's taken > effect correctly: > "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse > --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" > > Let's see if it stops crashing or not. > > Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > <http://twitter.com/ArtemR> > > > On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <archon810 at gmail.com> > wrote: > >> Hi Nithya, >> >> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing >> crashes, and no further releases have been made yet. >> >> volume info: >> Type: Replicate >> Volume ID: ****SNIP**** >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 4 = 4 >> Transport-type: tcp >> Bricks: >> Brick1: ****SNIP**** >> Brick2: ****SNIP**** >> Brick3: ****SNIP**** >> Brick4: ****SNIP**** >> Options Reconfigured: >> cluster.quorum-count: 1 >> cluster.quorum-type: fixed >> network.ping-timeout: 5 >> network.remote-dio: enable >> performance.rda-cache-limit: 256MB >> performance.readdir-ahead: on >> performance.parallel-readdir: on >> network.inode-lru-limit: 500000 >> performance.md-cache-timeout: 600 >> performance.cache-invalidation: on >> performance.stat-prefetch: on >> features.cache-invalidation-timeout: 600 >> features.cache-invalidation: on >> cluster.readdir-optimize: on >> performance.io-thread-count: 32 >> server.event-threads: 4 >> client.event-threads: 4 >> performance.read-ahead: off >> cluster.lookup-optimize: on >> performance.cache-size: 1GB >> cluster.self-heal-daemon: enable >> transport.address-family: inet >> nfs.disable: on >> performance.client-io-threads: on >> cluster.granular-entry-heal: enable >> cluster.data-self-heal-algorithm: full >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >> <http://twitter.com/ArtemR> >> >> >> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <nbalacha at redhat.com> >> wrote: >> >>> Hi Artem, >>> >>> Do you still see the crashes with 5.3? If yes, please try mount the >>> volume using the mount option lru-limit=0 and see if that helps. We are >>> looking into the crashes and will update when have a fix. >>> >>> Also, please provide the gluster volume info for the volume in question. >>> >>> >>> regards, >>> Nithya >>> >>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon810 at gmail.com> >>> wrote: >>> >>>> The fuse crash happened two more times, but this time monit helped >>>> recover within 1 minute, so it's a great workaround for now. >>>> >>>> What's odd is that the crashes are only happening on one of 4 servers, >>>> and I don't know why. >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>> <http://twitter.com/ArtemR> >>>> >>>> >>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <archon810 at gmail.com> >>>> wrote: >>>> >>>>> The fuse crash happened again yesterday, to another volume. Are there >>>>> any mount options that could help mitigate this? >>>>> >>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) task to >>>>> watch and restart the mount, which works and recovers the mount point >>>>> within a minute. Not ideal, but a temporary workaround. >>>>> >>>>> By the way, the way to reproduce this "Transport endpoint is not >>>>> connected" condition for testing purposes is to kill -9 the right >>>>> "glusterfs --process-name fuse" process. >>>>> >>>>> >>>>> monit check: >>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 >>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>> if space usage > 90% for 5 times within 15 cycles >>>>> then alert else if succeeded for 10 cycles then alert >>>>> >>>>> >>>>> stack trace: >>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>> [0x7fa0249e4329] >>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>> [0x7fa0249e4329] >>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>> The message "E [MSGID: 101191] >>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>> [2019-02-01 23:21:56.164427] >>>>> The message "I [MSGID: 108031] >>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>> pending frames: >>>>> frame : type(1) op(LOOKUP) >>>>> frame : type(0) op(0) >>>>> patchset: git://git.gluster.org/glusterfs.git >>>>> signal received: 6 >>>>> time of crash: >>>>> 2019-02-01 23:22:03 >>>>> configuration details: >>>>> argp 1 >>>>> backtrace 1 >>>>> dlfcn 1 >>>>> libpthread 1 >>>>> llistxattr 1 >>>>> setfsid 1 >>>>> spinlock 1 >>>>> epoll.h 1 >>>>> xattr.h 1 >>>>> st_atim.tv_nsec 1 >>>>> package-string: glusterfs 5.3 >>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>> >>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>> >>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>> >>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>> >>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>> >>>>> Sincerely, >>>>> Artem >>>>> >>>>> -- >>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>> beerpla.net | +ArtemRussakovskii >>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>> <http://twitter.com/ArtemR> >>>>> >>>>> >>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <archon810 at gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> The first (and so far only) crash happened at 2am the next day after >>>>>> we upgraded, on only one of four servers and only to one of two mounts. >>>>>> >>>>>> I have no idea what caused it, but yeah, we do have a pretty busy >>>>>> site (apkmirror.com), and it caused a disruption for any uploads or >>>>>> downloads from that server until I woke up and fixed the mount. >>>>>> >>>>>> I wish I could be more helpful but all I have is that stack trace. >>>>>> >>>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>>> >>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>> atumball at redhat.com> wrote: >>>>>> >>>>>>> Hi Artem, >>>>>>> >>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as >>>>>>> a clone of other bugs where recent discussions happened), and marked it as >>>>>>> a blocker for glusterfs-5.4 release. >>>>>>> >>>>>>> We already have fixes for log flooding - >>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>> identifying and fixing the issue seen with crash. >>>>>>> >>>>>>> Can you please tell if the crashes happened as soon as upgrade ? or >>>>>>> was there any particular pattern you observed before the crash. >>>>>>> >>>>>>> -Amar >>>>>>> >>>>>>> >>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>> archon810 at gmail.com> wrote: >>>>>>> >>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I >>>>>>>> already got a crash which others have mentioned in >>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to >>>>>>>> unmount, kill gluster, and remount: >>>>>>>> >>>>>>>> >>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>> [0x7fcccafcd329] >>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>> [0x7fcccafcd329] >>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>> [0x7fcccafcd329] >>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>> [0x7fcccafcd329] >>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>> The message "I [MSGID: 108031] >>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>> The message "E [MSGID: 101191] >>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>> pending frames: >>>>>>>> frame : type(1) op(READ) >>>>>>>> frame : type(1) op(OPEN) >>>>>>>> frame : type(0) op(0) >>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>> signal received: 6 >>>>>>>> time of crash: >>>>>>>> 2019-01-31 09:38:04 >>>>>>>> configuration details: >>>>>>>> argp 1 >>>>>>>> backtrace 1 >>>>>>>> dlfcn 1 >>>>>>>> libpthread 1 >>>>>>>> llistxattr 1 >>>>>>>> setfsid 1 >>>>>>>> spinlock 1 >>>>>>>> epoll.h 1 >>>>>>>> xattr.h 1 >>>>>>>> st_atim.tv_nsec 1 >>>>>>>> package-string: glusterfs 5.3 >>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>> --------- >>>>>>>> >>>>>>>> Do the pending patches fix the crash or only the repeated warnings? >>>>>>>> I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>> not too sure how to make it core dump. >>>>>>>> >>>>>>>> If it's not fixed by the patches above, has anyone already opened a >>>>>>>> ticket for the crashes that I can join and monitor? This is going to create >>>>>>>> a massive problem for us since production systems are crashing. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Sincerely, >>>>>>>> Artem >>>>>>>> >>>>>>>> -- >>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>> <http://twitter.com/ArtemR> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Also, not sure if related or not, but I got a ton of these >>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have been >>>>>>>>>> commenting about this issue here >>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>> >>>>>>>>> >>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this. >>>>>>>>> >>>>>>>>> >>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>> handler >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm hoping raising the issue here on the mailing list may bring >>>>>>>>>> some additional eyeballs and get them both fixed. >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Artem >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I found a similar issue here: >>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a >>>>>>>>>>> comment from 3 days ago from someone else with 5.3 who started seeing the >>>>>>>>>>> spam. >>>>>>>>>>> >>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check why this >>>>>>>>> message is logged and send a fix? >>>>>>>>> >>>>>>>>> >>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> Sincerely, >>>>>>>>>>> Artem >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Amar Tumballi (amarts) >>>>>>> >>>>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190207/e3a2b047/attachment.html>