thr3ads.net - Gluster users - [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Artem Russakovskii

2019-Feb-04 23:59 UTC

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

The fuse crash happened two more times, but this time monit helped recover
within 1 minute, so it's a great workaround for now.

What's odd is that the crashes are only happening on one of 4 servers, and
I don't know why.

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>


On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <archon810 at
gmail.com>
wrote:
> The fuse crash happened again yesterday, to another volume. Are there any
> mount options that could help mitigate this?
>
> In the meantime, I set up a monit (https://mmonit.com/monit/) task to
> watch and restart the mount, which works and recovers the mount point
> within a minute. Not ideal, but a temporary workaround.
>
> By the way, the way to reproduce this "Transport endpoint is not
> connected" condition for testing purposes is to kill -9 the right
> "glusterfs --process-name fuse" process.
>
>
> monit check:
> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>   if space usage > 90% for 5 times within 15 cycles
>     then alert else if succeeded for 10 cycles then alert
>
>
> stack trace:
> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fa0249e4329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fa0249e4329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
> [2019-02-01 23:21:56.164427]
> The message "I [MSGID: 108031]
[afr-common.c:2543:afr_local_discovery_cbk]
> 0-SITE_data3-replicate-0: selecting local read_child
SITE_data3-client-3"
> repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01
> 23:22:03.474036]
> pending frames:
> frame : type(1) op(LOOKUP)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 6
> time of crash:
> 2019-02-01 23:22:03
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>
>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>
>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>
>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
>
> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <archon810 at
gmail.com>
> wrote:
>
>> Hi,
>>
>> The first (and so far only) crash happened at 2am the next day after we
>> upgraded, on only one of four servers and only to one of two mounts.
>>
>> I have no idea what caused it, but yeah, we do have a pretty busy site
(
>> apkmirror.com), and it caused a disruption for any uploads or downloads
>> from that server until I woke up and fixed the mount.
>>
>> I wish I could be more helpful but all I have is that stack trace.
>>
>> I'm glad it's a blocker and will hopefully be resolved soon.
>>
>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <
>> atumball at redhat.com> wrote:
>>
>>> Hi Artem,
>>>
>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as
a
>>> clone of other bugs where recent discussions happened), and marked
it as a
>>> blocker for glusterfs-5.4 release.
>>>
>>> We already have fixes for log flooding -
>>> https://review.gluster.org/22128, and are the process of
identifying
>>> and fixing the issue seen with crash.
>>>
>>> Can you please tell if the crashes happened as soon as upgrade ? or
was
>>> there any particular pattern you observed before the crash.
>>>
>>> -Amar
>>>
>>>
>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <archon810
at gmail.com>
>>> wrote:
>>>
>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I
already
>>>> got a crash which others have mentioned in
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to
>>>> unmount, kill gluster, and remount:
>>>>
>>>>
>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>> [0x7fcccafcd329]
>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>> [0x7fcccafcd329]
>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>> [0x7fcccafcd329]
>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>> [0x7fcccafcd329]
>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>> The message "I [MSGID: 108031]
>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0:
>>>> selecting local read_child SITE_data1-client-3" repeated 5
times between
>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
>>>> The message "E [MSGID: 101191]
>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed
to dispatch
>>>> handler" repeated 72 times between [2019-01-31
09:37:53.746741] and
>>>> [2019-01-31 09:38:04.696993]
>>>> pending frames:
>>>> frame : type(1) op(READ)
>>>> frame : type(1) op(OPEN)
>>>> frame : type(0) op(0)
>>>> patchset: git://git.gluster.org/glusterfs.git
>>>> signal received: 6
>>>> time of crash:
>>>> 2019-01-31 09:38:04
>>>> configuration details:
>>>> argp 1
>>>> backtrace 1
>>>> dlfcn 1
>>>> libpthread 1
>>>> llistxattr 1
>>>> setfsid 1
>>>> spinlock 1
>>>> epoll.h 1
>>>> xattr.h 1
>>>> st_atim.tv_nsec 1
>>>> package-string: glusterfs 5.3
>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>> ---------
>>>>
>>>> Do the pending patches fix the crash or only the repeated
warnings? I'm
>>>> running glusterfs on OpenSUSE 15.0 installed via
>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>> not too sure how to make it core dump.
>>>>
>>>> If it's not fixed by the patches above, has anyone already
opened a
>>>> ticket for the crashes that I can join and monitor? This is
going to create
>>>> a massive problem for us since production systems are crashing.
>>>>
>>>> Thanks.
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>,
APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <
>>>> rgowdapp at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <
>>>>> archon810 at gmail.com> wrote:
>>>>>
>>>>>> Also, not sure if related or not, but I got a ton of
these "Failed to
>>>>>> dispatch handler" in my logs as well. Many people
have been commenting
>>>>>> about this issue here
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>
>>>>>
>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses
this.
>>>>>
>>>>>
>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>> [2019-01-30 20:38:20.783713] W
[dict.c:761:dict_ref]
>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>> [0x7fd966fcd329]
>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid
argument]
>>>>>>> ==> mnt-SITE_data3.log
<=>>>>>>> The message "E [MSGID: 101191]
>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>> handler" repeated 413 times between
[2019-01-30 20:36:23.881090] and
>>>>>>> [2019-01-30 20:38:20.015593]
>>>>>>> The message "I [MSGID: 108031]
>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data3-replicate-0:
>>>>>>> selecting local read_child
SITE_data3-client-0" repeated 42 times between
>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30
20:38:20.280306]
>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>> The message "I [MSGID: 108031]
>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0:
>>>>>>> selecting local read_child
SITE_data1-client-0" repeated 50 times between
>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30
20:38:19.459789]
>>>>>>> The message "E [MSGID: 101191]
>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>> handler" repeated 2654 times between
[2019-01-30 20:36:22.667327] and
>>>>>>> [2019-01-30 20:38:20.546355]
>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031]
>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0:
>>>>>>> selecting local read_child SITE_data1-client-0
>>>>>>> ==> mnt-SITE_data3.log
<=>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data3-replicate-0:
>>>>>>> selecting local read_child SITE_data3-client-0
>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>> handler
>>>>>>
>>>>>>
>>>>>> I'm hoping raising the issue here on the mailing
list may bring some
>>>>>> additional eyeballs and get them both fixed.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii
<
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> I found a similar issue here:
>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a
>>>>>>> comment from 3 days ago from someone else with 5.3
who started seeing the
>>>>>>> spam.
>>>>>>>
>>>>>>> Here's the command that repeats over and over:
>>>>>>> [2019-01-30 20:23:24.481581] W
[dict.c:761:dict_ref]
>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>> [0x7fd966fcd329]
>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid
argument]
>>>>>>>
>>>>>>
>>>>> +Milind Changire <mchangir at redhat.com> Can you
check why this message
>>>>> is logged and send a fix?
>>>>>
>>>>>
>>>>>>> Is there any fix for this issue?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> --
>>> Amar Tumballi (amarts)
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/eeabb742/attachment.html>

Nithya Balachandran

2019-Feb-06 08:19 UTC

head link

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Hi Artem,

Do you still see the crashes with 5.3? If yes, please try mount the volume
using the mount option lru-limit=0 and see if that helps. We are looking
into the crashes and will update when have a fix.

Also, please provide the gluster volume info for the volume in question.


regards,
Nithya

On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon810 at gmail.com>
wrote:
> The fuse crash happened two more times, but this time monit helped recover
> within 1 minute, so it's a great workaround for now.
>
> What's odd is that the crashes are only happening on one of 4 servers,
and
> I don't know why.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
>
> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <archon810 at
gmail.com>
> wrote:
>
>> The fuse crash happened again yesterday, to another volume. Are there
any
>> mount options that could help mitigate this?
>>
>> In the meantime, I set up a monit (https://mmonit.com/monit/) task to
>> watch and restart the mount, which works and recovers the mount point
>> within a minute. Not ideal, but a temporary workaround.
>>
>> By the way, the way to reproduce this "Transport endpoint is not
>> connected" condition for testing purposes is to kill -9 the right
>> "glusterfs --process-name fuse" process.
>>
>>
>> monit check:
>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
>>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>>   if space usage > 90% for 5 times within 15 cycles
>>     then alert else if succeeded for 10 cycles then alert
>>
>>
>> stack trace:
>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fa0249e4329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fa0249e4329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to
dispatch
>> handler" repeated 26 times between [2019-02-01 23:21:20.857333]
and
>> [2019-02-01 23:21:56.164427]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>> selecting local read_child SITE_data3-client-3" repeated 27 times
between
>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
>> pending frames:
>> frame : type(1) op(LOOKUP)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 6
>> time of crash:
>> 2019-02-01 23:22:03
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>
>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>
>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>
>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>>
>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <archon810 at
gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> The first (and so far only) crash happened at 2am the next day
after we
>>> upgraded, on only one of four servers and only to one of two
mounts.
>>>
>>> I have no idea what caused it, but yeah, we do have a pretty busy
site (
>>> apkmirror.com), and it caused a disruption for any uploads or
downloads
>>> from that server until I woke up and fixed the mount.
>>>
>>> I wish I could be more helpful but all I have is that stack trace.
>>>
>>> I'm glad it's a blocker and will hopefully be resolved
soon.
>>>
>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <
>>> atumball at redhat.com> wrote:
>>>
>>>> Hi Artem,
>>>>
>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie,
as a
>>>> clone of other bugs where recent discussions happened), and
marked it as a
>>>> blocker for glusterfs-5.4 release.
>>>>
>>>> We already have fixes for log flooding -
>>>> https://review.gluster.org/22128, and are the process of
identifying
>>>> and fixing the issue seen with crash.
>>>>
>>>> Can you please tell if the crashes happened as soon as upgrade
? or was
>>>> there any particular pattern you observed before the crash.
>>>>
>>>> -Amar
>>>>
>>>>
>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <
>>>> archon810 at gmail.com> wrote:
>>>>
>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3,
I already
>>>>> got a crash which others have mentioned in
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had
to
>>>>> unmount, kill gluster, and remount:
>>>>>
>>>>>
>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>> [0x7fcccafcd329]
>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>> [0x7fcccafcd329]
>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>> [0x7fcccafcd329]
>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>> [0x7fcccafcd329]
>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>> The message "I [MSGID: 108031]
>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0:
>>>>> selecting local read_child SITE_data1-client-3"
repeated 5 times between
>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31
09:38:03.958061]
>>>>> The message "E [MSGID: 101191]
>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll:
Failed to dispatch
>>>>> handler" repeated 72 times between [2019-01-31
09:37:53.746741] and
>>>>> [2019-01-31 09:38:04.696993]
>>>>> pending frames:
>>>>> frame : type(1) op(READ)
>>>>> frame : type(1) op(OPEN)
>>>>> frame : type(0) op(0)
>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>> signal received: 6
>>>>> time of crash:
>>>>> 2019-01-31 09:38:04
>>>>> configuration details:
>>>>> argp 1
>>>>> backtrace 1
>>>>> dlfcn 1
>>>>> libpthread 1
>>>>> llistxattr 1
>>>>> setfsid 1
>>>>> spinlock 1
>>>>> epoll.h 1
>>>>> xattr.h 1
>>>>> st_atim.tv_nsec 1
>>>>> package-string: glusterfs 5.3
>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>> ---------
>>>>>
>>>>> Do the pending patches fix the crash or only the repeated
warnings?
>>>>> I'm running glusterfs on OpenSUSE 15.0 installed via
>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>> not too sure how to make it core dump.
>>>>>
>>>>> If it's not fixed by the patches above, has anyone
already opened a
>>>>> ticket for the crashes that I can join and monitor? This is
going to create
>>>>> a massive problem for us since production systems are
crashing.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Sincerely,
>>>>> Artem
>>>>>
>>>>> --
>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> beerpla.net | +ArtemRussakovskii
>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>> <http://twitter.com/ArtemR>
>>>>>
>>>>>
>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <
>>>>> rgowdapp at redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> Also, not sure if related or not, but I got a ton
of these "Failed
>>>>>>> to dispatch handler" in my logs as well. Many
people have been commenting
>>>>>>> about this issue here
>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>
>>>>>>
>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/
addresses this.
>>>>>>
>>>>>>
>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>> [2019-01-30 20:38:20.783713] W
[dict.c:761:dict_ref]
>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>> [0x7fd966fcd329]
>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL
[Invalid argument]
>>>>>>>> ==> mnt-SITE_data3.log
<=>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>>> handler" repeated 413 times between
[2019-01-30 20:36:23.881090] and
>>>>>>>> [2019-01-30 20:38:20.015593]
>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data3-replicate-0:
>>>>>>>> selecting local read_child
SITE_data3-client-0" repeated 42 times between
>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30
20:38:20.280306]
>>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0:
>>>>>>>> selecting local read_child
SITE_data1-client-0" repeated 50 times between
>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30
20:38:19.459789]
>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>>> handler" repeated 2654 times between
[2019-01-30 20:36:22.667327] and
>>>>>>>> [2019-01-30 20:38:20.546355]
>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031]
>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0:
>>>>>>>> selecting local read_child SITE_data1-client-0
>>>>>>>> ==> mnt-SITE_data3.log
<=>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID:
108031]
>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data3-replicate-0:
>>>>>>>> selecting local read_child SITE_data3-client-0
>>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID:
101191]
>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>>> handler
>>>>>>>
>>>>>>>
>>>>>>> I'm hoping raising the issue here on the
mailing list may bring some
>>>>>>> additional eyeballs and get them both fixed.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii
<
>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>
>>>>>>>> I found a similar issue here:
>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a
>>>>>>>> comment from 3 days ago from someone else with
5.3 who started seeing the
>>>>>>>> spam.
>>>>>>>>
>>>>>>>> Here's the command that repeats over and
over:
>>>>>>>> [2019-01-30 20:23:24.481581] W
[dict.c:761:dict_ref]
>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>> [0x7fd966fcd329]
>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL
[Invalid argument]
>>>>>>>>
>>>>>>>
>>>>>> +Milind Changire <mchangir at redhat.com> Can you
check why this
>>>>>> message is logged and send a fix?
>>>>>>
>>>>>>
>>>>>>>> Is there any fix for this issue?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Artem
>>>>>>>>
>>>>>>>> --
>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>> --
>>>> Amar Tumballi (amarts)
>>>>
>>> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190206/16aee0d3/attachment.html>

Gluster users - Feb 2019 - Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)