thr3ads.net - Gluster users - [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Artem Russakovskii

2019-Feb-07 21:28 UTC

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

I've added the lru-limit=0 parameter to the mounts, and I see it's taken
effect correctly:
"/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
--volfile-server=localhost --volfile-id=/<SNIP>  /mnt/<SNIP>"

Let's see if it stops crashing or not.

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>


On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <archon810 at
gmail.com>
wrote:
> Hi Nithya,
>
> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing
> crashes, and no further releases have been made yet.
>
> volume info:
> Type: Replicate
> Volume ID: ****SNIP****
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: ****SNIP****
> Brick2: ****SNIP****
> Brick3: ****SNIP****
> Brick4: ****SNIP****
> Options Reconfigured:
> cluster.quorum-count: 1
> cluster.quorum-type: fixed
> network.ping-timeout: 5
> network.remote-dio: enable
> performance.rda-cache-limit: 256MB
> performance.readdir-ahead: on
> performance.parallel-readdir: on
> network.inode-lru-limit: 500000
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> cluster.readdir-optimize: on
> performance.io-thread-count: 32
> server.event-threads: 4
> client.event-threads: 4
> performance.read-ahead: off
> cluster.lookup-optimize: on
> performance.cache-size: 1GB
> cluster.self-heal-daemon: enable
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: on
> cluster.granular-entry-heal: enable
> cluster.data-self-heal-algorithm: full
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
>
> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <nbalacha at
redhat.com>
> wrote:
>
>> Hi Artem,
>>
>> Do you still see the crashes with 5.3? If yes, please try mount the
>> volume using the mount option lru-limit=0 and see if that helps. We are
>> looking into the crashes and will update when have a fix.
>>
>> Also, please provide the gluster volume info for the volume in
question.
>>
>>
>> regards,
>> Nithya
>>
>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon810 at
gmail.com>
>> wrote:
>>
>>> The fuse crash happened two more times, but this time monit helped
>>> recover within 1 minute, so it's a great workaround for now.
>>>
>>> What's odd is that the crashes are only happening on one of 4
servers,
>>> and I don't know why.
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> <http://twitter.com/ArtemR>
>>>
>>>
>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <archon810 at
gmail.com>
>>> wrote:
>>>
>>>> The fuse crash happened again yesterday, to another volume. Are
there
>>>> any mount options that could help mitigate this?
>>>>
>>>> In the meantime, I set up a monit (https://mmonit.com/monit/)
task to
>>>> watch and restart the mount, which works and recovers the mount
point
>>>> within a minute. Not ideal, but a temporary workaround.
>>>>
>>>> By the way, the way to reproduce this "Transport endpoint
is not
>>>> connected" condition for testing purposes is to kill -9
the right
>>>> "glusterfs --process-name fuse" process.
>>>>
>>>>
>>>> monit check:
>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
>>>>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>>>>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>>>>   if space usage > 90% for 5 times within 15 cycles
>>>>     then alert else if succeeded for 10 cycles then alert
>>>>
>>>>
>>>> stack trace:
>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>> [0x7fa0249e4329]
>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>> [0x7fa0249e4329]
>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>> The message "E [MSGID: 101191]
>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed
to dispatch
>>>> handler" repeated 26 times between [2019-02-01
23:21:20.857333] and
>>>> [2019-02-01 23:21:56.164427]
>>>> The message "I [MSGID: 108031]
>>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-SITE_data3-replicate-0:
>>>> selecting local read_child SITE_data3-client-3" repeated
27 times between
>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
>>>> pending frames:
>>>> frame : type(1) op(LOOKUP)
>>>> frame : type(0) op(0)
>>>> patchset: git://git.gluster.org/glusterfs.git
>>>> signal received: 6
>>>> time of crash:
>>>> 2019-02-01 23:22:03
>>>> configuration details:
>>>> argp 1
>>>> backtrace 1
>>>> dlfcn 1
>>>> libpthread 1
>>>> llistxattr 1
>>>> setfsid 1
>>>> spinlock 1
>>>> epoll.h 1
>>>> xattr.h 1
>>>> st_atim.tv_nsec 1
>>>> package-string: glusterfs 5.3
>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>,
APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <archon810
at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The first (and so far only) crash happened at 2am the next
day after
>>>>> we upgraded, on only one of four servers and only to one of
two mounts.
>>>>>
>>>>> I have no idea what caused it, but yeah, we do have a
pretty busy site
>>>>> (apkmirror.com), and it caused a disruption for any uploads
or
>>>>> downloads from that server until I woke up and fixed the
mount.
>>>>>
>>>>> I wish I could be more helpful but all I have is that stack
trace.
>>>>>
>>>>> I'm glad it's a blocker and will hopefully be
resolved soon.
>>>>>
>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan
<
>>>>> atumball at redhat.com> wrote:
>>>>>
>>>>>> Hi Artem,
>>>>>>
>>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a
>>>>>> clone of other bugs where recent discussions happened),
and marked it as a
>>>>>> blocker for glusterfs-5.4 release.
>>>>>>
>>>>>> We already have fixes for log flooding -
>>>>>> https://review.gluster.org/22128, and are the process
of identifying
>>>>>> and fixing the issue seen with crash.
>>>>>>
>>>>>> Can you please tell if the crashes happened as soon as
upgrade ? or
>>>>>> was there any particular pattern you observed before
the crash.
>>>>>>
>>>>>> -Amar
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii
<
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> Within 24 hours after updating from rock solid 4.1
to 5.3, I already
>>>>>>> got a crash which others have mentioned in
>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567
and had to
>>>>>>> unmount, kill gluster, and remount:
>>>>>>>
>>>>>>>
>>>>>>> [2019-01-31 09:38:04.317604] W
[dict.c:761:dict_ref]
>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>> [0x7fcccafcd329]
>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid
argument]
>>>>>>> [2019-01-31 09:38:04.319308] W
[dict.c:761:dict_ref]
>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>> [0x7fcccafcd329]
>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid
argument]
>>>>>>> [2019-01-31 09:38:04.320047] W
[dict.c:761:dict_ref]
>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>> [0x7fcccafcd329]
>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid
argument]
>>>>>>> [2019-01-31 09:38:04.320677] W
[dict.c:761:dict_ref]
>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>> [0x7fcccafcd329]
>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid
argument]
>>>>>>> The message "I [MSGID: 108031]
>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0:
>>>>>>> selecting local read_child
SITE_data1-client-3" repeated 5 times between
>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31
09:38:03.958061]
>>>>>>> The message "E [MSGID: 101191]
>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>> handler" repeated 72 times between [2019-01-31
09:37:53.746741] and
>>>>>>> [2019-01-31 09:38:04.696993]
>>>>>>> pending frames:
>>>>>>> frame : type(1) op(READ)
>>>>>>> frame : type(1) op(OPEN)
>>>>>>> frame : type(0) op(0)
>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>> signal received: 6
>>>>>>> time of crash:
>>>>>>> 2019-01-31 09:38:04
>>>>>>> configuration details:
>>>>>>> argp 1
>>>>>>> backtrace 1
>>>>>>> dlfcn 1
>>>>>>> libpthread 1
>>>>>>> llistxattr 1
>>>>>>> setfsid 1
>>>>>>> spinlock 1
>>>>>>> epoll.h 1
>>>>>>> xattr.h 1
>>>>>>> st_atim.tv_nsec 1
>>>>>>> package-string: glusterfs 5.3
>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>
>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>
>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>
>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>> ---------
>>>>>>>
>>>>>>> Do the pending patches fix the crash or only the
repeated warnings?
>>>>>>> I'm running glusterfs on OpenSUSE 15.0
installed via
>>>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>> not too sure how to make it core dump.
>>>>>>>
>>>>>>> If it's not fixed by the patches above, has
anyone already opened a
>>>>>>> ticket for the crashes that I can join and monitor?
This is going to create
>>>>>>> a massive problem for us since production systems
are crashing.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra
Gowdappa <
>>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem
Russakovskii <
>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Also, not sure if related or not, but I got
a ton of these "Failed
>>>>>>>>> to dispatch handler" in my logs as
well. Many people have been commenting
>>>>>>>>> about this issue here
>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this.
>>>>>>>>
>>>>>>>>
>>>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>>>> [2019-01-30 20:38:20.783713] W
[dict.c:761:dict_ref]
>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL
[Invalid argument]
>>>>>>>>>> ==> mnt-SITE_data3.log
<=>>>>>>>>>> The message "E [MSGID:
101191]
>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>> handler" repeated 413 times
between [2019-01-30 20:36:23.881090] and
>>>>>>>>>> [2019-01-30 20:38:20.015593]
>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>> selecting local read_child
SITE_data3-client-0" repeated 42 times between
>>>>>>>>>> [2019-01-30 20:36:23.290287] and
[2019-01-30 20:38:20.280306]
>>>>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>>>> The message "I [MSGID:
108031]
>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>> selecting local read_child
SITE_data1-client-0" repeated 50 times between
>>>>>>>>>> [2019-01-30 20:36:22.247367] and
[2019-01-30 20:38:19.459789]
>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>> handler" repeated 2654 times
between [2019-01-30 20:36:22.667327] and
>>>>>>>>>> [2019-01-30 20:38:20.546355]
>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID:
108031]
>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>> selecting local read_child
SITE_data1-client-0
>>>>>>>>>> ==> mnt-SITE_data3.log
<=>>>>>>>>>> [2019-01-30 20:38:22.349689] I
[MSGID: 108031]
>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>> selecting local read_child
SITE_data3-client-0
>>>>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>>>> [2019-01-30 20:38:22.762941] E
[MSGID: 101191]
>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>> handler
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm hoping raising the issue here on
the mailing list may bring
>>>>>>>>> some additional eyeballs and get them both
fixed.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Artem
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>>> <http://www.apkmirror.com/>,
Illogical Robot LLC
>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem
Russakovskii <
>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I found a similar issue here:
>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a
>>>>>>>>>> comment from 3 days ago from someone
else with 5.3 who started seeing the
>>>>>>>>>> spam.
>>>>>>>>>>
>>>>>>>>>> Here's the command that repeats
over and over:
>>>>>>>>>> [2019-01-30 20:23:24.481581] W
[dict.c:761:dict_ref]
>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL
[Invalid argument]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> +Milind Changire <mchangir at redhat.com>
Can you check why this
>>>>>>>> message is logged and send a fix?
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Is there any fix for this issue?
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Sincerely,
>>>>>>>>>> Artem
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Amar Tumballi (amarts)
>>>>>>
>>>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190207/fc7fee16/attachment.html>

Artem Russakovskii

2019-Feb-08 01:20 UTC

head link

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Sorry to disappoint, but the crash just happened again, so lru-limit=0
didn't help.

Here's the snippet of the crash and the subsequent remount by monit.


[2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7f4402b99329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7f440b6b5218] ) 0-dict: dict is NULL [In
valid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
0-<SNIP>_data1-replicate-0: selecting local read_child
<SNIP>_data1-client-3" repeated 39 times between [2019-02-08
01:11:18.043286] and [2019-02-08 01:13:07.915604]
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
[2019-02-08 01:13:09.311554]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-08 01:13:09
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
/lib64/libc.so.6(+0x36160)[0x7f440a887160]
/lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
/lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
/lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
/lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
/lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
/lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
---------
[2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
(args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
--volfile-server=localhost --volfile-id=/<SNIP>_data1
/mnt/<SNIP>_data1)
[2019-02-08 01:13:35.637830] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2019-02-08 01:13:35.651405] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2019-02-08 01:13:35.651628] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 3
[2019-02-08 01:13:35.651747] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 4
[2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
0-<SNIP>_data1-client-0: parent translators are ready, attempting connect
on transport
[2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
0-<SNIP>_data1-client-1: parent translators are ready, attempting connect
on transport
[2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
0-<SNIP>_data1-client-2: parent translators are ready, attempting connect
on transport
[2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
0-<SNIP>_data1-client-3: parent translators are ready, attempting connect
on transport
[2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
0-<SNIP>_data1-client-0: changing port to 49153 (from 0)
Final graph:


Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>


On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon810 at gmail.com>
wrote:
> I've added the lru-limit=0 parameter to the mounts, and I see it's
taken
> effect correctly:
> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
> --volfile-server=localhost --volfile-id=/<SNIP> 
/mnt/<SNIP>"
>
> Let's see if it stops crashing or not.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
>
> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <archon810 at
gmail.com>
> wrote:
>
>> Hi Nithya,
>>
>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing
>> crashes, and no further releases have been made yet.
>>
>> volume info:
>> Type: Replicate
>> Volume ID: ****SNIP****
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 4 = 4
>> Transport-type: tcp
>> Bricks:
>> Brick1: ****SNIP****
>> Brick2: ****SNIP****
>> Brick3: ****SNIP****
>> Brick4: ****SNIP****
>> Options Reconfigured:
>> cluster.quorum-count: 1
>> cluster.quorum-type: fixed
>> network.ping-timeout: 5
>> network.remote-dio: enable
>> performance.rda-cache-limit: 256MB
>> performance.readdir-ahead: on
>> performance.parallel-readdir: on
>> network.inode-lru-limit: 500000
>> performance.md-cache-timeout: 600
>> performance.cache-invalidation: on
>> performance.stat-prefetch: on
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: on
>> cluster.readdir-optimize: on
>> performance.io-thread-count: 32
>> server.event-threads: 4
>> client.event-threads: 4
>> performance.read-ahead: off
>> cluster.lookup-optimize: on
>> performance.cache-size: 1GB
>> cluster.self-heal-daemon: enable
>> transport.address-family: inet
>> nfs.disable: on
>> performance.client-io-threads: on
>> cluster.granular-entry-heal: enable
>> cluster.data-self-heal-algorithm: full
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>>
>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <nbalacha at
redhat.com>
>> wrote:
>>
>>> Hi Artem,
>>>
>>> Do you still see the crashes with 5.3? If yes, please try mount the
>>> volume using the mount option lru-limit=0 and see if that helps. We
are
>>> looking into the crashes and will update when have a fix.
>>>
>>> Also, please provide the gluster volume info for the volume in
question.
>>>
>>>
>>> regards,
>>> Nithya
>>>
>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon810 at
gmail.com>
>>> wrote:
>>>
>>>> The fuse crash happened two more times, but this time monit
helped
>>>> recover within 1 minute, so it's a great workaround for
now.
>>>>
>>>> What's odd is that the crashes are only happening on one of
4 servers,
>>>> and I don't know why.
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>,
APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii
<archon810 at gmail.com>
>>>> wrote:
>>>>
>>>>> The fuse crash happened again yesterday, to another volume.
Are there
>>>>> any mount options that could help mitigate this?
>>>>>
>>>>> In the meantime, I set up a monit
(https://mmonit.com/monit/) task to
>>>>> watch and restart the mount, which works and recovers the
mount point
>>>>> within a minute. Not ideal, but a temporary workaround.
>>>>>
>>>>> By the way, the way to reproduce this "Transport
endpoint is not
>>>>> connected" condition for testing purposes is to kill
-9 the right
>>>>> "glusterfs --process-name fuse" process.
>>>>>
>>>>>
>>>>> monit check:
>>>>> check filesystem glusterfs_data1 with path
/mnt/glusterfs_data1
>>>>>   start program  = "/bin/mount 
/mnt/glusterfs_data1"
>>>>>   stop program  = "/bin/umount
/mnt/glusterfs_data1"
>>>>>   if space usage > 90% for 5 times within 15 cycles
>>>>>     then alert else if succeeded for 10 cycles then alert
>>>>>
>>>>>
>>>>> stack trace:
>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>> [0x7fa0249e4329]
>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>> [0x7fa0249e4329]
>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>>> The message "E [MSGID: 101191]
>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll:
Failed to dispatch
>>>>> handler" repeated 26 times between [2019-02-01
23:21:20.857333] and
>>>>> [2019-02-01 23:21:56.164427]
>>>>> The message "I [MSGID: 108031]
>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-SITE_data3-replicate-0:
>>>>> selecting local read_child SITE_data3-client-3"
repeated 27 times between
>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01
23:22:03.474036]
>>>>> pending frames:
>>>>> frame : type(1) op(LOOKUP)
>>>>> frame : type(0) op(0)
>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>> signal received: 6
>>>>> time of crash:
>>>>> 2019-02-01 23:22:03
>>>>> configuration details:
>>>>> argp 1
>>>>> backtrace 1
>>>>> dlfcn 1
>>>>> libpthread 1
>>>>> llistxattr 1
>>>>> setfsid 1
>>>>> spinlock 1
>>>>> epoll.h 1
>>>>> xattr.h 1
>>>>> st_atim.tv_nsec 1
>>>>> package-string: glusterfs 5.3
>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>
>>>>> Sincerely,
>>>>> Artem
>>>>>
>>>>> --
>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> beerpla.net | +ArtemRussakovskii
>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>> <http://twitter.com/ArtemR>
>>>>>
>>>>>
>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii
<archon810 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The first (and so far only) crash happened at 2am the
next day after
>>>>>> we upgraded, on only one of four servers and only to
one of two mounts.
>>>>>>
>>>>>> I have no idea what caused it, but yeah, we do have a
pretty busy
>>>>>> site (apkmirror.com), and it caused a disruption for
any uploads or
>>>>>> downloads from that server until I woke up and fixed
the mount.
>>>>>>
>>>>>> I wish I could be more helpful but all I have is that
stack trace.
>>>>>>
>>>>>> I'm glad it's a blocker and will hopefully be
resolved soon.
>>>>>>
>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi
Suryanarayan <
>>>>>> atumball at redhat.com> wrote:
>>>>>>
>>>>>>> Hi Artem,
>>>>>>>
>>>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as
>>>>>>> a clone of other bugs where recent discussions
happened), and marked it as
>>>>>>> a blocker for glusterfs-5.4 release.
>>>>>>>
>>>>>>> We already have fixes for log flooding -
>>>>>>> https://review.gluster.org/22128, and are the
process of
>>>>>>> identifying and fixing the issue seen with crash.
>>>>>>>
>>>>>>> Can you please tell if the crashes happened as soon
as upgrade ? or
>>>>>>> was there any particular pattern you observed
before the crash.
>>>>>>>
>>>>>>> -Amar
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii
<
>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Within 24 hours after updating from rock solid
4.1 to 5.3, I
>>>>>>>> already got a crash which others have mentioned
in
>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to
>>>>>>>> unmount, kill gluster, and remount:
>>>>>>>>
>>>>>>>>
>>>>>>>> [2019-01-31 09:38:04.317604] W
[dict.c:761:dict_ref]
>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>> [0x7fcccafcd329]
>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL
[Invalid argument]
>>>>>>>> [2019-01-31 09:38:04.319308] W
[dict.c:761:dict_ref]
>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>> [0x7fcccafcd329]
>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL
[Invalid argument]
>>>>>>>> [2019-01-31 09:38:04.320047] W
[dict.c:761:dict_ref]
>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>> [0x7fcccafcd329]
>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL
[Invalid argument]
>>>>>>>> [2019-01-31 09:38:04.320677] W
[dict.c:761:dict_ref]
>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>> [0x7fcccafcd329]
>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL
[Invalid argument]
>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0:
>>>>>>>> selecting local read_child
SITE_data1-client-3" repeated 5 times between
>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31
09:38:03.958061]
>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>>> handler" repeated 72 times between
[2019-01-31 09:37:53.746741] and
>>>>>>>> [2019-01-31 09:38:04.696993]
>>>>>>>> pending frames:
>>>>>>>> frame : type(1) op(READ)
>>>>>>>> frame : type(1) op(OPEN)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>> signal received: 6
>>>>>>>> time of crash:
>>>>>>>> 2019-01-31 09:38:04
>>>>>>>> configuration details:
>>>>>>>> argp 1
>>>>>>>> backtrace 1
>>>>>>>> dlfcn 1
>>>>>>>> libpthread 1
>>>>>>>> llistxattr 1
>>>>>>>> setfsid 1
>>>>>>>> spinlock 1
>>>>>>>> epoll.h 1
>>>>>>>> xattr.h 1
>>>>>>>> st_atim.tv_nsec 1
>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>
>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>
>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>
>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>> ---------
>>>>>>>>
>>>>>>>> Do the pending patches fix the crash or only
the repeated warnings?
>>>>>>>> I'm running glusterfs on OpenSUSE 15.0
installed via
>>>>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>> not too sure how to make it core dump.
>>>>>>>>
>>>>>>>> If it's not fixed by the patches above, has
anyone already opened a
>>>>>>>> ticket for the crashes that I can join and
monitor? This is going to create
>>>>>>>> a massive problem for us since production
systems are crashing.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Artem
>>>>>>>>
>>>>>>>> --
>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra
Gowdappa <
>>>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem
Russakovskii <
>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Also, not sure if related or not, but I
got a ton of these
>>>>>>>>>> "Failed to dispatch handler"
in my logs as well. Many people have been
>>>>>>>>>> commenting about this issue here
>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>>>>> [2019-01-30 20:38:20.783713] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>> ==> mnt-SITE_data3.log
<=>>>>>>>>>>> The message "E [MSGID:
101191]
>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>> handler" repeated 413 times
between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>> [2019-01-30 20:38:20.015593]
>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>> selecting local read_child
SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>> [2019-01-30 20:36:23.290287] and
[2019-01-30 20:38:20.280306]
>>>>>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>>>>> The message "I [MSGID:
108031]
>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>> selecting local read_child
SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>> [2019-01-30 20:36:22.247367] and
[2019-01-30 20:38:19.459789]
>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>> handler" repeated 2654 times
between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>> [2019-01-30 20:38:20.546355]
>>>>>>>>>>> [2019-01-30 20:38:21.492319] I
[MSGID: 108031]
>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>> selecting local read_child
SITE_data1-client-0
>>>>>>>>>>> ==> mnt-SITE_data3.log
<=>>>>>>>>>>> [2019-01-30 20:38:22.349689] I
[MSGID: 108031]
>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>> selecting local read_child
SITE_data3-client-0
>>>>>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>>>>> [2019-01-30 20:38:22.762941] E
[MSGID: 101191]
>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>> handler
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'm hoping raising the issue here
on the mailing list may bring
>>>>>>>>>> some additional eyeballs and get them
both fixed.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Sincerely,
>>>>>>>>>> Artem
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem
Russakovskii <
>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I found a similar issue here:
>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a
>>>>>>>>>>> comment from 3 days ago from
someone else with 5.3 who started seeing the
>>>>>>>>>>> spam.
>>>>>>>>>>>
>>>>>>>>>>> Here's the command that repeats
over and over:
>>>>>>>>>>> [2019-01-30 20:23:24.481581] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> +Milind Changire <mchangir at
redhat.com> Can you check why this
>>>>>>>>> message is logged and send a fix?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Is there any fix for this issue?
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> Sincerely,
>>>>>>>>>>> Artem
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Amar Tumballi (amarts)
>>>>>>>
>>>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190207/e3a2b047/attachment.html>

Gluster users - Feb 2019 - Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)