thr3ads.net - Gluster users - [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] [Feb 2019]

If this information is useful, please help other people find it:
Share via:

João Baúto

2019-Feb-11 10:18 UTC

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Although I don't have these error messages, I'm having fuse crashes as
frequent as you. I have disabled write-behind and the mount has been
running over the weekend with heavy usage and no issues.

I can provide coredumps before disabling write-behind if needed. I opened a BZ
report <https://bugzilla.redhat.com/show_bug.cgi?id=1671014> with the
crashes that I was having.

*Jo?o Ba?to*
---------------

*Scientific Computing and Software Platform*
Champalimaud Research
Champalimaud Center for the Unknown
Av. Bras?lia, Doca de Pedrou?os
1400-038 Lisbon, Portugal
fchampalimaud.org <https://www.fchampalimaud.org/>


Artem Russakovskii <archon810 at gmail.com> escreveu no dia s?bado,
9/02/2019
?(s) 22:18:
> Alright. I've enabled core-dumping (hopefully), so now I'm waiting
for the
> next crash to see if it dumps a core for you guys to remotely debug.
>
> Then I can consider setting performance.write-behind to off and monitoring
> for further crashes.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
>
> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
> wrote:
>
>>
>>
>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at
gmail.com>
>> wrote:
>>
>>> Hi Nithya,
>>>
>>> I can try to disable write-behind as long as it doesn't heavily
impact
>>> performance for us. Which option is it exactly? I don't see it
set in my
>>> list of changed volume variables that I sent you guys earlier.
>>>
>>
>> The option is performance.write-behind
>>
>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> <http://twitter.com/ArtemR>
>>>
>>>
>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha at
redhat.com>
>>> wrote:
>>>
>>>> Hi Artem,
>>>>
>>>> We have found the cause of one crash. Unfortunately we have not
managed
>>>> to reproduce the one you reported so we don't know if it is
the same cause.
>>>>
>>>> Can you disable write-behind on the volume and let us know if
it solves
>>>> the problem? If yes, it is likely to be the same issue.
>>>>
>>>>
>>>> regards,
>>>> Nithya
>>>>
>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810
at gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry to disappoint, but the crash just happened again, so
lru-limit=0
>>>>> didn't help.
>>>>>
>>>>> Here's the snippet of the crash and the subsequent
remount by monit.
>>>>>
>>>>>
>>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>> [0x7f4402b99329]
>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>> [0x7f4402daaaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>>>> valid argument]
>>>>> The message "I [MSGID: 108031]
>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-<SNIP>_data1-replicate-0:
>>>>> selecting local read_child
<SNIP>_data1-client-3" repeated 39 times between
>>>>> [2019-02-08 01:11:18.043286] and [2019-02-08
01:13:07.915604]
>>>>> The message "E [MSGID: 101191]
>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll:
Failed to dispatch
>>>>> handler" repeated 515 times between [2019-02-08
01:11:17.932515] and
>>>>> [2019-02-08 01:13:09.311554]
>>>>> pending frames:
>>>>> frame : type(1) op(LOOKUP)
>>>>> frame : type(0) op(0)
>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>> signal received: 6
>>>>> time of crash:
>>>>> 2019-02-08 01:13:09
>>>>> configuration details:
>>>>> argp 1
>>>>> backtrace 1
>>>>> dlfcn 1
>>>>> libpthread 1
>>>>> llistxattr 1
>>>>> setfsid 1
>>>>> spinlock 1
>>>>> epoll.h 1
>>>>> xattr.h 1
>>>>> st_atim.tv_nsec 1
>>>>> package-string: glusterfs 5.3
>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>>>>
>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>>>> ---------
>>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030]
>>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started
running
>>>>> /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs
--lru-limit=0
>>>>> --process-name fuse --volfile-server=localhost
--volfile-id=/<SNIP>_data1
>>>>> /mnt/<SNIP>_data1)
>>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>>> with index 1
>>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>>> with index 2
>>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>>> with index 3
>>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>>> with index 4
>>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020]
[client.c:2354:notify]
>>>>> 0-<SNIP>_data1-client-0: parent translators are
ready, attempting connect
>>>>> on transport
>>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020]
[client.c:2354:notify]
>>>>> 0-<SNIP>_data1-client-1: parent translators are
ready, attempting connect
>>>>> on transport
>>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020]
[client.c:2354:notify]
>>>>> 0-<SNIP>_data1-client-2: parent translators are
ready, attempting connect
>>>>> on transport
>>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020]
[client.c:2354:notify]
>>>>> 0-<SNIP>_data1-client-3: parent translators are
ready, attempting connect
>>>>> on transport
>>>>> [2019-02-08 01:13:35.655527] I
[rpc-clnt.c:2042:rpc_clnt_reconfig]
>>>>> 0-<SNIP>_data1-client-0: changing port to 49153 (from
0)
>>>>> Final graph:
>>>>>
>>>>>
>>>>> Sincerely,
>>>>> Artem
>>>>>
>>>>> --
>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> beerpla.net | +ArtemRussakovskii
>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>> <http://twitter.com/ArtemR>
>>>>>
>>>>>
>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii
<archon810 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I've added the lru-limit=0 parameter to the mounts,
and I see it's
>>>>>> taken effect correctly:
>>>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name
fuse
>>>>>> --volfile-server=localhost --volfile-id=/<SNIP> 
/mnt/<SNIP>"
>>>>>>
>>>>>> Let's see if it stops crashing or not.
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Nithya,
>>>>>>>
>>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point
I started seeing
>>>>>>> crashes, and no further releases have been made
yet.
>>>>>>>
>>>>>>> volume info:
>>>>>>> Type: Replicate
>>>>>>> Volume ID: ****SNIP****
>>>>>>> Status: Started
>>>>>>> Snapshot Count: 0
>>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: ****SNIP****
>>>>>>> Brick2: ****SNIP****
>>>>>>> Brick3: ****SNIP****
>>>>>>> Brick4: ****SNIP****
>>>>>>> Options Reconfigured:
>>>>>>> cluster.quorum-count: 1
>>>>>>> cluster.quorum-type: fixed
>>>>>>> network.ping-timeout: 5
>>>>>>> network.remote-dio: enable
>>>>>>> performance.rda-cache-limit: 256MB
>>>>>>> performance.readdir-ahead: on
>>>>>>> performance.parallel-readdir: on
>>>>>>> network.inode-lru-limit: 500000
>>>>>>> performance.md-cache-timeout: 600
>>>>>>> performance.cache-invalidation: on
>>>>>>> performance.stat-prefetch: on
>>>>>>> features.cache-invalidation-timeout: 600
>>>>>>> features.cache-invalidation: on
>>>>>>> cluster.readdir-optimize: on
>>>>>>> performance.io-thread-count: 32
>>>>>>> server.event-threads: 4
>>>>>>> client.event-threads: 4
>>>>>>> performance.read-ahead: off
>>>>>>> cluster.lookup-optimize: on
>>>>>>> performance.cache-size: 1GB
>>>>>>> cluster.self-heal-daemon: enable
>>>>>>> transport.address-family: inet
>>>>>>> nfs.disable: on
>>>>>>> performance.client-io-threads: on
>>>>>>> cluster.granular-entry-heal: enable
>>>>>>> cluster.data-self-heal-algorithm: full
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran
<
>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>
>>>>>>>> Hi Artem,
>>>>>>>>
>>>>>>>> Do you still see the crashes with 5.3? If yes,
please try mount the
>>>>>>>> volume using the mount option lru-limit=0 and
see if that helps. We are
>>>>>>>> looking into the crashes and will update when
have a fix.
>>>>>>>>
>>>>>>>> Also, please provide the gluster volume info
for the volume in
>>>>>>>> question.
>>>>>>>>
>>>>>>>>
>>>>>>>> regards,
>>>>>>>> Nithya
>>>>>>>>
>>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii
<
>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The fuse crash happened two more times, but
this time monit helped
>>>>>>>>> recover within 1 minute, so it's a
great workaround for now.
>>>>>>>>>
>>>>>>>>> What's odd is that the crashes are only
happening on one of 4
>>>>>>>>> servers, and I don't know why.
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Artem
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>>> <http://www.apkmirror.com/>,
Illogical Robot LLC
>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem
Russakovskii <
>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The fuse crash happened again
yesterday, to another volume. Are
>>>>>>>>>> there any mount options that could help
mitigate this?
>>>>>>>>>>
>>>>>>>>>> In the meantime, I set up a monit
(https://mmonit.com/monit/)
>>>>>>>>>> task to watch and restart the mount,
which works and recovers the mount
>>>>>>>>>> point within a minute. Not ideal, but a
temporary workaround.
>>>>>>>>>>
>>>>>>>>>> By the way, the way to reproduce this
"Transport endpoint is not
>>>>>>>>>> connected" condition for testing
purposes is to kill -9 the right
>>>>>>>>>> "glusterfs --process-name
fuse" process.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> monit check:
>>>>>>>>>> check filesystem glusterfs_data1 with
path /mnt/glusterfs_data1
>>>>>>>>>>   start program  = "/bin/mount 
/mnt/glusterfs_data1"
>>>>>>>>>>   stop program  = "/bin/umount
/mnt/glusterfs_data1"
>>>>>>>>>>   if space usage > 90% for 5 times
within 15 cycles
>>>>>>>>>>     then alert else if succeeded for 10
cycles then alert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> stack trace:
>>>>>>>>>> [2019-02-01 23:22:00.312894] W
[dict.c:761:dict_ref]
>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL
[Invalid argument]
>>>>>>>>>> [2019-02-01 23:22:00.314051] W
[dict.c:761:dict_ref]
>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL
[Invalid argument]
>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>>> handler" repeated 26 times between
[2019-02-01 23:21:20.857333] and
>>>>>>>>>> [2019-02-01 23:21:56.164427]
>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>>>>>>>>>> selecting local read_child
SITE_data3-client-3" repeated 27 times between
>>>>>>>>>> [2019-02-01 23:21:11.142467] and
[2019-02-01 23:22:03.474036]
>>>>>>>>>> pending frames:
>>>>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>> signal received: 6
>>>>>>>>>> time of crash:
>>>>>>>>>> 2019-02-01 23:22:03
>>>>>>>>>> configuration details:
>>>>>>>>>> argp 1
>>>>>>>>>> backtrace 1
>>>>>>>>>> dlfcn 1
>>>>>>>>>> libpthread 1
>>>>>>>>>> llistxattr 1
>>>>>>>>>> setfsid 1
>>>>>>>>>> spinlock 1
>>>>>>>>>> epoll.h 1
>>>>>>>>>> xattr.h 1
>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>>>>>
>>>>>>>>>> Sincerely,
>>>>>>>>>> Artem
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem
Russakovskii <
>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> The first (and so far only) crash
happened at 2am the next day
>>>>>>>>>>> after we upgraded, on only one of
four servers and only to one of two
>>>>>>>>>>> mounts.
>>>>>>>>>>>
>>>>>>>>>>> I have no idea what caused it, but
yeah, we do have a pretty
>>>>>>>>>>> busy site (apkmirror.com), and it
caused a disruption for any
>>>>>>>>>>> uploads or downloads from that
server until I woke up and fixed the mount.
>>>>>>>>>>>
>>>>>>>>>>> I wish I could be more helpful but
all I have is that stack
>>>>>>>>>>> trace.
>>>>>>>>>>>
>>>>>>>>>>> I'm glad it's a blocker and
will hopefully be resolved soon.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar
Tumballi Suryanarayan <
>>>>>>>>>>> atumball at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>>
>>>>>>>>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1671603
>>>>>>>>>>>> (ie, as a clone of other bugs
where recent discussions happened), and
>>>>>>>>>>>> marked it as a blocker for
glusterfs-5.4 release.
>>>>>>>>>>>>
>>>>>>>>>>>> We already have fixes for log
flooding -
>>>>>>>>>>>>
https://review.gluster.org/22128, and are the process of
>>>>>>>>>>>> identifying and fixing the
issue seen with crash.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please tell if the
crashes happened as soon as upgrade
>>>>>>>>>>>> ? or was there any particular
pattern you observed before the crash.
>>>>>>>>>>>>
>>>>>>>>>>>> -Amar
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40
PM Artem Russakovskii <
>>>>>>>>>>>> archon810 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Within 24 hours after
updating from rock solid 4.1 to 5.3, I
>>>>>>>>>>>>> already got a crash which
others have mentioned in
>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had
>>>>>>>>>>>>> to unmount, kill gluster,
and remount:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2019-01-31
09:38:04.317604] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict:
dict is NULL [Invalid argument]
>>>>>>>>>>>>> [2019-01-31
09:38:04.319308] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict:
dict is NULL [Invalid argument]
>>>>>>>>>>>>> [2019-01-31
09:38:04.320047] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict:
dict is NULL [Invalid argument]
>>>>>>>>>>>>> [2019-01-31
09:38:04.320677] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict:
dict is NULL [Invalid argument]
>>>>>>>>>>>>> The message "I [MSGID:
108031]
>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>> selecting local read_child
SITE_data1-client-3" repeated 5 times between
>>>>>>>>>>>>> [2019-01-31
09:37:54.751905] and [2019-01-31 09:38:03.958061]
>>>>>>>>>>>>> The message "E [MSGID:
101191]
>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>> handler" repeated 72
times between [2019-01-31 09:37:53.746741] and
>>>>>>>>>>>>> [2019-01-31
09:38:04.696993]
>>>>>>>>>>>>> pending frames:
>>>>>>>>>>>>> frame : type(1) op(READ)
>>>>>>>>>>>>> frame : type(1) op(OPEN)
>>>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>>>> signal received: 6
>>>>>>>>>>>>> time of crash:
>>>>>>>>>>>>> 2019-01-31 09:38:04
>>>>>>>>>>>>> configuration details:
>>>>>>>>>>>>> argp 1
>>>>>>>>>>>>> backtrace 1
>>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>>> libpthread 1
>>>>>>>>>>>>> llistxattr 1
>>>>>>>>>>>>> setfsid 1
>>>>>>>>>>>>> spinlock 1
>>>>>>>>>>>>> epoll.h 1
>>>>>>>>>>>>> xattr.h 1
>>>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>>>> package-string: glusterfs
5.3
>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do the pending patches fix
the crash or only the repeated
>>>>>>>>>>>>> warnings? I'm running
glusterfs on OpenSUSE 15.0 installed via
>>>>>>>>>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>>>>>> not too sure how to make it
core dump.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If it's not fixed by
the patches above, has anyone already
>>>>>>>>>>>>> opened a ticket for the
crashes that I can join and monitor? This is going
>>>>>>>>>>>>> to create a massive problem
for us since production systems are crashing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 30, 2019 at
6:37 PM Raghavendra Gowdappa <
>>>>>>>>>>>>> rgowdapp at redhat.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at
2:14 AM Artem Russakovskii <
>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, not sure if
related or not, but I got a ton of these
>>>>>>>>>>>>>>> "Failed to
dispatch handler" in my logs as well. Many people have been
>>>>>>>>>>>>>>> commenting about
this issue here
>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
https://review.gluster.org/#/c/glusterfs/+/22046/ addresses
>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>
[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>> The
message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>> handler"
repeated 413 times between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>>>>>>> [2019-01-30
20:38:20.015593]
>>>>>>>>>>>>>>>> The message
"I [MSGID: 108031]
>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>> selecting local
read_child SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>>>>>>> [2019-01-30
20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>> The
message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>> selecting local
read_child SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>>>>>>> [2019-01-30
20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>>>>>> The message
"E [MSGID: 101191]
>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>> handler"
repeated 2654 times between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>>>>>>> [2019-01-30
20:38:20.546355]
>>>>>>>>>>>>>>>> [2019-01-30
20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>> selecting local
read_child SITE_data1-client-0
>>>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>> selecting local
read_child SITE_data3-client-0
>>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm hoping
raising the issue here on the mailing list may
>>>>>>>>>>>>>>> bring some
additional eyeballs and get them both fixed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Founder, Android
Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jan 30,
2019 at 12:26 PM Artem Russakovskii <
>>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I found a
similar issue here:
>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567.
>>>>>>>>>>>>>>>> There's a
comment from 3 days ago from someone else with 5.3 who started
>>>>>>>>>>>>>>>> seeing the
spam.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here's the
command that repeats over and over:
>>>>>>>>>>>>>>>> [2019-01-30
20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +Milind Changire
<mchangir at redhat.com> Can you check why
>>>>>>>>>>>>>> this message is logged
and send a fix?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there any
fix for this issue?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Founder,
Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>> Gluster-users
mailing list
>>>>>>>>>>>>>>> Gluster-users at
gluster.org
>>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at
gluster.org
>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Amar Tumballi (amarts)
>>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190211/aa109cf5/attachment.html>

Raghavendra Gowdappa

2019-Feb-12 03:19 UTC

head link

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

On Mon, Feb 11, 2019 at 3:49 PM Jo?o Ba?to <
joao.bauto at neuro.fchampalimaud.org> wrote:
> Although I don't have these error messages, I'm having fuse crashes
as
> frequent as you. I have disabled write-behind and the mount has been
> running over the weekend with heavy usage and no issues.
>
The issue you are facing will likely be fixed by patch [1]. Me, Xavi and
Nithya were able to identify the corruption in write-behind.

[1] https://review.gluster.org/22189

> I can provide coredumps before disabling write-behind if needed. I opened
> a BZ report <https://bugzilla.redhat.com/show_bug.cgi?id=1671014>
with
> the crashes that I was having.
>
> *Jo?o Ba?to*
> ---------------
>
> *Scientific Computing and Software Platform*
> Champalimaud Research
> Champalimaud Center for the Unknown
> Av. Bras?lia, Doca de Pedrou?os
> 1400-038 Lisbon, Portugal
> fchampalimaud.org <https://www.fchampalimaud.org/>
>
>
> Artem Russakovskii <archon810 at gmail.com> escreveu no dia s?bado,
> 9/02/2019 ?(s) 22:18:
>
>> Alright. I've enabled core-dumping (hopefully), so now I'm
waiting for
>> the next crash to see if it dumps a core for you guys to remotely
debug.
>>
>> Then I can consider setting performance.write-behind to off and
>> monitoring for further crashes.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>>
>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at
gmail.com>
>>> wrote:
>>>
>>>> Hi Nithya,
>>>>
>>>> I can try to disable write-behind as long as it doesn't
heavily impact
>>>> performance for us. Which option is it exactly? I don't see
it set in my
>>>> list of changed volume variables that I sent you guys earlier.
>>>>
>>>
>>> The option is performance.write-behind
>>>
>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>,
APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha
at redhat.com>
>>>> wrote:
>>>>
>>>>> Hi Artem,
>>>>>
>>>>> We have found the cause of one crash. Unfortunately we have
not
>>>>> managed to reproduce the one you reported so we don't
know if it is the
>>>>> same cause.
>>>>>
>>>>> Can you disable write-behind on the volume and let us know
if it
>>>>> solves the problem? If yes, it is likely to be the same
issue.
>>>>>
>>>>>
>>>>> regards,
>>>>> Nithya
>>>>>
>>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii
<archon810 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Sorry to disappoint, but the crash just happened again,
so
>>>>>> lru-limit=0 didn't help.
>>>>>>
>>>>>> Here's the snippet of the crash and the subsequent
remount by monit.
>>>>>>
>>>>>>
>>>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>> [0x7f4402b99329]
>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>> [0x7f4402daaaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>>>>> valid argument]
>>>>>> The message "I [MSGID: 108031]
>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-<SNIP>_data1-replicate-0:
>>>>>> selecting local read_child
<SNIP>_data1-client-3" repeated 39 times between
>>>>>> [2019-02-08 01:11:18.043286] and [2019-02-08
01:13:07.915604]
>>>>>> The message "E [MSGID: 101191]
>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
0-epoll: Failed to dispatch
>>>>>> handler" repeated 515 times between [2019-02-08
01:11:17.932515] and
>>>>>> [2019-02-08 01:13:09.311554]
>>>>>> pending frames:
>>>>>> frame : type(1) op(LOOKUP)
>>>>>> frame : type(0) op(0)
>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>> signal received: 6
>>>>>> time of crash:
>>>>>> 2019-02-08 01:13:09
>>>>>> configuration details:
>>>>>> argp 1
>>>>>> backtrace 1
>>>>>> dlfcn 1
>>>>>> libpthread 1
>>>>>> llistxattr 1
>>>>>> setfsid 1
>>>>>> spinlock 1
>>>>>> epoll.h 1
>>>>>> xattr.h 1
>>>>>> st_atim.tv_nsec 1
>>>>>> package-string: glusterfs 5.3
>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>>>>
>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>>>>
>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>>>>
>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>>>>>
>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>>>>> ---------
>>>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030]
>>>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started
running
>>>>>> /usr/sbin/glusterfs version 5.3 (args:
/usr/sbin/glusterfs --lru-limit=0
>>>>>> --process-name fuse --volfile-server=localhost
--volfile-id=/<SNIP>_data1
>>>>>> /mnt/<SNIP>_data1)
>>>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker]
0-epoll: Started thread
>>>>>> with index 1
>>>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker]
0-epoll: Started thread
>>>>>> with index 2
>>>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker]
0-epoll: Started thread
>>>>>> with index 3
>>>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker]
0-epoll: Started thread
>>>>>> with index 4
>>>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020]
[client.c:2354:notify]
>>>>>> 0-<SNIP>_data1-client-0: parent translators are
ready, attempting connect
>>>>>> on transport
>>>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020]
[client.c:2354:notify]
>>>>>> 0-<SNIP>_data1-client-1: parent translators are
ready, attempting connect
>>>>>> on transport
>>>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020]
[client.c:2354:notify]
>>>>>> 0-<SNIP>_data1-client-2: parent translators are
ready, attempting connect
>>>>>> on transport
>>>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020]
[client.c:2354:notify]
>>>>>> 0-<SNIP>_data1-client-3: parent translators are
ready, attempting connect
>>>>>> on transport
>>>>>> [2019-02-08 01:13:35.655527] I
[rpc-clnt.c:2042:rpc_clnt_reconfig]
>>>>>> 0-<SNIP>_data1-client-0: changing port to 49153
(from 0)
>>>>>> Final graph:
>>>>>>
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> I've added the lru-limit=0 parameter to the
mounts, and I see it's
>>>>>>> taken effect correctly:
>>>>>>> "/usr/sbin/glusterfs --lru-limit=0
--process-name fuse
>>>>>>> --volfile-server=localhost
--volfile-id=/<SNIP>  /mnt/<SNIP>"
>>>>>>>
>>>>>>> Let's see if it stops crashing or not.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii
<
>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Nithya,
>>>>>>>>
>>>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which
point I started seeing
>>>>>>>> crashes, and no further releases have been made
yet.
>>>>>>>>
>>>>>>>> volume info:
>>>>>>>> Type: Replicate
>>>>>>>> Volume ID: ****SNIP****
>>>>>>>> Status: Started
>>>>>>>> Snapshot Count: 0
>>>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: ****SNIP****
>>>>>>>> Brick2: ****SNIP****
>>>>>>>> Brick3: ****SNIP****
>>>>>>>> Brick4: ****SNIP****
>>>>>>>> Options Reconfigured:
>>>>>>>> cluster.quorum-count: 1
>>>>>>>> cluster.quorum-type: fixed
>>>>>>>> network.ping-timeout: 5
>>>>>>>> network.remote-dio: enable
>>>>>>>> performance.rda-cache-limit: 256MB
>>>>>>>> performance.readdir-ahead: on
>>>>>>>> performance.parallel-readdir: on
>>>>>>>> network.inode-lru-limit: 500000
>>>>>>>> performance.md-cache-timeout: 600
>>>>>>>> performance.cache-invalidation: on
>>>>>>>> performance.stat-prefetch: on
>>>>>>>> features.cache-invalidation-timeout: 600
>>>>>>>> features.cache-invalidation: on
>>>>>>>> cluster.readdir-optimize: on
>>>>>>>> performance.io-thread-count: 32
>>>>>>>> server.event-threads: 4
>>>>>>>> client.event-threads: 4
>>>>>>>> performance.read-ahead: off
>>>>>>>> cluster.lookup-optimize: on
>>>>>>>> performance.cache-size: 1GB
>>>>>>>> cluster.self-heal-daemon: enable
>>>>>>>> transport.address-family: inet
>>>>>>>> nfs.disable: on
>>>>>>>> performance.client-io-threads: on
>>>>>>>> cluster.granular-entry-heal: enable
>>>>>>>> cluster.data-self-heal-algorithm: full
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Artem
>>>>>>>>
>>>>>>>> --
>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya
Balachandran <
>>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Artem,
>>>>>>>>>
>>>>>>>>> Do you still see the crashes with 5.3? If
yes, please try mount
>>>>>>>>> the volume using the mount option
lru-limit=0 and see if that helps. We are
>>>>>>>>> looking into the crashes and will update
when have a fix.
>>>>>>>>>
>>>>>>>>> Also, please provide the gluster volume
info for the volume in
>>>>>>>>> question.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>> Nithya
>>>>>>>>>
>>>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem
Russakovskii <
>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The fuse crash happened two more times,
but this time monit
>>>>>>>>>> helped recover within 1 minute, so
it's a great workaround for now.
>>>>>>>>>>
>>>>>>>>>> What's odd is that the crashes are
only happening on one of 4
>>>>>>>>>> servers, and I don't know why.
>>>>>>>>>>
>>>>>>>>>> Sincerely,
>>>>>>>>>> Artem
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem
Russakovskii <
>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The fuse crash happened again
yesterday, to another volume. Are
>>>>>>>>>>> there any mount options that could
help mitigate this?
>>>>>>>>>>>
>>>>>>>>>>> In the meantime, I set up a monit
(https://mmonit.com/monit/)
>>>>>>>>>>> task to watch and restart the
mount, which works and recovers the mount
>>>>>>>>>>> point within a minute. Not ideal,
but a temporary workaround.
>>>>>>>>>>>
>>>>>>>>>>> By the way, the way to reproduce
this "Transport endpoint is not
>>>>>>>>>>> connected" condition for
testing purposes is to kill -9 the right
>>>>>>>>>>> "glusterfs --process-name
fuse" process.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> monit check:
>>>>>>>>>>> check filesystem glusterfs_data1
with path /mnt/glusterfs_data1
>>>>>>>>>>>   start program  = "/bin/mount
/mnt/glusterfs_data1"
>>>>>>>>>>>   stop program  = "/bin/umount
/mnt/glusterfs_data1"
>>>>>>>>>>>   if space usage > 90% for 5
times within 15 cycles
>>>>>>>>>>>     then alert else if succeeded
for 10 cycles then alert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> stack trace:
>>>>>>>>>>> [2019-02-01 23:22:00.312894] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>> [2019-02-01 23:22:00.314051] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>>>> handler" repeated 26 times
between [2019-02-01 23:21:20.857333] and
>>>>>>>>>>> [2019-02-01 23:21:56.164427]
>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>>>>>>>>>>> selecting local read_child
SITE_data3-client-3" repeated 27 times between
>>>>>>>>>>> [2019-02-01 23:21:11.142467] and
[2019-02-01 23:22:03.474036]
>>>>>>>>>>> pending frames:
>>>>>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>> signal received: 6
>>>>>>>>>>> time of crash:
>>>>>>>>>>> 2019-02-01 23:22:03
>>>>>>>>>>> configuration details:
>>>>>>>>>>> argp 1
>>>>>>>>>>> backtrace 1
>>>>>>>>>>> dlfcn 1
>>>>>>>>>>> libpthread 1
>>>>>>>>>>> llistxattr 1
>>>>>>>>>>> setfsid 1
>>>>>>>>>>> spinlock 1
>>>>>>>>>>> epoll.h 1
>>>>>>>>>>> xattr.h 1
>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>>>>>>
>>>>>>>>>>> Sincerely,
>>>>>>>>>>> Artem
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM
Artem Russakovskii <
>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> The first (and so far only)
crash happened at 2am the next day
>>>>>>>>>>>> after we upgraded, on only one
of four servers and only to one of two
>>>>>>>>>>>> mounts.
>>>>>>>>>>>>
>>>>>>>>>>>> I have no idea what caused it,
but yeah, we do have a pretty
>>>>>>>>>>>> busy site (apkmirror.com), and
it caused a disruption for any
>>>>>>>>>>>> uploads or downloads from that
server until I woke up and fixed the mount.
>>>>>>>>>>>>
>>>>>>>>>>>> I wish I could be more helpful
but all I have is that stack
>>>>>>>>>>>> trace.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm glad it's a blocker
and will hopefully be resolved soon.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM
Amar Tumballi Suryanarayan <
>>>>>>>>>>>> atumball at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1671603
>>>>>>>>>>>>> (ie, as a clone of other
bugs where recent discussions happened), and
>>>>>>>>>>>>> marked it as a blocker for
glusterfs-5.4 release.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We already have fixes for
log flooding -
>>>>>>>>>>>>>
https://review.gluster.org/22128, and are the process of
>>>>>>>>>>>>> identifying and fixing the
issue seen with crash.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please tell if the
crashes happened as soon as upgrade
>>>>>>>>>>>>> ? or was there any
particular pattern you observed before the crash.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Amar
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 31, 2019 at
11:40 PM Artem Russakovskii <
>>>>>>>>>>>>> archon810 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Within 24 hours after
updating from rock solid 4.1 to 5.3, I
>>>>>>>>>>>>>> already got a crash
which others have mentioned in
>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had
>>>>>>>>>>>>>> to unmount, kill
gluster, and remount:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [2019-01-31
09:38:04.317604] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fccd705b218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> [2019-01-31
09:38:04.319308] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fccd705b218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> [2019-01-31
09:38:04.320047] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fccd705b218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> [2019-01-31
09:38:04.320677] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fccd705b218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> The message "I
[MSGID: 108031]
>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>> selecting local
read_child SITE_data1-client-3" repeated 5 times between
>>>>>>>>>>>>>> [2019-01-31
09:37:54.751905] and [2019-01-31 09:38:03.958061]
>>>>>>>>>>>>>> The message "E
[MSGID: 101191]
>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>> handler" repeated
72 times between [2019-01-31 09:37:53.746741] and
>>>>>>>>>>>>>> [2019-01-31
09:38:04.696993]
>>>>>>>>>>>>>> pending frames:
>>>>>>>>>>>>>> frame : type(1)
op(READ)
>>>>>>>>>>>>>> frame : type(1)
op(OPEN)
>>>>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>>>>> signal received: 6
>>>>>>>>>>>>>> time of crash:
>>>>>>>>>>>>>> 2019-01-31 09:38:04
>>>>>>>>>>>>>> configuration details:
>>>>>>>>>>>>>> argp 1
>>>>>>>>>>>>>> backtrace 1
>>>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>>>> libpthread 1
>>>>>>>>>>>>>> llistxattr 1
>>>>>>>>>>>>>> setfsid 1
>>>>>>>>>>>>>> spinlock 1
>>>>>>>>>>>>>> epoll.h 1
>>>>>>>>>>>>>> xattr.h 1
>>>>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>>>>> package-string:
glusterfs 5.3
>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do the pending patches
fix the crash or only the repeated
>>>>>>>>>>>>>> warnings? I'm
running glusterfs on OpenSUSE 15.0 installed via
>>>>>>>>>>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>>>>>>> not too sure how to
make it core dump.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If it's not fixed
by the patches above, has anyone already
>>>>>>>>>>>>>> opened a ticket for the
crashes that I can join and monitor? This is going
>>>>>>>>>>>>>> to create a massive
problem for us since production systems are crashing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at
6:37 PM Raghavendra Gowdappa <
>>>>>>>>>>>>>> rgowdapp at
redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jan 31,
2019 at 2:14 AM Artem Russakovskii <
>>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, not sure
if related or not, but I got a ton of these
>>>>>>>>>>>>>>>> "Failed to
dispatch handler" in my logs as well. Many people have been
>>>>>>>>>>>>>>>> commenting
about this issue here
>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
https://review.gluster.org/#/c/glusterfs/+/22046/ addresses
>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>>> The
message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>
handler" repeated 413 times between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>>>>>>>> [2019-01-30
20:38:20.015593]
>>>>>>>>>>>>>>>>> The message
"I [MSGID: 108031]
>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>> selecting
local read_child SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>>>>>>>> [2019-01-30
20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>> The
message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>> selecting
local read_child SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>>>>>>>> [2019-01-30
20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>>>>>>> The message
"E [MSGID: 101191]
>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>
handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>>>>>>>> [2019-01-30
20:38:20.546355]
>>>>>>>>>>>>>>>>> [2019-01-30
20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>> selecting
local read_child SITE_data1-client-0
>>>>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>> selecting
local read_child SITE_data3-client-0
>>>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm hoping
raising the issue here on the mailing list may
>>>>>>>>>>>>>>>> bring some
additional eyeballs and get them both fixed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Founder,
Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Jan 30,
2019 at 12:26 PM Artem Russakovskii <
>>>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I found a
similar issue here:
>>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567.
>>>>>>>>>>>>>>>>> There's
a comment from 3 days ago from someone else with 5.3 who started
>>>>>>>>>>>>>>>>> seeing the
spam.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here's
the command that repeats over and over:
>>>>>>>>>>>>>>>>> [2019-01-30
20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +Milind Changire
<mchangir at redhat.com> Can you check why
>>>>>>>>>>>>>>> this message is
logged and send a fix?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is there
any fix for this issue?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Founder,
Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>>> beerpla.net
| +ArtemRussakovskii
>>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>> Gluster-users
mailing list
>>>>>>>>>>>>>>>> Gluster-users
at gluster.org
>>>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing
list
>>>>>>>>>>>>>> Gluster-users at
gluster.org
>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Amar Tumballi (amarts)
>>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190212/a2b08596/attachment-0001.html>

Raghavendra Gowdappa

2019-Feb-12 03:32 UTC

head link

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

On Mon, Feb 11, 2019 at 3:49 PM Jo?o Ba?to <
joao.bauto at neuro.fchampalimaud.org> wrote:
> Although I don't have these error messages, I'm having fuse crashes
as
> frequent as you. I have disabled write-behind and the mount has been
> running over the weekend with heavy usage and no issues.
>
> I can provide coredumps before disabling write-behind if needed. I opened
> a BZ report <https://bugzilla.redhat.com/show_bug.cgi?id=1671014>
with
> the crashes that I was having.
>
I've created a bug
<https://bugzilla.redhat.com/show_bug.cgi?id=1676356>
and marked it as a blocker for release-6. I've marked bz 1671014 as a
duplicate of this bug report on master. If you disagree about the bug you
filed being a duplicate, please reopen.

> *Jo?o Ba?to*
> ---------------
>
> *Scientific Computing and Software Platform*
> Champalimaud Research
> Champalimaud Center for the Unknown
> Av. Bras?lia, Doca de Pedrou?os
> 1400-038 Lisbon, Portugal
> fchampalimaud.org <https://www.fchampalimaud.org/>
>
>
> Artem Russakovskii <archon810 at gmail.com> escreveu no dia s?bado,
> 9/02/2019 ?(s) 22:18:
>
>> Alright. I've enabled core-dumping (hopefully), so now I'm
waiting for
>> the next crash to see if it dumps a core for you guys to remotely
debug.
>>
>> Then I can consider setting performance.write-behind to off and
>> monitoring for further crashes.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>>
>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at
gmail.com>
>>> wrote:
>>>
>>>> Hi Nithya,
>>>>
>>>> I can try to disable write-behind as long as it doesn't
heavily impact
>>>> performance for us. Which option is it exactly? I don't see
it set in my
>>>> list of changed volume variables that I sent you guys earlier.
>>>>
>>>
>>> The option is performance.write-behind
>>>
>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>,
APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha
at redhat.com>
>>>> wrote:
>>>>
>>>>> Hi Artem,
>>>>>
>>>>> We have found the cause of one crash. Unfortunately we have
not
>>>>> managed to reproduce the one you reported so we don't
know if it is the
>>>>> same cause.
>>>>>
>>>>> Can you disable write-behind on the volume and let us know
if it
>>>>> solves the problem? If yes, it is likely to be the same
issue.
>>>>>
>>>>>
>>>>> regards,
>>>>> Nithya
>>>>>
>>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii
<archon810 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Sorry to disappoint, but the crash just happened again,
so
>>>>>> lru-limit=0 didn't help.
>>>>>>
>>>>>> Here's the snippet of the crash and the subsequent
remount by monit.
>>>>>>
>>>>>>
>>>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>> [0x7f4402b99329]
>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>> [0x7f4402daaaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>>>>> valid argument]
>>>>>> The message "I [MSGID: 108031]
>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-<SNIP>_data1-replicate-0:
>>>>>> selecting local read_child
<SNIP>_data1-client-3" repeated 39 times between
>>>>>> [2019-02-08 01:11:18.043286] and [2019-02-08
01:13:07.915604]
>>>>>> The message "E [MSGID: 101191]
>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
0-epoll: Failed to dispatch
>>>>>> handler" repeated 515 times between [2019-02-08
01:11:17.932515] and
>>>>>> [2019-02-08 01:13:09.311554]
>>>>>> pending frames:
>>>>>> frame : type(1) op(LOOKUP)
>>>>>> frame : type(0) op(0)
>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>> signal received: 6
>>>>>> time of crash:
>>>>>> 2019-02-08 01:13:09
>>>>>> configuration details:
>>>>>> argp 1
>>>>>> backtrace 1
>>>>>> dlfcn 1
>>>>>> libpthread 1
>>>>>> llistxattr 1
>>>>>> setfsid 1
>>>>>> spinlock 1
>>>>>> epoll.h 1
>>>>>> xattr.h 1
>>>>>> st_atim.tv_nsec 1
>>>>>> package-string: glusterfs 5.3
>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>>>>
>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>>>>
>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>>>>
>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>>>>>
>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>>>>> ---------
>>>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030]
>>>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started
running
>>>>>> /usr/sbin/glusterfs version 5.3 (args:
/usr/sbin/glusterfs --lru-limit=0
>>>>>> --process-name fuse --volfile-server=localhost
--volfile-id=/<SNIP>_data1
>>>>>> /mnt/<SNIP>_data1)
>>>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker]
0-epoll: Started thread
>>>>>> with index 1
>>>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker]
0-epoll: Started thread
>>>>>> with index 2
>>>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker]
0-epoll: Started thread
>>>>>> with index 3
>>>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker]
0-epoll: Started thread
>>>>>> with index 4
>>>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020]
[client.c:2354:notify]
>>>>>> 0-<SNIP>_data1-client-0: parent translators are
ready, attempting connect
>>>>>> on transport
>>>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020]
[client.c:2354:notify]
>>>>>> 0-<SNIP>_data1-client-1: parent translators are
ready, attempting connect
>>>>>> on transport
>>>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020]
[client.c:2354:notify]
>>>>>> 0-<SNIP>_data1-client-2: parent translators are
ready, attempting connect
>>>>>> on transport
>>>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020]
[client.c:2354:notify]
>>>>>> 0-<SNIP>_data1-client-3: parent translators are
ready, attempting connect
>>>>>> on transport
>>>>>> [2019-02-08 01:13:35.655527] I
[rpc-clnt.c:2042:rpc_clnt_reconfig]
>>>>>> 0-<SNIP>_data1-client-0: changing port to 49153
(from 0)
>>>>>> Final graph:
>>>>>>
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> I've added the lru-limit=0 parameter to the
mounts, and I see it's
>>>>>>> taken effect correctly:
>>>>>>> "/usr/sbin/glusterfs --lru-limit=0
--process-name fuse
>>>>>>> --volfile-server=localhost
--volfile-id=/<SNIP>  /mnt/<SNIP>"
>>>>>>>
>>>>>>> Let's see if it stops crashing or not.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii
<
>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Nithya,
>>>>>>>>
>>>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which
point I started seeing
>>>>>>>> crashes, and no further releases have been made
yet.
>>>>>>>>
>>>>>>>> volume info:
>>>>>>>> Type: Replicate
>>>>>>>> Volume ID: ****SNIP****
>>>>>>>> Status: Started
>>>>>>>> Snapshot Count: 0
>>>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: ****SNIP****
>>>>>>>> Brick2: ****SNIP****
>>>>>>>> Brick3: ****SNIP****
>>>>>>>> Brick4: ****SNIP****
>>>>>>>> Options Reconfigured:
>>>>>>>> cluster.quorum-count: 1
>>>>>>>> cluster.quorum-type: fixed
>>>>>>>> network.ping-timeout: 5
>>>>>>>> network.remote-dio: enable
>>>>>>>> performance.rda-cache-limit: 256MB
>>>>>>>> performance.readdir-ahead: on
>>>>>>>> performance.parallel-readdir: on
>>>>>>>> network.inode-lru-limit: 500000
>>>>>>>> performance.md-cache-timeout: 600
>>>>>>>> performance.cache-invalidation: on
>>>>>>>> performance.stat-prefetch: on
>>>>>>>> features.cache-invalidation-timeout: 600
>>>>>>>> features.cache-invalidation: on
>>>>>>>> cluster.readdir-optimize: on
>>>>>>>> performance.io-thread-count: 32
>>>>>>>> server.event-threads: 4
>>>>>>>> client.event-threads: 4
>>>>>>>> performance.read-ahead: off
>>>>>>>> cluster.lookup-optimize: on
>>>>>>>> performance.cache-size: 1GB
>>>>>>>> cluster.self-heal-daemon: enable
>>>>>>>> transport.address-family: inet
>>>>>>>> nfs.disable: on
>>>>>>>> performance.client-io-threads: on
>>>>>>>> cluster.granular-entry-heal: enable
>>>>>>>> cluster.data-self-heal-algorithm: full
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Artem
>>>>>>>>
>>>>>>>> --
>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya
Balachandran <
>>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Artem,
>>>>>>>>>
>>>>>>>>> Do you still see the crashes with 5.3? If
yes, please try mount
>>>>>>>>> the volume using the mount option
lru-limit=0 and see if that helps. We are
>>>>>>>>> looking into the crashes and will update
when have a fix.
>>>>>>>>>
>>>>>>>>> Also, please provide the gluster volume
info for the volume in
>>>>>>>>> question.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>> Nithya
>>>>>>>>>
>>>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem
Russakovskii <
>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The fuse crash happened two more times,
but this time monit
>>>>>>>>>> helped recover within 1 minute, so
it's a great workaround for now.
>>>>>>>>>>
>>>>>>>>>> What's odd is that the crashes are
only happening on one of 4
>>>>>>>>>> servers, and I don't know why.
>>>>>>>>>>
>>>>>>>>>> Sincerely,
>>>>>>>>>> Artem
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem
Russakovskii <
>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The fuse crash happened again
yesterday, to another volume. Are
>>>>>>>>>>> there any mount options that could
help mitigate this?
>>>>>>>>>>>
>>>>>>>>>>> In the meantime, I set up a monit
(https://mmonit.com/monit/)
>>>>>>>>>>> task to watch and restart the
mount, which works and recovers the mount
>>>>>>>>>>> point within a minute. Not ideal,
but a temporary workaround.
>>>>>>>>>>>
>>>>>>>>>>> By the way, the way to reproduce
this "Transport endpoint is not
>>>>>>>>>>> connected" condition for
testing purposes is to kill -9 the right
>>>>>>>>>>> "glusterfs --process-name
fuse" process.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> monit check:
>>>>>>>>>>> check filesystem glusterfs_data1
with path /mnt/glusterfs_data1
>>>>>>>>>>>   start program  = "/bin/mount
/mnt/glusterfs_data1"
>>>>>>>>>>>   stop program  = "/bin/umount
/mnt/glusterfs_data1"
>>>>>>>>>>>   if space usage > 90% for 5
times within 15 cycles
>>>>>>>>>>>     then alert else if succeeded
for 10 cycles then alert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> stack trace:
>>>>>>>>>>> [2019-02-01 23:22:00.312894] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>> [2019-02-01 23:22:00.314051] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>>>> handler" repeated 26 times
between [2019-02-01 23:21:20.857333] and
>>>>>>>>>>> [2019-02-01 23:21:56.164427]
>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>>>>>>>>>>> selecting local read_child
SITE_data3-client-3" repeated 27 times between
>>>>>>>>>>> [2019-02-01 23:21:11.142467] and
[2019-02-01 23:22:03.474036]
>>>>>>>>>>> pending frames:
>>>>>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>> signal received: 6
>>>>>>>>>>> time of crash:
>>>>>>>>>>> 2019-02-01 23:22:03
>>>>>>>>>>> configuration details:
>>>>>>>>>>> argp 1
>>>>>>>>>>> backtrace 1
>>>>>>>>>>> dlfcn 1
>>>>>>>>>>> libpthread 1
>>>>>>>>>>> llistxattr 1
>>>>>>>>>>> setfsid 1
>>>>>>>>>>> spinlock 1
>>>>>>>>>>> epoll.h 1
>>>>>>>>>>> xattr.h 1
>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>>>>>>
>>>>>>>>>>> Sincerely,
>>>>>>>>>>> Artem
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM
Artem Russakovskii <
>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> The first (and so far only)
crash happened at 2am the next day
>>>>>>>>>>>> after we upgraded, on only one
of four servers and only to one of two
>>>>>>>>>>>> mounts.
>>>>>>>>>>>>
>>>>>>>>>>>> I have no idea what caused it,
but yeah, we do have a pretty
>>>>>>>>>>>> busy site (apkmirror.com), and
it caused a disruption for any
>>>>>>>>>>>> uploads or downloads from that
server until I woke up and fixed the mount.
>>>>>>>>>>>>
>>>>>>>>>>>> I wish I could be more helpful
but all I have is that stack
>>>>>>>>>>>> trace.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm glad it's a blocker
and will hopefully be resolved soon.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM
Amar Tumballi Suryanarayan <
>>>>>>>>>>>> atumball at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1671603
>>>>>>>>>>>>> (ie, as a clone of other
bugs where recent discussions happened), and
>>>>>>>>>>>>> marked it as a blocker for
glusterfs-5.4 release.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We already have fixes for
log flooding -
>>>>>>>>>>>>>
https://review.gluster.org/22128, and are the process of
>>>>>>>>>>>>> identifying and fixing the
issue seen with crash.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please tell if the
crashes happened as soon as upgrade
>>>>>>>>>>>>> ? or was there any
particular pattern you observed before the crash.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Amar
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 31, 2019 at
11:40 PM Artem Russakovskii <
>>>>>>>>>>>>> archon810 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Within 24 hours after
updating from rock solid 4.1 to 5.3, I
>>>>>>>>>>>>>> already got a crash
which others have mentioned in
>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had
>>>>>>>>>>>>>> to unmount, kill
gluster, and remount:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [2019-01-31
09:38:04.317604] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fccd705b218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> [2019-01-31
09:38:04.319308] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fccd705b218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> [2019-01-31
09:38:04.320047] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fccd705b218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> [2019-01-31
09:38:04.320677] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fccd705b218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> The message "I
[MSGID: 108031]
>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>> selecting local
read_child SITE_data1-client-3" repeated 5 times between
>>>>>>>>>>>>>> [2019-01-31
09:37:54.751905] and [2019-01-31 09:38:03.958061]
>>>>>>>>>>>>>> The message "E
[MSGID: 101191]
>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>> handler" repeated
72 times between [2019-01-31 09:37:53.746741] and
>>>>>>>>>>>>>> [2019-01-31
09:38:04.696993]
>>>>>>>>>>>>>> pending frames:
>>>>>>>>>>>>>> frame : type(1)
op(READ)
>>>>>>>>>>>>>> frame : type(1)
op(OPEN)
>>>>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>>>>> signal received: 6
>>>>>>>>>>>>>> time of crash:
>>>>>>>>>>>>>> 2019-01-31 09:38:04
>>>>>>>>>>>>>> configuration details:
>>>>>>>>>>>>>> argp 1
>>>>>>>>>>>>>> backtrace 1
>>>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>>>> libpthread 1
>>>>>>>>>>>>>> llistxattr 1
>>>>>>>>>>>>>> setfsid 1
>>>>>>>>>>>>>> spinlock 1
>>>>>>>>>>>>>> epoll.h 1
>>>>>>>>>>>>>> xattr.h 1
>>>>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>>>>> package-string:
glusterfs 5.3
>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do the pending patches
fix the crash or only the repeated
>>>>>>>>>>>>>> warnings? I'm
running glusterfs on OpenSUSE 15.0 installed via
>>>>>>>>>>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>>>>>>> not too sure how to
make it core dump.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If it's not fixed
by the patches above, has anyone already
>>>>>>>>>>>>>> opened a ticket for the
crashes that I can join and monitor? This is going
>>>>>>>>>>>>>> to create a massive
problem for us since production systems are crashing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at
6:37 PM Raghavendra Gowdappa <
>>>>>>>>>>>>>> rgowdapp at
redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jan 31,
2019 at 2:14 AM Artem Russakovskii <
>>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, not sure
if related or not, but I got a ton of these
>>>>>>>>>>>>>>>> "Failed to
dispatch handler" in my logs as well. Many people have been
>>>>>>>>>>>>>>>> commenting
about this issue here
>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
https://review.gluster.org/#/c/glusterfs/+/22046/ addresses
>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>>> The
message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>
handler" repeated 413 times between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>>>>>>>> [2019-01-30
20:38:20.015593]
>>>>>>>>>>>>>>>>> The message
"I [MSGID: 108031]
>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>> selecting
local read_child SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>>>>>>>> [2019-01-30
20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>> The
message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>> selecting
local read_child SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>>>>>>>> [2019-01-30
20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>>>>>>> The message
"E [MSGID: 101191]
>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>
handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>>>>>>>> [2019-01-30
20:38:20.546355]
>>>>>>>>>>>>>>>>> [2019-01-30
20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>> selecting
local read_child SITE_data1-client-0
>>>>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>> selecting
local read_child SITE_data3-client-0
>>>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm hoping
raising the issue here on the mailing list may
>>>>>>>>>>>>>>>> bring some
additional eyeballs and get them both fixed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Founder,
Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Jan 30,
2019 at 12:26 PM Artem Russakovskii <
>>>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I found a
similar issue here:
>>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567.
>>>>>>>>>>>>>>>>> There's
a comment from 3 days ago from someone else with 5.3 who started
>>>>>>>>>>>>>>>>> seeing the
spam.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here's
the command that repeats over and over:
>>>>>>>>>>>>>>>>> [2019-01-30
20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +Milind Changire
<mchangir at redhat.com> Can you check why
>>>>>>>>>>>>>>> this message is
logged and send a fix?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is there
any fix for this issue?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Founder,
Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>>> beerpla.net
| +ArtemRussakovskii
>>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>> Gluster-users
mailing list
>>>>>>>>>>>>>>>> Gluster-users
at gluster.org
>>>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing
list
>>>>>>>>>>>>>> Gluster-users at
gluster.org
>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Amar Tumballi (amarts)
>>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190212/fe0e3322/attachment.html>

Gluster users - Feb 2019 - Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)