thr3ads.net - Gluster users - [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Artem Russakovskii

2019-Feb-12 06:14 UTC

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Awesome. But is there a release schedule and an ETA for when these will be
out in the repos?

On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
wrote:
>
>
> On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii <archon810 at
gmail.com>
> wrote:
>
>> Great job identifying the issue!
>>
>> Any ETA on the next release with the logging and crash fixes in it?
>>
>
> I've marked write-behind corruption as a blocker for release-6. Logging
> fixes are already in codebase.
>
>
>> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Feb 11, 2019 at 3:49 PM Jo?o Ba?to <
>>> joao.bauto at neuro.fchampalimaud.org> wrote:
>>>
>>>> Although I don't have these error messages, I'm having
fuse crashes as
>>>> frequent as you. I have disabled write-behind and the mount has
been
>>>> running over the weekend with heavy usage and no issues.
>>>>
>>>
>>> The issue you are facing will likely be fixed by patch [1]. Me,
Xavi and
>>> Nithya were able to identify the corruption in write-behind.
>>>
>>> [1] https://review.gluster.org/22189
>>>
>>>
>>>> I can provide coredumps before disabling write-behind if
needed. I
>>>> opened a BZ report
>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1671014>
with the crashes
>>>> that I was having.
>>>>
>>>> *Jo?o Ba?to*
>>>> ---------------
>>>>
>>>> *Scientific Computing and Software Platform*
>>>> Champalimaud Research
>>>> Champalimaud Center for the Unknown
>>>> Av. Bras?lia, Doca de Pedrou?os
>>>> 1400-038 Lisbon, Portugal
>>>> fchampalimaud.org <https://www.fchampalimaud.org/>
>>>>
>>>>
>>>> Artem Russakovskii <archon810 at gmail.com> escreveu no
dia s?bado,
>>>> 9/02/2019 ?(s) 22:18:
>>>>
>>>>> Alright. I've enabled core-dumping (hopefully), so now
I'm waiting for
>>>>> the next crash to see if it dumps a core for you guys to
remotely debug.
>>>>>
>>>>> Then I can consider setting performance.write-behind to off
and
>>>>> monitoring for further crashes.
>>>>>
>>>>> Sincerely,
>>>>> Artem
>>>>>
>>>>> --
>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> beerpla.net | +ArtemRussakovskii
>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>> <http://twitter.com/ArtemR>
>>>>>
>>>>>
>>>>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <
>>>>> rgowdapp at redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Nithya,
>>>>>>>
>>>>>>> I can try to disable write-behind as long as it
doesn't heavily
>>>>>>> impact performance for us. Which option is it
exactly? I don't see it set
>>>>>>> in my list of changed volume variables that I sent
you guys earlier.
>>>>>>>
>>>>>>
>>>>>> The option is performance.write-behind
>>>>>>
>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran
<
>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>
>>>>>>>> Hi Artem,
>>>>>>>>
>>>>>>>> We have found the cause of one crash.
Unfortunately we have not
>>>>>>>> managed to reproduce the one you reported so we
don't know if it is the
>>>>>>>> same cause.
>>>>>>>>
>>>>>>>> Can you disable write-behind on the volume and
let us know if it
>>>>>>>> solves the problem? If yes, it is likely to be
the same issue.
>>>>>>>>
>>>>>>>>
>>>>>>>> regards,
>>>>>>>> Nithya
>>>>>>>>
>>>>>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii
<
>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sorry to disappoint, but the crash just
happened again, so
>>>>>>>>> lru-limit=0 didn't help.
>>>>>>>>>
>>>>>>>>> Here's the snippet of the crash and the
subsequent remount by
>>>>>>>>> monit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [2019-02-08 01:13:05.854391] W
[dict.c:761:dict_ref]
>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>> [0x7f4402b99329]
>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>> [0x7f4402daaaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>>>>>>>> valid argument]
>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-<SNIP>_data1-replicate-0:
>>>>>>>>> selecting local read_child
<SNIP>_data1-client-3" repeated 39 times between
>>>>>>>>> [2019-02-08 01:11:18.043286] and
[2019-02-08 01:13:07.915604]
>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>> handler" repeated 515 times between
[2019-02-08 01:11:17.932515] and
>>>>>>>>> [2019-02-08 01:13:09.311554]
>>>>>>>>> pending frames:
>>>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>>>> frame : type(0) op(0)
>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>> signal received: 6
>>>>>>>>> time of crash:
>>>>>>>>> 2019-02-08 01:13:09
>>>>>>>>> configuration details:
>>>>>>>>> argp 1
>>>>>>>>> backtrace 1
>>>>>>>>> dlfcn 1
>>>>>>>>> libpthread 1
>>>>>>>>> llistxattr 1
>>>>>>>>> setfsid 1
>>>>>>>>> spinlock 1
>>>>>>>>> epoll.h 1
>>>>>>>>> xattr.h 1
>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>>>>>>>
>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>>>>>>>
>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>>>>>>>
>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>>>>>>>>
>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>>>>>>>> ---------
>>>>>>>>> [2019-02-08 01:13:35.628478] I [MSGID:
100030]
>>>>>>>>> [glusterfsd.c:2715:main]
0-/usr/sbin/glusterfs: Started running
>>>>>>>>> /usr/sbin/glusterfs version 5.3 (args:
/usr/sbin/glusterfs --lru-limit=0
>>>>>>>>> --process-name fuse
--volfile-server=localhost --volfile-id=/<SNIP>_data1
>>>>>>>>> /mnt/<SNIP>_data1)
>>>>>>>>> [2019-02-08 01:13:35.637830] I [MSGID:
101190]
>>>>>>>>>
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>> with index 1
>>>>>>>>> [2019-02-08 01:13:35.651405] I [MSGID:
101190]
>>>>>>>>>
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>> with index 2
>>>>>>>>> [2019-02-08 01:13:35.651628] I [MSGID:
101190]
>>>>>>>>>
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>> with index 3
>>>>>>>>> [2019-02-08 01:13:35.651747] I [MSGID:
101190]
>>>>>>>>>
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>> with index 4
>>>>>>>>> [2019-02-08 01:13:35.652575] I [MSGID:
114020]
>>>>>>>>> [client.c:2354:notify]
0-<SNIP>_data1-client-0: parent translators are
>>>>>>>>> ready, attempting connect on transport
>>>>>>>>> [2019-02-08 01:13:35.652978] I [MSGID:
114020]
>>>>>>>>> [client.c:2354:notify]
0-<SNIP>_data1-client-1: parent translators are
>>>>>>>>> ready, attempting connect on transport
>>>>>>>>> [2019-02-08 01:13:35.655197] I [MSGID:
114020]
>>>>>>>>> [client.c:2354:notify]
0-<SNIP>_data1-client-2: parent translators are
>>>>>>>>> ready, attempting connect on transport
>>>>>>>>> [2019-02-08 01:13:35.655497] I [MSGID:
114020]
>>>>>>>>> [client.c:2354:notify]
0-<SNIP>_data1-client-3: parent translators are
>>>>>>>>> ready, attempting connect on transport
>>>>>>>>> [2019-02-08 01:13:35.655527] I
[rpc-clnt.c:2042:rpc_clnt_reconfig]
>>>>>>>>> 0-<SNIP>_data1-client-0: changing
port to 49153 (from 0)
>>>>>>>>> Final graph:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Artem
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>>> <http://www.apkmirror.com/>,
Illogical Robot LLC
>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem
Russakovskii <
>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I've added the lru-limit=0
parameter to the mounts, and I see
>>>>>>>>>> it's taken effect correctly:
>>>>>>>>>> "/usr/sbin/glusterfs --lru-limit=0
--process-name fuse
>>>>>>>>>> --volfile-server=localhost
--volfile-id=/<SNIP>  /mnt/<SNIP>"
>>>>>>>>>>
>>>>>>>>>> Let's see if it stops crashing or
not.
>>>>>>>>>>
>>>>>>>>>> Sincerely,
>>>>>>>>>> Artem
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem
Russakovskii <
>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Nithya,
>>>>>>>>>>>
>>>>>>>>>>> Indeed, I upgraded from 4.1 to 5.3,
at which point I started
>>>>>>>>>>> seeing crashes, and no further
releases have been made yet.
>>>>>>>>>>>
>>>>>>>>>>> volume info:
>>>>>>>>>>> Type: Replicate
>>>>>>>>>>> Volume ID: ****SNIP****
>>>>>>>>>>> Status: Started
>>>>>>>>>>> Snapshot Count: 0
>>>>>>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>> Bricks:
>>>>>>>>>>> Brick1: ****SNIP****
>>>>>>>>>>> Brick2: ****SNIP****
>>>>>>>>>>> Brick3: ****SNIP****
>>>>>>>>>>> Brick4: ****SNIP****
>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>> cluster.quorum-count: 1
>>>>>>>>>>> cluster.quorum-type: fixed
>>>>>>>>>>> network.ping-timeout: 5
>>>>>>>>>>> network.remote-dio: enable
>>>>>>>>>>> performance.rda-cache-limit: 256MB
>>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>> performance.parallel-readdir: on
>>>>>>>>>>> network.inode-lru-limit: 500000
>>>>>>>>>>> performance.md-cache-timeout: 600
>>>>>>>>>>> performance.cache-invalidation: on
>>>>>>>>>>> performance.stat-prefetch: on
>>>>>>>>>>>
features.cache-invalidation-timeout: 600
>>>>>>>>>>> features.cache-invalidation: on
>>>>>>>>>>> cluster.readdir-optimize: on
>>>>>>>>>>> performance.io-thread-count: 32
>>>>>>>>>>> server.event-threads: 4
>>>>>>>>>>> client.event-threads: 4
>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>> cluster.lookup-optimize: on
>>>>>>>>>>> performance.cache-size: 1GB
>>>>>>>>>>> cluster.self-heal-daemon: enable
>>>>>>>>>>> transport.address-family: inet
>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>> performance.client-io-threads: on
>>>>>>>>>>> cluster.granular-entry-heal: enable
>>>>>>>>>>> cluster.data-self-heal-algorithm:
full
>>>>>>>>>>>
>>>>>>>>>>> Sincerely,
>>>>>>>>>>> Artem
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM
Nithya Balachandran <
>>>>>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>>
>>>>>>>>>>>> Do you still see the crashes
with 5.3? If yes, please try mount
>>>>>>>>>>>> the volume using the mount
option lru-limit=0 and see if that helps. We are
>>>>>>>>>>>> looking into the crashes and
will update when have a fix.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, please provide the
gluster volume info for the volume in
>>>>>>>>>>>> question.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> regards,
>>>>>>>>>>>> Nithya
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, 5 Feb 2019 at 05:31,
Artem Russakovskii <
>>>>>>>>>>>> archon810 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The fuse crash happened two
more times, but this time monit
>>>>>>>>>>>>> helped recover within 1
minute, so it's a great workaround for now.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What's odd is that the
crashes are only happening on one of 4
>>>>>>>>>>>>> servers, and I don't
know why.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Feb 2, 2019 at
12:14 PM Artem Russakovskii <
>>>>>>>>>>>>> archon810 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The fuse crash happened
again yesterday, to another volume.
>>>>>>>>>>>>>> Are there any mount
options that could help mitigate this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the meantime, I set
up a monit (https://mmonit.com/monit/)
>>>>>>>>>>>>>> task to watch and
restart the mount, which works and recovers the mount
>>>>>>>>>>>>>> point within a minute.
Not ideal, but a temporary workaround.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> By the way, the way to
reproduce this "Transport endpoint is
>>>>>>>>>>>>>> not connected"
condition for testing purposes is to kill -9 the right
>>>>>>>>>>>>>> "glusterfs
--process-name fuse" process.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> monit check:
>>>>>>>>>>>>>> check filesystem
glusterfs_data1 with path
>>>>>>>>>>>>>> /mnt/glusterfs_data1
>>>>>>>>>>>>>>   start program  =
"/bin/mount  /mnt/glusterfs_data1"
>>>>>>>>>>>>>>   stop program  =
"/bin/umount /mnt/glusterfs_data1"
>>>>>>>>>>>>>>   if space usage >
90% for 5 times within 15 cycles
>>>>>>>>>>>>>>     then alert else if
succeeded for 10 cycles then alert
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> stack trace:
>>>>>>>>>>>>>> [2019-02-01
23:22:00.312894] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fa02cf5b218] )
0-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> [2019-02-01
23:22:00.314051] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fa02cf5b218] )
0-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> The message "E
[MSGID: 101191]
>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>>>>>>> handler" repeated
26 times between [2019-02-01 23:21:20.857333] and
>>>>>>>>>>>>>> [2019-02-01
23:21:56.164427]
>>>>>>>>>>>>>> The message "I
[MSGID: 108031]
>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>>>>>>>>>>>>>> selecting local
read_child SITE_data3-client-3" repeated 27 times between
>>>>>>>>>>>>>> [2019-02-01
23:21:11.142467] and [2019-02-01 23:22:03.474036]
>>>>>>>>>>>>>> pending frames:
>>>>>>>>>>>>>> frame : type(1)
op(LOOKUP)
>>>>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>>>>> signal received: 6
>>>>>>>>>>>>>> time of crash:
>>>>>>>>>>>>>> 2019-02-01 23:22:03
>>>>>>>>>>>>>> configuration details:
>>>>>>>>>>>>>> argp 1
>>>>>>>>>>>>>> backtrace 1
>>>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>>>> libpthread 1
>>>>>>>>>>>>>> llistxattr 1
>>>>>>>>>>>>>> setfsid 1
>>>>>>>>>>>>>> spinlock 1
>>>>>>>>>>>>>> epoll.h 1
>>>>>>>>>>>>>> xattr.h 1
>>>>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>>>>> package-string:
glusterfs 5.3
>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Feb 1, 2019 at
9:03 AM Artem Russakovskii <
>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The first (and so
far only) crash happened at 2am the next
>>>>>>>>>>>>>>> day after we
upgraded, on only one of four servers and only to one of two
>>>>>>>>>>>>>>> mounts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have no idea what
caused it, but yeah, we do have a pretty
>>>>>>>>>>>>>>> busy site
(apkmirror.com), and it caused a disruption for
>>>>>>>>>>>>>>> any uploads or
downloads from that server until I woke up and fixed the
>>>>>>>>>>>>>>> mount.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I wish I could be
more helpful but all I have is that stack
>>>>>>>>>>>>>>> trace.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm glad
it's a blocker and will hopefully be resolved soon.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jan 31,
2019, 7:26 PM Amar Tumballi Suryanarayan <
>>>>>>>>>>>>>>> atumball at
redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1671603
>>>>>>>>>>>>>>>> (ie, as a clone
of other bugs where recent discussions happened), and
>>>>>>>>>>>>>>>> marked it as a
blocker for glusterfs-5.4 release.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We already have
fixes for log flooding -
>>>>>>>>>>>>>>>>
https://review.gluster.org/22128, and are the process of
>>>>>>>>>>>>>>>> identifying and
fixing the issue seen with crash.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you please
tell if the crashes happened as soon as
>>>>>>>>>>>>>>>> upgrade ? or
was there any particular pattern you observed before the crash.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Amar
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jan 31,
2019 at 11:40 PM Artem Russakovskii <
>>>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Within 24
hours after updating from rock solid 4.1 to 5.3,
>>>>>>>>>>>>>>>>> I already
got a crash which others have mentioned in
>>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and
>>>>>>>>>>>>>>>>> had to
unmount, kill gluster, and remount:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [2019-01-31
09:38:04.317604] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>
[0x7fcccafcd329]
>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>> [2019-01-31
09:38:04.319308] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>
[0x7fcccafcd329]
>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>> [2019-01-31
09:38:04.320047] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>
[0x7fcccafcd329]
>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>> [2019-01-31
09:38:04.320677] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>
[0x7fcccafcd329]
>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>> The message
"I [MSGID: 108031]
>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>> selecting
local read_child SITE_data1-client-3" repeated 5 times between
>>>>>>>>>>>>>>>>> [2019-01-31
09:37:54.751905] and [2019-01-31 09:38:03.958061]
>>>>>>>>>>>>>>>>> The message
"E [MSGID: 101191]
>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>
handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
>>>>>>>>>>>>>>>>> [2019-01-31
09:38:04.696993]
>>>>>>>>>>>>>>>>> pending
frames:
>>>>>>>>>>>>>>>>> frame :
type(1) op(READ)
>>>>>>>>>>>>>>>>> frame :
type(1) op(OPEN)
>>>>>>>>>>>>>>>>> frame :
type(0) op(0)
>>>>>>>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>>>>>>>> signal
received: 6
>>>>>>>>>>>>>>>>> time of
crash:
>>>>>>>>>>>>>>>>> 2019-01-31
09:38:04
>>>>>>>>>>>>>>>>>
configuration details:
>>>>>>>>>>>>>>>>> argp 1
>>>>>>>>>>>>>>>>> backtrace 1
>>>>>>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>>>>>>> libpthread
1
>>>>>>>>>>>>>>>>> llistxattr
1
>>>>>>>>>>>>>>>>> setfsid 1
>>>>>>>>>>>>>>>>> spinlock 1
>>>>>>>>>>>>>>>>> epoll.h 1
>>>>>>>>>>>>>>>>> xattr.h 1
>>>>>>>>>>>>>>>>>
st_atim.tv_nsec 1
>>>>>>>>>>>>>>>>>
package-string: glusterfs 5.3
>>>>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do the
pending patches fix the crash or only the repeated
>>>>>>>>>>>>>>>>> warnings?
I'm running glusterfs on OpenSUSE 15.0 installed via
>>>>>>>>>>>>>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>>>>>>>>>> not too
sure how to make it core dump.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If it's
not fixed by the patches above, has anyone already
>>>>>>>>>>>>>>>>> opened a
ticket for the crashes that I can join and monitor? This is going
>>>>>>>>>>>>>>>>> to create a
massive problem for us since production systems are crashing.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Founder,
Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>>> beerpla.net
| +ArtemRussakovskii
>>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Jan
30, 2019 at 6:37 PM Raghavendra Gowdappa <
>>>>>>>>>>>>>>>>> rgowdapp at
redhat.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu,
Jan 31, 2019 at 2:14 AM Artem Russakovskii <
>>>>>>>>>>>>>>>>>>
archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Also, not sure if related or not, but I got a ton of
>>>>>>>>>>>>>>>>>>>
these "Failed to dispatch handler" in my logs as well. Many people
have
>>>>>>>>>>>>>>>>>>>
been commenting about this issue here
>>>>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
https://review.gluster.org/#/c/glusterfs/+/22046/
>>>>>>>>>>>>>>>>>>
addresses this.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>>>>>>
The message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>>>
handler" repeated 413 times between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:20.015593]
>>>>>>>>>>>>>>>>>>>>
The message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>>>>>
selecting local read_child SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>>>>
The message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>>>>>
selecting local read_child SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>>>>>>>>>>
The message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>>>
handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:20.546355]
>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>>>>>
selecting local read_child SITE_data1-client-0
>>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>>>>>
selecting local read_child SITE_data3-client-0
>>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>>>
handler
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
I'm hoping raising the issue here on the mailing list
>>>>>>>>>>>>>>>>>>> may
bring some additional eyeballs and get them both fixed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Thanks.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
Sincerely,
>>>>>>>>>>>>>>>>>>>
Artem
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>
Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>>>>>
Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>>>>>
beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <
>>>>>>>>>>>>>>>>>>>
archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I found a similar issue here:
>>>>>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567.
>>>>>>>>>>>>>>>>>>>>
There's a comment from 3 days ago from someone else with 5.3 who started
>>>>>>>>>>>>>>>>>>>>
seeing the spam.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Here's the command that repeats over and over:
>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +Milind
Changire <mchangir at redhat.com> Can you check why
>>>>>>>>>>>>>>>>>> this
message is logged and send a fix?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Is there any fix for this issue?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Thanks.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Sincerely,
>>>>>>>>>>>>>>>>>>>>
Artem
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
--
>>>>>>>>>>>>>>>>>>>>
Founder, Android Police <http://www.androidpolice.com>
>>>>>>>>>>>>>>>>>>>>
, APK Mirror <http://www.apkmirror.com/>, Illogical
>>>>>>>>>>>>>>>>>>>>
Robot LLC
>>>>>>>>>>>>>>>>>>>>
beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Amar Tumballi
(amarts)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at
gluster.org
>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190211/4577a5ff/attachment.html>

Nithya Balachandran

2019-Feb-12 08:37 UTC

head link

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Not yet but we are discussing an interim release. It is going to take a
couple of days to review the fixes so not before then. We will update on
the list with dates once we decide.


On Tue, 12 Feb 2019 at 11:46, Artem Russakovskii <archon810 at gmail.com>
wrote:
> Awesome. But is there a release schedule and an ETA for when these will be
> out in the repos?
>
> On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
> wrote:
>
>>
>>
>> On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii <archon810 at
gmail.com>
>> wrote:
>>
>>> Great job identifying the issue!
>>>
>>> Any ETA on the next release with the logging and crash fixes in it?
>>>
>>
>> I've marked write-behind corruption as a blocker for release-6.
Logging
>> fixes are already in codebase.
>>
>>
>>> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Feb 11, 2019 at 3:49 PM Jo?o Ba?to <
>>>> joao.bauto at neuro.fchampalimaud.org> wrote:
>>>>
>>>>> Although I don't have these error messages, I'm
having fuse crashes as
>>>>> frequent as you. I have disabled write-behind and the mount
has been
>>>>> running over the weekend with heavy usage and no issues.
>>>>>
>>>>
>>>> The issue you are facing will likely be fixed by patch [1]. Me,
Xavi
>>>> and Nithya were able to identify the corruption in
write-behind.
>>>>
>>>> [1] https://review.gluster.org/22189
>>>>
>>>>
>>>>> I can provide coredumps before disabling write-behind if
needed. I
>>>>> opened a BZ report
>>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1671014>
with the
>>>>> crashes that I was having.
>>>>>
>>>>> *Jo?o Ba?to*
>>>>> ---------------
>>>>>
>>>>> *Scientific Computing and Software Platform*
>>>>> Champalimaud Research
>>>>> Champalimaud Center for the Unknown
>>>>> Av. Bras?lia, Doca de Pedrou?os
>>>>> 1400-038 Lisbon, Portugal
>>>>> fchampalimaud.org <https://www.fchampalimaud.org/>
>>>>>
>>>>>
>>>>> Artem Russakovskii <archon810 at gmail.com> escreveu
no dia s?bado,
>>>>> 9/02/2019 ?(s) 22:18:
>>>>>
>>>>>> Alright. I've enabled core-dumping (hopefully), so
now I'm waiting
>>>>>> for the next crash to see if it dumps a core for you
guys to remotely debug.
>>>>>>
>>>>>> Then I can consider setting performance.write-behind to
off and
>>>>>> monitoring for further crashes.
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa
<
>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii
<
>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Nithya,
>>>>>>>>
>>>>>>>> I can try to disable write-behind as long as it
doesn't heavily
>>>>>>>> impact performance for us. Which option is it
exactly? I don't see it set
>>>>>>>> in my list of changed volume variables that I
sent you guys earlier.
>>>>>>>>
>>>>>>>
>>>>>>> The option is performance.write-behind
>>>>>>>
>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Artem
>>>>>>>>
>>>>>>>> --
>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya
Balachandran <
>>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Artem,
>>>>>>>>>
>>>>>>>>> We have found the cause of one crash.
Unfortunately we have not
>>>>>>>>> managed to reproduce the one you reported
so we don't know if it is the
>>>>>>>>> same cause.
>>>>>>>>>
>>>>>>>>> Can you disable write-behind on the volume
and let us know if it
>>>>>>>>> solves the problem? If yes, it is likely to
be the same issue.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>> Nithya
>>>>>>>>>
>>>>>>>>> On Fri, 8 Feb 2019 at 06:51, Artem
Russakovskii <
>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sorry to disappoint, but the crash just
happened again, so
>>>>>>>>>> lru-limit=0 didn't help.
>>>>>>>>>>
>>>>>>>>>> Here's the snippet of the crash and
the subsequent remount by
>>>>>>>>>> monit.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [2019-02-08 01:13:05.854391] W
[dict.c:761:dict_ref]
>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>> [0x7f4402b99329]
>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>> [0x7f4402daaaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL
[In
>>>>>>>>>> valid argument]
>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0:
>>>>>>>>>> selecting local read_child
<SNIP>_data1-client-3" repeated 39 times between
>>>>>>>>>> [2019-02-08 01:11:18.043286] and
[2019-02-08 01:13:07.915604]
>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>>> handler" repeated 515 times
between [2019-02-08 01:11:17.932515] and
>>>>>>>>>> [2019-02-08 01:13:09.311554]
>>>>>>>>>> pending frames:
>>>>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>> signal received: 6
>>>>>>>>>> time of crash:
>>>>>>>>>> 2019-02-08 01:13:09
>>>>>>>>>> configuration details:
>>>>>>>>>> argp 1
>>>>>>>>>> backtrace 1
>>>>>>>>>> dlfcn 1
>>>>>>>>>> libpthread 1
>>>>>>>>>> llistxattr 1
>>>>>>>>>> setfsid 1
>>>>>>>>>> spinlock 1
>>>>>>>>>> epoll.h 1
>>>>>>>>>> xattr.h 1
>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>>>>>>>>>
>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>>>>>>>>> ---------
>>>>>>>>>> [2019-02-08 01:13:35.628478] I [MSGID:
100030]
>>>>>>>>>> [glusterfsd.c:2715:main]
0-/usr/sbin/glusterfs: Started running
>>>>>>>>>> /usr/sbin/glusterfs version 5.3 (args:
/usr/sbin/glusterfs --lru-limit=0
>>>>>>>>>> --process-name fuse
--volfile-server=localhost --volfile-id=/<SNIP>_data1
>>>>>>>>>> /mnt/<SNIP>_data1)
>>>>>>>>>> [2019-02-08 01:13:35.637830] I [MSGID:
101190]
>>>>>>>>>>
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>>> with index 1
>>>>>>>>>> [2019-02-08 01:13:35.651405] I [MSGID:
101190]
>>>>>>>>>>
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>>> with index 2
>>>>>>>>>> [2019-02-08 01:13:35.651628] I [MSGID:
101190]
>>>>>>>>>>
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>>> with index 3
>>>>>>>>>> [2019-02-08 01:13:35.651747] I [MSGID:
101190]
>>>>>>>>>>
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>>>>>>>> with index 4
>>>>>>>>>> [2019-02-08 01:13:35.652575] I [MSGID:
114020]
>>>>>>>>>> [client.c:2354:notify]
0-<SNIP>_data1-client-0: parent translators are
>>>>>>>>>> ready, attempting connect on transport
>>>>>>>>>> [2019-02-08 01:13:35.652978] I [MSGID:
114020]
>>>>>>>>>> [client.c:2354:notify]
0-<SNIP>_data1-client-1: parent translators are
>>>>>>>>>> ready, attempting connect on transport
>>>>>>>>>> [2019-02-08 01:13:35.655197] I [MSGID:
114020]
>>>>>>>>>> [client.c:2354:notify]
0-<SNIP>_data1-client-2: parent translators are
>>>>>>>>>> ready, attempting connect on transport
>>>>>>>>>> [2019-02-08 01:13:35.655497] I [MSGID:
114020]
>>>>>>>>>> [client.c:2354:notify]
0-<SNIP>_data1-client-3: parent translators are
>>>>>>>>>> ready, attempting connect on transport
>>>>>>>>>> [2019-02-08 01:13:35.655527] I
>>>>>>>>>> [rpc-clnt.c:2042:rpc_clnt_reconfig]
0-<SNIP>_data1-client-0: changing port
>>>>>>>>>> to 49153 (from 0)
>>>>>>>>>> Final graph:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sincerely,
>>>>>>>>>> Artem
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem
Russakovskii <
>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've added the lru-limit=0
parameter to the mounts, and I see
>>>>>>>>>>> it's taken effect correctly:
>>>>>>>>>>> "/usr/sbin/glusterfs
--lru-limit=0 --process-name fuse
>>>>>>>>>>> --volfile-server=localhost
--volfile-id=/<SNIP>  /mnt/<SNIP>"
>>>>>>>>>>>
>>>>>>>>>>> Let's see if it stops crashing
or not.
>>>>>>>>>>>
>>>>>>>>>>> Sincerely,
>>>>>>>>>>> Artem
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 6, 2019 at 10:48 AM
Artem Russakovskii <
>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Nithya,
>>>>>>>>>>>>
>>>>>>>>>>>> Indeed, I upgraded from 4.1 to
5.3, at which point I started
>>>>>>>>>>>> seeing crashes, and no further
releases have been made yet.
>>>>>>>>>>>>
>>>>>>>>>>>> volume info:
>>>>>>>>>>>> Type: Replicate
>>>>>>>>>>>> Volume ID: ****SNIP****
>>>>>>>>>>>> Status: Started
>>>>>>>>>>>> Snapshot Count: 0
>>>>>>>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>> Bricks:
>>>>>>>>>>>> Brick1: ****SNIP****
>>>>>>>>>>>> Brick2: ****SNIP****
>>>>>>>>>>>> Brick3: ****SNIP****
>>>>>>>>>>>> Brick4: ****SNIP****
>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>> cluster.quorum-count: 1
>>>>>>>>>>>> cluster.quorum-type: fixed
>>>>>>>>>>>> network.ping-timeout: 5
>>>>>>>>>>>> network.remote-dio: enable
>>>>>>>>>>>> performance.rda-cache-limit:
256MB
>>>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>>> performance.parallel-readdir:
on
>>>>>>>>>>>> network.inode-lru-limit: 500000
>>>>>>>>>>>> performance.md-cache-timeout:
600
>>>>>>>>>>>> performance.cache-invalidation:
on
>>>>>>>>>>>> performance.stat-prefetch: on
>>>>>>>>>>>>
features.cache-invalidation-timeout: 600
>>>>>>>>>>>> features.cache-invalidation: on
>>>>>>>>>>>> cluster.readdir-optimize: on
>>>>>>>>>>>> performance.io-thread-count: 32
>>>>>>>>>>>> server.event-threads: 4
>>>>>>>>>>>> client.event-threads: 4
>>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>>> cluster.lookup-optimize: on
>>>>>>>>>>>> performance.cache-size: 1GB
>>>>>>>>>>>> cluster.self-heal-daemon:
enable
>>>>>>>>>>>> transport.address-family: inet
>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>> performance.client-io-threads:
on
>>>>>>>>>>>> cluster.granular-entry-heal:
enable
>>>>>>>>>>>>
cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>
>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>> Artem
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM
Nithya Balachandran <
>>>>>>>>>>>> nbalacha at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you still see the
crashes with 5.3? If yes, please try
>>>>>>>>>>>>> mount the volume using the
mount option lru-limit=0 and see if that helps.
>>>>>>>>>>>>> We are looking into the
crashes and will update when have a fix.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, please provide the
gluster volume info for the volume in
>>>>>>>>>>>>> question.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> regards,
>>>>>>>>>>>>> Nithya
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 5 Feb 2019 at
05:31, Artem Russakovskii <
>>>>>>>>>>>>> archon810 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The fuse crash happened
two more times, but this time monit
>>>>>>>>>>>>>> helped recover within 1
minute, so it's a great workaround for now.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What's odd is that
the crashes are only happening on one of 4
>>>>>>>>>>>>>> servers, and I
don't know why.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Feb 2, 2019 at
12:14 PM Artem Russakovskii <
>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The fuse crash
happened again yesterday, to another volume.
>>>>>>>>>>>>>>> Are there any mount
options that could help mitigate this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the meantime, I
set up a monit (https://mmonit.com/monit/)
>>>>>>>>>>>>>>> task to watch and
restart the mount, which works and recovers the mount
>>>>>>>>>>>>>>> point within a
minute. Not ideal, but a temporary workaround.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> By the way, the way
to reproduce this "Transport endpoint is
>>>>>>>>>>>>>>> not connected"
condition for testing purposes is to kill -9 the right
>>>>>>>>>>>>>>> "glusterfs
--process-name fuse" process.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> monit check:
>>>>>>>>>>>>>>> check filesystem
glusterfs_data1 with path
>>>>>>>>>>>>>>>
/mnt/glusterfs_data1
>>>>>>>>>>>>>>>   start program  =
"/bin/mount  /mnt/glusterfs_data1"
>>>>>>>>>>>>>>>   stop program  =
"/bin/umount /mnt/glusterfs_data1"
>>>>>>>>>>>>>>>   if space usage
> 90% for 5 times within 15 cycles
>>>>>>>>>>>>>>>     then alert else
if succeeded for 10 cycles then alert
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> stack trace:
>>>>>>>>>>>>>>> [2019-02-01
23:22:00.312894] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>> [0x7fa02cf5b218] )
0-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>> [2019-02-01
23:22:00.314051] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>> [0x7fa02cf5b218] )
0-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>> The message "E
[MSGID: 101191]
>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>>>>>>>> handler"
repeated 26 times between [2019-02-01 23:21:20.857333] and
>>>>>>>>>>>>>>> [2019-02-01
23:21:56.164427]
>>>>>>>>>>>>>>> The message "I
[MSGID: 108031]
>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>>>>>>>>>>>>>>> selecting local
read_child SITE_data3-client-3" repeated 27 times between
>>>>>>>>>>>>>>> [2019-02-01
23:21:11.142467] and [2019-02-01 23:22:03.474036]
>>>>>>>>>>>>>>> pending frames:
>>>>>>>>>>>>>>> frame : type(1)
op(LOOKUP)
>>>>>>>>>>>>>>> frame : type(0)
op(0)
>>>>>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>>>>>> signal received: 6
>>>>>>>>>>>>>>> time of crash:
>>>>>>>>>>>>>>> 2019-02-01 23:22:03
>>>>>>>>>>>>>>> configuration
details:
>>>>>>>>>>>>>>> argp 1
>>>>>>>>>>>>>>> backtrace 1
>>>>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>>>>> libpthread 1
>>>>>>>>>>>>>>> llistxattr 1
>>>>>>>>>>>>>>> setfsid 1
>>>>>>>>>>>>>>> spinlock 1
>>>>>>>>>>>>>>> epoll.h 1
>>>>>>>>>>>>>>> xattr.h 1
>>>>>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>>>>>> package-string:
glusterfs 5.3
>>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Founder, Android
Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Feb 1, 2019
at 9:03 AM Artem Russakovskii <
>>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The first (and
so far only) crash happened at 2am the next
>>>>>>>>>>>>>>>> day after we
upgraded, on only one of four servers and only to one of two
>>>>>>>>>>>>>>>> mounts.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have no idea
what caused it, but yeah, we do have a
>>>>>>>>>>>>>>>> pretty busy
site (apkmirror.com), and it caused a
>>>>>>>>>>>>>>>> disruption for
any uploads or downloads from that server until I woke up
>>>>>>>>>>>>>>>> and fixed the
mount.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I wish I could
be more helpful but all I have is that stack
>>>>>>>>>>>>>>>> trace.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm glad
it's a blocker and will hopefully be resolved
>>>>>>>>>>>>>>>> soon.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jan 31,
2019, 7:26 PM Amar Tumballi Suryanarayan <
>>>>>>>>>>>>>>>> atumball at
redhat.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1671603
>>>>>>>>>>>>>>>>> (ie, as a
clone of other bugs where recent discussions happened), and
>>>>>>>>>>>>>>>>> marked it
as a blocker for glusterfs-5.4 release.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We already
have fixes for log flooding -
>>>>>>>>>>>>>>>>>
https://review.gluster.org/22128, and are the process of
>>>>>>>>>>>>>>>>> identifying
and fixing the issue seen with crash.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can you
please tell if the crashes happened as soon as
>>>>>>>>>>>>>>>>> upgrade ?
or was there any particular pattern you observed before the crash.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Amar
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jan
31, 2019 at 11:40 PM Artem Russakovskii <
>>>>>>>>>>>>>>>>> archon810
at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Within
24 hours after updating from rock solid 4.1 to
>>>>>>>>>>>>>>>>>> 5.3, I
already got a crash which others have mentioned in
>>>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and
>>>>>>>>>>>>>>>>>> had to
unmount, kill gluster, and remount:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>
[0x7fcccafcd329]
>>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>
[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>
[0x7fcccafcd329]
>>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>
[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>
[0x7fcccafcd329]
>>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>
[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>
[0x7fcccafcd329]
>>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>> The
message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>>>
selecting local read_child SITE_data1-client-3" repeated 5 times between
>>>>>>>>>>>>>>>>>>
[2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
>>>>>>>>>>>>>>>>>> The
message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>
handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
>>>>>>>>>>>>>>>>>>
[2019-01-31 09:38:04.696993]
>>>>>>>>>>>>>>>>>> pending
frames:
>>>>>>>>>>>>>>>>>> frame :
type(1) op(READ)
>>>>>>>>>>>>>>>>>> frame :
type(1) op(OPEN)
>>>>>>>>>>>>>>>>>> frame :
type(0) op(0)
>>>>>>>>>>>>>>>>>>
patchset: git://git.gluster.org/glusterfs.git
>>>>>>>>>>>>>>>>>> signal
received: 6
>>>>>>>>>>>>>>>>>> time of
crash:
>>>>>>>>>>>>>>>>>>
2019-01-31 09:38:04
>>>>>>>>>>>>>>>>>>
configuration details:
>>>>>>>>>>>>>>>>>> argp 1
>>>>>>>>>>>>>>>>>>
backtrace 1
>>>>>>>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>>>>>>>>
libpthread 1
>>>>>>>>>>>>>>>>>>
llistxattr 1
>>>>>>>>>>>>>>>>>> setfsid
1
>>>>>>>>>>>>>>>>>>
spinlock 1
>>>>>>>>>>>>>>>>>> epoll.h
1
>>>>>>>>>>>>>>>>>> xattr.h
1
>>>>>>>>>>>>>>>>>>
st_atim.tv_nsec 1
>>>>>>>>>>>>>>>>>>
package-string: glusterfs 5.3
>>>>>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>>>>>>>>>>>
---------
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Do the
pending patches fix the crash or only the repeated
>>>>>>>>>>>>>>>>>>
warnings? I'm running glusterfs on OpenSUSE 15.0 installed via
>>>>>>>>>>>>>>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>>>>>>>>>>> not too
sure how to make it core dump.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If
it's not fixed by the patches above, has anyone
>>>>>>>>>>>>>>>>>> already
opened a ticket for the crashes that I can join and monitor? This
>>>>>>>>>>>>>>>>>> is
going to create a massive problem for us since production systems are
>>>>>>>>>>>>>>>>>>
crashing.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
Sincerely,
>>>>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>
Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>>>>>
beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed,
Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <
>>>>>>>>>>>>>>>>>>
rgowdapp at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <
>>>>>>>>>>>>>>>>>>>
archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Also, not sure if related or not, but I got a ton of
>>>>>>>>>>>>>>>>>>>>
these "Failed to dispatch handler" in my logs as well. Many people
have
>>>>>>>>>>>>>>>>>>>>
been commenting about this issue here
>>>>>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
https://review.gluster.org/#/c/glusterfs/+/22046/
>>>>>>>>>>>>>>>>>>>
addresses this.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>>>>>>>
The message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>>>>
handler" repeated 413 times between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:20.015593]
>>>>>>>>>>>>>>>>>>>>>
The message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>>>>>>
selecting local read_child SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>>>>>
The message "I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>>>>>>
selecting local read_child SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>>>>>>>>>>>
The message "E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>>>>
handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:20.546355]
>>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>>>>>>>>
selecting local read_child SITE_data1-client-0
>>>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data3.log
<=>>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>>>>>>>>
selecting local read_child SITE_data3-client-0
>>>>>>>>>>>>>>>>>>>>>
==> mnt-SITE_data1.log
<=>>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>>>>>>>>
handler
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
I'm hoping raising the issue here on the mailing list
>>>>>>>>>>>>>>>>>>>>
may bring some additional eyeballs and get them both fixed.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Thanks.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Sincerely,
>>>>>>>>>>>>>>>>>>>>
Artem
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
--
>>>>>>>>>>>>>>>>>>>>
Founder, Android Police <http://www.androidpolice.com>
>>>>>>>>>>>>>>>>>>>>
, APK Mirror <http://www.apkmirror.com/>, Illogical
>>>>>>>>>>>>>>>>>>>>
Robot LLC
>>>>>>>>>>>>>>>>>>>>
beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <
>>>>>>>>>>>>>>>>>>>>
archon810 at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
I found a similar issue here:
>>>>>>>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567.
>>>>>>>>>>>>>>>>>>>>>
There's a comment from 3 days ago from someone else with 5.3 who started
>>>>>>>>>>>>>>>>>>>>>
seeing the spam.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Here's the command that repeats over and over:
>>>>>>>>>>>>>>>>>>>>>
[2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>>>>>>>>
[0x7fd966fcd329]
>>>>>>>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>>>>>>>>
[0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>>>>>>>>
[0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
+Milind Changire <mchangir at redhat.com> Can you check
>>>>>>>>>>>>>>>>>>> why
this message is logged and send a fix?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Is there any fix for this issue?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Thanks.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
Sincerely,
>>>>>>>>>>>>>>>>>>>>>
Artem
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
--
>>>>>>>>>>>>>>>>>>>>>
Founder, Android Police <http://www.androidpolice.com>
>>>>>>>>>>>>>>>>>>>>>
, APK Mirror <http://www.apkmirror.com/>, Illogical
>>>>>>>>>>>>>>>>>>>>>
Robot LLC
>>>>>>>>>>>>>>>>>>>>>
beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Amar
Tumballi (amarts)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing
list
>>>>>>>>>>>>>> Gluster-users at
gluster.org
>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190212/fe8dce36/attachment.html>

Gluster users - Feb 2019 - Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)