thr3ads.net - Gluster users - [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Raghavendra Gowdappa

2019-Feb-09 03:22 UTC

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at
gmail.com>
wrote:
> Hi Nithya,
>
> I can try to disable write-behind as long as it doesn't heavily impact
> performance for us. Which option is it exactly? I don't see it set in
my
> list of changed volume variables that I sent you guys earlier.
>
The option is performance.write-behind

> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
>
> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha at
redhat.com>
> wrote:
>
>> Hi Artem,
>>
>> We have found the cause of one crash. Unfortunately we have not managed
>> to reproduce the one you reported so we don't know if it is the
same cause.
>>
>> Can you disable write-behind on the volume and let us know if it solves
>> the problem? If yes, it is likely to be the same issue.
>>
>>
>> regards,
>> Nithya
>>
>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810 at
gmail.com>
>> wrote:
>>
>>> Sorry to disappoint, but the crash just happened again, so
lru-limit=0
>>> didn't help.
>>>
>>> Here's the snippet of the crash and the subsequent remount by
monit.
>>>
>>>
>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7f4402b99329]
>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>> valid argument]
>>> The message "I [MSGID: 108031]
>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-<SNIP>_data1-replicate-0:
>>> selecting local read_child <SNIP>_data1-client-3"
repeated 39 times between
>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>>> The message "E [MSGID: 101191]
>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to
dispatch
>>> handler" repeated 515 times between [2019-02-08
01:11:17.932515] and
>>> [2019-02-08 01:13:09.311554]
>>> pending frames:
>>> frame : type(1) op(LOOKUP)
>>> frame : type(0) op(0)
>>> patchset: git://git.gluster.org/glusterfs.git
>>> signal received: 6
>>> time of crash:
>>> 2019-02-08 01:13:09
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 5.3
>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>
>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>
>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>
>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>> ---------
>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030]
[glusterfsd.c:2715:main]
>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
5.3
>>> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>>> --volfile-server=localhost --volfile-id=/<SNIP>_data1
/mnt/<SNIP>_data1)
>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started
thread
>>> with index 1
>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started
thread
>>> with index 2
>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started
thread
>>> with index 3
>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started
thread
>>> with index 4
>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020]
[client.c:2354:notify]
>>> 0-<SNIP>_data1-client-0: parent translators are ready,
attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020]
[client.c:2354:notify]
>>> 0-<SNIP>_data1-client-1: parent translators are ready,
attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020]
[client.c:2354:notify]
>>> 0-<SNIP>_data1-client-2: parent translators are ready,
attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020]
[client.c:2354:notify]
>>> 0-<SNIP>_data1-client-3: parent translators are ready,
attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
>>> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0)
>>> Final graph:
>>>
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> <http://twitter.com/ArtemR>
>>>
>>>
>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon810 at
gmail.com>
>>> wrote:
>>>
>>>> I've added the lru-limit=0 parameter to the mounts, and I
see it's
>>>> taken effect correctly:
>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>>>> --volfile-server=localhost --volfile-id=/<SNIP> 
/mnt/<SNIP>"
>>>>
>>>> Let's see if it stops crashing or not.
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>,
APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii
<archon810 at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Nithya,
>>>>>
>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I
started seeing
>>>>> crashes, and no further releases have been made yet.
>>>>>
>>>>> volume info:
>>>>> Type: Replicate
>>>>> Volume ID: ****SNIP****
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x 4 = 4
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: ****SNIP****
>>>>> Brick2: ****SNIP****
>>>>> Brick3: ****SNIP****
>>>>> Brick4: ****SNIP****
>>>>> Options Reconfigured:
>>>>> cluster.quorum-count: 1
>>>>> cluster.quorum-type: fixed
>>>>> network.ping-timeout: 5
>>>>> network.remote-dio: enable
>>>>> performance.rda-cache-limit: 256MB
>>>>> performance.readdir-ahead: on
>>>>> performance.parallel-readdir: on
>>>>> network.inode-lru-limit: 500000
>>>>> performance.md-cache-timeout: 600
>>>>> performance.cache-invalidation: on
>>>>> performance.stat-prefetch: on
>>>>> features.cache-invalidation-timeout: 600
>>>>> features.cache-invalidation: on
>>>>> cluster.readdir-optimize: on
>>>>> performance.io-thread-count: 32
>>>>> server.event-threads: 4
>>>>> client.event-threads: 4
>>>>> performance.read-ahead: off
>>>>> cluster.lookup-optimize: on
>>>>> performance.cache-size: 1GB
>>>>> cluster.self-heal-daemon: enable
>>>>> transport.address-family: inet
>>>>> nfs.disable: on
>>>>> performance.client-io-threads: on
>>>>> cluster.granular-entry-heal: enable
>>>>> cluster.data-self-heal-algorithm: full
>>>>>
>>>>> Sincerely,
>>>>> Artem
>>>>>
>>>>> --
>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> beerpla.net | +ArtemRussakovskii
>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>> <http://twitter.com/ArtemR>
>>>>>
>>>>>
>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <
>>>>> nbalacha at redhat.com> wrote:
>>>>>
>>>>>> Hi Artem,
>>>>>>
>>>>>> Do you still see the crashes with 5.3? If yes, please
try mount the
>>>>>> volume using the mount option lru-limit=0 and see if
that helps. We are
>>>>>> looking into the crashes and will update when have a
fix.
>>>>>>
>>>>>> Also, please provide the gluster volume info for the
volume in
>>>>>> question.
>>>>>>
>>>>>>
>>>>>> regards,
>>>>>> Nithya
>>>>>>
>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii
<archon810 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The fuse crash happened two more times, but this
time monit helped
>>>>>>> recover within 1 minute, so it's a great
workaround for now.
>>>>>>>
>>>>>>> What's odd is that the crashes are only
happening on one of 4
>>>>>>> servers, and I don't know why.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii
<
>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>
>>>>>>>> The fuse crash happened again yesterday, to
another volume. Are
>>>>>>>> there any mount options that could help
mitigate this?
>>>>>>>>
>>>>>>>> In the meantime, I set up a monit
(https://mmonit.com/monit/) task
>>>>>>>> to watch and restart the mount, which works and
recovers the mount point
>>>>>>>> within a minute. Not ideal, but a temporary
workaround.
>>>>>>>>
>>>>>>>> By the way, the way to reproduce this
"Transport endpoint is not
>>>>>>>> connected" condition for testing purposes
is to kill -9 the right
>>>>>>>> "glusterfs --process-name fuse"
process.
>>>>>>>>
>>>>>>>>
>>>>>>>> monit check:
>>>>>>>> check filesystem glusterfs_data1 with path
/mnt/glusterfs_data1
>>>>>>>>   start program  = "/bin/mount 
/mnt/glusterfs_data1"
>>>>>>>>   stop program  = "/bin/umount
/mnt/glusterfs_data1"
>>>>>>>>   if space usage > 90% for 5 times within 15
cycles
>>>>>>>>     then alert else if succeeded for 10 cycles
then alert
>>>>>>>>
>>>>>>>>
>>>>>>>> stack trace:
>>>>>>>> [2019-02-01 23:22:00.312894] W
[dict.c:761:dict_ref]
>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>> [0x7fa0249e4329]
>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL
[Invalid argument]
>>>>>>>> [2019-02-01 23:22:00.314051] W
[dict.c:761:dict_ref]
>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>> [0x7fa0249e4329]
>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL
[Invalid argument]
>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
0-epoll: Failed to dispatch
>>>>>>>> handler" repeated 26 times between
[2019-02-01 23:21:20.857333] and
>>>>>>>> [2019-02-01 23:21:56.164427]
>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-SITE_data3-replicate-0:
>>>>>>>> selecting local read_child
SITE_data3-client-3" repeated 27 times between
>>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01
23:22:03.474036]
>>>>>>>> pending frames:
>>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>> signal received: 6
>>>>>>>> time of crash:
>>>>>>>> 2019-02-01 23:22:03
>>>>>>>> configuration details:
>>>>>>>> argp 1
>>>>>>>> backtrace 1
>>>>>>>> dlfcn 1
>>>>>>>> libpthread 1
>>>>>>>> llistxattr 1
>>>>>>>> setfsid 1
>>>>>>>> spinlock 1
>>>>>>>> epoll.h 1
>>>>>>>> xattr.h 1
>>>>>>>> st_atim.tv_nsec 1
>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>>>
>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>>>
>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>>>
>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>>>
>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Artem
>>>>>>>>
>>>>>>>> --
>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem
Russakovskii <
>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> The first (and so far only) crash happened
at 2am the next day
>>>>>>>>> after we upgraded, on only one of four
servers and only to one of two
>>>>>>>>> mounts.
>>>>>>>>>
>>>>>>>>> I have no idea what caused it, but yeah, we
do have a pretty busy
>>>>>>>>> site (apkmirror.com), and it caused a
disruption for any uploads
>>>>>>>>> or downloads from that server until I woke
up and fixed the mount.
>>>>>>>>>
>>>>>>>>> I wish I could be more helpful but all I
have is that stack trace.
>>>>>>>>>
>>>>>>>>> I'm glad it's a blocker and will
hopefully be resolved soon.
>>>>>>>>>
>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi
Suryanarayan <
>>>>>>>>> atumball at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Artem,
>>>>>>>>>>
>>>>>>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie,
>>>>>>>>>> as a clone of other bugs where recent
discussions happened), and marked it
>>>>>>>>>> as a blocker for glusterfs-5.4 release.
>>>>>>>>>>
>>>>>>>>>> We already have fixes for log flooding
-
>>>>>>>>>> https://review.gluster.org/22128, and
are the process of
>>>>>>>>>> identifying and fixing the issue seen
with crash.
>>>>>>>>>>
>>>>>>>>>> Can you please tell if the crashes
happened as soon as upgrade ?
>>>>>>>>>> or was there any particular pattern you
observed before the crash.
>>>>>>>>>>
>>>>>>>>>> -Amar
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem
Russakovskii <
>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Within 24 hours after updating from
rock solid 4.1 to 5.3, I
>>>>>>>>>>> already got a crash which others
have mentioned in
>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to
>>>>>>>>>>> unmount, kill gluster, and remount:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [2019-01-31 09:38:04.317604] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>> [2019-01-31 09:38:04.319308] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>> [2019-01-31 09:38:04.320047] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>> [2019-01-31 09:38:04.320677] W
[dict.c:761:dict_ref]
>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is
NULL [Invalid argument]
>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>> selecting local read_child
SITE_data1-client-3" repeated 5 times between
>>>>>>>>>>> [2019-01-31 09:37:54.751905] and
[2019-01-31 09:38:03.958061]
>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>> handler" repeated 72 times
between [2019-01-31 09:37:53.746741] and
>>>>>>>>>>> [2019-01-31 09:38:04.696993]
>>>>>>>>>>> pending frames:
>>>>>>>>>>> frame : type(1) op(READ)
>>>>>>>>>>> frame : type(1) op(OPEN)
>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>> signal received: 6
>>>>>>>>>>> time of crash:
>>>>>>>>>>> 2019-01-31 09:38:04
>>>>>>>>>>> configuration details:
>>>>>>>>>>> argp 1
>>>>>>>>>>> backtrace 1
>>>>>>>>>>> dlfcn 1
>>>>>>>>>>> libpthread 1
>>>>>>>>>>> llistxattr 1
>>>>>>>>>>> setfsid 1
>>>>>>>>>>> spinlock 1
>>>>>>>>>>> epoll.h 1
>>>>>>>>>>> xattr.h 1
>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>>>
>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>>>> ---------
>>>>>>>>>>>
>>>>>>>>>>> Do the pending patches fix the
crash or only the repeated
>>>>>>>>>>> warnings? I'm running glusterfs
on OpenSUSE 15.0 installed via
>>>>>>>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>>>> not too sure how to make it core
dump.
>>>>>>>>>>>
>>>>>>>>>>> If it's not fixed by the
patches above, has anyone already
>>>>>>>>>>> opened a ticket for the crashes
that I can join and monitor? This is going
>>>>>>>>>>> to create a massive problem for us
since production systems are crashing.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> Sincerely,
>>>>>>>>>>> Artem
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM
Raghavendra Gowdappa <
>>>>>>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM
Artem Russakovskii <
>>>>>>>>>>>> archon810 at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Also, not sure if related
or not, but I got a ton of these
>>>>>>>>>>>>> "Failed to dispatch
handler" in my logs as well. Many people have been
>>>>>>>>>>>>> commenting about this issue
here
>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
https://review.gluster.org/#/c/glusterfs/+/22046/ addresses
>>>>>>>>>>>> this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> ==> mnt-SITE_data1.log
<=>>>>>>>>>>>>>> [2019-01-30
20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fd9731ea218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log <=>>>>>>>>>>>>>>
The message "E [MSGID: 101191]
>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>> handler" repeated
413 times between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>>>>> [2019-01-30
20:38:20.015593]
>>>>>>>>>>>>>> The message "I
[MSGID: 108031]
>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>> selecting local
read_child SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>>>>> [2019-01-30
20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log <=>>>>>>>>>>>>>>
The message "I [MSGID: 108031]
>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>> selecting local
read_child SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>>>>> [2019-01-30
20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>>>> The message "E
[MSGID: 101191]
>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>> handler" repeated
2654 times between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>>>>> [2019-01-30
20:38:20.546355]
>>>>>>>>>>>>>> [2019-01-30
20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>> selecting local
read_child SITE_data1-client-0
>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log <=>>>>>>>>>>>>>>
[2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>> selecting local
read_child SITE_data3-client-0
>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log <=>>>>>>>>>>>>>>
[2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm hoping raising the
issue here on the mailing list may
>>>>>>>>>>>>> bring some additional
eyeballs and get them both fixed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 30, 2019 at
12:26 PM Artem Russakovskii <
>>>>>>>>>>>>> archon810 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I found a similar issue
here:
>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's
>>>>>>>>>>>>>> a comment from 3 days
ago from someone else with 5.3 who started seeing the
>>>>>>>>>>>>>> spam.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's the command
that repeats over and over:
>>>>>>>>>>>>>> [2019-01-30
20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>> [0x7fd9731ea218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> +Milind Changire <mchangir
at redhat.com> Can you check why this
>>>>>>>>>>>> message is logged and send a
fix?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there any fix for
this issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at
gluster.org
>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Amar Tumballi (amarts)
>>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190209/f034ee9f/attachment.html>

Artem Russakovskii

2019-Feb-09 22:17 UTC

head link

[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Alright. I've enabled core-dumping (hopefully), so now I'm waiting for
the
next crash to see if it dumps a core for you guys to remotely debug.

Then I can consider setting performance.write-behind to off and monitoring
for further crashes.

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>


On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
wrote:
>
>
> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at
gmail.com>
> wrote:
>
>> Hi Nithya,
>>
>> I can try to disable write-behind as long as it doesn't heavily
impact
>> performance for us. Which option is it exactly? I don't see it set
in my
>> list of changed volume variables that I sent you guys earlier.
>>
>
> The option is performance.write-behind
>
>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>>
>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha at
redhat.com>
>> wrote:
>>
>>> Hi Artem,
>>>
>>> We have found the cause of one crash. Unfortunately we have not
managed
>>> to reproduce the one you reported so we don't know if it is the
same cause.
>>>
>>> Can you disable write-behind on the volume and let us know if it
solves
>>> the problem? If yes, it is likely to be the same issue.
>>>
>>>
>>> regards,
>>> Nithya
>>>
>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810 at
gmail.com>
>>> wrote:
>>>
>>>> Sorry to disappoint, but the crash just happened again, so
lru-limit=0
>>>> didn't help.
>>>>
>>>> Here's the snippet of the crash and the subsequent remount
by monit.
>>>>
>>>>
>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>> [0x7f4402b99329]
>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>> [0x7f4402daaaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>>> valid argument]
>>>> The message "I [MSGID: 108031]
>>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-<SNIP>_data1-replicate-0:
>>>> selecting local read_child <SNIP>_data1-client-3"
repeated 39 times between
>>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>>>> The message "E [MSGID: 101191]
>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed
to dispatch
>>>> handler" repeated 515 times between [2019-02-08
01:11:17.932515] and
>>>> [2019-02-08 01:13:09.311554]
>>>> pending frames:
>>>> frame : type(1) op(LOOKUP)
>>>> frame : type(0) op(0)
>>>> patchset: git://git.gluster.org/glusterfs.git
>>>> signal received: 6
>>>> time of crash:
>>>> 2019-02-08 01:13:09
>>>> configuration details:
>>>> argp 1
>>>> backtrace 1
>>>> dlfcn 1
>>>> libpthread 1
>>>> llistxattr 1
>>>> setfsid 1
>>>> spinlock 1
>>>> epoll.h 1
>>>> xattr.h 1
>>>> st_atim.tv_nsec 1
>>>> package-string: glusterfs 5.3
>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>>>
>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>>> ---------
>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030]
[glusterfsd.c:2715:main]
>>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs
version 5.3
>>>> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>>>> --volfile-server=localhost --volfile-id=/<SNIP>_data1
/mnt/<SNIP>_data1)
>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>> with index 1
>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>> with index 2
>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>> with index 3
>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>> with index 4
>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020]
[client.c:2354:notify]
>>>> 0-<SNIP>_data1-client-0: parent translators are ready,
attempting connect
>>>> on transport
>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020]
[client.c:2354:notify]
>>>> 0-<SNIP>_data1-client-1: parent translators are ready,
attempting connect
>>>> on transport
>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020]
[client.c:2354:notify]
>>>> 0-<SNIP>_data1-client-2: parent translators are ready,
attempting connect
>>>> on transport
>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020]
[client.c:2354:notify]
>>>> 0-<SNIP>_data1-client-3: parent translators are ready,
attempting connect
>>>> on transport
>>>> [2019-02-08 01:13:35.655527] I
[rpc-clnt.c:2042:rpc_clnt_reconfig]
>>>> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0)
>>>> Final graph:
>>>>
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>,
APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon810
at gmail.com>
>>>> wrote:
>>>>
>>>>> I've added the lru-limit=0 parameter to the mounts, and
I see it's
>>>>> taken effect correctly:
>>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>>>>> --volfile-server=localhost --volfile-id=/<SNIP> 
/mnt/<SNIP>"
>>>>>
>>>>> Let's see if it stops crashing or not.
>>>>>
>>>>> Sincerely,
>>>>> Artem
>>>>>
>>>>> --
>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> beerpla.net | +ArtemRussakovskii
>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>> <http://twitter.com/ArtemR>
>>>>>
>>>>>
>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <
>>>>> archon810 at gmail.com> wrote:
>>>>>
>>>>>> Hi Nithya,
>>>>>>
>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I
started seeing
>>>>>> crashes, and no further releases have been made yet.
>>>>>>
>>>>>> volume info:
>>>>>> Type: Replicate
>>>>>> Volume ID: ****SNIP****
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: ****SNIP****
>>>>>> Brick2: ****SNIP****
>>>>>> Brick3: ****SNIP****
>>>>>> Brick4: ****SNIP****
>>>>>> Options Reconfigured:
>>>>>> cluster.quorum-count: 1
>>>>>> cluster.quorum-type: fixed
>>>>>> network.ping-timeout: 5
>>>>>> network.remote-dio: enable
>>>>>> performance.rda-cache-limit: 256MB
>>>>>> performance.readdir-ahead: on
>>>>>> performance.parallel-readdir: on
>>>>>> network.inode-lru-limit: 500000
>>>>>> performance.md-cache-timeout: 600
>>>>>> performance.cache-invalidation: on
>>>>>> performance.stat-prefetch: on
>>>>>> features.cache-invalidation-timeout: 600
>>>>>> features.cache-invalidation: on
>>>>>> cluster.readdir-optimize: on
>>>>>> performance.io-thread-count: 32
>>>>>> server.event-threads: 4
>>>>>> client.event-threads: 4
>>>>>> performance.read-ahead: off
>>>>>> cluster.lookup-optimize: on
>>>>>> performance.cache-size: 1GB
>>>>>> cluster.self-heal-daemon: enable
>>>>>> transport.address-family: inet
>>>>>> nfs.disable: on
>>>>>> performance.client-io-threads: on
>>>>>> cluster.granular-entry-heal: enable
>>>>>> cluster.data-self-heal-algorithm: full
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran
<
>>>>>> nbalacha at redhat.com> wrote:
>>>>>>
>>>>>>> Hi Artem,
>>>>>>>
>>>>>>> Do you still see the crashes with 5.3? If yes,
please try mount the
>>>>>>> volume using the mount option lru-limit=0 and see
if that helps. We are
>>>>>>> looking into the crashes and will update when have
a fix.
>>>>>>>
>>>>>>> Also, please provide the gluster volume info for
the volume in
>>>>>>> question.
>>>>>>>
>>>>>>>
>>>>>>> regards,
>>>>>>> Nithya
>>>>>>>
>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii
<archon810 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The fuse crash happened two more times, but
this time monit helped
>>>>>>>> recover within 1 minute, so it's a great
workaround for now.
>>>>>>>>
>>>>>>>> What's odd is that the crashes are only
happening on one of 4
>>>>>>>> servers, and I don't know why.
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Artem
>>>>>>>>
>>>>>>>> --
>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem
Russakovskii <
>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The fuse crash happened again yesterday, to
another volume. Are
>>>>>>>>> there any mount options that could help
mitigate this?
>>>>>>>>>
>>>>>>>>> In the meantime, I set up a monit
(https://mmonit.com/monit/)
>>>>>>>>> task to watch and restart the mount, which
works and recovers the mount
>>>>>>>>> point within a minute. Not ideal, but a
temporary workaround.
>>>>>>>>>
>>>>>>>>> By the way, the way to reproduce this
"Transport endpoint is not
>>>>>>>>> connected" condition for testing
purposes is to kill -9 the right
>>>>>>>>> "glusterfs --process-name fuse"
process.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> monit check:
>>>>>>>>> check filesystem glusterfs_data1 with path
/mnt/glusterfs_data1
>>>>>>>>>   start program  = "/bin/mount 
/mnt/glusterfs_data1"
>>>>>>>>>   stop program  = "/bin/umount
/mnt/glusterfs_data1"
>>>>>>>>>   if space usage > 90% for 5 times
within 15 cycles
>>>>>>>>>     then alert else if succeeded for 10
cycles then alert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> stack trace:
>>>>>>>>> [2019-02-01 23:22:00.312894] W
[dict.c:761:dict_ref]
>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL
[Invalid argument]
>>>>>>>>> [2019-02-01 23:22:00.314051] W
[dict.c:761:dict_ref]
>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>> [0x7fa0249e4329]
>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>> [0x7fa024bf5af5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL
[Invalid argument]
>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>>>>>>>> handler" repeated 26 times between
[2019-02-01 23:21:20.857333] and
>>>>>>>>> [2019-02-01 23:21:56.164427]
>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk]
0-SITE_data3-replicate-0:
>>>>>>>>> selecting local read_child
SITE_data3-client-3" repeated 27 times between
>>>>>>>>> [2019-02-01 23:21:11.142467] and
[2019-02-01 23:22:03.474036]
>>>>>>>>> pending frames:
>>>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>>>> frame : type(0) op(0)
>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>> signal received: 6
>>>>>>>>> time of crash:
>>>>>>>>> 2019-02-01 23:22:03
>>>>>>>>> configuration details:
>>>>>>>>> argp 1
>>>>>>>>> backtrace 1
>>>>>>>>> dlfcn 1
>>>>>>>>> libpthread 1
>>>>>>>>> llistxattr 1
>>>>>>>>> setfsid 1
>>>>>>>>> spinlock 1
>>>>>>>>> epoll.h 1
>>>>>>>>> xattr.h 1
>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>>>>
>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>>>>
>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>>>>
>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>>>>
>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Artem
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>>> <http://www.apkmirror.com/>,
Illogical Robot LLC
>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem
Russakovskii <
>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> The first (and so far only) crash
happened at 2am the next day
>>>>>>>>>> after we upgraded, on only one of four
servers and only to one of two
>>>>>>>>>> mounts.
>>>>>>>>>>
>>>>>>>>>> I have no idea what caused it, but
yeah, we do have a pretty busy
>>>>>>>>>> site (apkmirror.com), and it caused a
disruption for any uploads
>>>>>>>>>> or downloads from that server until I
woke up and fixed the mount.
>>>>>>>>>>
>>>>>>>>>> I wish I could be more helpful but all
I have is that stack
>>>>>>>>>> trace.
>>>>>>>>>>
>>>>>>>>>> I'm glad it's a blocker and
will hopefully be resolved soon.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar
Tumballi Suryanarayan <
>>>>>>>>>> atumball at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Artem,
>>>>>>>>>>>
>>>>>>>>>>> Opened
https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie,
>>>>>>>>>>> as a clone of other bugs where
recent discussions happened), and marked it
>>>>>>>>>>> as a blocker for glusterfs-5.4
release.
>>>>>>>>>>>
>>>>>>>>>>> We already have fixes for log
flooding -
>>>>>>>>>>> https://review.gluster.org/22128,
and are the process of
>>>>>>>>>>> identifying and fixing the issue
seen with crash.
>>>>>>>>>>>
>>>>>>>>>>> Can you please tell if the crashes
happened as soon as upgrade ?
>>>>>>>>>>> or was there any particular pattern
you observed before the crash.
>>>>>>>>>>>
>>>>>>>>>>> -Amar
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM
Artem Russakovskii <
>>>>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Within 24 hours after updating
from rock solid 4.1 to 5.3, I
>>>>>>>>>>>> already got a crash which
others have mentioned in
>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to
>>>>>>>>>>>> unmount, kill gluster, and
remount:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [2019-01-31 09:38:04.317604] W
[dict.c:761:dict_ref]
>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict
is NULL [Invalid argument]
>>>>>>>>>>>> [2019-01-31 09:38:04.319308] W
[dict.c:761:dict_ref]
>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict
is NULL [Invalid argument]
>>>>>>>>>>>> [2019-01-31 09:38:04.320047] W
[dict.c:761:dict_ref]
>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict
is NULL [Invalid argument]
>>>>>>>>>>>> [2019-01-31 09:38:04.320677] W
[dict.c:761:dict_ref]
>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>> [0x7fcccb1deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict
is NULL [Invalid argument]
>>>>>>>>>>>> The message "I [MSGID:
108031]
>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>> selecting local read_child
SITE_data1-client-3" repeated 5 times between
>>>>>>>>>>>> [2019-01-31 09:37:54.751905]
and [2019-01-31 09:38:03.958061]
>>>>>>>>>>>> The message "E [MSGID:
101191]
>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>> handler" repeated 72 times
between [2019-01-31 09:37:53.746741] and
>>>>>>>>>>>> [2019-01-31 09:38:04.696993]
>>>>>>>>>>>> pending frames:
>>>>>>>>>>>> frame : type(1) op(READ)
>>>>>>>>>>>> frame : type(1) op(OPEN)
>>>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>>>> patchset:
git://git.gluster.org/glusterfs.git
>>>>>>>>>>>> signal received: 6
>>>>>>>>>>>> time of crash:
>>>>>>>>>>>> 2019-01-31 09:38:04
>>>>>>>>>>>> configuration details:
>>>>>>>>>>>> argp 1
>>>>>>>>>>>> backtrace 1
>>>>>>>>>>>> dlfcn 1
>>>>>>>>>>>> libpthread 1
>>>>>>>>>>>> llistxattr 1
>>>>>>>>>>>> setfsid 1
>>>>>>>>>>>> spinlock 1
>>>>>>>>>>>> epoll.h 1
>>>>>>>>>>>> xattr.h 1
>>>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>>>>>
>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>>>>>
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>>>>>
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>>>>>
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>>>>>
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>>>>>
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>>>>
>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>>>>
>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>>>>>
>>>>>>>>>>>>
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>>>>
>>>>>>>>>>>>
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>>>>>
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>>>>>
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>>>>>
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>>>>> ---------
>>>>>>>>>>>>
>>>>>>>>>>>> Do the pending patches fix the
crash or only the repeated
>>>>>>>>>>>> warnings? I'm running
glusterfs on OpenSUSE 15.0 installed via
>>>>>>>>>>>>
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>>>>> not too sure how to make it
core dump.
>>>>>>>>>>>>
>>>>>>>>>>>> If it's not fixed by the
patches above, has anyone already
>>>>>>>>>>>> opened a ticket for the crashes
that I can join and monitor? This is going
>>>>>>>>>>>> to create a massive problem for
us since production systems are crashing.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>> Artem
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM
Raghavendra Gowdappa <
>>>>>>>>>>>> rgowdapp at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jan 31, 2019 at
2:14 AM Artem Russakovskii <
>>>>>>>>>>>>> archon810 at gmail.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, not sure if
related or not, but I got a ton of these
>>>>>>>>>>>>>> "Failed to
dispatch handler" in my logs as well. Many people have been
>>>>>>>>>>>>>> commenting about this
issue here
>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
https://review.gluster.org/#/c/glusterfs/+/22046/ addresses
>>>>>>>>>>>>> this.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>> [2019-01-30
20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>> [0x7fd9731ea218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log
<=>>>>>>>>>>>>>>> The message
"E [MSGID: 101191]
>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>> handler"
repeated 413 times between [2019-01-30 20:36:23.881090] and
>>>>>>>>>>>>>>> [2019-01-30
20:38:20.015593]
>>>>>>>>>>>>>>> The message "I
[MSGID: 108031]
>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>> selecting local
read_child SITE_data3-client-0" repeated 42 times between
>>>>>>>>>>>>>>> [2019-01-30
20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>> The message
"I [MSGID: 108031]
>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>> selecting local
read_child SITE_data1-client-0" repeated 50 times between
>>>>>>>>>>>>>>> [2019-01-30
20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>>>>> The message "E
[MSGID: 101191]
>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>> handler"
repeated 2654 times between [2019-01-30 20:36:22.667327] and
>>>>>>>>>>>>>>> [2019-01-30
20:38:20.546355]
>>>>>>>>>>>>>>> [2019-01-30
20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>>>>>>> selecting local
read_child SITE_data1-client-0
>>>>>>>>>>>>>>> ==>
mnt-SITE_data3.log
<=>>>>>>>>>>>>>>> [2019-01-30
20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>>>>>
[afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>>>>>>>>>>>> selecting local
read_child SITE_data3-client-0
>>>>>>>>>>>>>>> ==>
mnt-SITE_data1.log
<=>>>>>>>>>>>>>>> [2019-01-30
20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>>>>>
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm hoping raising
the issue here on the mailing list may
>>>>>>>>>>>>>> bring some additional
eyeballs and get them both fixed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK
>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at
12:26 PM Artem Russakovskii <
>>>>>>>>>>>>>> archon810 at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I found a similar
issue here:
>>>>>>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1313567.
>>>>>>>>>>>>>>> There's a
comment from 3 days ago from someone else with 5.3 who started
>>>>>>>>>>>>>>> seeing the spam.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here's the
command that repeats over and over:
>>>>>>>>>>>>>>> [2019-01-30
20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>>>>>
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>>>>>
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>>>> [0x7fd9671deaf5]
-->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>>>> [0x7fd9731ea218] )
2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> +Milind Changire
<mchangir at redhat.com> Can you check why this
>>>>>>>>>>>>> message is logged and send
a fix?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there any fix
for this issue?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Founder, Android
Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>>>> Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>>>> beerpla.net |
+ArtemRussakovskii
>>>>>>>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>>>>
<http://twitter.com/ArtemR>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing
list
>>>>>>>>>>>>>> Gluster-users at
gluster.org
>>>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Amar Tumballi (amarts)
>>>>>>>>>>>
>>>>>>>>>>
_______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190209/e05e3ad6/attachment.html>

Gluster users - Feb 2019 - Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)