Raghavendra Gowdappa
2019-Feb-09 03:22 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at gmail.com> wrote:> Hi Nithya, > > I can try to disable write-behind as long as it doesn't heavily impact > performance for us. Which option is it exactly? I don't see it set in my > list of changed volume variables that I sent you guys earlier. >The option is performance.write-behind> Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > <http://twitter.com/ArtemR> > > > On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha at redhat.com> > wrote: > >> Hi Artem, >> >> We have found the cause of one crash. Unfortunately we have not managed >> to reproduce the one you reported so we don't know if it is the same cause. >> >> Can you disable write-behind on the volume and let us know if it solves >> the problem? If yes, it is likely to be the same issue. >> >> >> regards, >> Nithya >> >> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810 at gmail.com> >> wrote: >> >>> Sorry to disappoint, but the crash just happened again, so lru-limit=0 >>> didn't help. >>> >>> Here's the snippet of the crash and the subsequent remount by monit. >>> >>> >>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] >>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>> [0x7f4402b99329] >>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In >>> valid argument] >>> The message "I [MSGID: 108031] >>> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: >>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between >>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] >>> The message "E [MSGID: 101191] >>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and >>> [2019-02-08 01:13:09.311554] >>> pending frames: >>> frame : type(1) op(LOOKUP) >>> frame : type(0) op(0) >>> patchset: git://git.gluster.org/glusterfs.git >>> signal received: 6 >>> time of crash: >>> 2019-02-08 01:13:09 >>> configuration details: >>> argp 1 >>> backtrace 1 >>> dlfcn 1 >>> libpthread 1 >>> llistxattr 1 >>> setfsid 1 >>> spinlock 1 >>> epoll.h 1 >>> xattr.h 1 >>> st_atim.tv_nsec 1 >>> package-string: glusterfs 5.3 >>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] >>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] >>> /lib64/libc.so.6(+0x36160)[0x7f440a887160] >>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] >>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] >>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] >>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] >>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] >>> >>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] >>> >>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] >>> >>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] >>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] >>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] >>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] >>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] >>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] >>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] >>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] >>> --------- >>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main] >>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 >>> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>> --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1) >>> [2019-02-08 01:13:35.637830] I [MSGID: 101190] >>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 1 >>> [2019-02-08 01:13:35.651405] I [MSGID: 101190] >>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 2 >>> [2019-02-08 01:13:35.651628] I [MSGID: 101190] >>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 3 >>> [2019-02-08 01:13:35.651747] I [MSGID: 101190] >>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>> with index 4 >>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] >>> 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect >>> on transport >>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] >>> 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect >>> on transport >>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] >>> 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect >>> on transport >>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] >>> 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect >>> on transport >>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] >>> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0) >>> Final graph: >>> >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>> <http://www.apkmirror.com/>, Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>> <http://twitter.com/ArtemR> >>> >>> >>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon810 at gmail.com> >>> wrote: >>> >>>> I've added the lru-limit=0 parameter to the mounts, and I see it's >>>> taken effect correctly: >>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>>> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >>>> >>>> Let's see if it stops crashing or not. >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>> <http://twitter.com/ArtemR> >>>> >>>> >>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <archon810 at gmail.com> >>>> wrote: >>>> >>>>> Hi Nithya, >>>>> >>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing >>>>> crashes, and no further releases have been made yet. >>>>> >>>>> volume info: >>>>> Type: Replicate >>>>> Volume ID: ****SNIP**** >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 1 x 4 = 4 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: ****SNIP**** >>>>> Brick2: ****SNIP**** >>>>> Brick3: ****SNIP**** >>>>> Brick4: ****SNIP**** >>>>> Options Reconfigured: >>>>> cluster.quorum-count: 1 >>>>> cluster.quorum-type: fixed >>>>> network.ping-timeout: 5 >>>>> network.remote-dio: enable >>>>> performance.rda-cache-limit: 256MB >>>>> performance.readdir-ahead: on >>>>> performance.parallel-readdir: on >>>>> network.inode-lru-limit: 500000 >>>>> performance.md-cache-timeout: 600 >>>>> performance.cache-invalidation: on >>>>> performance.stat-prefetch: on >>>>> features.cache-invalidation-timeout: 600 >>>>> features.cache-invalidation: on >>>>> cluster.readdir-optimize: on >>>>> performance.io-thread-count: 32 >>>>> server.event-threads: 4 >>>>> client.event-threads: 4 >>>>> performance.read-ahead: off >>>>> cluster.lookup-optimize: on >>>>> performance.cache-size: 1GB >>>>> cluster.self-heal-daemon: enable >>>>> transport.address-family: inet >>>>> nfs.disable: on >>>>> performance.client-io-threads: on >>>>> cluster.granular-entry-heal: enable >>>>> cluster.data-self-heal-algorithm: full >>>>> >>>>> Sincerely, >>>>> Artem >>>>> >>>>> -- >>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>> beerpla.net | +ArtemRussakovskii >>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>> <http://twitter.com/ArtemR> >>>>> >>>>> >>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran < >>>>> nbalacha at redhat.com> wrote: >>>>> >>>>>> Hi Artem, >>>>>> >>>>>> Do you still see the crashes with 5.3? If yes, please try mount the >>>>>> volume using the mount option lru-limit=0 and see if that helps. We are >>>>>> looking into the crashes and will update when have a fix. >>>>>> >>>>>> Also, please provide the gluster volume info for the volume in >>>>>> question. >>>>>> >>>>>> >>>>>> regards, >>>>>> Nithya >>>>>> >>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon810 at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> The fuse crash happened two more times, but this time monit helped >>>>>>> recover within 1 minute, so it's a great workaround for now. >>>>>>> >>>>>>> What's odd is that the crashes are only happening on one of 4 >>>>>>> servers, and I don't know why. >>>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>>> >>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>>>> archon810 at gmail.com> wrote: >>>>>>> >>>>>>>> The fuse crash happened again yesterday, to another volume. Are >>>>>>>> there any mount options that could help mitigate this? >>>>>>>> >>>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) task >>>>>>>> to watch and restart the mount, which works and recovers the mount point >>>>>>>> within a minute. Not ideal, but a temporary workaround. >>>>>>>> >>>>>>>> By the way, the way to reproduce this "Transport endpoint is not >>>>>>>> connected" condition for testing purposes is to kill -9 the right >>>>>>>> "glusterfs --process-name fuse" process. >>>>>>>> >>>>>>>> >>>>>>>> monit check: >>>>>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 >>>>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>>>> then alert else if succeeded for 10 cycles then alert >>>>>>>> >>>>>>>> >>>>>>>> stack trace: >>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>> [0x7fa0249e4329] >>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>> [0x7fa0249e4329] >>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>> The message "E [MSGID: 101191] >>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>>>>> [2019-02-01 23:21:56.164427] >>>>>>>> The message "I [MSGID: 108031] >>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>>>> pending frames: >>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>> frame : type(0) op(0) >>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>> signal received: 6 >>>>>>>> time of crash: >>>>>>>> 2019-02-01 23:22:03 >>>>>>>> configuration details: >>>>>>>> argp 1 >>>>>>>> backtrace 1 >>>>>>>> dlfcn 1 >>>>>>>> libpthread 1 >>>>>>>> llistxattr 1 >>>>>>>> setfsid 1 >>>>>>>> spinlock 1 >>>>>>>> epoll.h 1 >>>>>>>> xattr.h 1 >>>>>>>> st_atim.tv_nsec 1 >>>>>>>> package-string: glusterfs 5.3 >>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>>>> >>>>>>>> Sincerely, >>>>>>>> Artem >>>>>>>> >>>>>>>> -- >>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>> <http://twitter.com/ArtemR> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>>>> archon810 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> The first (and so far only) crash happened at 2am the next day >>>>>>>>> after we upgraded, on only one of four servers and only to one of two >>>>>>>>> mounts. >>>>>>>>> >>>>>>>>> I have no idea what caused it, but yeah, we do have a pretty busy >>>>>>>>> site (apkmirror.com), and it caused a disruption for any uploads >>>>>>>>> or downloads from that server until I woke up and fixed the mount. >>>>>>>>> >>>>>>>>> I wish I could be more helpful but all I have is that stack trace. >>>>>>>>> >>>>>>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>>>>>> >>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>>>> atumball at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Artem, >>>>>>>>>> >>>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, >>>>>>>>>> as a clone of other bugs where recent discussions happened), and marked it >>>>>>>>>> as a blocker for glusterfs-5.4 release. >>>>>>>>>> >>>>>>>>>> We already have fixes for log flooding - >>>>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>>>> >>>>>>>>>> Can you please tell if the crashes happened as soon as upgrade ? >>>>>>>>>> or was there any particular pattern you observed before the crash. >>>>>>>>>> >>>>>>>>>> -Amar >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I >>>>>>>>>>> already got a crash which others have mentioned in >>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to >>>>>>>>>>> unmount, kill gluster, and remount: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>>>> pending frames: >>>>>>>>>>> frame : type(1) op(READ) >>>>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>> signal received: 6 >>>>>>>>>>> time of crash: >>>>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>>>> configuration details: >>>>>>>>>>> argp 1 >>>>>>>>>>> backtrace 1 >>>>>>>>>>> dlfcn 1 >>>>>>>>>>> libpthread 1 >>>>>>>>>>> llistxattr 1 >>>>>>>>>>> setfsid 1 >>>>>>>>>>> spinlock 1 >>>>>>>>>>> epoll.h 1 >>>>>>>>>>> xattr.h 1 >>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>>>> --------- >>>>>>>>>>> >>>>>>>>>>> Do the pending patches fix the crash or only the repeated >>>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>>>> not too sure how to make it core dump. >>>>>>>>>>> >>>>>>>>>>> If it's not fixed by the patches above, has anyone already >>>>>>>>>>> opened a ticket for the crashes that I can join and monitor? This is going >>>>>>>>>>> to create a massive problem for us since production systems are crashing. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> Sincerely, >>>>>>>>>>> Artem >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of these >>>>>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have been >>>>>>>>>>>>> commenting about this issue here >>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses >>>>>>>>>>>> this. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>> handler >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list may >>>>>>>>>>>>> bring some additional eyeballs and get them both fixed. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>> Artem >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's >>>>>>>>>>>>>> a comment from 3 days ago from someone else with 5.3 who started seeing the >>>>>>>>>>>>>> spam. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check why this >>>>>>>>>>>> message is logged and send a fix? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>> Artem >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Amar Tumballi (amarts) >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190209/f034ee9f/attachment.html>
Artem Russakovskii
2019-Feb-09 22:17 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
Alright. I've enabled core-dumping (hopefully), so now I'm waiting for the next crash to see if it dumps a core for you guys to remotely debug. Then I can consider setting performance.write-behind to off and monitoring for further crashes. Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <rgowdapp at redhat.com> wrote:> > > On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at gmail.com> > wrote: > >> Hi Nithya, >> >> I can try to disable write-behind as long as it doesn't heavily impact >> performance for us. Which option is it exactly? I don't see it set in my >> list of changed volume variables that I sent you guys earlier. >> > > The option is performance.write-behind > > >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >> <http://twitter.com/ArtemR> >> >> >> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha at redhat.com> >> wrote: >> >>> Hi Artem, >>> >>> We have found the cause of one crash. Unfortunately we have not managed >>> to reproduce the one you reported so we don't know if it is the same cause. >>> >>> Can you disable write-behind on the volume and let us know if it solves >>> the problem? If yes, it is likely to be the same issue. >>> >>> >>> regards, >>> Nithya >>> >>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810 at gmail.com> >>> wrote: >>> >>>> Sorry to disappoint, but the crash just happened again, so lru-limit=0 >>>> didn't help. >>>> >>>> Here's the snippet of the crash and the subsequent remount by monit. >>>> >>>> >>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] >>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>> [0x7f4402b99329] >>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In >>>> valid argument] >>>> The message "I [MSGID: 108031] >>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: >>>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between >>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] >>>> The message "E [MSGID: 101191] >>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and >>>> [2019-02-08 01:13:09.311554] >>>> pending frames: >>>> frame : type(1) op(LOOKUP) >>>> frame : type(0) op(0) >>>> patchset: git://git.gluster.org/glusterfs.git >>>> signal received: 6 >>>> time of crash: >>>> 2019-02-08 01:13:09 >>>> configuration details: >>>> argp 1 >>>> backtrace 1 >>>> dlfcn 1 >>>> libpthread 1 >>>> llistxattr 1 >>>> setfsid 1 >>>> spinlock 1 >>>> epoll.h 1 >>>> xattr.h 1 >>>> st_atim.tv_nsec 1 >>>> package-string: glusterfs 5.3 >>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] >>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] >>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160] >>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] >>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] >>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] >>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] >>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] >>>> >>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] >>>> >>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] >>>> >>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] >>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] >>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] >>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] >>>> >>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] >>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] >>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] >>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] >>>> --------- >>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main] >>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 >>>> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>>> --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1) >>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190] >>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>> with index 1 >>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190] >>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>> with index 2 >>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190] >>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>> with index 3 >>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190] >>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>> with index 4 >>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] >>>> 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect >>>> on transport >>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] >>>> 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect >>>> on transport >>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] >>>> 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect >>>> on transport >>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] >>>> 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect >>>> on transport >>>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] >>>> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0) >>>> Final graph: >>>> >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>> <http://twitter.com/ArtemR> >>>> >>>> >>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon810 at gmail.com> >>>> wrote: >>>> >>>>> I've added the lru-limit=0 parameter to the mounts, and I see it's >>>>> taken effect correctly: >>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>>>> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >>>>> >>>>> Let's see if it stops crashing or not. >>>>> >>>>> Sincerely, >>>>> Artem >>>>> >>>>> -- >>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>> beerpla.net | +ArtemRussakovskii >>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>> <http://twitter.com/ArtemR> >>>>> >>>>> >>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii < >>>>> archon810 at gmail.com> wrote: >>>>> >>>>>> Hi Nithya, >>>>>> >>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing >>>>>> crashes, and no further releases have been made yet. >>>>>> >>>>>> volume info: >>>>>> Type: Replicate >>>>>> Volume ID: ****SNIP**** >>>>>> Status: Started >>>>>> Snapshot Count: 0 >>>>>> Number of Bricks: 1 x 4 = 4 >>>>>> Transport-type: tcp >>>>>> Bricks: >>>>>> Brick1: ****SNIP**** >>>>>> Brick2: ****SNIP**** >>>>>> Brick3: ****SNIP**** >>>>>> Brick4: ****SNIP**** >>>>>> Options Reconfigured: >>>>>> cluster.quorum-count: 1 >>>>>> cluster.quorum-type: fixed >>>>>> network.ping-timeout: 5 >>>>>> network.remote-dio: enable >>>>>> performance.rda-cache-limit: 256MB >>>>>> performance.readdir-ahead: on >>>>>> performance.parallel-readdir: on >>>>>> network.inode-lru-limit: 500000 >>>>>> performance.md-cache-timeout: 600 >>>>>> performance.cache-invalidation: on >>>>>> performance.stat-prefetch: on >>>>>> features.cache-invalidation-timeout: 600 >>>>>> features.cache-invalidation: on >>>>>> cluster.readdir-optimize: on >>>>>> performance.io-thread-count: 32 >>>>>> server.event-threads: 4 >>>>>> client.event-threads: 4 >>>>>> performance.read-ahead: off >>>>>> cluster.lookup-optimize: on >>>>>> performance.cache-size: 1GB >>>>>> cluster.self-heal-daemon: enable >>>>>> transport.address-family: inet >>>>>> nfs.disable: on >>>>>> performance.client-io-threads: on >>>>>> cluster.granular-entry-heal: enable >>>>>> cluster.data-self-heal-algorithm: full >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>>> >>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran < >>>>>> nbalacha at redhat.com> wrote: >>>>>> >>>>>>> Hi Artem, >>>>>>> >>>>>>> Do you still see the crashes with 5.3? If yes, please try mount the >>>>>>> volume using the mount option lru-limit=0 and see if that helps. We are >>>>>>> looking into the crashes and will update when have a fix. >>>>>>> >>>>>>> Also, please provide the gluster volume info for the volume in >>>>>>> question. >>>>>>> >>>>>>> >>>>>>> regards, >>>>>>> Nithya >>>>>>> >>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon810 at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> The fuse crash happened two more times, but this time monit helped >>>>>>>> recover within 1 minute, so it's a great workaround for now. >>>>>>>> >>>>>>>> What's odd is that the crashes are only happening on one of 4 >>>>>>>> servers, and I don't know why. >>>>>>>> >>>>>>>> Sincerely, >>>>>>>> Artem >>>>>>>> >>>>>>>> -- >>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>> <http://twitter.com/ArtemR> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>>>>> archon810 at gmail.com> wrote: >>>>>>>> >>>>>>>>> The fuse crash happened again yesterday, to another volume. Are >>>>>>>>> there any mount options that could help mitigate this? >>>>>>>>> >>>>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) >>>>>>>>> task to watch and restart the mount, which works and recovers the mount >>>>>>>>> point within a minute. Not ideal, but a temporary workaround. >>>>>>>>> >>>>>>>>> By the way, the way to reproduce this "Transport endpoint is not >>>>>>>>> connected" condition for testing purposes is to kill -9 the right >>>>>>>>> "glusterfs --process-name fuse" process. >>>>>>>>> >>>>>>>>> >>>>>>>>> monit check: >>>>>>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 >>>>>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>>>>> then alert else if succeeded for 10 cycles then alert >>>>>>>>> >>>>>>>>> >>>>>>>>> stack trace: >>>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>> [0x7fa0249e4329] >>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>> [0x7fa0249e4329] >>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>>>>>> [2019-02-01 23:21:56.164427] >>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>>>>> pending frames: >>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>> frame : type(0) op(0) >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>> signal received: 6 >>>>>>>>> time of crash: >>>>>>>>> 2019-02-01 23:22:03 >>>>>>>>> configuration details: >>>>>>>>> argp 1 >>>>>>>>> backtrace 1 >>>>>>>>> dlfcn 1 >>>>>>>>> libpthread 1 >>>>>>>>> llistxattr 1 >>>>>>>>> setfsid 1 >>>>>>>>> spinlock 1 >>>>>>>>> epoll.h 1 >>>>>>>>> xattr.h 1 >>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>>>>> >>>>>>>>> Sincerely, >>>>>>>>> Artem >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> The first (and so far only) crash happened at 2am the next day >>>>>>>>>> after we upgraded, on only one of four servers and only to one of two >>>>>>>>>> mounts. >>>>>>>>>> >>>>>>>>>> I have no idea what caused it, but yeah, we do have a pretty busy >>>>>>>>>> site (apkmirror.com), and it caused a disruption for any uploads >>>>>>>>>> or downloads from that server until I woke up and fixed the mount. >>>>>>>>>> >>>>>>>>>> I wish I could be more helpful but all I have is that stack >>>>>>>>>> trace. >>>>>>>>>> >>>>>>>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>>>>>>> >>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>>>>> atumball at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Artem, >>>>>>>>>>> >>>>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, >>>>>>>>>>> as a clone of other bugs where recent discussions happened), and marked it >>>>>>>>>>> as a blocker for glusterfs-5.4 release. >>>>>>>>>>> >>>>>>>>>>> We already have fixes for log flooding - >>>>>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>>>>> >>>>>>>>>>> Can you please tell if the crashes happened as soon as upgrade ? >>>>>>>>>>> or was there any particular pattern you observed before the crash. >>>>>>>>>>> >>>>>>>>>>> -Amar >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I >>>>>>>>>>>> already got a crash which others have mentioned in >>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to >>>>>>>>>>>> unmount, kill gluster, and remount: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>>>>> pending frames: >>>>>>>>>>>> frame : type(1) op(READ) >>>>>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>> signal received: 6 >>>>>>>>>>>> time of crash: >>>>>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>>>>> configuration details: >>>>>>>>>>>> argp 1 >>>>>>>>>>>> backtrace 1 >>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>> libpthread 1 >>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>> setfsid 1 >>>>>>>>>>>> spinlock 1 >>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>>>>> >>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>>>>> >>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>>>>> >>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>>>>> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>>>>> >>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>>>>> --------- >>>>>>>>>>>> >>>>>>>>>>>> Do the pending patches fix the crash or only the repeated >>>>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>>>>> not too sure how to make it core dump. >>>>>>>>>>>> >>>>>>>>>>>> If it's not fixed by the patches above, has anyone already >>>>>>>>>>>> opened a ticket for the crashes that I can join and monitor? This is going >>>>>>>>>>>> to create a massive problem for us since production systems are crashing. >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> Sincerely, >>>>>>>>>>>> Artem >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of these >>>>>>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have been >>>>>>>>>>>>>> commenting about this issue here >>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses >>>>>>>>>>>>> this. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>> handler >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list may >>>>>>>>>>>>>> bring some additional eyeballs and get them both fixed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>> Artem >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. >>>>>>>>>>>>>>> There's a comment from 3 days ago from someone else with 5.3 who started >>>>>>>>>>>>>>> seeing the spam. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check why this >>>>>>>>>>>>> message is logged and send a fix? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Amar Tumballi (amarts) >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190209/e05e3ad6/attachment.html>