João Baúto
2019-Feb-11 10:18 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
Although I don't have these error messages, I'm having fuse crashes as frequent as you. I have disabled write-behind and the mount has been running over the weekend with heavy usage and no issues. I can provide coredumps before disabling write-behind if needed. I opened a BZ report <https://bugzilla.redhat.com/show_bug.cgi?id=1671014> with the crashes that I was having. *Jo?o Ba?to* --------------- *Scientific Computing and Software Platform* Champalimaud Research Champalimaud Center for the Unknown Av. Bras?lia, Doca de Pedrou?os 1400-038 Lisbon, Portugal fchampalimaud.org <https://www.fchampalimaud.org/> Artem Russakovskii <archon810 at gmail.com> escreveu no dia s?bado, 9/02/2019 ?(s) 22:18:> Alright. I've enabled core-dumping (hopefully), so now I'm waiting for the > next crash to see if it dumps a core for you guys to remotely debug. > > Then I can consider setting performance.write-behind to off and monitoring > for further crashes. > > Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > <http://twitter.com/ArtemR> > > > On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > >> >> >> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at gmail.com> >> wrote: >> >>> Hi Nithya, >>> >>> I can try to disable write-behind as long as it doesn't heavily impact >>> performance for us. Which option is it exactly? I don't see it set in my >>> list of changed volume variables that I sent you guys earlier. >>> >> >> The option is performance.write-behind >> >> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>> <http://www.apkmirror.com/>, Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>> <http://twitter.com/ArtemR> >>> >>> >>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha at redhat.com> >>> wrote: >>> >>>> Hi Artem, >>>> >>>> We have found the cause of one crash. Unfortunately we have not managed >>>> to reproduce the one you reported so we don't know if it is the same cause. >>>> >>>> Can you disable write-behind on the volume and let us know if it solves >>>> the problem? If yes, it is likely to be the same issue. >>>> >>>> >>>> regards, >>>> Nithya >>>> >>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810 at gmail.com> >>>> wrote: >>>> >>>>> Sorry to disappoint, but the crash just happened again, so lru-limit=0 >>>>> didn't help. >>>>> >>>>> Here's the snippet of the crash and the subsequent remount by monit. >>>>> >>>>> >>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] >>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>> [0x7f4402b99329] >>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In >>>>> valid argument] >>>>> The message "I [MSGID: 108031] >>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: >>>>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between >>>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] >>>>> The message "E [MSGID: 101191] >>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and >>>>> [2019-02-08 01:13:09.311554] >>>>> pending frames: >>>>> frame : type(1) op(LOOKUP) >>>>> frame : type(0) op(0) >>>>> patchset: git://git.gluster.org/glusterfs.git >>>>> signal received: 6 >>>>> time of crash: >>>>> 2019-02-08 01:13:09 >>>>> configuration details: >>>>> argp 1 >>>>> backtrace 1 >>>>> dlfcn 1 >>>>> libpthread 1 >>>>> llistxattr 1 >>>>> setfsid 1 >>>>> spinlock 1 >>>>> epoll.h 1 >>>>> xattr.h 1 >>>>> st_atim.tv_nsec 1 >>>>> package-string: glusterfs 5.3 >>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] >>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] >>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160] >>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] >>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] >>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] >>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] >>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] >>>>> >>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] >>>>> >>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] >>>>> >>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] >>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] >>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] >>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] >>>>> >>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] >>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] >>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] >>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] >>>>> --------- >>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] >>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running >>>>> /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 >>>>> --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1 >>>>> /mnt/<SNIP>_data1) >>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190] >>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>> with index 1 >>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190] >>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>> with index 2 >>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190] >>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>> with index 3 >>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190] >>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>> with index 4 >>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] >>>>> 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect >>>>> on transport >>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] >>>>> 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect >>>>> on transport >>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] >>>>> 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect >>>>> on transport >>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] >>>>> 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect >>>>> on transport >>>>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] >>>>> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0) >>>>> Final graph: >>>>> >>>>> >>>>> Sincerely, >>>>> Artem >>>>> >>>>> -- >>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>> beerpla.net | +ArtemRussakovskii >>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>> <http://twitter.com/ArtemR> >>>>> >>>>> >>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon810 at gmail.com> >>>>> wrote: >>>>> >>>>>> I've added the lru-limit=0 parameter to the mounts, and I see it's >>>>>> taken effect correctly: >>>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>>>>> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >>>>>> >>>>>> Let's see if it stops crashing or not. >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>>> >>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii < >>>>>> archon810 at gmail.com> wrote: >>>>>> >>>>>>> Hi Nithya, >>>>>>> >>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing >>>>>>> crashes, and no further releases have been made yet. >>>>>>> >>>>>>> volume info: >>>>>>> Type: Replicate >>>>>>> Volume ID: ****SNIP**** >>>>>>> Status: Started >>>>>>> Snapshot Count: 0 >>>>>>> Number of Bricks: 1 x 4 = 4 >>>>>>> Transport-type: tcp >>>>>>> Bricks: >>>>>>> Brick1: ****SNIP**** >>>>>>> Brick2: ****SNIP**** >>>>>>> Brick3: ****SNIP**** >>>>>>> Brick4: ****SNIP**** >>>>>>> Options Reconfigured: >>>>>>> cluster.quorum-count: 1 >>>>>>> cluster.quorum-type: fixed >>>>>>> network.ping-timeout: 5 >>>>>>> network.remote-dio: enable >>>>>>> performance.rda-cache-limit: 256MB >>>>>>> performance.readdir-ahead: on >>>>>>> performance.parallel-readdir: on >>>>>>> network.inode-lru-limit: 500000 >>>>>>> performance.md-cache-timeout: 600 >>>>>>> performance.cache-invalidation: on >>>>>>> performance.stat-prefetch: on >>>>>>> features.cache-invalidation-timeout: 600 >>>>>>> features.cache-invalidation: on >>>>>>> cluster.readdir-optimize: on >>>>>>> performance.io-thread-count: 32 >>>>>>> server.event-threads: 4 >>>>>>> client.event-threads: 4 >>>>>>> performance.read-ahead: off >>>>>>> cluster.lookup-optimize: on >>>>>>> performance.cache-size: 1GB >>>>>>> cluster.self-heal-daemon: enable >>>>>>> transport.address-family: inet >>>>>>> nfs.disable: on >>>>>>> performance.client-io-threads: on >>>>>>> cluster.granular-entry-heal: enable >>>>>>> cluster.data-self-heal-algorithm: full >>>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran < >>>>>>> nbalacha at redhat.com> wrote: >>>>>>> >>>>>>>> Hi Artem, >>>>>>>> >>>>>>>> Do you still see the crashes with 5.3? If yes, please try mount the >>>>>>>> volume using the mount option lru-limit=0 and see if that helps. We are >>>>>>>> looking into the crashes and will update when have a fix. >>>>>>>> >>>>>>>> Also, please provide the gluster volume info for the volume in >>>>>>>> question. >>>>>>>> >>>>>>>> >>>>>>>> regards, >>>>>>>> Nithya >>>>>>>> >>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii < >>>>>>>> archon810 at gmail.com> wrote: >>>>>>>> >>>>>>>>> The fuse crash happened two more times, but this time monit helped >>>>>>>>> recover within 1 minute, so it's a great workaround for now. >>>>>>>>> >>>>>>>>> What's odd is that the crashes are only happening on one of 4 >>>>>>>>> servers, and I don't know why. >>>>>>>>> >>>>>>>>> Sincerely, >>>>>>>>> Artem >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> The fuse crash happened again yesterday, to another volume. Are >>>>>>>>>> there any mount options that could help mitigate this? >>>>>>>>>> >>>>>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) >>>>>>>>>> task to watch and restart the mount, which works and recovers the mount >>>>>>>>>> point within a minute. Not ideal, but a temporary workaround. >>>>>>>>>> >>>>>>>>>> By the way, the way to reproduce this "Transport endpoint is not >>>>>>>>>> connected" condition for testing purposes is to kill -9 the right >>>>>>>>>> "glusterfs --process-name fuse" process. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> monit check: >>>>>>>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 >>>>>>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>>>>>> then alert else if succeeded for 10 cycles then alert >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> stack trace: >>>>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>>>>>>> [2019-02-01 23:21:56.164427] >>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>>>>>> pending frames: >>>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>>> frame : type(0) op(0) >>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>> signal received: 6 >>>>>>>>>> time of crash: >>>>>>>>>> 2019-02-01 23:22:03 >>>>>>>>>> configuration details: >>>>>>>>>> argp 1 >>>>>>>>>> backtrace 1 >>>>>>>>>> dlfcn 1 >>>>>>>>>> libpthread 1 >>>>>>>>>> llistxattr 1 >>>>>>>>>> setfsid 1 >>>>>>>>>> spinlock 1 >>>>>>>>>> epoll.h 1 >>>>>>>>>> xattr.h 1 >>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>>>>>> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Artem >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> The first (and so far only) crash happened at 2am the next day >>>>>>>>>>> after we upgraded, on only one of four servers and only to one of two >>>>>>>>>>> mounts. >>>>>>>>>>> >>>>>>>>>>> I have no idea what caused it, but yeah, we do have a pretty >>>>>>>>>>> busy site (apkmirror.com), and it caused a disruption for any >>>>>>>>>>> uploads or downloads from that server until I woke up and fixed the mount. >>>>>>>>>>> >>>>>>>>>>> I wish I could be more helpful but all I have is that stack >>>>>>>>>>> trace. >>>>>>>>>>> >>>>>>>>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>>>>>>>> >>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>>>>>> atumball at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Artem, >>>>>>>>>>>> >>>>>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 >>>>>>>>>>>> (ie, as a clone of other bugs where recent discussions happened), and >>>>>>>>>>>> marked it as a blocker for glusterfs-5.4 release. >>>>>>>>>>>> >>>>>>>>>>>> We already have fixes for log flooding - >>>>>>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>>>>>> >>>>>>>>>>>> Can you please tell if the crashes happened as soon as upgrade >>>>>>>>>>>> ? or was there any particular pattern you observed before the crash. >>>>>>>>>>>> >>>>>>>>>>>> -Amar >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I >>>>>>>>>>>>> already got a crash which others have mentioned in >>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had >>>>>>>>>>>>> to unmount, kill gluster, and remount: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>>>>>> pending frames: >>>>>>>>>>>>> frame : type(1) op(READ) >>>>>>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>>> signal received: 6 >>>>>>>>>>>>> time of crash: >>>>>>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>>>>>> configuration details: >>>>>>>>>>>>> argp 1 >>>>>>>>>>>>> backtrace 1 >>>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>>> libpthread 1 >>>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>>> setfsid 1 >>>>>>>>>>>>> spinlock 1 >>>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>>>>>> >>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>>>>>> >>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>>>>>> >>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>>>>>> >>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>>>>>> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>>>>>> >>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>>>>>> --------- >>>>>>>>>>>>> >>>>>>>>>>>>> Do the pending patches fix the crash or only the repeated >>>>>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>>>>>> not too sure how to make it core dump. >>>>>>>>>>>>> >>>>>>>>>>>>> If it's not fixed by the patches above, has anyone already >>>>>>>>>>>>> opened a ticket for the crashes that I can join and monitor? This is going >>>>>>>>>>>>> to create a massive problem for us since production systems are crashing. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>> Artem >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of these >>>>>>>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have been >>>>>>>>>>>>>>> commenting about this issue here >>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses >>>>>>>>>>>>>> this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>> handler >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list may >>>>>>>>>>>>>>> bring some additional eyeballs and get them both fixed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. >>>>>>>>>>>>>>>> There's a comment from 3 days ago from someone else with 5.3 who started >>>>>>>>>>>>>>>> seeing the spam. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check why >>>>>>>>>>>>>> this message is logged and send a fix? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Amar Tumballi (amarts) >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190211/aa109cf5/attachment.html>
Raghavendra Gowdappa
2019-Feb-12 03:19 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
On Mon, Feb 11, 2019 at 3:49 PM Jo?o Ba?to < joao.bauto at neuro.fchampalimaud.org> wrote:> Although I don't have these error messages, I'm having fuse crashes as > frequent as you. I have disabled write-behind and the mount has been > running over the weekend with heavy usage and no issues. >The issue you are facing will likely be fixed by patch [1]. Me, Xavi and Nithya were able to identify the corruption in write-behind. [1] https://review.gluster.org/22189> I can provide coredumps before disabling write-behind if needed. I opened > a BZ report <https://bugzilla.redhat.com/show_bug.cgi?id=1671014> with > the crashes that I was having. > > *Jo?o Ba?to* > --------------- > > *Scientific Computing and Software Platform* > Champalimaud Research > Champalimaud Center for the Unknown > Av. Bras?lia, Doca de Pedrou?os > 1400-038 Lisbon, Portugal > fchampalimaud.org <https://www.fchampalimaud.org/> > > > Artem Russakovskii <archon810 at gmail.com> escreveu no dia s?bado, > 9/02/2019 ?(s) 22:18: > >> Alright. I've enabled core-dumping (hopefully), so now I'm waiting for >> the next crash to see if it dumps a core for you guys to remotely debug. >> >> Then I can consider setting performance.write-behind to off and >> monitoring for further crashes. >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >> <http://twitter.com/ArtemR> >> >> >> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <rgowdapp at redhat.com> >> wrote: >> >>> >>> >>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at gmail.com> >>> wrote: >>> >>>> Hi Nithya, >>>> >>>> I can try to disable write-behind as long as it doesn't heavily impact >>>> performance for us. Which option is it exactly? I don't see it set in my >>>> list of changed volume variables that I sent you guys earlier. >>>> >>> >>> The option is performance.write-behind >>> >>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>> <http://twitter.com/ArtemR> >>>> >>>> >>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha at redhat.com> >>>> wrote: >>>> >>>>> Hi Artem, >>>>> >>>>> We have found the cause of one crash. Unfortunately we have not >>>>> managed to reproduce the one you reported so we don't know if it is the >>>>> same cause. >>>>> >>>>> Can you disable write-behind on the volume and let us know if it >>>>> solves the problem? If yes, it is likely to be the same issue. >>>>> >>>>> >>>>> regards, >>>>> Nithya >>>>> >>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810 at gmail.com> >>>>> wrote: >>>>> >>>>>> Sorry to disappoint, but the crash just happened again, so >>>>>> lru-limit=0 didn't help. >>>>>> >>>>>> Here's the snippet of the crash and the subsequent remount by monit. >>>>>> >>>>>> >>>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] >>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>> [0x7f4402b99329] >>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In >>>>>> valid argument] >>>>>> The message "I [MSGID: 108031] >>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: >>>>>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between >>>>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] >>>>>> The message "E [MSGID: 101191] >>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and >>>>>> [2019-02-08 01:13:09.311554] >>>>>> pending frames: >>>>>> frame : type(1) op(LOOKUP) >>>>>> frame : type(0) op(0) >>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>> signal received: 6 >>>>>> time of crash: >>>>>> 2019-02-08 01:13:09 >>>>>> configuration details: >>>>>> argp 1 >>>>>> backtrace 1 >>>>>> dlfcn 1 >>>>>> libpthread 1 >>>>>> llistxattr 1 >>>>>> setfsid 1 >>>>>> spinlock 1 >>>>>> epoll.h 1 >>>>>> xattr.h 1 >>>>>> st_atim.tv_nsec 1 >>>>>> package-string: glusterfs 5.3 >>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] >>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] >>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160] >>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] >>>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] >>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] >>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] >>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] >>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] >>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] >>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] >>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] >>>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] >>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] >>>>>> --------- >>>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] >>>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running >>>>>> /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 >>>>>> --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1 >>>>>> /mnt/<SNIP>_data1) >>>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190] >>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 1 >>>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190] >>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 2 >>>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190] >>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 3 >>>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190] >>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 4 >>>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] >>>>>> 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect >>>>>> on transport >>>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] >>>>>> 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect >>>>>> on transport >>>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] >>>>>> 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect >>>>>> on transport >>>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] >>>>>> 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect >>>>>> on transport >>>>>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] >>>>>> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0) >>>>>> Final graph: >>>>>> >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>>> >>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii < >>>>>> archon810 at gmail.com> wrote: >>>>>> >>>>>>> I've added the lru-limit=0 parameter to the mounts, and I see it's >>>>>>> taken effect correctly: >>>>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>>>>>> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >>>>>>> >>>>>>> Let's see if it stops crashing or not. >>>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii < >>>>>>> archon810 at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Nithya, >>>>>>>> >>>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing >>>>>>>> crashes, and no further releases have been made yet. >>>>>>>> >>>>>>>> volume info: >>>>>>>> Type: Replicate >>>>>>>> Volume ID: ****SNIP**** >>>>>>>> Status: Started >>>>>>>> Snapshot Count: 0 >>>>>>>> Number of Bricks: 1 x 4 = 4 >>>>>>>> Transport-type: tcp >>>>>>>> Bricks: >>>>>>>> Brick1: ****SNIP**** >>>>>>>> Brick2: ****SNIP**** >>>>>>>> Brick3: ****SNIP**** >>>>>>>> Brick4: ****SNIP**** >>>>>>>> Options Reconfigured: >>>>>>>> cluster.quorum-count: 1 >>>>>>>> cluster.quorum-type: fixed >>>>>>>> network.ping-timeout: 5 >>>>>>>> network.remote-dio: enable >>>>>>>> performance.rda-cache-limit: 256MB >>>>>>>> performance.readdir-ahead: on >>>>>>>> performance.parallel-readdir: on >>>>>>>> network.inode-lru-limit: 500000 >>>>>>>> performance.md-cache-timeout: 600 >>>>>>>> performance.cache-invalidation: on >>>>>>>> performance.stat-prefetch: on >>>>>>>> features.cache-invalidation-timeout: 600 >>>>>>>> features.cache-invalidation: on >>>>>>>> cluster.readdir-optimize: on >>>>>>>> performance.io-thread-count: 32 >>>>>>>> server.event-threads: 4 >>>>>>>> client.event-threads: 4 >>>>>>>> performance.read-ahead: off >>>>>>>> cluster.lookup-optimize: on >>>>>>>> performance.cache-size: 1GB >>>>>>>> cluster.self-heal-daemon: enable >>>>>>>> transport.address-family: inet >>>>>>>> nfs.disable: on >>>>>>>> performance.client-io-threads: on >>>>>>>> cluster.granular-entry-heal: enable >>>>>>>> cluster.data-self-heal-algorithm: full >>>>>>>> >>>>>>>> Sincerely, >>>>>>>> Artem >>>>>>>> >>>>>>>> -- >>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>> <http://twitter.com/ArtemR> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran < >>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi Artem, >>>>>>>>> >>>>>>>>> Do you still see the crashes with 5.3? If yes, please try mount >>>>>>>>> the volume using the mount option lru-limit=0 and see if that helps. We are >>>>>>>>> looking into the crashes and will update when have a fix. >>>>>>>>> >>>>>>>>> Also, please provide the gluster volume info for the volume in >>>>>>>>> question. >>>>>>>>> >>>>>>>>> >>>>>>>>> regards, >>>>>>>>> Nithya >>>>>>>>> >>>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii < >>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> The fuse crash happened two more times, but this time monit >>>>>>>>>> helped recover within 1 minute, so it's a great workaround for now. >>>>>>>>>> >>>>>>>>>> What's odd is that the crashes are only happening on one of 4 >>>>>>>>>> servers, and I don't know why. >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Artem >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> The fuse crash happened again yesterday, to another volume. Are >>>>>>>>>>> there any mount options that could help mitigate this? >>>>>>>>>>> >>>>>>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) >>>>>>>>>>> task to watch and restart the mount, which works and recovers the mount >>>>>>>>>>> point within a minute. Not ideal, but a temporary workaround. >>>>>>>>>>> >>>>>>>>>>> By the way, the way to reproduce this "Transport endpoint is not >>>>>>>>>>> connected" condition for testing purposes is to kill -9 the right >>>>>>>>>>> "glusterfs --process-name fuse" process. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> monit check: >>>>>>>>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 >>>>>>>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>>>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>>>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>>>>>>> then alert else if succeeded for 10 cycles then alert >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> stack trace: >>>>>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>>>>>>>> [2019-02-01 23:21:56.164427] >>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>>>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>>>>>>> pending frames: >>>>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>> signal received: 6 >>>>>>>>>>> time of crash: >>>>>>>>>>> 2019-02-01 23:22:03 >>>>>>>>>>> configuration details: >>>>>>>>>>> argp 1 >>>>>>>>>>> backtrace 1 >>>>>>>>>>> dlfcn 1 >>>>>>>>>>> libpthread 1 >>>>>>>>>>> llistxattr 1 >>>>>>>>>>> setfsid 1 >>>>>>>>>>> spinlock 1 >>>>>>>>>>> epoll.h 1 >>>>>>>>>>> xattr.h 1 >>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>>>>>>> >>>>>>>>>>> Sincerely, >>>>>>>>>>> Artem >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> The first (and so far only) crash happened at 2am the next day >>>>>>>>>>>> after we upgraded, on only one of four servers and only to one of two >>>>>>>>>>>> mounts. >>>>>>>>>>>> >>>>>>>>>>>> I have no idea what caused it, but yeah, we do have a pretty >>>>>>>>>>>> busy site (apkmirror.com), and it caused a disruption for any >>>>>>>>>>>> uploads or downloads from that server until I woke up and fixed the mount. >>>>>>>>>>>> >>>>>>>>>>>> I wish I could be more helpful but all I have is that stack >>>>>>>>>>>> trace. >>>>>>>>>>>> >>>>>>>>>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>>>>>>> atumball at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Artem, >>>>>>>>>>>>> >>>>>>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 >>>>>>>>>>>>> (ie, as a clone of other bugs where recent discussions happened), and >>>>>>>>>>>>> marked it as a blocker for glusterfs-5.4 release. >>>>>>>>>>>>> >>>>>>>>>>>>> We already have fixes for log flooding - >>>>>>>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>>>>>>> >>>>>>>>>>>>> Can you please tell if the crashes happened as soon as upgrade >>>>>>>>>>>>> ? or was there any particular pattern you observed before the crash. >>>>>>>>>>>>> >>>>>>>>>>>>> -Amar >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I >>>>>>>>>>>>>> already got a crash which others have mentioned in >>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had >>>>>>>>>>>>>> to unmount, kill gluster, and remount: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>>>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>>>>>>> pending frames: >>>>>>>>>>>>>> frame : type(1) op(READ) >>>>>>>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>>>> signal received: 6 >>>>>>>>>>>>>> time of crash: >>>>>>>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>>>>>>> configuration details: >>>>>>>>>>>>>> argp 1 >>>>>>>>>>>>>> backtrace 1 >>>>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>>>> libpthread 1 >>>>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>>>> setfsid 1 >>>>>>>>>>>>>> spinlock 1 >>>>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>>>>>>> --------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do the pending patches fix the crash or only the repeated >>>>>>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>>>>>>> not too sure how to make it core dump. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If it's not fixed by the patches above, has anyone already >>>>>>>>>>>>>> opened a ticket for the crashes that I can join and monitor? This is going >>>>>>>>>>>>>> to create a massive problem for us since production systems are crashing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>> Artem >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of these >>>>>>>>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have been >>>>>>>>>>>>>>>> commenting about this issue here >>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses >>>>>>>>>>>>>>> this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>> handler >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list may >>>>>>>>>>>>>>>> bring some additional eyeballs and get them both fixed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. >>>>>>>>>>>>>>>>> There's a comment from 3 days ago from someone else with 5.3 who started >>>>>>>>>>>>>>>>> seeing the spam. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check why >>>>>>>>>>>>>>> this message is logged and send a fix? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Amar Tumballi (amarts) >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190212/a2b08596/attachment-0001.html>
Raghavendra Gowdappa
2019-Feb-12 03:32 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
On Mon, Feb 11, 2019 at 3:49 PM Jo?o Ba?to < joao.bauto at neuro.fchampalimaud.org> wrote:> Although I don't have these error messages, I'm having fuse crashes as > frequent as you. I have disabled write-behind and the mount has been > running over the weekend with heavy usage and no issues. > > I can provide coredumps before disabling write-behind if needed. I opened > a BZ report <https://bugzilla.redhat.com/show_bug.cgi?id=1671014> with > the crashes that I was having. >I've created a bug <https://bugzilla.redhat.com/show_bug.cgi?id=1676356> and marked it as a blocker for release-6. I've marked bz 1671014 as a duplicate of this bug report on master. If you disagree about the bug you filed being a duplicate, please reopen.> *Jo?o Ba?to* > --------------- > > *Scientific Computing and Software Platform* > Champalimaud Research > Champalimaud Center for the Unknown > Av. Bras?lia, Doca de Pedrou?os > 1400-038 Lisbon, Portugal > fchampalimaud.org <https://www.fchampalimaud.org/> > > > Artem Russakovskii <archon810 at gmail.com> escreveu no dia s?bado, > 9/02/2019 ?(s) 22:18: > >> Alright. I've enabled core-dumping (hopefully), so now I'm waiting for >> the next crash to see if it dumps a core for you guys to remotely debug. >> >> Then I can consider setting performance.write-behind to off and >> monitoring for further crashes. >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >> <http://twitter.com/ArtemR> >> >> >> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <rgowdapp at redhat.com> >> wrote: >> >>> >>> >>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810 at gmail.com> >>> wrote: >>> >>>> Hi Nithya, >>>> >>>> I can try to disable write-behind as long as it doesn't heavily impact >>>> performance for us. Which option is it exactly? I don't see it set in my >>>> list of changed volume variables that I sent you guys earlier. >>>> >>> >>> The option is performance.write-behind >>> >>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>> <http://twitter.com/ArtemR> >>>> >>>> >>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha at redhat.com> >>>> wrote: >>>> >>>>> Hi Artem, >>>>> >>>>> We have found the cause of one crash. Unfortunately we have not >>>>> managed to reproduce the one you reported so we don't know if it is the >>>>> same cause. >>>>> >>>>> Can you disable write-behind on the volume and let us know if it >>>>> solves the problem? If yes, it is likely to be the same issue. >>>>> >>>>> >>>>> regards, >>>>> Nithya >>>>> >>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810 at gmail.com> >>>>> wrote: >>>>> >>>>>> Sorry to disappoint, but the crash just happened again, so >>>>>> lru-limit=0 didn't help. >>>>>> >>>>>> Here's the snippet of the crash and the subsequent remount by monit. >>>>>> >>>>>> >>>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] >>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>> [0x7f4402b99329] >>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In >>>>>> valid argument] >>>>>> The message "I [MSGID: 108031] >>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: >>>>>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between >>>>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] >>>>>> The message "E [MSGID: 101191] >>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and >>>>>> [2019-02-08 01:13:09.311554] >>>>>> pending frames: >>>>>> frame : type(1) op(LOOKUP) >>>>>> frame : type(0) op(0) >>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>> signal received: 6 >>>>>> time of crash: >>>>>> 2019-02-08 01:13:09 >>>>>> configuration details: >>>>>> argp 1 >>>>>> backtrace 1 >>>>>> dlfcn 1 >>>>>> libpthread 1 >>>>>> llistxattr 1 >>>>>> setfsid 1 >>>>>> spinlock 1 >>>>>> epoll.h 1 >>>>>> xattr.h 1 >>>>>> st_atim.tv_nsec 1 >>>>>> package-string: glusterfs 5.3 >>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] >>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] >>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160] >>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] >>>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] >>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] >>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] >>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] >>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] >>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] >>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] >>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] >>>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] >>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] >>>>>> --------- >>>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] >>>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running >>>>>> /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 >>>>>> --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1 >>>>>> /mnt/<SNIP>_data1) >>>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190] >>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 1 >>>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190] >>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 2 >>>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190] >>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 3 >>>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190] >>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>> with index 4 >>>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] >>>>>> 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect >>>>>> on transport >>>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] >>>>>> 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect >>>>>> on transport >>>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] >>>>>> 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect >>>>>> on transport >>>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] >>>>>> 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect >>>>>> on transport >>>>>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] >>>>>> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0) >>>>>> Final graph: >>>>>> >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>>> >>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii < >>>>>> archon810 at gmail.com> wrote: >>>>>> >>>>>>> I've added the lru-limit=0 parameter to the mounts, and I see it's >>>>>>> taken effect correctly: >>>>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>>>>>> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >>>>>>> >>>>>>> Let's see if it stops crashing or not. >>>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii < >>>>>>> archon810 at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Nithya, >>>>>>>> >>>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing >>>>>>>> crashes, and no further releases have been made yet. >>>>>>>> >>>>>>>> volume info: >>>>>>>> Type: Replicate >>>>>>>> Volume ID: ****SNIP**** >>>>>>>> Status: Started >>>>>>>> Snapshot Count: 0 >>>>>>>> Number of Bricks: 1 x 4 = 4 >>>>>>>> Transport-type: tcp >>>>>>>> Bricks: >>>>>>>> Brick1: ****SNIP**** >>>>>>>> Brick2: ****SNIP**** >>>>>>>> Brick3: ****SNIP**** >>>>>>>> Brick4: ****SNIP**** >>>>>>>> Options Reconfigured: >>>>>>>> cluster.quorum-count: 1 >>>>>>>> cluster.quorum-type: fixed >>>>>>>> network.ping-timeout: 5 >>>>>>>> network.remote-dio: enable >>>>>>>> performance.rda-cache-limit: 256MB >>>>>>>> performance.readdir-ahead: on >>>>>>>> performance.parallel-readdir: on >>>>>>>> network.inode-lru-limit: 500000 >>>>>>>> performance.md-cache-timeout: 600 >>>>>>>> performance.cache-invalidation: on >>>>>>>> performance.stat-prefetch: on >>>>>>>> features.cache-invalidation-timeout: 600 >>>>>>>> features.cache-invalidation: on >>>>>>>> cluster.readdir-optimize: on >>>>>>>> performance.io-thread-count: 32 >>>>>>>> server.event-threads: 4 >>>>>>>> client.event-threads: 4 >>>>>>>> performance.read-ahead: off >>>>>>>> cluster.lookup-optimize: on >>>>>>>> performance.cache-size: 1GB >>>>>>>> cluster.self-heal-daemon: enable >>>>>>>> transport.address-family: inet >>>>>>>> nfs.disable: on >>>>>>>> performance.client-io-threads: on >>>>>>>> cluster.granular-entry-heal: enable >>>>>>>> cluster.data-self-heal-algorithm: full >>>>>>>> >>>>>>>> Sincerely, >>>>>>>> Artem >>>>>>>> >>>>>>>> -- >>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>> <http://twitter.com/ArtemR> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran < >>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi Artem, >>>>>>>>> >>>>>>>>> Do you still see the crashes with 5.3? If yes, please try mount >>>>>>>>> the volume using the mount option lru-limit=0 and see if that helps. We are >>>>>>>>> looking into the crashes and will update when have a fix. >>>>>>>>> >>>>>>>>> Also, please provide the gluster volume info for the volume in >>>>>>>>> question. >>>>>>>>> >>>>>>>>> >>>>>>>>> regards, >>>>>>>>> Nithya >>>>>>>>> >>>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii < >>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> The fuse crash happened two more times, but this time monit >>>>>>>>>> helped recover within 1 minute, so it's a great workaround for now. >>>>>>>>>> >>>>>>>>>> What's odd is that the crashes are only happening on one of 4 >>>>>>>>>> servers, and I don't know why. >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Artem >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> The fuse crash happened again yesterday, to another volume. Are >>>>>>>>>>> there any mount options that could help mitigate this? >>>>>>>>>>> >>>>>>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) >>>>>>>>>>> task to watch and restart the mount, which works and recovers the mount >>>>>>>>>>> point within a minute. Not ideal, but a temporary workaround. >>>>>>>>>>> >>>>>>>>>>> By the way, the way to reproduce this "Transport endpoint is not >>>>>>>>>>> connected" condition for testing purposes is to kill -9 the right >>>>>>>>>>> "glusterfs --process-name fuse" process. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> monit check: >>>>>>>>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 >>>>>>>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>>>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>>>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>>>>>>> then alert else if succeeded for 10 cycles then alert >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> stack trace: >>>>>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>>>>>>>> [2019-02-01 23:21:56.164427] >>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>>>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>>>>>>> pending frames: >>>>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>> signal received: 6 >>>>>>>>>>> time of crash: >>>>>>>>>>> 2019-02-01 23:22:03 >>>>>>>>>>> configuration details: >>>>>>>>>>> argp 1 >>>>>>>>>>> backtrace 1 >>>>>>>>>>> dlfcn 1 >>>>>>>>>>> libpthread 1 >>>>>>>>>>> llistxattr 1 >>>>>>>>>>> setfsid 1 >>>>>>>>>>> spinlock 1 >>>>>>>>>>> epoll.h 1 >>>>>>>>>>> xattr.h 1 >>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>>>>>>> >>>>>>>>>>> Sincerely, >>>>>>>>>>> Artem >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> The first (and so far only) crash happened at 2am the next day >>>>>>>>>>>> after we upgraded, on only one of four servers and only to one of two >>>>>>>>>>>> mounts. >>>>>>>>>>>> >>>>>>>>>>>> I have no idea what caused it, but yeah, we do have a pretty >>>>>>>>>>>> busy site (apkmirror.com), and it caused a disruption for any >>>>>>>>>>>> uploads or downloads from that server until I woke up and fixed the mount. >>>>>>>>>>>> >>>>>>>>>>>> I wish I could be more helpful but all I have is that stack >>>>>>>>>>>> trace. >>>>>>>>>>>> >>>>>>>>>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>>>>>>> atumball at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Artem, >>>>>>>>>>>>> >>>>>>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 >>>>>>>>>>>>> (ie, as a clone of other bugs where recent discussions happened), and >>>>>>>>>>>>> marked it as a blocker for glusterfs-5.4 release. >>>>>>>>>>>>> >>>>>>>>>>>>> We already have fixes for log flooding - >>>>>>>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>>>>>>> >>>>>>>>>>>>> Can you please tell if the crashes happened as soon as upgrade >>>>>>>>>>>>> ? or was there any particular pattern you observed before the crash. >>>>>>>>>>>>> >>>>>>>>>>>>> -Amar >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I >>>>>>>>>>>>>> already got a crash which others have mentioned in >>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had >>>>>>>>>>>>>> to unmount, kill gluster, and remount: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>>>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>>>>>>> pending frames: >>>>>>>>>>>>>> frame : type(1) op(READ) >>>>>>>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>>>> signal received: 6 >>>>>>>>>>>>>> time of crash: >>>>>>>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>>>>>>> configuration details: >>>>>>>>>>>>>> argp 1 >>>>>>>>>>>>>> backtrace 1 >>>>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>>>> libpthread 1 >>>>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>>>> setfsid 1 >>>>>>>>>>>>>> spinlock 1 >>>>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>>>>>>> --------- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do the pending patches fix the crash or only the repeated >>>>>>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>>>>>>> not too sure how to make it core dump. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If it's not fixed by the patches above, has anyone already >>>>>>>>>>>>>> opened a ticket for the crashes that I can join and monitor? This is going >>>>>>>>>>>>>> to create a massive problem for us since production systems are crashing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>> Artem >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of these >>>>>>>>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have been >>>>>>>>>>>>>>>> commenting about this issue here >>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses >>>>>>>>>>>>>>> this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>> handler >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list may >>>>>>>>>>>>>>>> bring some additional eyeballs and get them both fixed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. >>>>>>>>>>>>>>>>> There's a comment from 3 days ago from someone else with 5.3 who started >>>>>>>>>>>>>>>>> seeing the spam. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check why >>>>>>>>>>>>>>> this message is logged and send a fix? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Amar Tumballi (amarts) >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190212/fe0e3322/attachment.html>