Artem Russakovskii
2019-Feb-12 06:14 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
Awesome. But is there a release schedule and an ETA for when these will be out in the repos? On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa <rgowdapp at redhat.com> wrote:> > > On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii <archon810 at gmail.com> > wrote: > >> Great job identifying the issue! >> >> Any ETA on the next release with the logging and crash fixes in it? >> > > I've marked write-behind corruption as a blocker for release-6. Logging > fixes are already in codebase. > > >> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa <rgowdapp at redhat.com> >> wrote: >> >>> >>> >>> On Mon, Feb 11, 2019 at 3:49 PM Jo?o Ba?to < >>> joao.bauto at neuro.fchampalimaud.org> wrote: >>> >>>> Although I don't have these error messages, I'm having fuse crashes as >>>> frequent as you. I have disabled write-behind and the mount has been >>>> running over the weekend with heavy usage and no issues. >>>> >>> >>> The issue you are facing will likely be fixed by patch [1]. Me, Xavi and >>> Nithya were able to identify the corruption in write-behind. >>> >>> [1] https://review.gluster.org/22189 >>> >>> >>>> I can provide coredumps before disabling write-behind if needed. I >>>> opened a BZ report >>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1671014> with the crashes >>>> that I was having. >>>> >>>> *Jo?o Ba?to* >>>> --------------- >>>> >>>> *Scientific Computing and Software Platform* >>>> Champalimaud Research >>>> Champalimaud Center for the Unknown >>>> Av. Bras?lia, Doca de Pedrou?os >>>> 1400-038 Lisbon, Portugal >>>> fchampalimaud.org <https://www.fchampalimaud.org/> >>>> >>>> >>>> Artem Russakovskii <archon810 at gmail.com> escreveu no dia s?bado, >>>> 9/02/2019 ?(s) 22:18: >>>> >>>>> Alright. I've enabled core-dumping (hopefully), so now I'm waiting for >>>>> the next crash to see if it dumps a core for you guys to remotely debug. >>>>> >>>>> Then I can consider setting performance.write-behind to off and >>>>> monitoring for further crashes. >>>>> >>>>> Sincerely, >>>>> Artem >>>>> >>>>> -- >>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>> beerpla.net | +ArtemRussakovskii >>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>> <http://twitter.com/ArtemR> >>>>> >>>>> >>>>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa < >>>>> rgowdapp at redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii < >>>>>> archon810 at gmail.com> wrote: >>>>>> >>>>>>> Hi Nithya, >>>>>>> >>>>>>> I can try to disable write-behind as long as it doesn't heavily >>>>>>> impact performance for us. Which option is it exactly? I don't see it set >>>>>>> in my list of changed volume variables that I sent you guys earlier. >>>>>>> >>>>>> >>>>>> The option is performance.write-behind >>>>>> >>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>>> >>>>>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran < >>>>>>> nbalacha at redhat.com> wrote: >>>>>>> >>>>>>>> Hi Artem, >>>>>>>> >>>>>>>> We have found the cause of one crash. Unfortunately we have not >>>>>>>> managed to reproduce the one you reported so we don't know if it is the >>>>>>>> same cause. >>>>>>>> >>>>>>>> Can you disable write-behind on the volume and let us know if it >>>>>>>> solves the problem? If yes, it is likely to be the same issue. >>>>>>>> >>>>>>>> >>>>>>>> regards, >>>>>>>> Nithya >>>>>>>> >>>>>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii < >>>>>>>> archon810 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Sorry to disappoint, but the crash just happened again, so >>>>>>>>> lru-limit=0 didn't help. >>>>>>>>> >>>>>>>>> Here's the snippet of the crash and the subsequent remount by >>>>>>>>> monit. >>>>>>>>> >>>>>>>>> >>>>>>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] >>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>> [0x7f4402b99329] >>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In >>>>>>>>> valid argument] >>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: >>>>>>>>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between >>>>>>>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] >>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>>>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and >>>>>>>>> [2019-02-08 01:13:09.311554] >>>>>>>>> pending frames: >>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>> frame : type(0) op(0) >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>> signal received: 6 >>>>>>>>> time of crash: >>>>>>>>> 2019-02-08 01:13:09 >>>>>>>>> configuration details: >>>>>>>>> argp 1 >>>>>>>>> backtrace 1 >>>>>>>>> dlfcn 1 >>>>>>>>> libpthread 1 >>>>>>>>> llistxattr 1 >>>>>>>>> setfsid 1 >>>>>>>>> spinlock 1 >>>>>>>>> epoll.h 1 >>>>>>>>> xattr.h 1 >>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] >>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160] >>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] >>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] >>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] >>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] >>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] >>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] >>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] >>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] >>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] >>>>>>>>> --------- >>>>>>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] >>>>>>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running >>>>>>>>> /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 >>>>>>>>> --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1 >>>>>>>>> /mnt/<SNIP>_data1) >>>>>>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190] >>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>>>>> with index 1 >>>>>>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190] >>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>>>>> with index 2 >>>>>>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190] >>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>>>>> with index 3 >>>>>>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190] >>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>>>>> with index 4 >>>>>>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] >>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-0: parent translators are >>>>>>>>> ready, attempting connect on transport >>>>>>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] >>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-1: parent translators are >>>>>>>>> ready, attempting connect on transport >>>>>>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] >>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-2: parent translators are >>>>>>>>> ready, attempting connect on transport >>>>>>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] >>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-3: parent translators are >>>>>>>>> ready, attempting connect on transport >>>>>>>>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] >>>>>>>>> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0) >>>>>>>>> Final graph: >>>>>>>>> >>>>>>>>> >>>>>>>>> Sincerely, >>>>>>>>> Artem >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii < >>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I've added the lru-limit=0 parameter to the mounts, and I see >>>>>>>>>> it's taken effect correctly: >>>>>>>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>>>>>>>>> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >>>>>>>>>> >>>>>>>>>> Let's see if it stops crashing or not. >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Artem >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii < >>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Nithya, >>>>>>>>>>> >>>>>>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started >>>>>>>>>>> seeing crashes, and no further releases have been made yet. >>>>>>>>>>> >>>>>>>>>>> volume info: >>>>>>>>>>> Type: Replicate >>>>>>>>>>> Volume ID: ****SNIP**** >>>>>>>>>>> Status: Started >>>>>>>>>>> Snapshot Count: 0 >>>>>>>>>>> Number of Bricks: 1 x 4 = 4 >>>>>>>>>>> Transport-type: tcp >>>>>>>>>>> Bricks: >>>>>>>>>>> Brick1: ****SNIP**** >>>>>>>>>>> Brick2: ****SNIP**** >>>>>>>>>>> Brick3: ****SNIP**** >>>>>>>>>>> Brick4: ****SNIP**** >>>>>>>>>>> Options Reconfigured: >>>>>>>>>>> cluster.quorum-count: 1 >>>>>>>>>>> cluster.quorum-type: fixed >>>>>>>>>>> network.ping-timeout: 5 >>>>>>>>>>> network.remote-dio: enable >>>>>>>>>>> performance.rda-cache-limit: 256MB >>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>> performance.parallel-readdir: on >>>>>>>>>>> network.inode-lru-limit: 500000 >>>>>>>>>>> performance.md-cache-timeout: 600 >>>>>>>>>>> performance.cache-invalidation: on >>>>>>>>>>> performance.stat-prefetch: on >>>>>>>>>>> features.cache-invalidation-timeout: 600 >>>>>>>>>>> features.cache-invalidation: on >>>>>>>>>>> cluster.readdir-optimize: on >>>>>>>>>>> performance.io-thread-count: 32 >>>>>>>>>>> server.event-threads: 4 >>>>>>>>>>> client.event-threads: 4 >>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>> cluster.lookup-optimize: on >>>>>>>>>>> performance.cache-size: 1GB >>>>>>>>>>> cluster.self-heal-daemon: enable >>>>>>>>>>> transport.address-family: inet >>>>>>>>>>> nfs.disable: on >>>>>>>>>>> performance.client-io-threads: on >>>>>>>>>>> cluster.granular-entry-heal: enable >>>>>>>>>>> cluster.data-self-heal-algorithm: full >>>>>>>>>>> >>>>>>>>>>> Sincerely, >>>>>>>>>>> Artem >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran < >>>>>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Artem, >>>>>>>>>>>> >>>>>>>>>>>> Do you still see the crashes with 5.3? If yes, please try mount >>>>>>>>>>>> the volume using the mount option lru-limit=0 and see if that helps. We are >>>>>>>>>>>> looking into the crashes and will update when have a fix. >>>>>>>>>>>> >>>>>>>>>>>> Also, please provide the gluster volume info for the volume in >>>>>>>>>>>> question. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> regards, >>>>>>>>>>>> Nithya >>>>>>>>>>>> >>>>>>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii < >>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> The fuse crash happened two more times, but this time monit >>>>>>>>>>>>> helped recover within 1 minute, so it's a great workaround for now. >>>>>>>>>>>>> >>>>>>>>>>>>> What's odd is that the crashes are only happening on one of 4 >>>>>>>>>>>>> servers, and I don't know why. >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>> Artem >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The fuse crash happened again yesterday, to another volume. >>>>>>>>>>>>>> Are there any mount options that could help mitigate this? >>>>>>>>>>>>>> >>>>>>>>>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) >>>>>>>>>>>>>> task to watch and restart the mount, which works and recovers the mount >>>>>>>>>>>>>> point within a minute. Not ideal, but a temporary workaround. >>>>>>>>>>>>>> >>>>>>>>>>>>>> By the way, the way to reproduce this "Transport endpoint is >>>>>>>>>>>>>> not connected" condition for testing purposes is to kill -9 the right >>>>>>>>>>>>>> "glusterfs --process-name fuse" process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> monit check: >>>>>>>>>>>>>> check filesystem glusterfs_data1 with path >>>>>>>>>>>>>> /mnt/glusterfs_data1 >>>>>>>>>>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>>>>>>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>>>>>>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>>>>>>>>>> then alert else if succeeded for 10 cycles then alert >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> stack trace: >>>>>>>>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>>>>>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>>>>>>>>>>> [2019-02-01 23:21:56.164427] >>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>>>>>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>>>>>>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>>>>>>>>>> pending frames: >>>>>>>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>>>> signal received: 6 >>>>>>>>>>>>>> time of crash: >>>>>>>>>>>>>> 2019-02-01 23:22:03 >>>>>>>>>>>>>> configuration details: >>>>>>>>>>>>>> argp 1 >>>>>>>>>>>>>> backtrace 1 >>>>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>>>> libpthread 1 >>>>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>>>> setfsid 1 >>>>>>>>>>>>>> spinlock 1 >>>>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>>>>>>>>>> >>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>> Artem >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The first (and so far only) crash happened at 2am the next >>>>>>>>>>>>>>> day after we upgraded, on only one of four servers and only to one of two >>>>>>>>>>>>>>> mounts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have no idea what caused it, but yeah, we do have a pretty >>>>>>>>>>>>>>> busy site (apkmirror.com), and it caused a disruption for >>>>>>>>>>>>>>> any uploads or downloads from that server until I woke up and fixed the >>>>>>>>>>>>>>> mount. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I wish I could be more helpful but all I have is that stack >>>>>>>>>>>>>>> trace. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>>>>>>>>>> atumball at redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Artem, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 >>>>>>>>>>>>>>>> (ie, as a clone of other bugs where recent discussions happened), and >>>>>>>>>>>>>>>> marked it as a blocker for glusterfs-5.4 release. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We already have fixes for log flooding - >>>>>>>>>>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>>>>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can you please tell if the crashes happened as soon as >>>>>>>>>>>>>>>> upgrade ? or was there any particular pattern you observed before the crash. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Amar >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, >>>>>>>>>>>>>>>>> I already got a crash which others have mentioned in >>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and >>>>>>>>>>>>>>>>> had to unmount, kill gluster, and remount: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>>>>>>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>>>>>>>>>> pending frames: >>>>>>>>>>>>>>>>> frame : type(1) op(READ) >>>>>>>>>>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>>>>>>> signal received: 6 >>>>>>>>>>>>>>>>> time of crash: >>>>>>>>>>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>>>>>>>>>> configuration details: >>>>>>>>>>>>>>>>> argp 1 >>>>>>>>>>>>>>>>> backtrace 1 >>>>>>>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>>>>>>> libpthread 1 >>>>>>>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>>>>>>> setfsid 1 >>>>>>>>>>>>>>>>> spinlock 1 >>>>>>>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>>>>>>>>>> --------- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Do the pending patches fix the crash or only the repeated >>>>>>>>>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>>>>>>>>>> not too sure how to make it core dump. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If it's not fixed by the patches above, has anyone already >>>>>>>>>>>>>>>>> opened a ticket for the crashes that I can join and monitor? This is going >>>>>>>>>>>>>>>>> to create a massive problem for us since production systems are crashing. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>>>>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of >>>>>>>>>>>>>>>>>>> these "Failed to dispatch handler" in my logs as well. Many people have >>>>>>>>>>>>>>>>>>> been commenting about this issue here >>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ >>>>>>>>>>>>>>>>>> addresses this. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>>>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>>>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>>>>> handler >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list >>>>>>>>>>>>>>>>>>> may bring some additional eyeballs and get them both fixed. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. >>>>>>>>>>>>>>>>>>>> There's a comment from 3 days ago from someone else with 5.3 who started >>>>>>>>>>>>>>>>>>>> seeing the spam. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check why >>>>>>>>>>>>>>>>>> this message is logged and send a fix? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com> >>>>>>>>>>>>>>>>>>>> , APK Mirror <http://www.apkmirror.com/>, Illogical >>>>>>>>>>>>>>>>>>>> Robot LLC >>>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Amar Tumballi (amarts) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190211/4577a5ff/attachment.html>
Nithya Balachandran
2019-Feb-12 08:37 UTC
[Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
Not yet but we are discussing an interim release. It is going to take a couple of days to review the fixes so not before then. We will update on the list with dates once we decide. On Tue, 12 Feb 2019 at 11:46, Artem Russakovskii <archon810 at gmail.com> wrote:> Awesome. But is there a release schedule and an ETA for when these will be > out in the repos? > > On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > >> >> >> On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii <archon810 at gmail.com> >> wrote: >> >>> Great job identifying the issue! >>> >>> Any ETA on the next release with the logging and crash fixes in it? >>> >> >> I've marked write-behind corruption as a blocker for release-6. Logging >> fixes are already in codebase. >> >> >>> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa <rgowdapp at redhat.com> >>> wrote: >>> >>>> >>>> >>>> On Mon, Feb 11, 2019 at 3:49 PM Jo?o Ba?to < >>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>> >>>>> Although I don't have these error messages, I'm having fuse crashes as >>>>> frequent as you. I have disabled write-behind and the mount has been >>>>> running over the weekend with heavy usage and no issues. >>>>> >>>> >>>> The issue you are facing will likely be fixed by patch [1]. Me, Xavi >>>> and Nithya were able to identify the corruption in write-behind. >>>> >>>> [1] https://review.gluster.org/22189 >>>> >>>> >>>>> I can provide coredumps before disabling write-behind if needed. I >>>>> opened a BZ report >>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1671014> with the >>>>> crashes that I was having. >>>>> >>>>> *Jo?o Ba?to* >>>>> --------------- >>>>> >>>>> *Scientific Computing and Software Platform* >>>>> Champalimaud Research >>>>> Champalimaud Center for the Unknown >>>>> Av. Bras?lia, Doca de Pedrou?os >>>>> 1400-038 Lisbon, Portugal >>>>> fchampalimaud.org <https://www.fchampalimaud.org/> >>>>> >>>>> >>>>> Artem Russakovskii <archon810 at gmail.com> escreveu no dia s?bado, >>>>> 9/02/2019 ?(s) 22:18: >>>>> >>>>>> Alright. I've enabled core-dumping (hopefully), so now I'm waiting >>>>>> for the next crash to see if it dumps a core for you guys to remotely debug. >>>>>> >>>>>> Then I can consider setting performance.write-behind to off and >>>>>> monitoring for further crashes. >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>>> >>>>>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa < >>>>>> rgowdapp at redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii < >>>>>>> archon810 at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Nithya, >>>>>>>> >>>>>>>> I can try to disable write-behind as long as it doesn't heavily >>>>>>>> impact performance for us. Which option is it exactly? I don't see it set >>>>>>>> in my list of changed volume variables that I sent you guys earlier. >>>>>>>> >>>>>>> >>>>>>> The option is performance.write-behind >>>>>>> >>>>>>> >>>>>>>> Sincerely, >>>>>>>> Artem >>>>>>>> >>>>>>>> -- >>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>> <http://twitter.com/ArtemR> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran < >>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi Artem, >>>>>>>>> >>>>>>>>> We have found the cause of one crash. Unfortunately we have not >>>>>>>>> managed to reproduce the one you reported so we don't know if it is the >>>>>>>>> same cause. >>>>>>>>> >>>>>>>>> Can you disable write-behind on the volume and let us know if it >>>>>>>>> solves the problem? If yes, it is likely to be the same issue. >>>>>>>>> >>>>>>>>> >>>>>>>>> regards, >>>>>>>>> Nithya >>>>>>>>> >>>>>>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii < >>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Sorry to disappoint, but the crash just happened again, so >>>>>>>>>> lru-limit=0 didn't help. >>>>>>>>>> >>>>>>>>>> Here's the snippet of the crash and the subsequent remount by >>>>>>>>>> monit. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] >>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>> [0x7f4402b99329] >>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In >>>>>>>>>> valid argument] >>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: >>>>>>>>>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between >>>>>>>>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] >>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>>>>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and >>>>>>>>>> [2019-02-08 01:13:09.311554] >>>>>>>>>> pending frames: >>>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>>> frame : type(0) op(0) >>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>> signal received: 6 >>>>>>>>>> time of crash: >>>>>>>>>> 2019-02-08 01:13:09 >>>>>>>>>> configuration details: >>>>>>>>>> argp 1 >>>>>>>>>> backtrace 1 >>>>>>>>>> dlfcn 1 >>>>>>>>>> libpthread 1 >>>>>>>>>> llistxattr 1 >>>>>>>>>> setfsid 1 >>>>>>>>>> spinlock 1 >>>>>>>>>> epoll.h 1 >>>>>>>>>> xattr.h 1 >>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] >>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] >>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160] >>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] >>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] >>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] >>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] >>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] >>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] >>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] >>>>>>>>>> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] >>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] >>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] >>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] >>>>>>>>>> --------- >>>>>>>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] >>>>>>>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running >>>>>>>>>> /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 >>>>>>>>>> --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1 >>>>>>>>>> /mnt/<SNIP>_data1) >>>>>>>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190] >>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>>>>>> with index 1 >>>>>>>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190] >>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>>>>>> with index 2 >>>>>>>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190] >>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>>>>>> with index 3 >>>>>>>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190] >>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>>>>>>> with index 4 >>>>>>>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] >>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-0: parent translators are >>>>>>>>>> ready, attempting connect on transport >>>>>>>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] >>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-1: parent translators are >>>>>>>>>> ready, attempting connect on transport >>>>>>>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] >>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-2: parent translators are >>>>>>>>>> ready, attempting connect on transport >>>>>>>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] >>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-3: parent translators are >>>>>>>>>> ready, attempting connect on transport >>>>>>>>>> [2019-02-08 01:13:35.655527] I >>>>>>>>>> [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-<SNIP>_data1-client-0: changing port >>>>>>>>>> to 49153 (from 0) >>>>>>>>>> Final graph: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Artem >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii < >>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I've added the lru-limit=0 parameter to the mounts, and I see >>>>>>>>>>> it's taken effect correctly: >>>>>>>>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>>>>>>>>>> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >>>>>>>>>>> >>>>>>>>>>> Let's see if it stops crashing or not. >>>>>>>>>>> >>>>>>>>>>> Sincerely, >>>>>>>>>>> Artem >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii < >>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Nithya, >>>>>>>>>>>> >>>>>>>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started >>>>>>>>>>>> seeing crashes, and no further releases have been made yet. >>>>>>>>>>>> >>>>>>>>>>>> volume info: >>>>>>>>>>>> Type: Replicate >>>>>>>>>>>> Volume ID: ****SNIP**** >>>>>>>>>>>> Status: Started >>>>>>>>>>>> Snapshot Count: 0 >>>>>>>>>>>> Number of Bricks: 1 x 4 = 4 >>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>> Bricks: >>>>>>>>>>>> Brick1: ****SNIP**** >>>>>>>>>>>> Brick2: ****SNIP**** >>>>>>>>>>>> Brick3: ****SNIP**** >>>>>>>>>>>> Brick4: ****SNIP**** >>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>> cluster.quorum-count: 1 >>>>>>>>>>>> cluster.quorum-type: fixed >>>>>>>>>>>> network.ping-timeout: 5 >>>>>>>>>>>> network.remote-dio: enable >>>>>>>>>>>> performance.rda-cache-limit: 256MB >>>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>>> performance.parallel-readdir: on >>>>>>>>>>>> network.inode-lru-limit: 500000 >>>>>>>>>>>> performance.md-cache-timeout: 600 >>>>>>>>>>>> performance.cache-invalidation: on >>>>>>>>>>>> performance.stat-prefetch: on >>>>>>>>>>>> features.cache-invalidation-timeout: 600 >>>>>>>>>>>> features.cache-invalidation: on >>>>>>>>>>>> cluster.readdir-optimize: on >>>>>>>>>>>> performance.io-thread-count: 32 >>>>>>>>>>>> server.event-threads: 4 >>>>>>>>>>>> client.event-threads: 4 >>>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>>> cluster.lookup-optimize: on >>>>>>>>>>>> performance.cache-size: 1GB >>>>>>>>>>>> cluster.self-heal-daemon: enable >>>>>>>>>>>> transport.address-family: inet >>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>> performance.client-io-threads: on >>>>>>>>>>>> cluster.granular-entry-heal: enable >>>>>>>>>>>> cluster.data-self-heal-algorithm: full >>>>>>>>>>>> >>>>>>>>>>>> Sincerely, >>>>>>>>>>>> Artem >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran < >>>>>>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Artem, >>>>>>>>>>>>> >>>>>>>>>>>>> Do you still see the crashes with 5.3? If yes, please try >>>>>>>>>>>>> mount the volume using the mount option lru-limit=0 and see if that helps. >>>>>>>>>>>>> We are looking into the crashes and will update when have a fix. >>>>>>>>>>>>> >>>>>>>>>>>>> Also, please provide the gluster volume info for the volume in >>>>>>>>>>>>> question. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> regards, >>>>>>>>>>>>> Nithya >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii < >>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The fuse crash happened two more times, but this time monit >>>>>>>>>>>>>> helped recover within 1 minute, so it's a great workaround for now. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What's odd is that the crashes are only happening on one of 4 >>>>>>>>>>>>>> servers, and I don't know why. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>> Artem >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The fuse crash happened again yesterday, to another volume. >>>>>>>>>>>>>>> Are there any mount options that could help mitigate this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) >>>>>>>>>>>>>>> task to watch and restart the mount, which works and recovers the mount >>>>>>>>>>>>>>> point within a minute. Not ideal, but a temporary workaround. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> By the way, the way to reproduce this "Transport endpoint is >>>>>>>>>>>>>>> not connected" condition for testing purposes is to kill -9 the right >>>>>>>>>>>>>>> "glusterfs --process-name fuse" process. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> monit check: >>>>>>>>>>>>>>> check filesystem glusterfs_data1 with path >>>>>>>>>>>>>>> /mnt/glusterfs_data1 >>>>>>>>>>>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>>>>>>>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>>>>>>>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>>>>>>>>>>> then alert else if succeeded for 10 cycles then alert >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> stack trace: >>>>>>>>>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >>>>>>>>>>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>>>>>>>>>>>> [2019-02-01 23:21:56.164427] >>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>>>>>>>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>>>>>>>>>>> pending frames: >>>>>>>>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>>>>> signal received: 6 >>>>>>>>>>>>>>> time of crash: >>>>>>>>>>>>>>> 2019-02-01 23:22:03 >>>>>>>>>>>>>>> configuration details: >>>>>>>>>>>>>>> argp 1 >>>>>>>>>>>>>>> backtrace 1 >>>>>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>>>>> libpthread 1 >>>>>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>>>>> setfsid 1 >>>>>>>>>>>>>>> spinlock 1 >>>>>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The first (and so far only) crash happened at 2am the next >>>>>>>>>>>>>>>> day after we upgraded, on only one of four servers and only to one of two >>>>>>>>>>>>>>>> mounts. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have no idea what caused it, but yeah, we do have a >>>>>>>>>>>>>>>> pretty busy site (apkmirror.com), and it caused a >>>>>>>>>>>>>>>> disruption for any uploads or downloads from that server until I woke up >>>>>>>>>>>>>>>> and fixed the mount. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I wish I could be more helpful but all I have is that stack >>>>>>>>>>>>>>>> trace. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm glad it's a blocker and will hopefully be resolved >>>>>>>>>>>>>>>> soon. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>>>>>>>>>>> atumball at redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Artem, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 >>>>>>>>>>>>>>>>> (ie, as a clone of other bugs where recent discussions happened), and >>>>>>>>>>>>>>>>> marked it as a blocker for glusterfs-5.4 release. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We already have fixes for log flooding - >>>>>>>>>>>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>>>>>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can you please tell if the crashes happened as soon as >>>>>>>>>>>>>>>>> upgrade ? or was there any particular pattern you observed before the crash. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Amar >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to >>>>>>>>>>>>>>>>>> 5.3, I already got a crash which others have mentioned in >>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and >>>>>>>>>>>>>>>>>> had to unmount, kill gluster, and remount: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times between >>>>>>>>>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>>>>>>>>>>> pending frames: >>>>>>>>>>>>>>>>>> frame : type(1) op(READ) >>>>>>>>>>>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>>>>>>>> signal received: 6 >>>>>>>>>>>>>>>>>> time of crash: >>>>>>>>>>>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>>>>>>>>>>> configuration details: >>>>>>>>>>>>>>>>>> argp 1 >>>>>>>>>>>>>>>>>> backtrace 1 >>>>>>>>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>>>>>>>> libpthread 1 >>>>>>>>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>>>>>>>> setfsid 1 >>>>>>>>>>>>>>>>>> spinlock 1 >>>>>>>>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>>>>>>>>>>> --------- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Do the pending patches fix the crash or only the repeated >>>>>>>>>>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>>>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>>>>>>>>>>> not too sure how to make it core dump. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If it's not fixed by the patches above, has anyone >>>>>>>>>>>>>>>>>> already opened a ticket for the crashes that I can join and monitor? This >>>>>>>>>>>>>>>>>> is going to create a massive problem for us since production systems are >>>>>>>>>>>>>>>>>> crashing. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>>>>>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of >>>>>>>>>>>>>>>>>>>> these "Failed to dispatch handler" in my logs as well. Many people have >>>>>>>>>>>>>>>>>>>> been commenting about this issue here >>>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ >>>>>>>>>>>>>>>>>>> addresses this. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and >>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between >>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between >>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and >>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <=>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <=>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>>>>>>>>>>>>>>> handler >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list >>>>>>>>>>>>>>>>>>>> may bring some additional eyeballs and get them both fixed. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com> >>>>>>>>>>>>>>>>>>>> , APK Mirror <http://www.apkmirror.com/>, Illogical >>>>>>>>>>>>>>>>>>>> Robot LLC >>>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>>>>>>>>>>> archon810 at gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. >>>>>>>>>>>>>>>>>>>>> There's a comment from 3 days ago from someone else with 5.3 who started >>>>>>>>>>>>>>>>>>>>> seeing the spam. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> +Milind Changire <mchangir at redhat.com> Can you check >>>>>>>>>>>>>>>>>>> why this message is logged and send a fix? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com> >>>>>>>>>>>>>>>>>>>>> , APK Mirror <http://www.apkmirror.com/>, Illogical >>>>>>>>>>>>>>>>>>>>> Robot LLC >>>>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Amar Tumballi (amarts) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190212/fe8dce36/attachment.html>