Raghavendra Gowdappa
2018-Dec-29 23:46 UTC
[Gluster-users] java application crushes while reading a zip file
Thanks Dmitry. Can you provide the following debug info I asked earlier: * strace -ff -v ... of java application * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while mounting). regards, Raghavendra On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev <isakdim at gmail.com> wrote:> These 3 options seem to trigger both (reading zip file and renaming files) > problems. > > Options Reconfigured: > performance.io-cache: off > performance.stat-prefetch: off > performance.quick-read: off > performance.parallel-readdir: off > *performance.readdir-ahead: on* > *performance.write-behind: on* > *performance.read-ahead: on* > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > > > On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isakdim at gmail.com> > wrote: > >> Turning a single option on at a time still worked fine. I will keep >> trying. >> >> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log >> messages. Do you suppose these issues are triggered by the new environment >> or did not exist in 4.1.5? >> >> [root at node1 ~]# glusterfs --version >> glusterfs 4.1.5 >> >> On AWS using >> [root at node1 ~]# hostnamectl >> Static hostname: node1 >> Icon name: computer-vm >> Chassis: vm >> Machine ID: b30d0f2110ac3807b210c19ede3ce88f >> Boot ID: 52bb159a0aa94043a40e7c7651967bd9 >> Virtualization: kvm >> Operating System: CentOS Linux 7 (Core) >> CPE OS Name: cpe:/o:centos:centos:7 >> Kernel: Linux 3.10.0-862.3.2.el7.x86_64 >> Architecture: x86-64 >> >> >> >> >> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <rgowdapp at redhat.com> >> wrote: >> >>> >>> >>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at gmail.com> >>> wrote: >>> >>>> Ok. I will try different options. >>>> >>>> This system is scheduled to go into production soon. What version >>>> would you recommend to roll back to? >>>> >>> >>> These are long standing issues. So, rolling back may not make these >>> issues go away. Instead if you think performance is agreeable to you, >>> please keep these xlators off in production. >>> >>> >>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>>> rgowdapp at redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim at gmail.com> >>>>> wrote: >>>>> >>>>>> Raghavendra, >>>>>> >>>>>> Thank for the suggestion. >>>>>> >>>>>> >>>>>> I am suing >>>>>> >>>>>> [root at jl-fanexoss1p glusterfs]# gluster --version >>>>>> glusterfs 5.0 >>>>>> >>>>>> On >>>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl >>>>>> Icon name: computer-vm >>>>>> Chassis: vm >>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>>> Boot ID: eed98992fdda4c88bdd459a89101766b >>>>>> Virtualization: vmware >>>>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>>>> CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>>>> Architecture: x86-64 >>>>>> >>>>>> >>>>>> I have configured the following options >>>>>> >>>>>> [root at jl-fanexoss1p glusterfs]# gluster volume info >>>>>> Volume Name: gv0 >>>>>> Type: Replicate >>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >>>>>> Status: Started >>>>>> Snapshot Count: 0 >>>>>> Number of Bricks: 1 x 3 = 3 >>>>>> Transport-type: tcp >>>>>> Bricks: >>>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >>>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >>>>>> Options Reconfigured: >>>>>> performance.io-cache: off >>>>>> performance.stat-prefetch: off >>>>>> performance.quick-read: off >>>>>> performance.parallel-readdir: off >>>>>> performance.readdir-ahead: off >>>>>> performance.write-behind: off >>>>>> performance.read-ahead: off >>>>>> performance.client-io-threads: off >>>>>> nfs.disable: on >>>>>> transport.address-family: inet >>>>>> >>>>>> I don't know if it is related, but I am seeing a lot of >>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >>>>>> operation failed [No such device or address] >>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>> handler >>>>>> >>>>> >>>>> These msgs were introduced by patch [1]. To the best of my knowledge >>>>> they are benign. We'll be sending a patch to fix these msgs though. >>>>> >>>>> +Mohit Agrawal <moagrawa at redhat.com> +Milind Changire >>>>> <mchangir at redhat.com> . Can you try to identify why we are seeing >>>>> these messages? If possible please send a patch to fix this. >>>>> >>>>> [1] >>>>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5 >>>>> >>>>> >>>>>> And java.io exceptions trying to rename files. >>>>>> >>>>> >>>>> When you see the errors is it possible to collect, >>>>> * strace of the java application (strace -ff -v ...) >>>>> * fuse-dump of the glusterfs mount (use option --dump-fuse while >>>>> mounting)? >>>>> >>>>> I also need another favour from you. By trail and error, can you point >>>>> out which of the many performance xlators you've turned off is causing the >>>>> issue? >>>>> >>>>> The above two data-points will help us to fix the problem. >>>>> >>>>> >>>>>> Thank You, >>>>>> Dmitry >>>>>> >>>>>> >>>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa < >>>>>> rgowdapp at redhat.com> wrote: >>>>>> >>>>>>> What version of glusterfs are you using? It might be either >>>>>>> * a stale metadata issue. >>>>>>> * inconsistent ctime issue. >>>>>>> >>>>>>> Can you try turning off all performance xlators? If the issue is 1, >>>>>>> that should help. >>>>>>> >>>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev <isakdim at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Attempted to set 'performance.read-ahead off` according to >>>>>>>> https://jira.apache.org/jira/browse/AMQ-7041 >>>>>>>> That did not help. >>>>>>>> >>>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev <isakdim at gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> The core file generated by JVM suggests that it happens because >>>>>>>>> the file is changing while it is being read - >>>>>>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>>>>>>>> The application reads in the zipfile and goes through the zip >>>>>>>>> entries, then reloads the file and goes the zip entries again. It does so >>>>>>>>> 3 times. The application never crushes on the 1st cycle but sometimes >>>>>>>>> crushes on the 2nd or 3rd cycle. >>>>>>>>> The zip file is generated about 20 seconds prior to it being used >>>>>>>>> and is not updated or even used by any other application. I have never >>>>>>>>> seen this problem on a plain file system. >>>>>>>>> >>>>>>>>> I would appreciate any suggestions on how to go debugging this >>>>>>>>> issue. I can change the source code of the java application. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Dmitry >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181230/34c8b9b0/attachment.html>
Dmitry Isakbayev
2018-Dec-31 18:38 UTC
[Gluster-users] java application crushes while reading a zip file
The software ran with all of the options turned off over the weekend without any problems. I will try to collect the debug info for you. I have re-enabled the 3 three options, but yet to see the problem reoccurring. On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa <rgowdapp at redhat.com> wrote:> Thanks Dmitry. Can you provide the following debug info I asked earlier: > > * strace -ff -v ... of java application > * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while > mounting). > > regards, > Raghavendra > > On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev <isakdim at gmail.com> > wrote: > >> These 3 options seem to trigger both (reading zip file and renaming >> files) problems. >> >> Options Reconfigured: >> performance.io-cache: off >> performance.stat-prefetch: off >> performance.quick-read: off >> performance.parallel-readdir: off >> *performance.readdir-ahead: on* >> *performance.write-behind: on* >> *performance.read-ahead: on* >> performance.client-io-threads: off >> nfs.disable: on >> transport.address-family: inet >> >> >> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isakdim at gmail.com> >> wrote: >> >>> Turning a single option on at a time still worked fine. I will keep >>> trying. >>> >>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log >>> messages. Do you suppose these issues are triggered by the new environment >>> or did not exist in 4.1.5? >>> >>> [root at node1 ~]# glusterfs --version >>> glusterfs 4.1.5 >>> >>> On AWS using >>> [root at node1 ~]# hostnamectl >>> Static hostname: node1 >>> Icon name: computer-vm >>> Chassis: vm >>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f >>> Boot ID: 52bb159a0aa94043a40e7c7651967bd9 >>> Virtualization: kvm >>> Operating System: CentOS Linux 7 (Core) >>> CPE OS Name: cpe:/o:centos:centos:7 >>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64 >>> Architecture: x86-64 >>> >>> >>> >>> >>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa < >>> rgowdapp at redhat.com> wrote: >>> >>>> >>>> >>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at gmail.com> >>>> wrote: >>>> >>>>> Ok. I will try different options. >>>>> >>>>> This system is scheduled to go into production soon. What version >>>>> would you recommend to roll back to? >>>>> >>>> >>>> These are long standing issues. So, rolling back may not make these >>>> issues go away. Instead if you think performance is agreeable to you, >>>> please keep these xlators off in production. >>>> >>>> >>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>>>> rgowdapp at redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Raghavendra, >>>>>>> >>>>>>> Thank for the suggestion. >>>>>>> >>>>>>> >>>>>>> I am suing >>>>>>> >>>>>>> [root at jl-fanexoss1p glusterfs]# gluster --version >>>>>>> glusterfs 5.0 >>>>>>> >>>>>>> On >>>>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl >>>>>>> Icon name: computer-vm >>>>>>> Chassis: vm >>>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>>>> Boot ID: eed98992fdda4c88bdd459a89101766b >>>>>>> Virtualization: vmware >>>>>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>>>>> CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>>>>> Architecture: x86-64 >>>>>>> >>>>>>> >>>>>>> I have configured the following options >>>>>>> >>>>>>> [root at jl-fanexoss1p glusterfs]# gluster volume info >>>>>>> Volume Name: gv0 >>>>>>> Type: Replicate >>>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >>>>>>> Status: Started >>>>>>> Snapshot Count: 0 >>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>> Transport-type: tcp >>>>>>> Bricks: >>>>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >>>>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >>>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >>>>>>> Options Reconfigured: >>>>>>> performance.io-cache: off >>>>>>> performance.stat-prefetch: off >>>>>>> performance.quick-read: off >>>>>>> performance.parallel-readdir: off >>>>>>> performance.readdir-ahead: off >>>>>>> performance.write-behind: off >>>>>>> performance.read-ahead: off >>>>>>> performance.client-io-threads: off >>>>>>> nfs.disable: on >>>>>>> transport.address-family: inet >>>>>>> >>>>>>> I don't know if it is related, but I am seeing a lot of >>>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >>>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >>>>>>> operation failed [No such device or address] >>>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>> handler >>>>>>> >>>>>> >>>>>> These msgs were introduced by patch [1]. To the best of my knowledge >>>>>> they are benign. We'll be sending a patch to fix these msgs though. >>>>>> >>>>>> +Mohit Agrawal <moagrawa at redhat.com> +Milind Changire >>>>>> <mchangir at redhat.com> . Can you try to identify why we are seeing >>>>>> these messages? If possible please send a patch to fix this. >>>>>> >>>>>> [1] >>>>>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5 >>>>>> >>>>>> >>>>>>> And java.io exceptions trying to rename files. >>>>>>> >>>>>> >>>>>> When you see the errors is it possible to collect, >>>>>> * strace of the java application (strace -ff -v ...) >>>>>> * fuse-dump of the glusterfs mount (use option --dump-fuse while >>>>>> mounting)? >>>>>> >>>>>> I also need another favour from you. By trail and error, can you >>>>>> point out which of the many performance xlators you've turned off is >>>>>> causing the issue? >>>>>> >>>>>> The above two data-points will help us to fix the problem. >>>>>> >>>>>> >>>>>>> Thank You, >>>>>>> Dmitry >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa < >>>>>>> rgowdapp at redhat.com> wrote: >>>>>>> >>>>>>>> What version of glusterfs are you using? It might be either >>>>>>>> * a stale metadata issue. >>>>>>>> * inconsistent ctime issue. >>>>>>>> >>>>>>>> Can you try turning off all performance xlators? If the issue is 1, >>>>>>>> that should help. >>>>>>>> >>>>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev <isakdim at gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Attempted to set 'performance.read-ahead off` according to >>>>>>>>> https://jira.apache.org/jira/browse/AMQ-7041 >>>>>>>>> That did not help. >>>>>>>>> >>>>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev < >>>>>>>>> isakdim at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> The core file generated by JVM suggests that it happens because >>>>>>>>>> the file is changing while it is being read - >>>>>>>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>>>>>>>>> The application reads in the zipfile and goes through the zip >>>>>>>>>> entries, then reloads the file and goes the zip entries again. It does so >>>>>>>>>> 3 times. The application never crushes on the 1st cycle but sometimes >>>>>>>>>> crushes on the 2nd or 3rd cycle. >>>>>>>>>> The zip file is generated about 20 seconds prior to it being used >>>>>>>>>> and is not updated or even used by any other application. I have never >>>>>>>>>> seen this problem on a plain file system. >>>>>>>>>> >>>>>>>>>> I would appreciate any suggestions on how to go debugging this >>>>>>>>>> issue. I can change the source code of the java application. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Dmitry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181231/bbbd571a/attachment.html>
Dmitry Isakbayev
2019-Jan-02 16:28 UTC
[Gluster-users] java application crushes while reading a zip file
Still no JVM crushes. Is it possible that running glusterfs with performance options turned off for a couple of days cleared out the "stale metadata issue"? On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev <isakdim at gmail.com> wrote:> The software ran with all of the options turned off over the weekend > without any problems. > I will try to collect the debug info for you. I have re-enabled the 3 > three options, but yet to see the problem reoccurring. > > > On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > >> Thanks Dmitry. Can you provide the following debug info I asked earlier: >> >> * strace -ff -v ... of java application >> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while >> mounting). >> >> regards, >> Raghavendra >> >> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev <isakdim at gmail.com> >> wrote: >> >>> These 3 options seem to trigger both (reading zip file and renaming >>> files) problems. >>> >>> Options Reconfigured: >>> performance.io-cache: off >>> performance.stat-prefetch: off >>> performance.quick-read: off >>> performance.parallel-readdir: off >>> *performance.readdir-ahead: on* >>> *performance.write-behind: on* >>> *performance.read-ahead: on* >>> performance.client-io-threads: off >>> nfs.disable: on >>> transport.address-family: inet >>> >>> >>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isakdim at gmail.com> >>> wrote: >>> >>>> Turning a single option on at a time still worked fine. I will keep >>>> trying. >>>> >>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log >>>> messages. Do you suppose these issues are triggered by the new environment >>>> or did not exist in 4.1.5? >>>> >>>> [root at node1 ~]# glusterfs --version >>>> glusterfs 4.1.5 >>>> >>>> On AWS using >>>> [root at node1 ~]# hostnamectl >>>> Static hostname: node1 >>>> Icon name: computer-vm >>>> Chassis: vm >>>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f >>>> Boot ID: 52bb159a0aa94043a40e7c7651967bd9 >>>> Virtualization: kvm >>>> Operating System: CentOS Linux 7 (Core) >>>> CPE OS Name: cpe:/o:centos:centos:7 >>>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64 >>>> Architecture: x86-64 >>>> >>>> >>>> >>>> >>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa < >>>> rgowdapp at redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at gmail.com> >>>>> wrote: >>>>> >>>>>> Ok. I will try different options. >>>>>> >>>>>> This system is scheduled to go into production soon. What version >>>>>> would you recommend to roll back to? >>>>>> >>>>> >>>>> These are long standing issues. So, rolling back may not make these >>>>> issues go away. Instead if you think performance is agreeable to you, >>>>> please keep these xlators off in production. >>>>> >>>>> >>>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>>>>> rgowdapp at redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Raghavendra, >>>>>>>> >>>>>>>> Thank for the suggestion. >>>>>>>> >>>>>>>> >>>>>>>> I am suing >>>>>>>> >>>>>>>> [root at jl-fanexoss1p glusterfs]# gluster --version >>>>>>>> glusterfs 5.0 >>>>>>>> >>>>>>>> On >>>>>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl >>>>>>>> Icon name: computer-vm >>>>>>>> Chassis: vm >>>>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>>>>> Boot ID: eed98992fdda4c88bdd459a89101766b >>>>>>>> Virtualization: vmware >>>>>>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>>>>>> CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>>>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>>>>>> Architecture: x86-64 >>>>>>>> >>>>>>>> >>>>>>>> I have configured the following options >>>>>>>> >>>>>>>> [root at jl-fanexoss1p glusterfs]# gluster volume info >>>>>>>> Volume Name: gv0 >>>>>>>> Type: Replicate >>>>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >>>>>>>> Status: Started >>>>>>>> Snapshot Count: 0 >>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>> Transport-type: tcp >>>>>>>> Bricks: >>>>>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >>>>>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >>>>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >>>>>>>> Options Reconfigured: >>>>>>>> performance.io-cache: off >>>>>>>> performance.stat-prefetch: off >>>>>>>> performance.quick-read: off >>>>>>>> performance.parallel-readdir: off >>>>>>>> performance.readdir-ahead: off >>>>>>>> performance.write-behind: off >>>>>>>> performance.read-ahead: off >>>>>>>> performance.client-io-threads: off >>>>>>>> nfs.disable: on >>>>>>>> transport.address-family: inet >>>>>>>> >>>>>>>> I don't know if it is related, but I am seeing a lot of >>>>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >>>>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >>>>>>>> operation failed [No such device or address] >>>>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>>>>> handler >>>>>>>> >>>>>>> >>>>>>> These msgs were introduced by patch [1]. To the best of my knowledge >>>>>>> they are benign. We'll be sending a patch to fix these msgs though. >>>>>>> >>>>>>> +Mohit Agrawal <moagrawa at redhat.com> +Milind Changire >>>>>>> <mchangir at redhat.com> . Can you try to identify why we are seeing >>>>>>> these messages? If possible please send a patch to fix this. >>>>>>> >>>>>>> [1] >>>>>>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5 >>>>>>> >>>>>>> >>>>>>>> And java.io exceptions trying to rename files. >>>>>>>> >>>>>>> >>>>>>> When you see the errors is it possible to collect, >>>>>>> * strace of the java application (strace -ff -v ...) >>>>>>> * fuse-dump of the glusterfs mount (use option --dump-fuse while >>>>>>> mounting)? >>>>>>> >>>>>>> I also need another favour from you. By trail and error, can you >>>>>>> point out which of the many performance xlators you've turned off is >>>>>>> causing the issue? >>>>>>> >>>>>>> The above two data-points will help us to fix the problem. >>>>>>> >>>>>>> >>>>>>>> Thank You, >>>>>>>> Dmitry >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa < >>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>> >>>>>>>>> What version of glusterfs are you using? It might be either >>>>>>>>> * a stale metadata issue. >>>>>>>>> * inconsistent ctime issue. >>>>>>>>> >>>>>>>>> Can you try turning off all performance xlators? If the issue is >>>>>>>>> 1, that should help. >>>>>>>>> >>>>>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev < >>>>>>>>> isakdim at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Attempted to set 'performance.read-ahead off` according to >>>>>>>>>> https://jira.apache.org/jira/browse/AMQ-7041 >>>>>>>>>> That did not help. >>>>>>>>>> >>>>>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev < >>>>>>>>>> isakdim at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> The core file generated by JVM suggests that it happens because >>>>>>>>>>> the file is changing while it is being read - >>>>>>>>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>>>>>>>>>> The application reads in the zipfile and goes through the zip >>>>>>>>>>> entries, then reloads the file and goes the zip entries again. It does so >>>>>>>>>>> 3 times. The application never crushes on the 1st cycle but sometimes >>>>>>>>>>> crushes on the 2nd or 3rd cycle. >>>>>>>>>>> The zip file is generated about 20 seconds prior to it being >>>>>>>>>>> used and is not updated or even used by any other application. I have >>>>>>>>>>> never seen this problem on a plain file system. >>>>>>>>>>> >>>>>>>>>>> I would appreciate any suggestions on how to go debugging this >>>>>>>>>>> issue. I can change the source code of the java application. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Dmitry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190102/98d12ec5/attachment.html>