Dmitry Isakbayev
2018-Dec-28 15:24 UTC
[Gluster-users] java application crushes while reading a zip file
Turning a single option on at a time still worked fine. I will keep trying. We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log messages. Do you suppose these issues are triggered by the new environment or did not exist in 4.1.5? [root at node1 ~]# glusterfs --version glusterfs 4.1.5 On AWS using [root at node1 ~]# hostnamectl Static hostname: node1 Icon name: computer-vm Chassis: vm Machine ID: b30d0f2110ac3807b210c19ede3ce88f Boot ID: 52bb159a0aa94043a40e7c7651967bd9 Virtualization: kvm Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-862.3.2.el7.x86_64 Architecture: x86-64 On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <rgowdapp at redhat.com> wrote:> > > On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at gmail.com> > wrote: > >> Ok. I will try different options. >> >> This system is scheduled to go into production soon. What version would >> you recommend to roll back to? >> > > These are long standing issues. So, rolling back may not make these issues > go away. Instead if you think performance is agreeable to you, please keep > these xlators off in production. > > >> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >> rgowdapp at redhat.com> wrote: >> >>> >>> >>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim at gmail.com> >>> wrote: >>> >>>> Raghavendra, >>>> >>>> Thank for the suggestion. >>>> >>>> >>>> I am suing >>>> >>>> [root at jl-fanexoss1p glusterfs]# gluster --version >>>> glusterfs 5.0 >>>> >>>> On >>>> [root at jl-fanexoss1p glusterfs]# hostnamectl >>>> Icon name: computer-vm >>>> Chassis: vm >>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>> Boot ID: eed98992fdda4c88bdd459a89101766b >>>> Virtualization: vmware >>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>> CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>> Architecture: x86-64 >>>> >>>> >>>> I have configured the following options >>>> >>>> [root at jl-fanexoss1p glusterfs]# gluster volume info >>>> Volume Name: gv0 >>>> Type: Replicate >>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 1 x 3 = 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >>>> Options Reconfigured: >>>> performance.io-cache: off >>>> performance.stat-prefetch: off >>>> performance.quick-read: off >>>> performance.parallel-readdir: off >>>> performance.readdir-ahead: off >>>> performance.write-behind: off >>>> performance.read-ahead: off >>>> performance.client-io-threads: off >>>> nfs.disable: on >>>> transport.address-family: inet >>>> >>>> I don't know if it is related, but I am seeing a lot of >>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >>>> operation failed [No such device or address] >>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>> handler >>>> >>> >>> These msgs were introduced by patch [1]. To the best of my knowledge >>> they are benign. We'll be sending a patch to fix these msgs though. >>> >>> +Mohit Agrawal <moagrawa at redhat.com> +Milind Changire >>> <mchangir at redhat.com> . Can you try to identify why we are seeing these >>> messages? If possible please send a patch to fix this. >>> >>> [1] >>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5 >>> >>> >>>> And java.io exceptions trying to rename files. >>>> >>> >>> When you see the errors is it possible to collect, >>> * strace of the java application (strace -ff -v ...) >>> * fuse-dump of the glusterfs mount (use option --dump-fuse while >>> mounting)? >>> >>> I also need another favour from you. By trail and error, can you point >>> out which of the many performance xlators you've turned off is causing the >>> issue? >>> >>> The above two data-points will help us to fix the problem. >>> >>> >>>> Thank You, >>>> Dmitry >>>> >>>> >>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa < >>>> rgowdapp at redhat.com> wrote: >>>> >>>>> What version of glusterfs are you using? It might be either >>>>> * a stale metadata issue. >>>>> * inconsistent ctime issue. >>>>> >>>>> Can you try turning off all performance xlators? If the issue is 1, >>>>> that should help. >>>>> >>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev <isakdim at gmail.com> >>>>> wrote: >>>>> >>>>>> Attempted to set 'performance.read-ahead off` according to >>>>>> https://jira.apache.org/jira/browse/AMQ-7041 >>>>>> That did not help. >>>>>> >>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev <isakdim at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> The core file generated by JVM suggests that it happens because the >>>>>>> file is changing while it is being read - >>>>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>>>>>> The application reads in the zipfile and goes through the zip >>>>>>> entries, then reloads the file and goes the zip entries again. It does so >>>>>>> 3 times. The application never crushes on the 1st cycle but sometimes >>>>>>> crushes on the 2nd or 3rd cycle. >>>>>>> The zip file is generated about 20 seconds prior to it being used >>>>>>> and is not updated or even used by any other application. I have never >>>>>>> seen this problem on a plain file system. >>>>>>> >>>>>>> I would appreciate any suggestions on how to go debugging this >>>>>>> issue. I can change the source code of the java application. >>>>>>> >>>>>>> Regards, >>>>>>> Dmitry >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/3640b2cd/attachment.html>
Dmitry Isakbayev
2018-Dec-28 20:37 UTC
[Gluster-users] java application crushes while reading a zip file
These 3 options seem to trigger both (reading zip file and renaming files) problems. Options Reconfigured: performance.io-cache: off performance.stat-prefetch: off performance.quick-read: off performance.parallel-readdir: off *performance.readdir-ahead: on* *performance.write-behind: on* *performance.read-ahead: on* performance.client-io-threads: off nfs.disable: on transport.address-family: inet On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isakdim at gmail.com> wrote:> Turning a single option on at a time still worked fine. I will keep > trying. > > We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log > messages. Do you suppose these issues are triggered by the new environment > or did not exist in 4.1.5? > > [root at node1 ~]# glusterfs --version > glusterfs 4.1.5 > > On AWS using > [root at node1 ~]# hostnamectl > Static hostname: node1 > Icon name: computer-vm > Chassis: vm > Machine ID: b30d0f2110ac3807b210c19ede3ce88f > Boot ID: 52bb159a0aa94043a40e7c7651967bd9 > Virtualization: kvm > Operating System: CentOS Linux 7 (Core) > CPE OS Name: cpe:/o:centos:centos:7 > Kernel: Linux 3.10.0-862.3.2.el7.x86_64 > Architecture: x86-64 > > > > > On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > >> >> >> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at gmail.com> >> wrote: >> >>> Ok. I will try different options. >>> >>> This system is scheduled to go into production soon. What version would >>> you recommend to roll back to? >>> >> >> These are long standing issues. So, rolling back may not make these >> issues go away. Instead if you think performance is agreeable to you, >> please keep these xlators off in production. >> >> >>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>> rgowdapp at redhat.com> wrote: >>> >>>> >>>> >>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim at gmail.com> >>>> wrote: >>>> >>>>> Raghavendra, >>>>> >>>>> Thank for the suggestion. >>>>> >>>>> >>>>> I am suing >>>>> >>>>> [root at jl-fanexoss1p glusterfs]# gluster --version >>>>> glusterfs 5.0 >>>>> >>>>> On >>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl >>>>> Icon name: computer-vm >>>>> Chassis: vm >>>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>> Boot ID: eed98992fdda4c88bdd459a89101766b >>>>> Virtualization: vmware >>>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>>> CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>>> Architecture: x86-64 >>>>> >>>>> >>>>> I have configured the following options >>>>> >>>>> [root at jl-fanexoss1p glusterfs]# gluster volume info >>>>> Volume Name: gv0 >>>>> Type: Replicate >>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 1 x 3 = 3 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >>>>> Options Reconfigured: >>>>> performance.io-cache: off >>>>> performance.stat-prefetch: off >>>>> performance.quick-read: off >>>>> performance.parallel-readdir: off >>>>> performance.readdir-ahead: off >>>>> performance.write-behind: off >>>>> performance.read-ahead: off >>>>> performance.client-io-threads: off >>>>> nfs.disable: on >>>>> transport.address-family: inet >>>>> >>>>> I don't know if it is related, but I am seeing a lot of >>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >>>>> operation failed [No such device or address] >>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>>> handler >>>>> >>>> >>>> These msgs were introduced by patch [1]. To the best of my knowledge >>>> they are benign. We'll be sending a patch to fix these msgs though. >>>> >>>> +Mohit Agrawal <moagrawa at redhat.com> +Milind Changire >>>> <mchangir at redhat.com> . Can you try to identify why we are seeing >>>> these messages? If possible please send a patch to fix this. >>>> >>>> [1] >>>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5 >>>> >>>> >>>>> And java.io exceptions trying to rename files. >>>>> >>>> >>>> When you see the errors is it possible to collect, >>>> * strace of the java application (strace -ff -v ...) >>>> * fuse-dump of the glusterfs mount (use option --dump-fuse while >>>> mounting)? >>>> >>>> I also need another favour from you. By trail and error, can you point >>>> out which of the many performance xlators you've turned off is causing the >>>> issue? >>>> >>>> The above two data-points will help us to fix the problem. >>>> >>>> >>>>> Thank You, >>>>> Dmitry >>>>> >>>>> >>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa < >>>>> rgowdapp at redhat.com> wrote: >>>>> >>>>>> What version of glusterfs are you using? It might be either >>>>>> * a stale metadata issue. >>>>>> * inconsistent ctime issue. >>>>>> >>>>>> Can you try turning off all performance xlators? If the issue is 1, >>>>>> that should help. >>>>>> >>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev <isakdim at gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Attempted to set 'performance.read-ahead off` according to >>>>>>> https://jira.apache.org/jira/browse/AMQ-7041 >>>>>>> That did not help. >>>>>>> >>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev <isakdim at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> The core file generated by JVM suggests that it happens because the >>>>>>>> file is changing while it is being read - >>>>>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>>>>>>> The application reads in the zipfile and goes through the zip >>>>>>>> entries, then reloads the file and goes the zip entries again. It does so >>>>>>>> 3 times. The application never crushes on the 1st cycle but sometimes >>>>>>>> crushes on the 2nd or 3rd cycle. >>>>>>>> The zip file is generated about 20 seconds prior to it being used >>>>>>>> and is not updated or even used by any other application. I have never >>>>>>>> seen this problem on a plain file system. >>>>>>>> >>>>>>>> I would appreciate any suggestions on how to go debugging this >>>>>>>> issue. I can change the source code of the java application. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Dmitry >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/d5fb862e/attachment.html>