thr3ads.net - Gluster users - [Gluster-users] java application crushes while reading a zip file [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Raghavendra Gowdappa

2018-Dec-28 13:56 UTC

[Gluster-users] java application crushes while reading a zip file

On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at gmail.com>
wrote:
> Ok. I will try different options.
>
> This system is scheduled to go into production soon.  What version would
> you recommend to roll back to?
>
These are long standing issues. So, rolling back may not make these issues
go away. Instead if you think performance is agreeable to you, please keep
these xlators off in production.

> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
> wrote:
>
>>
>>
>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim at
gmail.com>
>> wrote:
>>
>>> Raghavendra,
>>>
>>> Thank  for the suggestion.
>>>
>>>
>>> I am suing
>>>
>>> [root at jl-fanexoss1p glusterfs]# gluster --version
>>> glusterfs 5.0
>>>
>>> On
>>> [root at jl-fanexoss1p glusterfs]# hostnamectl
>>>          Icon name: computer-vm
>>>            Chassis: vm
>>>         Machine ID: e44b8478ef7a467d98363614f4e50535
>>>            Boot ID: eed98992fdda4c88bdd459a89101766b
>>>     Virtualization: vmware
>>>   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
>>>        CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>             Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>>>       Architecture: x86-64
>>>
>>>
>>> I have configured the following options
>>>
>>> [root at jl-fanexoss1p glusterfs]# gluster volume info
>>> Volume Name: gv0
>>> Type: Replicate
>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>>> Options Reconfigured:
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.quick-read: off
>>> performance.parallel-readdir: off
>>> performance.readdir-ahead: off
>>> performance.write-behind: off
>>> performance.read-ahead: off
>>> performance.client-io-threads: off
>>> nfs.disable: on
>>> transport.address-family: inet
>>>
>>> I don't know if it is related, but I am seeing a lot of
>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0:
remote
>>> operation failed [No such device or address]
>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to
dispatch
>>> handler
>>>
>>
>> These msgs were introduced by patch [1]. To the best of my knowledge
they
>> are benign. We'll be sending a patch to fix these msgs though.
>>
>> +Mohit Agrawal <moagrawa at redhat.com> +Milind Changire
>> <mchangir at redhat.com> . Can you try to identify why we are
seeing these
>> messages? If possible please send a patch to fix this.
>>
>> [1]
>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5
>>
>>
>>> And java.io exceptions trying to rename files.
>>>
>>
>> When you see the errors is it possible to collect,
>> * strace of the java application (strace -ff -v ...)
>> * fuse-dump of the glusterfs mount (use option --dump-fuse while
>> mounting)?
>>
>> I also need another favour from you. By trail and error, can you point
>> out which of the many performance xlators you've turned off is
causing the
>> issue?
>>
>> The above two data-points will help us to fix the problem.
>>
>>
>>> Thank You,
>>> Dmitry
>>>
>>>
>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa <
>>> rgowdapp at redhat.com> wrote:
>>>
>>>> What version of glusterfs are you using? It might be either
>>>> * a stale metadata issue.
>>>> * inconsistent ctime issue.
>>>>
>>>> Can you try turning off all performance xlators? If the issue
is 1,
>>>> that should help.
>>>>
>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev <isakdim at
gmail.com>
>>>> wrote:
>>>>
>>>>> Attempted to set 'performance.read-ahead off` according
to
>>>>> https://jira.apache.org/jira/browse/AMQ-7041
>>>>> That did not help.
>>>>>
>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> The core file generated by JVM suggests that it happens
because the
>>>>>> file is changing while it is being read -
>>>>>>
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>>>>> The application reads in the zipfile and goes through
the zip
>>>>>> entries, then reloads the file and goes the zip entries
again.  It does so
>>>>>> 3 times.  The application never crushes on the 1st
cycle but sometimes
>>>>>> crushes on the 2nd or 3rd cycle.
>>>>>> The zip file is generated about 20 seconds prior to it
being used and
>>>>>> is not updated or even used by any other application. 
I have never seen
>>>>>> this problem on a plain file system.
>>>>>>
>>>>>> I would appreciate any suggestions on how to go
debugging this
>>>>>> issue.  I can change the source code of the java
application.
>>>>>>
>>>>>> Regards,
>>>>>> Dmitry
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/d459dc91/attachment.html>

Dmitry Isakbayev

2018-Dec-28 15:24 UTC

head link

[Gluster-users] java application crushes while reading a zip file

Turning a single option on at a time still worked fine.  I will keep trying.

We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
messages.  Do you suppose these issues are triggered by the new environment
or did not exist in 4.1.5?

[root at node1 ~]# glusterfs --version
glusterfs 4.1.5

On AWS using
[root at node1 ~]# hostnamectl
   Static hostname: node1
         Icon name: computer-vm
           Chassis: vm
        Machine ID: b30d0f2110ac3807b210c19ede3ce88f
           Boot ID: 52bb159a0aa94043a40e7c7651967bd9
    Virtualization: kvm
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
            Kernel: Linux 3.10.0-862.3.2.el7.x86_64
      Architecture: x86-64




On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <rgowdapp at
redhat.com>
wrote:
>
>
> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at
gmail.com>
> wrote:
>
>> Ok. I will try different options.
>>
>> This system is scheduled to go into production soon.  What version
would
>> you recommend to roll back to?
>>
>
> These are long standing issues. So, rolling back may not make these issues
> go away. Instead if you think performance is agreeable to you, please keep
> these xlators off in production.
>
>
>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>> rgowdapp at redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim at
gmail.com>
>>> wrote:
>>>
>>>> Raghavendra,
>>>>
>>>> Thank  for the suggestion.
>>>>
>>>>
>>>> I am suing
>>>>
>>>> [root at jl-fanexoss1p glusterfs]# gluster --version
>>>> glusterfs 5.0
>>>>
>>>> On
>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl
>>>>          Icon name: computer-vm
>>>>            Chassis: vm
>>>>         Machine ID: e44b8478ef7a467d98363614f4e50535
>>>>            Boot ID: eed98992fdda4c88bdd459a89101766b
>>>>     Virtualization: vmware
>>>>   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
>>>>        CPE OS Name:
cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>>             Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>>>>       Architecture: x86-64
>>>>
>>>>
>>>> I have configured the following options
>>>>
>>>> [root at jl-fanexoss1p glusterfs]# gluster volume info
>>>> Volume Name: gv0
>>>> Type: Replicate
>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>>>> Options Reconfigured:
>>>> performance.io-cache: off
>>>> performance.stat-prefetch: off
>>>> performance.quick-read: off
>>>> performance.parallel-readdir: off
>>>> performance.readdir-ahead: off
>>>> performance.write-behind: off
>>>> performance.read-ahead: off
>>>> performance.client-io-threads: off
>>>> nfs.disable: on
>>>> transport.address-family: inet
>>>>
>>>> I don't know if it is related, but I am seeing a lot of
>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0:
remote
>>>> operation failed [No such device or address]
>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed
to dispatch
>>>> handler
>>>>
>>>
>>> These msgs were introduced by patch [1]. To the best of my
knowledge
>>> they are benign. We'll be sending a patch to fix these msgs
though.
>>>
>>> +Mohit Agrawal <moagrawa at redhat.com> +Milind Changire
>>> <mchangir at redhat.com> . Can you try to identify why we are
seeing these
>>> messages? If possible please send a patch to fix this.
>>>
>>> [1]
>>>
https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5
>>>
>>>
>>>> And java.io exceptions trying to rename files.
>>>>
>>>
>>> When you see the errors is it possible to collect,
>>> * strace of the java application (strace -ff -v ...)
>>> * fuse-dump of the glusterfs mount (use option --dump-fuse while
>>> mounting)?
>>>
>>> I also need another favour from you. By trail and error, can you
point
>>> out which of the many performance xlators you've turned off is
causing the
>>> issue?
>>>
>>> The above two data-points will help us to fix the problem.
>>>
>>>
>>>> Thank You,
>>>> Dmitry
>>>>
>>>>
>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa <
>>>> rgowdapp at redhat.com> wrote:
>>>>
>>>>> What version of glusterfs are you using? It might be either
>>>>> * a stale metadata issue.
>>>>> * inconsistent ctime issue.
>>>>>
>>>>> Can you try turning off all performance xlators? If the
issue is 1,
>>>>> that should help.
>>>>>
>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Attempted to set 'performance.read-ahead off`
according to
>>>>>> https://jira.apache.org/jira/browse/AMQ-7041
>>>>>> That did not help.
>>>>>>
>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The core file generated by JVM suggests that it
happens because the
>>>>>>> file is changing while it is being read -
>>>>>>>
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>>>>>> The application reads in the zipfile and goes
through the zip
>>>>>>> entries, then reloads the file and goes the zip
entries again.  It does so
>>>>>>> 3 times.  The application never crushes on the 1st
cycle but sometimes
>>>>>>> crushes on the 2nd or 3rd cycle.
>>>>>>> The zip file is generated about 20 seconds prior to
it being used
>>>>>>> and is not updated or even used by any other
application.  I have never
>>>>>>> seen this problem on a plain file system.
>>>>>>>
>>>>>>> I would appreciate any suggestions on how to go
debugging this
>>>>>>> issue.  I can change the source code of the java
application.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Dmitry
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/3640b2cd/attachment.html>

Gluster users - Dec 2018 - java application crushes while reading a zip file

[Gluster-users] java application crushes while reading a zip file

[Gluster-users] java application crushes while reading a zip file