thr3ads.net - Gluster users - [Gluster-users] java application crushes while reading a zip file [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Dmitry Isakbayev

2018-Dec-28 15:24 UTC

[Gluster-users] java application crushes while reading a zip file

Turning a single option on at a time still worked fine.  I will keep trying.

We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
messages.  Do you suppose these issues are triggered by the new environment
or did not exist in 4.1.5?

[root at node1 ~]# glusterfs --version
glusterfs 4.1.5

On AWS using
[root at node1 ~]# hostnamectl
   Static hostname: node1
         Icon name: computer-vm
           Chassis: vm
        Machine ID: b30d0f2110ac3807b210c19ede3ce88f
           Boot ID: 52bb159a0aa94043a40e7c7651967bd9
    Virtualization: kvm
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
            Kernel: Linux 3.10.0-862.3.2.el7.x86_64
      Architecture: x86-64




On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <rgowdapp at
redhat.com>
wrote:
>
>
> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at
gmail.com>
> wrote:
>
>> Ok. I will try different options.
>>
>> This system is scheduled to go into production soon.  What version
would
>> you recommend to roll back to?
>>
>
> These are long standing issues. So, rolling back may not make these issues
> go away. Instead if you think performance is agreeable to you, please keep
> these xlators off in production.
>
>
>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>> rgowdapp at redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim at
gmail.com>
>>> wrote:
>>>
>>>> Raghavendra,
>>>>
>>>> Thank  for the suggestion.
>>>>
>>>>
>>>> I am suing
>>>>
>>>> [root at jl-fanexoss1p glusterfs]# gluster --version
>>>> glusterfs 5.0
>>>>
>>>> On
>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl
>>>>          Icon name: computer-vm
>>>>            Chassis: vm
>>>>         Machine ID: e44b8478ef7a467d98363614f4e50535
>>>>            Boot ID: eed98992fdda4c88bdd459a89101766b
>>>>     Virtualization: vmware
>>>>   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
>>>>        CPE OS Name:
cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>>             Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>>>>       Architecture: x86-64
>>>>
>>>>
>>>> I have configured the following options
>>>>
>>>> [root at jl-fanexoss1p glusterfs]# gluster volume info
>>>> Volume Name: gv0
>>>> Type: Replicate
>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>>>> Options Reconfigured:
>>>> performance.io-cache: off
>>>> performance.stat-prefetch: off
>>>> performance.quick-read: off
>>>> performance.parallel-readdir: off
>>>> performance.readdir-ahead: off
>>>> performance.write-behind: off
>>>> performance.read-ahead: off
>>>> performance.client-io-threads: off
>>>> nfs.disable: on
>>>> transport.address-family: inet
>>>>
>>>> I don't know if it is related, but I am seeing a lot of
>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0:
remote
>>>> operation failed [No such device or address]
>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed
to dispatch
>>>> handler
>>>>
>>>
>>> These msgs were introduced by patch [1]. To the best of my
knowledge
>>> they are benign. We'll be sending a patch to fix these msgs
though.
>>>
>>> +Mohit Agrawal <moagrawa at redhat.com> +Milind Changire
>>> <mchangir at redhat.com> . Can you try to identify why we are
seeing these
>>> messages? If possible please send a patch to fix this.
>>>
>>> [1]
>>>
https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5
>>>
>>>
>>>> And java.io exceptions trying to rename files.
>>>>
>>>
>>> When you see the errors is it possible to collect,
>>> * strace of the java application (strace -ff -v ...)
>>> * fuse-dump of the glusterfs mount (use option --dump-fuse while
>>> mounting)?
>>>
>>> I also need another favour from you. By trail and error, can you
point
>>> out which of the many performance xlators you've turned off is
causing the
>>> issue?
>>>
>>> The above two data-points will help us to fix the problem.
>>>
>>>
>>>> Thank You,
>>>> Dmitry
>>>>
>>>>
>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa <
>>>> rgowdapp at redhat.com> wrote:
>>>>
>>>>> What version of glusterfs are you using? It might be either
>>>>> * a stale metadata issue.
>>>>> * inconsistent ctime issue.
>>>>>
>>>>> Can you try turning off all performance xlators? If the
issue is 1,
>>>>> that should help.
>>>>>
>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Attempted to set 'performance.read-ahead off`
according to
>>>>>> https://jira.apache.org/jira/browse/AMQ-7041
>>>>>> That did not help.
>>>>>>
>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The core file generated by JVM suggests that it
happens because the
>>>>>>> file is changing while it is being read -
>>>>>>>
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>>>>>> The application reads in the zipfile and goes
through the zip
>>>>>>> entries, then reloads the file and goes the zip
entries again.  It does so
>>>>>>> 3 times.  The application never crushes on the 1st
cycle but sometimes
>>>>>>> crushes on the 2nd or 3rd cycle.
>>>>>>> The zip file is generated about 20 seconds prior to
it being used
>>>>>>> and is not updated or even used by any other
application.  I have never
>>>>>>> seen this problem on a plain file system.
>>>>>>>
>>>>>>> I would appreciate any suggestions on how to go
debugging this
>>>>>>> issue.  I can change the source code of the java
application.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Dmitry
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/3640b2cd/attachment.html>

Dmitry Isakbayev

2018-Dec-28 20:37 UTC

head link

[Gluster-users] java application crushes while reading a zip file

These 3 options seem to trigger both (reading zip file and renaming files)
problems.

Options Reconfigured:
performance.io-cache: off
performance.stat-prefetch: off
performance.quick-read: off
performance.parallel-readdir: off
*performance.readdir-ahead: on*
*performance.write-behind: on*
*performance.read-ahead: on*
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet


On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isakdim at gmail.com>
wrote:
> Turning a single option on at a time still worked fine.  I will keep
> trying.
>
> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
> messages.  Do you suppose these issues are triggered by the new environment
> or did not exist in 4.1.5?
>
> [root at node1 ~]# glusterfs --version
> glusterfs 4.1.5
>
> On AWS using
> [root at node1 ~]# hostnamectl
>    Static hostname: node1
>          Icon name: computer-vm
>            Chassis: vm
>         Machine ID: b30d0f2110ac3807b210c19ede3ce88f
>            Boot ID: 52bb159a0aa94043a40e7c7651967bd9
>     Virtualization: kvm
>   Operating System: CentOS Linux 7 (Core)
>        CPE OS Name: cpe:/o:centos:centos:7
>             Kernel: Linux 3.10.0-862.3.2.el7.x86_64
>       Architecture: x86-64
>
>
>
>
> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <rgowdapp at
redhat.com>
> wrote:
>
>>
>>
>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at
gmail.com>
>> wrote:
>>
>>> Ok. I will try different options.
>>>
>>> This system is scheduled to go into production soon.  What version
would
>>> you recommend to roll back to?
>>>
>>
>> These are long standing issues. So, rolling back may not make these
>> issues go away. Instead if you think performance is agreeable to you,
>> please keep these xlators off in production.
>>
>>
>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>>> rgowdapp at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim at
gmail.com>
>>>> wrote:
>>>>
>>>>> Raghavendra,
>>>>>
>>>>> Thank  for the suggestion.
>>>>>
>>>>>
>>>>> I am suing
>>>>>
>>>>> [root at jl-fanexoss1p glusterfs]# gluster --version
>>>>> glusterfs 5.0
>>>>>
>>>>> On
>>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl
>>>>>          Icon name: computer-vm
>>>>>            Chassis: vm
>>>>>         Machine ID: e44b8478ef7a467d98363614f4e50535
>>>>>            Boot ID: eed98992fdda4c88bdd459a89101766b
>>>>>     Virtualization: vmware
>>>>>   Operating System: Red Hat Enterprise Linux Server 7.5
(Maipo)
>>>>>        CPE OS Name:
cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>>>             Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>>>>>       Architecture: x86-64
>>>>>
>>>>>
>>>>> I have configured the following options
>>>>>
>>>>> [root at jl-fanexoss1p glusterfs]# gluster volume info
>>>>> Volume Name: gv0
>>>>> Type: Replicate
>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x 3 = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
>>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>>>>> Options Reconfigured:
>>>>> performance.io-cache: off
>>>>> performance.stat-prefetch: off
>>>>> performance.quick-read: off
>>>>> performance.parallel-readdir: off
>>>>> performance.readdir-ahead: off
>>>>> performance.write-behind: off
>>>>> performance.read-ahead: off
>>>>> performance.client-io-threads: off
>>>>> nfs.disable: on
>>>>> transport.address-family: inet
>>>>>
>>>>> I don't know if it is related, but I am seeing a lot of
>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk]
2-gv0-client-0: remote
>>>>> operation failed [No such device or address]
>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll:
Failed to dispatch
>>>>> handler
>>>>>
>>>>
>>>> These msgs were introduced by patch [1]. To the best of my
knowledge
>>>> they are benign. We'll be sending a patch to fix these msgs
though.
>>>>
>>>> +Mohit Agrawal <moagrawa at redhat.com> +Milind Changire
>>>> <mchangir at redhat.com> . Can you try to identify why we
are seeing
>>>> these messages? If possible please send a patch to fix this.
>>>>
>>>> [1]
>>>>
https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5
>>>>
>>>>
>>>>> And java.io exceptions trying to rename files.
>>>>>
>>>>
>>>> When you see the errors is it possible to collect,
>>>> * strace of the java application (strace -ff -v ...)
>>>> * fuse-dump of the glusterfs mount (use option --dump-fuse
while
>>>> mounting)?
>>>>
>>>> I also need another favour from you. By trail and error, can
you point
>>>> out which of the many performance xlators you've turned off
is causing the
>>>> issue?
>>>>
>>>> The above two data-points will help us to fix the problem.
>>>>
>>>>
>>>>> Thank You,
>>>>> Dmitry
>>>>>
>>>>>
>>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa <
>>>>> rgowdapp at redhat.com> wrote:
>>>>>
>>>>>> What version of glusterfs are you using? It might be
either
>>>>>> * a stale metadata issue.
>>>>>> * inconsistent ctime issue.
>>>>>>
>>>>>> Can you try turning off all performance xlators? If the
issue is 1,
>>>>>> that should help.
>>>>>>
>>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Attempted to set 'performance.read-ahead off`
according to
>>>>>>> https://jira.apache.org/jira/browse/AMQ-7041
>>>>>>> That did not help.
>>>>>>>
>>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The core file generated by JVM suggests that it
happens because the
>>>>>>>> file is changing while it is being read -
>>>>>>>>
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>>>>>>> The application reads in the zipfile and goes
through the zip
>>>>>>>> entries, then reloads the file and goes the zip
entries again.  It does so
>>>>>>>> 3 times.  The application never crushes on the
1st cycle but sometimes
>>>>>>>> crushes on the 2nd or 3rd cycle.
>>>>>>>> The zip file is generated about 20 seconds
prior to it being used
>>>>>>>> and is not updated or even used by any other
application.  I have never
>>>>>>>> seen this problem on a plain file system.
>>>>>>>>
>>>>>>>> I would appreciate any suggestions on how to go
debugging this
>>>>>>>> issue.  I can change the source code of the
java application.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Dmitry
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/d5fb862e/attachment.html>

Gluster users - Dec 2018 - java application crushes while reading a zip file

[Gluster-users] java application crushes while reading a zip file

[Gluster-users] java application crushes while reading a zip file