thr3ads.net - Gluster users - [Gluster-users] java application crushes while reading a zip file [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Raghavendra Gowdappa

2018-Dec-29 23:46 UTC

[Gluster-users] java application crushes while reading a zip file

Thanks Dmitry. Can you provide the following debug info I asked earlier:

* strace -ff -v ... of java application
* dump of the I/O traffic seen by the mountpoint (use --dump-fuse while
mounting).

regards,
Raghavendra

On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev <isakdim at gmail.com>
wrote:
> These 3 options seem to trigger both (reading zip file and renaming files)
> problems.
>
> Options Reconfigured:
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.quick-read: off
> performance.parallel-readdir: off
> *performance.readdir-ahead: on*
> *performance.write-behind: on*
> *performance.read-ahead: on*
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
>
>
> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isakdim at
gmail.com>
> wrote:
>
>> Turning a single option on at a time still worked fine.  I will keep
>> trying.
>>
>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
>> messages.  Do you suppose these issues are triggered by the new
environment
>> or did not exist in 4.1.5?
>>
>> [root at node1 ~]# glusterfs --version
>> glusterfs 4.1.5
>>
>> On AWS using
>> [root at node1 ~]# hostnamectl
>>    Static hostname: node1
>>          Icon name: computer-vm
>>            Chassis: vm
>>         Machine ID: b30d0f2110ac3807b210c19ede3ce88f
>>            Boot ID: 52bb159a0aa94043a40e7c7651967bd9
>>     Virtualization: kvm
>>   Operating System: CentOS Linux 7 (Core)
>>        CPE OS Name: cpe:/o:centos:centos:7
>>             Kernel: Linux 3.10.0-862.3.2.el7.x86_64
>>       Architecture: x86-64
>>
>>
>>
>>
>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <rgowdapp at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at
gmail.com>
>>> wrote:
>>>
>>>> Ok. I will try different options.
>>>>
>>>> This system is scheduled to go into production soon.  What
version
>>>> would you recommend to roll back to?
>>>>
>>>
>>> These are long standing issues. So, rolling back may not make these
>>> issues go away. Instead if you think performance is agreeable to
you,
>>> please keep these xlators off in production.
>>>
>>>
>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>>>> rgowdapp at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Raghavendra,
>>>>>>
>>>>>> Thank  for the suggestion.
>>>>>>
>>>>>>
>>>>>> I am suing
>>>>>>
>>>>>> [root at jl-fanexoss1p glusterfs]# gluster --version
>>>>>> glusterfs 5.0
>>>>>>
>>>>>> On
>>>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl
>>>>>>          Icon name: computer-vm
>>>>>>            Chassis: vm
>>>>>>         Machine ID: e44b8478ef7a467d98363614f4e50535
>>>>>>            Boot ID: eed98992fdda4c88bdd459a89101766b
>>>>>>     Virtualization: vmware
>>>>>>   Operating System: Red Hat Enterprise Linux Server 7.5
(Maipo)
>>>>>>        CPE OS Name:
cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>>>>             Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>>>>>>       Architecture: x86-64
>>>>>>
>>>>>>
>>>>>> I have configured the following options
>>>>>>
>>>>>> [root at jl-fanexoss1p glusterfs]# gluster volume info
>>>>>> Volume Name: gv0
>>>>>> Type: Replicate
>>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
>>>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
>>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>>>>>> Options Reconfigured:
>>>>>> performance.io-cache: off
>>>>>> performance.stat-prefetch: off
>>>>>> performance.quick-read: off
>>>>>> performance.parallel-readdir: off
>>>>>> performance.readdir-ahead: off
>>>>>> performance.write-behind: off
>>>>>> performance.read-ahead: off
>>>>>> performance.client-io-threads: off
>>>>>> nfs.disable: on
>>>>>> transport.address-family: inet
>>>>>>
>>>>>> I don't know if it is related, but I am seeing a
lot of
>>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk]
2-gv0-client-0: remote
>>>>>> operation failed [No such device or address]
>>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>> handler
>>>>>>
>>>>>
>>>>> These msgs were introduced by patch [1]. To the best of my
knowledge
>>>>> they are benign. We'll be sending a patch to fix these
msgs though.
>>>>>
>>>>> +Mohit Agrawal <moagrawa at redhat.com> +Milind
Changire
>>>>> <mchangir at redhat.com> . Can you try to identify
why we are seeing
>>>>> these messages? If possible please send a patch to fix
this.
>>>>>
>>>>> [1]
>>>>>
https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5
>>>>>
>>>>>
>>>>>> And java.io exceptions trying to rename files.
>>>>>>
>>>>>
>>>>> When you see the errors is it possible to collect,
>>>>> * strace of the java application (strace -ff -v ...)
>>>>> * fuse-dump of the glusterfs mount (use option --dump-fuse
while
>>>>> mounting)?
>>>>>
>>>>> I also need another favour from you. By trail and error,
can you point
>>>>> out which of the many performance xlators you've turned
off is causing the
>>>>> issue?
>>>>>
>>>>> The above two data-points will help us to fix the problem.
>>>>>
>>>>>
>>>>>> Thank You,
>>>>>> Dmitry
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa
<
>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>
>>>>>>> What version of glusterfs are you using? It might
be either
>>>>>>> * a stale metadata issue.
>>>>>>> * inconsistent ctime issue.
>>>>>>>
>>>>>>> Can you try turning off all performance xlators? If
the issue is 1,
>>>>>>> that should help.
>>>>>>>
>>>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Attempted to set 'performance.read-ahead
off` according to
>>>>>>>> https://jira.apache.org/jira/browse/AMQ-7041
>>>>>>>> That did not help.
>>>>>>>>
>>>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry
Isakbayev <isakdim at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The core file generated by JVM suggests
that it happens because
>>>>>>>>> the file is changing while it is being read
-
>>>>>>>>>
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>>>>>>>> The application reads in the zipfile and
goes through the zip
>>>>>>>>> entries, then reloads the file and goes the
zip entries again.  It does so
>>>>>>>>> 3 times.  The application never crushes on
the 1st cycle but sometimes
>>>>>>>>> crushes on the 2nd or 3rd cycle.
>>>>>>>>> The zip file is generated about 20 seconds
prior to it being used
>>>>>>>>> and is not updated or even used by any
other application.  I have never
>>>>>>>>> seen this problem on a plain file system.
>>>>>>>>>
>>>>>>>>> I would appreciate any suggestions on how
to go debugging this
>>>>>>>>> issue.  I can change the source code of the
java application.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Dmitry
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181230/34c8b9b0/attachment.html>

Dmitry Isakbayev

2018-Dec-31 18:38 UTC

head link

[Gluster-users] java application crushes while reading a zip file

The software ran with all of the options turned off over the weekend
without any problems.
I will try to collect the debug info for you.  I have re-enabled the 3
three options, but yet to see the problem reoccurring.


On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
wrote:
> Thanks Dmitry. Can you provide the following debug info I asked earlier:
>
> * strace -ff -v ... of java application
> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while
> mounting).
>
> regards,
> Raghavendra
>
> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev <isakdim at
gmail.com>
> wrote:
>
>> These 3 options seem to trigger both (reading zip file and renaming
>> files) problems.
>>
>> Options Reconfigured:
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> performance.quick-read: off
>> performance.parallel-readdir: off
>> *performance.readdir-ahead: on*
>> *performance.write-behind: on*
>> *performance.read-ahead: on*
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>>
>>
>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isakdim at
gmail.com>
>> wrote:
>>
>>> Turning a single option on at a time still worked fine.  I will
keep
>>> trying.
>>>
>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or
log
>>> messages.  Do you suppose these issues are triggered by the new
environment
>>> or did not exist in 4.1.5?
>>>
>>> [root at node1 ~]# glusterfs --version
>>> glusterfs 4.1.5
>>>
>>> On AWS using
>>> [root at node1 ~]# hostnamectl
>>>    Static hostname: node1
>>>          Icon name: computer-vm
>>>            Chassis: vm
>>>         Machine ID: b30d0f2110ac3807b210c19ede3ce88f
>>>            Boot ID: 52bb159a0aa94043a40e7c7651967bd9
>>>     Virtualization: kvm
>>>   Operating System: CentOS Linux 7 (Core)
>>>        CPE OS Name: cpe:/o:centos:centos:7
>>>             Kernel: Linux 3.10.0-862.3.2.el7.x86_64
>>>       Architecture: x86-64
>>>
>>>
>>>
>>>
>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <
>>> rgowdapp at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim at
gmail.com>
>>>> wrote:
>>>>
>>>>> Ok. I will try different options.
>>>>>
>>>>> This system is scheduled to go into production soon.  What
version
>>>>> would you recommend to roll back to?
>>>>>
>>>>
>>>> These are long standing issues. So, rolling back may not make
these
>>>> issues go away. Instead if you think performance is agreeable
to you,
>>>> please keep these xlators off in production.
>>>>
>>>>
>>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>>>>> rgowdapp at redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Raghavendra,
>>>>>>>
>>>>>>> Thank  for the suggestion.
>>>>>>>
>>>>>>>
>>>>>>> I am suing
>>>>>>>
>>>>>>> [root at jl-fanexoss1p glusterfs]# gluster
--version
>>>>>>> glusterfs 5.0
>>>>>>>
>>>>>>> On
>>>>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl
>>>>>>>          Icon name: computer-vm
>>>>>>>            Chassis: vm
>>>>>>>         Machine ID:
e44b8478ef7a467d98363614f4e50535
>>>>>>>            Boot ID:
eed98992fdda4c88bdd459a89101766b
>>>>>>>     Virtualization: vmware
>>>>>>>   Operating System: Red Hat Enterprise Linux Server
7.5 (Maipo)
>>>>>>>        CPE OS Name:
cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>>>>>             Kernel: Linux
3.10.0-862.14.4.el7.x86_64
>>>>>>>       Architecture: x86-64
>>>>>>>
>>>>>>>
>>>>>>> I have configured the following options
>>>>>>>
>>>>>>> [root at jl-fanexoss1p glusterfs]# gluster volume
info
>>>>>>> Volume Name: gv0
>>>>>>> Type: Replicate
>>>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>>>>>>> Status: Started
>>>>>>> Snapshot Count: 0
>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
>>>>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
>>>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>>>>>>> Options Reconfigured:
>>>>>>> performance.io-cache: off
>>>>>>> performance.stat-prefetch: off
>>>>>>> performance.quick-read: off
>>>>>>> performance.parallel-readdir: off
>>>>>>> performance.readdir-ahead: off
>>>>>>> performance.write-behind: off
>>>>>>> performance.read-ahead: off
>>>>>>> performance.client-io-threads: off
>>>>>>> nfs.disable: on
>>>>>>> transport.address-family: inet
>>>>>>>
>>>>>>> I don't know if it is related, but I am seeing
a lot of
>>>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>>>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk]
2-gv0-client-0: remote
>>>>>>> operation failed [No such device or address]
>>>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>> handler
>>>>>>>
>>>>>>
>>>>>> These msgs were introduced by patch [1]. To the best of
my knowledge
>>>>>> they are benign. We'll be sending a patch to fix
these msgs though.
>>>>>>
>>>>>> +Mohit Agrawal <moagrawa at redhat.com> +Milind
Changire
>>>>>> <mchangir at redhat.com> . Can you try to
identify why we are seeing
>>>>>> these messages? If possible please send a patch to fix
this.
>>>>>>
>>>>>> [1]
>>>>>>
https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5
>>>>>>
>>>>>>
>>>>>>> And java.io exceptions trying to rename files.
>>>>>>>
>>>>>>
>>>>>> When you see the errors is it possible to collect,
>>>>>> * strace of the java application (strace -ff -v ...)
>>>>>> * fuse-dump of the glusterfs mount (use option
--dump-fuse while
>>>>>> mounting)?
>>>>>>
>>>>>> I also need another favour from you. By trail and
error, can you
>>>>>> point out which of the many performance xlators
you've turned off is
>>>>>> causing the issue?
>>>>>>
>>>>>> The above two data-points will help us to fix the
problem.
>>>>>>
>>>>>>
>>>>>>> Thank You,
>>>>>>> Dmitry
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra
Gowdappa <
>>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>>
>>>>>>>> What version of glusterfs are you using? It
might be either
>>>>>>>> * a stale metadata issue.
>>>>>>>> * inconsistent ctime issue.
>>>>>>>>
>>>>>>>> Can you try turning off all performance
xlators? If the issue is 1,
>>>>>>>> that should help.
>>>>>>>>
>>>>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry
Isakbayev <isakdim at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Attempted to set
'performance.read-ahead off` according to
>>>>>>>>>
https://jira.apache.org/jira/browse/AMQ-7041
>>>>>>>>> That did not help.
>>>>>>>>>
>>>>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry
Isakbayev <
>>>>>>>>> isakdim at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The core file generated by JVM suggests
that it happens because
>>>>>>>>>> the file is changing while it is being
read -
>>>>>>>>>>
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>>>>>>>>> The application reads in the zipfile
and goes through the zip
>>>>>>>>>> entries, then reloads the file and goes
the zip entries again.  It does so
>>>>>>>>>> 3 times.  The application never crushes
on the 1st cycle but sometimes
>>>>>>>>>> crushes on the 2nd or 3rd cycle.
>>>>>>>>>> The zip file is generated about 20
seconds prior to it being used
>>>>>>>>>> and is not updated or even used by any
other application.  I have never
>>>>>>>>>> seen this problem on a plain file
system.
>>>>>>>>>>
>>>>>>>>>> I would appreciate any suggestions on
how to go debugging this
>>>>>>>>>> issue.  I can change the source code of
the java application.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Dmitry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
_______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20181231/bbbd571a/attachment.html>

Dmitry Isakbayev

2019-Jan-02 16:28 UTC

head link

[Gluster-users] java application crushes while reading a zip file

Still no JVM crushes.  Is it possible that running glusterfs with
performance options turned off for a couple of days cleared out the "stale
metadata issue"?


On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev <isakdim at gmail.com>
wrote:
> The software ran with all of the options turned off over the weekend
> without any problems.
> I will try to collect the debug info for you.  I have re-enabled the 3
> three options, but yet to see the problem reoccurring.
>
>
> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa <rgowdapp at
redhat.com>
> wrote:
>
>> Thanks Dmitry. Can you provide the following debug info I asked
earlier:
>>
>> * strace -ff -v ... of java application
>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while
>> mounting).
>>
>> regards,
>> Raghavendra
>>
>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev <isakdim at
gmail.com>
>> wrote:
>>
>>> These 3 options seem to trigger both (reading zip file and renaming
>>> files) problems.
>>>
>>> Options Reconfigured:
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.quick-read: off
>>> performance.parallel-readdir: off
>>> *performance.readdir-ahead: on*
>>> *performance.write-behind: on*
>>> *performance.read-ahead: on*
>>> performance.client-io-threads: off
>>> nfs.disable: on
>>> transport.address-family: inet
>>>
>>>
>>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isakdim at
gmail.com>
>>> wrote:
>>>
>>>> Turning a single option on at a time still worked fine.  I will
keep
>>>> trying.
>>>>
>>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues
or log
>>>> messages.  Do you suppose these issues are triggered by the new
environment
>>>> or did not exist in 4.1.5?
>>>>
>>>> [root at node1 ~]# glusterfs --version
>>>> glusterfs 4.1.5
>>>>
>>>> On AWS using
>>>> [root at node1 ~]# hostnamectl
>>>>    Static hostname: node1
>>>>          Icon name: computer-vm
>>>>            Chassis: vm
>>>>         Machine ID: b30d0f2110ac3807b210c19ede3ce88f
>>>>            Boot ID: 52bb159a0aa94043a40e7c7651967bd9
>>>>     Virtualization: kvm
>>>>   Operating System: CentOS Linux 7 (Core)
>>>>        CPE OS Name: cpe:/o:centos:centos:7
>>>>             Kernel: Linux 3.10.0-862.3.2.el7.x86_64
>>>>       Architecture: x86-64
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <
>>>> rgowdapp at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Ok. I will try different options.
>>>>>>
>>>>>> This system is scheduled to go into production soon. 
What version
>>>>>> would you recommend to roll back to?
>>>>>>
>>>>>
>>>>> These are long standing issues. So, rolling back may not
make these
>>>>> issues go away. Instead if you think performance is
agreeable to you,
>>>>> please keep these xlators off in production.
>>>>>
>>>>>
>>>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa
<
>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev
<isakdim at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Raghavendra,
>>>>>>>>
>>>>>>>> Thank  for the suggestion.
>>>>>>>>
>>>>>>>>
>>>>>>>> I am suing
>>>>>>>>
>>>>>>>> [root at jl-fanexoss1p glusterfs]# gluster
--version
>>>>>>>> glusterfs 5.0
>>>>>>>>
>>>>>>>> On
>>>>>>>> [root at jl-fanexoss1p glusterfs]# hostnamectl
>>>>>>>>          Icon name: computer-vm
>>>>>>>>            Chassis: vm
>>>>>>>>         Machine ID:
e44b8478ef7a467d98363614f4e50535
>>>>>>>>            Boot ID:
eed98992fdda4c88bdd459a89101766b
>>>>>>>>     Virtualization: vmware
>>>>>>>>   Operating System: Red Hat Enterprise Linux
Server 7.5 (Maipo)
>>>>>>>>        CPE OS Name:
cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>>>>>>             Kernel: Linux
3.10.0-862.14.4.el7.x86_64
>>>>>>>>       Architecture: x86-64
>>>>>>>>
>>>>>>>>
>>>>>>>> I have configured the following options
>>>>>>>>
>>>>>>>> [root at jl-fanexoss1p glusterfs]# gluster
volume info
>>>>>>>> Volume Name: gv0
>>>>>>>> Type: Replicate
>>>>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>>>>>>>> Status: Started
>>>>>>>> Snapshot Count: 0
>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1:
jl-fanexoss1p.cspire.net:/data/brick1/gv0
>>>>>>>> Brick2:
sl-fanexoss2p.cspire.net:/data/brick1/gv0
>>>>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>>>>>>>> Options Reconfigured:
>>>>>>>> performance.io-cache: off
>>>>>>>> performance.stat-prefetch: off
>>>>>>>> performance.quick-read: off
>>>>>>>> performance.parallel-readdir: off
>>>>>>>> performance.readdir-ahead: off
>>>>>>>> performance.write-behind: off
>>>>>>>> performance.read-ahead: off
>>>>>>>> performance.client-io-threads: off
>>>>>>>> nfs.disable: on
>>>>>>>> transport.address-family: inet
>>>>>>>>
>>>>>>>> I don't know if it is related, but I am
seeing a lot of
>>>>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>>>>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk]
2-gv0-client-0: remote
>>>>>>>> operation failed [No such device or address]
>>>>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker]
2-epoll: Failed to dispatch
>>>>>>>> handler
>>>>>>>>
>>>>>>>
>>>>>>> These msgs were introduced by patch [1]. To the
best of my knowledge
>>>>>>> they are benign. We'll be sending a patch to
fix these msgs though.
>>>>>>>
>>>>>>> +Mohit Agrawal <moagrawa at redhat.com>
+Milind Changire
>>>>>>> <mchangir at redhat.com> . Can you try to
identify why we are seeing
>>>>>>> these messages? If possible please send a patch to
fix this.
>>>>>>>
>>>>>>> [1]
>>>>>>>
https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5
>>>>>>>
>>>>>>>
>>>>>>>> And java.io exceptions trying to rename files.
>>>>>>>>
>>>>>>>
>>>>>>> When you see the errors is it possible to collect,
>>>>>>> * strace of the java application (strace -ff -v
...)
>>>>>>> * fuse-dump of the glusterfs mount (use option
--dump-fuse while
>>>>>>> mounting)?
>>>>>>>
>>>>>>> I also need another favour from you. By trail and
error, can you
>>>>>>> point out which of the many performance xlators
you've turned off is
>>>>>>> causing the issue?
>>>>>>>
>>>>>>> The above two data-points will help us to fix the
problem.
>>>>>>>
>>>>>>>
>>>>>>>> Thank You,
>>>>>>>> Dmitry
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra
Gowdappa <
>>>>>>>> rgowdapp at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> What version of glusterfs are you using? It
might be either
>>>>>>>>> * a stale metadata issue.
>>>>>>>>> * inconsistent ctime issue.
>>>>>>>>>
>>>>>>>>> Can you try turning off all performance
xlators? If the issue is
>>>>>>>>> 1, that should help.
>>>>>>>>>
>>>>>>>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry
Isakbayev <
>>>>>>>>> isakdim at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Attempted to set
'performance.read-ahead off` according to
>>>>>>>>>>
https://jira.apache.org/jira/browse/AMQ-7041
>>>>>>>>>> That did not help.
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry
Isakbayev <
>>>>>>>>>> isakdim at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The core file generated by JVM
suggests that it happens because
>>>>>>>>>>> the file is changing while it is
being read -
>>>>>>>>>>>
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>>>>>>>>>> The application reads in the
zipfile and goes through the zip
>>>>>>>>>>> entries, then reloads the file and
goes the zip entries again.  It does so
>>>>>>>>>>> 3 times.  The application never
crushes on the 1st cycle but sometimes
>>>>>>>>>>> crushes on the 2nd or 3rd cycle.
>>>>>>>>>>> The zip file is generated about 20
seconds prior to it being
>>>>>>>>>>> used and is not updated or even
used by any other application.  I have
>>>>>>>>>>> never seen this problem on a plain
file system.
>>>>>>>>>>>
>>>>>>>>>>> I would appreciate any suggestions
on how to go debugging this
>>>>>>>>>>> issue.  I can change the source
code of the java application.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Dmitry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190102/98d12ec5/attachment.html>

Gluster users - Jan 2019 - java application crushes while reading a zip file

[Gluster-users] java application crushes while reading a zip file

[Gluster-users] java application crushes while reading a zip file

[Gluster-users] java application crushes while reading a zip file