thr3ads.net - Gluster users - [Gluster-users] File operation failure on simple distributed volume [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Mohammed Rafi K C

2017-Feb-14 08:19 UTC

[Gluster-users] File operation failure on simple distributed volume

Hi Yonex,

Are you still hitting this issue ?


Regards

Rafi KC


On 01/16/2017 10:36 AM, yonex wrote:> Hi
>
> I noticed that there is a high throughput degradation while attaching
> the gdb script to a glusterfs client process. Write speed becomes 2%
> or less. It is not be able to keep thrown in production.
>
> Could you provide the custom build that you mentioned before? I am
> going to keep trying to reproduce the problem outside of the
> production environment.
>
> Regards
>
> 2017?1?8? 21:54?Mohammed Rafi K C <rkavunga at redhat.com>:
>
>> Is there any update on this ?
>>
>>
>> Regards
>>
>> Rafi KC
>>
>> On 12/24/2016 03:53 PM, yonex wrote:
>>> Rafi,
>>>
>>> Thanks again. I will try that and get back to you.
>>>
>>> Regards.
>>>
>>>
>>> 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavunga at
redhat.com>:
>>>> Hi Yonex,
>>>>
>>>> As we discussed in irc #gluster-devel , I have attached the gdb
script
>>>> along with this mail.
>>>>
>>>> Procedure to run the gdb script.
>>>>
>>>> 1) Install gdb,
>>>>
>>>> 2) Download and install gluster debuginfo for your machine .
packages
>>>> location --- >
https://cbs.centos.org/koji/buildinfo?buildID=12757
>>>>
>>>> 3) find the process id and attach gdb to the process using the
command
>>>> gdb attach <pid> -x <path_to_script>
>>>>
>>>> 4) Continue running the script till you hit the problem
>>>>
>>>> 5) Stop the gdb
>>>>
>>>> 6) You will see a file called mylog.txt in the location where
you ran
>>>> the gdb
>>>>
>>>>
>>>> Please keep an eye on the attached process. If you have any
doubt
>>>> please
>>>> feel free to revert me.
>>>>
>>>> Regards
>>>>
>>>> Rafi KC
>>>>
>>>>
>>>> On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote:
>>>>> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
>>>>>> Client 0-glusterfs01-client-2 has disconnected from
bricks around
>>>>>> 2016-12-15 11:21:17.854249 . Can you look and/or paste
the brick logs
>>>>>> around the time.
>>>>> You can find the brick name and hostname for
>>>>> 0-glusterfs01-client-2 from
>>>>> client graph.
>>>>>
>>>>> Rafi
>>>>>
>>>>>> Are you there in any of gluster irc channel, if so Have
you got a
>>>>>> nickname that I can search.
>>>>>>
>>>>>> Regards
>>>>>> Rafi KC
>>>>>>
>>>>>> On 12/19/2016 04:28 PM, yonex wrote:
>>>>>>> Rafi,
>>>>>>>
>>>>>>> OK. Thanks for your guide. I found the debug log
and pasted
>>>>>>> lines around that.
>>>>>>> http://pastebin.com/vhHR6PQN
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>>
>>>>>>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C
<rkavunga at redhat.com>:
>>>>>>>> On 12/16/2016 09:10 PM, yonex wrote:
>>>>>>>>> Rafi,
>>>>>>>>>
>>>>>>>>> Thanks, the .meta feature I didn't know
is very nice. I
>>>>>>>>> finally have
>>>>>>>>> captured debug logs from a client and
bricks.
>>>>>>>>>
>>>>>>>>> A mount log:
>>>>>>>>> - http://pastebin.com/Tjy7wGGj
>>>>>>>>>
>>>>>>>>> FYI rickdom126 is my client's hostname.
>>>>>>>>>
>>>>>>>>> Brick logs around that time:
>>>>>>>>> - Brick1: http://pastebin.com/qzbVRSF3
>>>>>>>>> - Brick2: http://pastebin.com/j3yMNhP3
>>>>>>>>> - Brick3: http://pastebin.com/m81mVj6L
>>>>>>>>> - Brick4: http://pastebin.com/JDAbChf6
>>>>>>>>> - Brick5: http://pastebin.com/7saP6rsm
>>>>>>>>>
>>>>>>>>> However I could not find any message like
"EOF on socket". I hope
>>>>>>>>> there is any helpful information in the
logs above.
>>>>>>>> Indeed. I understand that the connections are
in disconnected
>>>>>>>> state. But
>>>>>>>> what particularly I'm looking for is the
cause of the
>>>>>>>> disconnect, Can
>>>>>>>> you paste the debug logs when it start
disconnects, and around
>>>>>>>> that. You
>>>>>>>> may see a debug logs that says
"disconnecting now".
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Rafi KC
>>>>>>>>
>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K
C
>>>>>>>>> <rkavunga at redhat.com>:
>>>>>>>>>> On 12/13/2016 09:56 PM, yonex wrote:
>>>>>>>>>>> Hi Rafi,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your response. OK, I
think it is possible to
>>>>>>>>>>> capture debug
>>>>>>>>>>> logs, since the error seems to be
reproduced a few times per
>>>>>>>>>>> day. I
>>>>>>>>>>> will try that. However, so I want
to avoid redundant debug
>>>>>>>>>>> outputs if
>>>>>>>>>>> possible, is there a way to enable
debug log only on
>>>>>>>>>>> specific client
>>>>>>>>>>> nodes?
>>>>>>>>>> if you are using fuse mount, there is
proc kind of feature
>>>>>>>>>> called .meta
>>>>>>>>>> . You can set log level through that
for a particular client
>>>>>>>>>> [1] . But I
>>>>>>>>>> also want log from bricks because I
suspect bricks process for
>>>>>>>>>> initiating the disconnects.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] eg : echo 8 >
/mnt/glusterfs/.meta/logging/loglevel
>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>> Yonex
>>>>>>>>>>>
>>>>>>>>>>> 2016-12-13 23:33 GMT+09:00 Mohammed
Rafi K C
>>>>>>>>>>> <rkavunga at redhat.com>:
>>>>>>>>>>>> Hi Yonex,
>>>>>>>>>>>>
>>>>>>>>>>>> Is this consistently
reproducible ? if so, Can you enable
>>>>>>>>>>>> debug log [1]
>>>>>>>>>>>> and check for any message
similar to [2]. Basically you can
>>>>>>>>>>>> even search
>>>>>>>>>>>> for "EOF on socket".
>>>>>>>>>>>>
>>>>>>>>>>>> You can set your log level back
to default (INFO) after
>>>>>>>>>>>> capturing for
>>>>>>>>>>>> some time.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1] : gluster volume set
<volname>
>>>>>>>>>>>> diagnostics.brick-log-level
DEBUG and
>>>>>>>>>>>> gluster volume set
<volname> diagnostics.client-log-level DEBUG
>>>>>>>>>>>>
>>>>>>>>>>>> [2] :
http://pastebin.com/xn8QHXWa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>>
>>>>>>>>>>>> Rafi KC
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/12/2016 09:35 PM, yonex
wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> When my application moves a
file from it's local disk to
>>>>>>>>>>>>> FUSE-mounted
>>>>>>>>>>>>> GlusterFS volume, the
client outputs many warnings and
>>>>>>>>>>>>> errors not
>>>>>>>>>>>>> always but occasionally.
The volume is a simple
>>>>>>>>>>>>> distributed volume.
>>>>>>>>>>>>>
>>>>>>>>>>>>> A sample of logs pasted:
http://pastebin.com/axkTCRJX
>>>>>>>>>>>>>
>>>>>>>>>>>>> It seems to come from
something like a network disconnection
>>>>>>>>>>>>> ("Transport endpoint
is not connected") at a glance, but other
>>>>>>>>>>>>> networking applications on
the same machine don't observe
>>>>>>>>>>>>> such a
>>>>>>>>>>>>> thing. So I guess there may
be a problem somewhere in
>>>>>>>>>>>>> GlusterFS stack.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It ended in failing to
rename a file, logging PHP Warning
>>>>>>>>>>>>> like below:
>>>>>>>>>>>>>
>>>>>>>>>>>>> PHP Warning:
rename(/glusterfs01/db1/stack/f0/13a9a2f0):
>>>>>>>>>>>>> failed
>>>>>>>>>>>>> to open stream:
Input/output error in [snipped].php on
>>>>>>>>>>>>> line 278
>>>>>>>>>>>>> PHP Warning:
>>>>>>>>>>>>>
rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
>>>>>>>>>>>>> Input/output error in
[snipped].php on line 278
>>>>>>>>>>>>>
>>>>>>>>>>>>> Conditions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - GlusterFS 3.8.5 installed
via yum CentOS-Gluster-3.8.repo
>>>>>>>>>>>>> - Volume info and status
pasted: http://pastebin.com/JPt2KeD8
>>>>>>>>>>>>> - Client machines' OS:
Scientific Linux 6 or CentOS 6.
>>>>>>>>>>>>> - Server machines' OS:
CentOS 6.
>>>>>>>>>>>>> - Kernel version is
2.6.32-642.6.2.el6.x86_64 on all machines.
>>>>>>>>>>>>> - The number of connected
FUSE clients is 260.
>>>>>>>>>>>>> - No firewall between
connected machines.
>>>>>>>>>>>>> - Neither remounting
volumes nor rebooting client machines
>>>>>>>>>>>>> take effect.
>>>>>>>>>>>>> - It is caused by not only
rename() but also copy() and
>>>>>>>>>>>>> filesize() operation.
>>>>>>>>>>>>> - No outputs in brick logs
when it happens.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any ideas? I'd
appreciate any help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards.
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at
gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170214/c5a890a4/attachment.html>

yonex

2017-Feb-16 15:43 UTC

head link

[Gluster-users] File operation failure on simple distributed volume

Hi Rafi,

I'm still on this issue. But reproduction has not yet been achieved
outside of production. In production environment, I have made
applications stop writing data to glusterfs volume. Only read
operations are going.

P.S. It seems that I have corrupted the email thread..;-(
http://lists.gluster.org/pipermail/gluster-users/2017-January/029679.html

2017-02-14 17:19 GMT+09:00 Mohammed Rafi K C <rkavunga at
redhat.com>:> Hi Yonex,
>
> Are you still hitting this issue ?
>
>
> Regards
>
> Rafi KC
>
>
> On 01/16/2017 10:36 AM, yonex wrote:
>
> Hi
>
> I noticed that there is a high throughput degradation while attaching the
> gdb script to a glusterfs client process. Write speed becomes 2% or less.
It
> is not be able to keep thrown in production.
>
> Could you provide the custom build that you mentioned before? I am going to
> keep trying to reproduce the problem outside of the production environment.
>
> Regards
>
> 2017?1?8? 21:54?Mohammed Rafi K C <rkavunga at redhat.com>:
>
> Is there any update on this ?
>
>
> Regards
>
> Rafi KC
>
> On 12/24/2016 03:53 PM, yonex wrote:
>
> Rafi,
>
>
> Thanks again. I will try that and get back to you.
>
>
> Regards.
>
>
>
> 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavunga at
redhat.com>:
>
> Hi Yonex,
>
>
> As we discussed in irc #gluster-devel , I have attached the gdb script
>
> along with this mail.
>
>
> Procedure to run the gdb script.
>
>
> 1) Install gdb,
>
>
> 2) Download and install gluster debuginfo for your machine . packages
>
> location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757
>
>
> 3) find the process id and attach gdb to the process using the command
>
> gdb attach <pid> -x <path_to_script>
>
>
> 4) Continue running the script till you hit the problem
>
>
> 5) Stop the gdb
>
>
> 6) You will see a file called mylog.txt in the location where you ran
>
> the gdb
>
>
>
> Please keep an eye on the attached process. If you have any doubt please
>
> feel free to revert me.
>
>
> Regards
>
>
> Rafi KC
>
>
>
> On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote:
>
> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
>
> Client 0-glusterfs01-client-2 has disconnected from bricks around
>
> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs
>
> around the time.
>
> You can find the brick name and hostname for 0-glusterfs01-client-2 from
>
> client graph.
>
>
> Rafi
>
>
> Are you there in any of gluster irc channel, if so Have you got a
>
> nickname that I can search.
>
>
> Regards
>
> Rafi KC
>
>
> On 12/19/2016 04:28 PM, yonex wrote:
>
> Rafi,
>
>
> OK. Thanks for your guide. I found the debug log and pasted lines around
> that.
>
> http://pastebin.com/vhHR6PQN
>
>
> Regards
>
>
>
> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga at
redhat.com>:
>
> On 12/16/2016 09:10 PM, yonex wrote:
>
> Rafi,
>
>
> Thanks, the .meta feature I didn't know is very nice. I finally have
>
> captured debug logs from a client and bricks.
>
>
> A mount log:
>
> - http://pastebin.com/Tjy7wGGj
>
>
> FYI rickdom126 is my client's hostname.
>
>
> Brick logs around that time:
>
> - Brick1: http://pastebin.com/qzbVRSF3
>
> - Brick2: http://pastebin.com/j3yMNhP3
>
> - Brick3: http://pastebin.com/m81mVj6L
>
> - Brick4: http://pastebin.com/JDAbChf6
>
> - Brick5: http://pastebin.com/7saP6rsm
>
>
> However I could not find any message like "EOF on socket". I hope
>
> there is any helpful information in the logs above.
>
> Indeed. I understand that the connections are in disconnected state. But
>
> what particularly I'm looking for is the cause of the disconnect, Can
>
> you paste the debug logs when it start disconnects, and around that. You
>
> may see a debug logs that says "disconnecting now".
>
>
>
> Regards
>
> Rafi KC
>
>
>
> Regards.
>
>
>
> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga at
redhat.com>:
>
> On 12/13/2016 09:56 PM, yonex wrote:
>
> Hi Rafi,
>
>
> Thanks for your response. OK, I think it is possible to capture debug
>
> logs, since the error seems to be reproduced a few times per day. I
>
> will try that. However, so I want to avoid redundant debug outputs if
>
> possible, is there a way to enable debug log only on specific client
>
> nodes?
>
> if you are using fuse mount, there is proc kind of feature called .meta
>
> . You can set log level through that for a particular client [1] . But I
>
> also want log from bricks because I suspect bricks process for
>
> initiating the disconnects.
>
>
>
> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel
>
>
> Regards
>
>
> Yonex
>
>
> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga at
redhat.com>:
>
> Hi Yonex,
>
>
> Is this consistently reproducible ? if so, Can you enable debug log [1]
>
> and check for any message similar to [2]. Basically you can even search
>
> for "EOF on socket".
>
>
> You can set your log level back to default (INFO) after capturing for
>
> some time.
>
>
>
> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG
and
>
> gluster volume set <volname> diagnostics.client-log-level DEBUG
>
>
> [2] : http://pastebin.com/xn8QHXWa
>
>
>
> Regards
>
>
> Rafi KC
>
>
> On 12/12/2016 09:35 PM, yonex wrote:
>
> Hi,
>
>
> When my application moves a file from it's local disk to FUSE-mounted
>
> GlusterFS volume, the client outputs many warnings and errors not
>
> always but occasionally. The volume is a simple distributed volume.
>
>
> A sample of logs pasted: http://pastebin.com/axkTCRJX
>
>
> It seems to come from something like a network disconnection
>
> ("Transport endpoint is not connected") at a glance, but other
>
> networking applications on the same machine don't observe such a
>
> thing. So I guess there may be a problem somewhere in GlusterFS stack.
>
>
> It ended in failing to rename a file, logging PHP Warning like below:
>
>
> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed
>
> to open stream: Input/output error in [snipped].php on line 278
>
> PHP Warning:
>
> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
>
> Input/output error in [snipped].php on line 278
>
>
> Conditions:
>
>
> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo
>
> - Volume info and status pasted: http://pastebin.com/JPt2KeD8
>
> - Client machines' OS: Scientific Linux 6 or CentOS 6.
>
> - Server machines' OS: CentOS 6.
>
> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines.
>
> - The number of connected FUSE clients is 260.
>
> - No firewall between connected machines.
>
> - Neither remounting volumes nor rebooting client machines take effect.
>
> - It is caused by not only rename() but also copy() and filesize()
> operation.
>
> - No outputs in brick logs when it happens.
>
>
> Any ideas? I'd appreciate any help.
>
>
> Regards.
>
> _______________________________________________
>
> Gluster-users mailing list
>
> Gluster-users at gluster.org
>
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>

Gluster users - Feb 2017 - File operation failure on simple distributed volume

[Gluster-users] File operation failure on simple distributed volume

[Gluster-users] File operation failure on simple distributed volume