thr3ads.net - Gluster users - [Gluster-users] File operation failure on simple distributed volume [Feb 2017]

If this information is useful, please help other people find it:
Share via:

yonex

2017-Jan-16 05:06 UTC

[Gluster-users] File operation failure on simple distributed volume

I noticed that there is a high throughput degradation while attaching the gdb
script to a glusterfs client process. Write speed becomes 2% or less. It is not
be able to keep thrown in production.

Could you provide the custom build that you mentioned before? I am going to keep
trying to reproduce the problem outside of the production environment.

Regards

2017?1?8? 21:54?Mohammed Rafi K C <rkavunga at redhat.com>:

Is there any update on this ?

Regards

Rafi KC

On 12/24/2016 03:53 PM, yonex wrote:
Rafi,

Thanks again. I will try that and get back to you.

Regards.

2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>:
Hi Yonex,

As we discussed in irc #gluster-devel , I have attached the gdb script
along with this mail.

Procedure to run the gdb script.

1) Install gdb,

2) Download and install gluster debuginfo for your machine . packages
location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757

3) find the process id and attach gdb to the process using the command
gdb attach <pid> -x <path_to_script>

4) Continue running the script till you hit the problem

5) Stop the gdb

6) You will see a file called mylog.txt in the location where you ran
the gdb

Please keep an eye on the attached process. If you have any doubt please
feel free to revert me.

Regards

Rafi KC

On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote:
On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
Client 0-glusterfs01-client-2 has disconnected from bricks around
2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs
around the time.
You can find the brick name and hostname for 0-glusterfs01-client-2 from
client graph.

Rafi

Are you there in any of gluster irc channel, if so Have you got a
nickname that I can search.

Regards
Rafi KC

On 12/19/2016 04:28 PM, yonex wrote:
Rafi,

OK. Thanks for your guide. I found the debug log and pasted lines around that.
http://pastebin.com/vhHR6PQN

Regards

2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>:
On 12/16/2016 09:10 PM, yonex wrote:
Rafi,

Thanks, the .meta feature I didn't know is very nice. I finally have
captured debug logs from a client and bricks.

A mount log:
- http://pastebin.com/Tjy7wGGj

FYI rickdom126 is my client's hostname.

Brick logs around that time:
- Brick1: http://pastebin.com/qzbVRSF3
- Brick2: http://pastebin.com/j3yMNhP3
- Brick3: http://pastebin.com/m81mVj6L
- Brick4: http://pastebin.com/JDAbChf6
- Brick5: http://pastebin.com/7saP6rsm

However I could not find any message like "EOF on socket". I hope
there is any helpful information in the logs above.
Indeed. I understand that the connections are in disconnected state. But
what particularly I'm looking for is the cause of the disconnect, Can
you paste the debug logs when it start disconnects, and around that. You
may see a debug logs that says "disconnecting now".

Regards
Rafi KC

Regards.

2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>:
On 12/13/2016 09:56 PM, yonex wrote:
Hi Rafi,

Thanks for your response. OK, I think it is possible to capture debug
logs, since the error seems to be reproduced a few times per day. I
will try that. However, so I want to avoid redundant debug outputs if
possible, is there a way to enable debug log only on specific client
nodes?
if you are using fuse mount, there is proc kind of feature called .meta
. You can set log level through that for a particular client [1] . But I
also want log from bricks because I suspect bricks process for
initiating the disconnects.

[1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel

Regards

Yonex

2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>:
Hi Yonex,

Is this consistently reproducible ? if so, Can you enable debug log [1]
and check for any message similar to [2]. Basically you can even search
for "EOF on socket".

You can set your log level back to default (INFO) after capturing for
some time.

[1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and
gluster volume set <volname> diagnostics.client-log-level DEBUG

[2] : http://pastebin.com/xn8QHXWa

Regards

Rafi KC

On 12/12/2016 09:35 PM, yonex wrote:
Hi,

When my application moves a file from it's local disk to FUSE-mounted
GlusterFS volume, the client outputs many warnings and errors not
always but occasionally. The volume is a simple distributed volume.

A sample of logs pasted: http://pastebin.com/axkTCRJX

It seems to come from something like a network disconnection
("Transport endpoint is not connected") at a glance, but other
networking applications on the same machine don't observe such a
thing. So I guess there may be a problem somewhere in GlusterFS stack.

It ended in failing to rename a file, logging PHP Warning like below:

PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed
to open stream: Input/output error in [snipped].php on line 278
PHP Warning:
rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
Input/output error in [snipped].php on line 278

Conditions:

- GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo
- Volume info and status pasted: http://pastebin.com/JPt2KeD8
- Client machines' OS: Scientific Linux 6 or CentOS 6.
- Server machines' OS: CentOS 6.
- Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines.
- The number of connected FUSE clients is 260.
- No firewall between connected machines.
- Neither remounting volumes nor rebooting client machines take effect.
- It is caused by not only rename() but also copy() and filesize() operation.
- No outputs in brick logs when it happens.

Any ideas? I'd appreciate any help.

Regards.
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170116/745165b5/attachment.html>

Mohammed Rafi K C

2017-Feb-14 08:19 UTC

head link

[Gluster-users] File operation failure on simple distributed volume

Hi Yonex,

Are you still hitting this issue ?


Regards

Rafi KC


On 01/16/2017 10:36 AM, yonex wrote:> Hi
>
> I noticed that there is a high throughput degradation while attaching
> the gdb script to a glusterfs client process. Write speed becomes 2%
> or less. It is not be able to keep thrown in production.
>
> Could you provide the custom build that you mentioned before? I am
> going to keep trying to reproduce the problem outside of the
> production environment.
>
> Regards
>
> 2017?1?8? 21:54?Mohammed Rafi K C <rkavunga at redhat.com>:
>
>> Is there any update on this ?
>>
>>
>> Regards
>>
>> Rafi KC
>>
>> On 12/24/2016 03:53 PM, yonex wrote:
>>> Rafi,
>>>
>>> Thanks again. I will try that and get back to you.
>>>
>>> Regards.
>>>
>>>
>>> 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavunga at
redhat.com>:
>>>> Hi Yonex,
>>>>
>>>> As we discussed in irc #gluster-devel , I have attached the gdb
script
>>>> along with this mail.
>>>>
>>>> Procedure to run the gdb script.
>>>>
>>>> 1) Install gdb,
>>>>
>>>> 2) Download and install gluster debuginfo for your machine .
packages
>>>> location --- >
https://cbs.centos.org/koji/buildinfo?buildID=12757
>>>>
>>>> 3) find the process id and attach gdb to the process using the
command
>>>> gdb attach <pid> -x <path_to_script>
>>>>
>>>> 4) Continue running the script till you hit the problem
>>>>
>>>> 5) Stop the gdb
>>>>
>>>> 6) You will see a file called mylog.txt in the location where
you ran
>>>> the gdb
>>>>
>>>>
>>>> Please keep an eye on the attached process. If you have any
doubt
>>>> please
>>>> feel free to revert me.
>>>>
>>>> Regards
>>>>
>>>> Rafi KC
>>>>
>>>>
>>>> On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote:
>>>>> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:
>>>>>> Client 0-glusterfs01-client-2 has disconnected from
bricks around
>>>>>> 2016-12-15 11:21:17.854249 . Can you look and/or paste
the brick logs
>>>>>> around the time.
>>>>> You can find the brick name and hostname for
>>>>> 0-glusterfs01-client-2 from
>>>>> client graph.
>>>>>
>>>>> Rafi
>>>>>
>>>>>> Are you there in any of gluster irc channel, if so Have
you got a
>>>>>> nickname that I can search.
>>>>>>
>>>>>> Regards
>>>>>> Rafi KC
>>>>>>
>>>>>> On 12/19/2016 04:28 PM, yonex wrote:
>>>>>>> Rafi,
>>>>>>>
>>>>>>> OK. Thanks for your guide. I found the debug log
and pasted
>>>>>>> lines around that.
>>>>>>> http://pastebin.com/vhHR6PQN
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>>
>>>>>>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C
<rkavunga at redhat.com>:
>>>>>>>> On 12/16/2016 09:10 PM, yonex wrote:
>>>>>>>>> Rafi,
>>>>>>>>>
>>>>>>>>> Thanks, the .meta feature I didn't know
is very nice. I
>>>>>>>>> finally have
>>>>>>>>> captured debug logs from a client and
bricks.
>>>>>>>>>
>>>>>>>>> A mount log:
>>>>>>>>> - http://pastebin.com/Tjy7wGGj
>>>>>>>>>
>>>>>>>>> FYI rickdom126 is my client's hostname.
>>>>>>>>>
>>>>>>>>> Brick logs around that time:
>>>>>>>>> - Brick1: http://pastebin.com/qzbVRSF3
>>>>>>>>> - Brick2: http://pastebin.com/j3yMNhP3
>>>>>>>>> - Brick3: http://pastebin.com/m81mVj6L
>>>>>>>>> - Brick4: http://pastebin.com/JDAbChf6
>>>>>>>>> - Brick5: http://pastebin.com/7saP6rsm
>>>>>>>>>
>>>>>>>>> However I could not find any message like
"EOF on socket". I hope
>>>>>>>>> there is any helpful information in the
logs above.
>>>>>>>> Indeed. I understand that the connections are
in disconnected
>>>>>>>> state. But
>>>>>>>> what particularly I'm looking for is the
cause of the
>>>>>>>> disconnect, Can
>>>>>>>> you paste the debug logs when it start
disconnects, and around
>>>>>>>> that. You
>>>>>>>> may see a debug logs that says
"disconnecting now".
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Rafi KC
>>>>>>>>
>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K
C
>>>>>>>>> <rkavunga at redhat.com>:
>>>>>>>>>> On 12/13/2016 09:56 PM, yonex wrote:
>>>>>>>>>>> Hi Rafi,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your response. OK, I
think it is possible to
>>>>>>>>>>> capture debug
>>>>>>>>>>> logs, since the error seems to be
reproduced a few times per
>>>>>>>>>>> day. I
>>>>>>>>>>> will try that. However, so I want
to avoid redundant debug
>>>>>>>>>>> outputs if
>>>>>>>>>>> possible, is there a way to enable
debug log only on
>>>>>>>>>>> specific client
>>>>>>>>>>> nodes?
>>>>>>>>>> if you are using fuse mount, there is
proc kind of feature
>>>>>>>>>> called .meta
>>>>>>>>>> . You can set log level through that
for a particular client
>>>>>>>>>> [1] . But I
>>>>>>>>>> also want log from bricks because I
suspect bricks process for
>>>>>>>>>> initiating the disconnects.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] eg : echo 8 >
/mnt/glusterfs/.meta/logging/loglevel
>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>> Yonex
>>>>>>>>>>>
>>>>>>>>>>> 2016-12-13 23:33 GMT+09:00 Mohammed
Rafi K C
>>>>>>>>>>> <rkavunga at redhat.com>:
>>>>>>>>>>>> Hi Yonex,
>>>>>>>>>>>>
>>>>>>>>>>>> Is this consistently
reproducible ? if so, Can you enable
>>>>>>>>>>>> debug log [1]
>>>>>>>>>>>> and check for any message
similar to [2]. Basically you can
>>>>>>>>>>>> even search
>>>>>>>>>>>> for "EOF on socket".
>>>>>>>>>>>>
>>>>>>>>>>>> You can set your log level back
to default (INFO) after
>>>>>>>>>>>> capturing for
>>>>>>>>>>>> some time.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1] : gluster volume set
<volname>
>>>>>>>>>>>> diagnostics.brick-log-level
DEBUG and
>>>>>>>>>>>> gluster volume set
<volname> diagnostics.client-log-level DEBUG
>>>>>>>>>>>>
>>>>>>>>>>>> [2] :
http://pastebin.com/xn8QHXWa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>>
>>>>>>>>>>>> Rafi KC
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/12/2016 09:35 PM, yonex
wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> When my application moves a
file from it's local disk to
>>>>>>>>>>>>> FUSE-mounted
>>>>>>>>>>>>> GlusterFS volume, the
client outputs many warnings and
>>>>>>>>>>>>> errors not
>>>>>>>>>>>>> always but occasionally.
The volume is a simple
>>>>>>>>>>>>> distributed volume.
>>>>>>>>>>>>>
>>>>>>>>>>>>> A sample of logs pasted:
http://pastebin.com/axkTCRJX
>>>>>>>>>>>>>
>>>>>>>>>>>>> It seems to come from
something like a network disconnection
>>>>>>>>>>>>> ("Transport endpoint
is not connected") at a glance, but other
>>>>>>>>>>>>> networking applications on
the same machine don't observe
>>>>>>>>>>>>> such a
>>>>>>>>>>>>> thing. So I guess there may
be a problem somewhere in
>>>>>>>>>>>>> GlusterFS stack.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It ended in failing to
rename a file, logging PHP Warning
>>>>>>>>>>>>> like below:
>>>>>>>>>>>>>
>>>>>>>>>>>>> PHP Warning:
rename(/glusterfs01/db1/stack/f0/13a9a2f0):
>>>>>>>>>>>>> failed
>>>>>>>>>>>>> to open stream:
Input/output error in [snipped].php on
>>>>>>>>>>>>> line 278
>>>>>>>>>>>>> PHP Warning:
>>>>>>>>>>>>>
rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0):
>>>>>>>>>>>>> Input/output error in
[snipped].php on line 278
>>>>>>>>>>>>>
>>>>>>>>>>>>> Conditions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - GlusterFS 3.8.5 installed
via yum CentOS-Gluster-3.8.repo
>>>>>>>>>>>>> - Volume info and status
pasted: http://pastebin.com/JPt2KeD8
>>>>>>>>>>>>> - Client machines' OS:
Scientific Linux 6 or CentOS 6.
>>>>>>>>>>>>> - Server machines' OS:
CentOS 6.
>>>>>>>>>>>>> - Kernel version is
2.6.32-642.6.2.el6.x86_64 on all machines.
>>>>>>>>>>>>> - The number of connected
FUSE clients is 260.
>>>>>>>>>>>>> - No firewall between
connected machines.
>>>>>>>>>>>>> - Neither remounting
volumes nor rebooting client machines
>>>>>>>>>>>>> take effect.
>>>>>>>>>>>>> - It is caused by not only
rename() but also copy() and
>>>>>>>>>>>>> filesize() operation.
>>>>>>>>>>>>> - No outputs in brick logs
when it happens.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any ideas? I'd
appreciate any help.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards.
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at
gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170214/c5a890a4/attachment.html>

Gluster users - Feb 2017 - File operation failure on simple distributed volume

[Gluster-users] File operation failure on simple distributed volume

[Gluster-users] File operation failure on simple distributed volume