Mohammed Rafi K C
2016-Dec-14 06:20 UTC
[Gluster-users] File operation failure on simple distributed volume
On 12/13/2016 09:56 PM, yonex wrote:> Hi Rafi, > > Thanks for your response. OK, I think it is possible to capture debug > logs, since the error seems to be reproduced a few times per day. I > will try that. However, so I want to avoid redundant debug outputs if > possible, is there a way to enable debug log only on specific client > nodes?if you are using fuse mount, there is proc kind of feature called .meta . You can set log level through that for a particular client [1] . But I also want log from bricks because I suspect bricks process for initiating the disconnects. [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel> > Regards > > Yonex > > 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >> Hi Yonex, >> >> Is this consistently reproducible ? if so, Can you enable debug log [1] >> and check for any message similar to [2]. Basically you can even search >> for "EOF on socket". >> >> You can set your log level back to default (INFO) after capturing for >> some time. >> >> >> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >> gluster volume set <volname> diagnostics.client-log-level DEBUG >> >> [2] : http://pastebin.com/xn8QHXWa >> >> >> Regards >> >> Rafi KC >> >> On 12/12/2016 09:35 PM, yonex wrote: >>> Hi, >>> >>> When my application moves a file from it's local disk to FUSE-mounted >>> GlusterFS volume, the client outputs many warnings and errors not >>> always but occasionally. The volume is a simple distributed volume. >>> >>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>> >>> It seems to come from something like a network disconnection >>> ("Transport endpoint is not connected") at a glance, but other >>> networking applications on the same machine don't observe such a >>> thing. So I guess there may be a problem somewhere in GlusterFS stack. >>> >>> It ended in failing to rename a file, logging PHP Warning like below: >>> >>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >>> to open stream: Input/output error in [snipped].php on line 278 >>> PHP Warning: >>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>> Input/output error in [snipped].php on line 278 >>> >>> Conditions: >>> >>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>> - Server machines' OS: CentOS 6. >>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>> - The number of connected FUSE clients is 260. >>> - No firewall between connected machines. >>> - Neither remounting volumes nor rebooting client machines take effect. >>> - It is caused by not only rename() but also copy() and filesize() operation. >>> - No outputs in brick logs when it happens. >>> >>> Any ideas? I'd appreciate any help. >>> >>> Regards. >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users
yonex
2016-Dec-16 15:40 UTC
[Gluster-users] File operation failure on simple distributed volume
Rafi, Thanks, the .meta feature I didn't know is very nice. I finally have captured debug logs from a client and bricks. A mount log: - http://pastebin.com/Tjy7wGGj FYI rickdom126 is my client's hostname. Brick logs around that time: - Brick1: http://pastebin.com/qzbVRSF3 - Brick2: http://pastebin.com/j3yMNhP3 - Brick3: http://pastebin.com/m81mVj6L - Brick4: http://pastebin.com/JDAbChf6 - Brick5: http://pastebin.com/7saP6rsm However I could not find any message like "EOF on socket". I hope there is any helpful information in the logs above. Regards. 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>:> > > On 12/13/2016 09:56 PM, yonex wrote: >> Hi Rafi, >> >> Thanks for your response. OK, I think it is possible to capture debug >> logs, since the error seems to be reproduced a few times per day. I >> will try that. However, so I want to avoid redundant debug outputs if >> possible, is there a way to enable debug log only on specific client >> nodes? > > if you are using fuse mount, there is proc kind of feature called .meta > . You can set log level through that for a particular client [1] . But I > also want log from bricks because I suspect bricks process for > initiating the disconnects. > > > [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel > >> >> Regards >> >> Yonex >> >> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>> Hi Yonex, >>> >>> Is this consistently reproducible ? if so, Can you enable debug log [1] >>> and check for any message similar to [2]. Basically you can even search >>> for "EOF on socket". >>> >>> You can set your log level back to default (INFO) after capturing for >>> some time. >>> >>> >>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >>> gluster volume set <volname> diagnostics.client-log-level DEBUG >>> >>> [2] : http://pastebin.com/xn8QHXWa >>> >>> >>> Regards >>> >>> Rafi KC >>> >>> On 12/12/2016 09:35 PM, yonex wrote: >>>> Hi, >>>> >>>> When my application moves a file from it's local disk to FUSE-mounted >>>> GlusterFS volume, the client outputs many warnings and errors not >>>> always but occasionally. The volume is a simple distributed volume. >>>> >>>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>>> >>>> It seems to come from something like a network disconnection >>>> ("Transport endpoint is not connected") at a glance, but other >>>> networking applications on the same machine don't observe such a >>>> thing. So I guess there may be a problem somewhere in GlusterFS stack. >>>> >>>> It ended in failing to rename a file, logging PHP Warning like below: >>>> >>>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >>>> to open stream: Input/output error in [snipped].php on line 278 >>>> PHP Warning: >>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>>> Input/output error in [snipped].php on line 278 >>>> >>>> Conditions: >>>> >>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>>> - Server machines' OS: CentOS 6. >>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>>> - The number of connected FUSE clients is 260. >>>> - No firewall between connected machines. >>>> - Neither remounting volumes nor rebooting client machines take effect. >>>> - It is caused by not only rename() but also copy() and filesize() operation. >>>> - No outputs in brick logs when it happens. >>>> >>>> Any ideas? I'd appreciate any help. >>>> >>>> Regards. >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://www.gluster.org/mailman/listinfo/gluster-users >