yonex
2016-Dec-16 15:40 UTC
[Gluster-users] File operation failure on simple distributed volume
Rafi, Thanks, the .meta feature I didn't know is very nice. I finally have captured debug logs from a client and bricks. A mount log: - http://pastebin.com/Tjy7wGGj FYI rickdom126 is my client's hostname. Brick logs around that time: - Brick1: http://pastebin.com/qzbVRSF3 - Brick2: http://pastebin.com/j3yMNhP3 - Brick3: http://pastebin.com/m81mVj6L - Brick4: http://pastebin.com/JDAbChf6 - Brick5: http://pastebin.com/7saP6rsm However I could not find any message like "EOF on socket". I hope there is any helpful information in the logs above. Regards. 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>:> > > On 12/13/2016 09:56 PM, yonex wrote: >> Hi Rafi, >> >> Thanks for your response. OK, I think it is possible to capture debug >> logs, since the error seems to be reproduced a few times per day. I >> will try that. However, so I want to avoid redundant debug outputs if >> possible, is there a way to enable debug log only on specific client >> nodes? > > if you are using fuse mount, there is proc kind of feature called .meta > . You can set log level through that for a particular client [1] . But I > also want log from bricks because I suspect bricks process for > initiating the disconnects. > > > [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel > >> >> Regards >> >> Yonex >> >> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>> Hi Yonex, >>> >>> Is this consistently reproducible ? if so, Can you enable debug log [1] >>> and check for any message similar to [2]. Basically you can even search >>> for "EOF on socket". >>> >>> You can set your log level back to default (INFO) after capturing for >>> some time. >>> >>> >>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >>> gluster volume set <volname> diagnostics.client-log-level DEBUG >>> >>> [2] : http://pastebin.com/xn8QHXWa >>> >>> >>> Regards >>> >>> Rafi KC >>> >>> On 12/12/2016 09:35 PM, yonex wrote: >>>> Hi, >>>> >>>> When my application moves a file from it's local disk to FUSE-mounted >>>> GlusterFS volume, the client outputs many warnings and errors not >>>> always but occasionally. The volume is a simple distributed volume. >>>> >>>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>>> >>>> It seems to come from something like a network disconnection >>>> ("Transport endpoint is not connected") at a glance, but other >>>> networking applications on the same machine don't observe such a >>>> thing. So I guess there may be a problem somewhere in GlusterFS stack. >>>> >>>> It ended in failing to rename a file, logging PHP Warning like below: >>>> >>>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >>>> to open stream: Input/output error in [snipped].php on line 278 >>>> PHP Warning: >>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>>> Input/output error in [snipped].php on line 278 >>>> >>>> Conditions: >>>> >>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>>> - Server machines' OS: CentOS 6. >>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>>> - The number of connected FUSE clients is 260. >>>> - No firewall between connected machines. >>>> - Neither remounting volumes nor rebooting client machines take effect. >>>> - It is caused by not only rename() but also copy() and filesize() operation. >>>> - No outputs in brick logs when it happens. >>>> >>>> Any ideas? I'd appreciate any help. >>>> >>>> Regards. >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://www.gluster.org/mailman/listinfo/gluster-users >
Mohammed Rafi K C
2016-Dec-19 05:58 UTC
[Gluster-users] File operation failure on simple distributed volume
On 12/16/2016 09:10 PM, yonex wrote:> Rafi, > > Thanks, the .meta feature I didn't know is very nice. I finally have > captured debug logs from a client and bricks. > > A mount log: > - http://pastebin.com/Tjy7wGGj > > FYI rickdom126 is my client's hostname. > > Brick logs around that time: > - Brick1: http://pastebin.com/qzbVRSF3 > - Brick2: http://pastebin.com/j3yMNhP3 > - Brick3: http://pastebin.com/m81mVj6L > - Brick4: http://pastebin.com/JDAbChf6 > - Brick5: http://pastebin.com/7saP6rsm > > However I could not find any message like "EOF on socket". I hope > there is any helpful information in the logs above.Indeed. I understand that the connections are in disconnected state. But what particularly I'm looking for is the cause of the disconnect, Can you paste the debug logs when it start disconnects, and around that. You may see a debug logs that says "disconnecting now". Regards Rafi KC> > Regards. > > > 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >> >> On 12/13/2016 09:56 PM, yonex wrote: >>> Hi Rafi, >>> >>> Thanks for your response. OK, I think it is possible to capture debug >>> logs, since the error seems to be reproduced a few times per day. I >>> will try that. However, so I want to avoid redundant debug outputs if >>> possible, is there a way to enable debug log only on specific client >>> nodes? >> if you are using fuse mount, there is proc kind of feature called .meta >> . You can set log level through that for a particular client [1] . But I >> also want log from bricks because I suspect bricks process for >> initiating the disconnects. >> >> >> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel >> >>> Regards >>> >>> Yonex >>> >>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>>> Hi Yonex, >>>> >>>> Is this consistently reproducible ? if so, Can you enable debug log [1] >>>> and check for any message similar to [2]. Basically you can even search >>>> for "EOF on socket". >>>> >>>> You can set your log level back to default (INFO) after capturing for >>>> some time. >>>> >>>> >>>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >>>> gluster volume set <volname> diagnostics.client-log-level DEBUG >>>> >>>> [2] : http://pastebin.com/xn8QHXWa >>>> >>>> >>>> Regards >>>> >>>> Rafi KC >>>> >>>> On 12/12/2016 09:35 PM, yonex wrote: >>>>> Hi, >>>>> >>>>> When my application moves a file from it's local disk to FUSE-mounted >>>>> GlusterFS volume, the client outputs many warnings and errors not >>>>> always but occasionally. The volume is a simple distributed volume. >>>>> >>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>>>> >>>>> It seems to come from something like a network disconnection >>>>> ("Transport endpoint is not connected") at a glance, but other >>>>> networking applications on the same machine don't observe such a >>>>> thing. So I guess there may be a problem somewhere in GlusterFS stack. >>>>> >>>>> It ended in failing to rename a file, logging PHP Warning like below: >>>>> >>>>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >>>>> to open stream: Input/output error in [snipped].php on line 278 >>>>> PHP Warning: >>>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>>>> Input/output error in [snipped].php on line 278 >>>>> >>>>> Conditions: >>>>> >>>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>>>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>>>> - Server machines' OS: CentOS 6. >>>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>>>> - The number of connected FUSE clients is 260. >>>>> - No firewall between connected machines. >>>>> - Neither remounting volumes nor rebooting client machines take effect. >>>>> - It is caused by not only rename() but also copy() and filesize() operation. >>>>> - No outputs in brick logs when it happens. >>>>> >>>>> Any ideas? I'd appreciate any help. >>>>> >>>>> Regards. >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://www.gluster.org/mailman/listinfo/gluster-users