Mohammed Rafi K C
2016-Dec-19 12:02 UTC
[Gluster-users] File operation failure on simple distributed volume
Client 0-glusterfs01-client-2 has disconnected from bricks around 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs around the time. Are you there in any of gluster irc channel, if so Have you got a nickname that I can search. Regards Rafi KC On 12/19/2016 04:28 PM, yonex wrote:> Rafi, > > OK. Thanks for your guide. I found the debug log and pasted lines around that. > http://pastebin.com/vhHR6PQN > > Regards > > > 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >> >> On 12/16/2016 09:10 PM, yonex wrote: >>> Rafi, >>> >>> Thanks, the .meta feature I didn't know is very nice. I finally have >>> captured debug logs from a client and bricks. >>> >>> A mount log: >>> - http://pastebin.com/Tjy7wGGj >>> >>> FYI rickdom126 is my client's hostname. >>> >>> Brick logs around that time: >>> - Brick1: http://pastebin.com/qzbVRSF3 >>> - Brick2: http://pastebin.com/j3yMNhP3 >>> - Brick3: http://pastebin.com/m81mVj6L >>> - Brick4: http://pastebin.com/JDAbChf6 >>> - Brick5: http://pastebin.com/7saP6rsm >>> >>> However I could not find any message like "EOF on socket". I hope >>> there is any helpful information in the logs above. >> Indeed. I understand that the connections are in disconnected state. But >> what particularly I'm looking for is the cause of the disconnect, Can >> you paste the debug logs when it start disconnects, and around that. You >> may see a debug logs that says "disconnecting now". >> >> >> Regards >> Rafi KC >> >> >>> Regards. >>> >>> >>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>>> On 12/13/2016 09:56 PM, yonex wrote: >>>>> Hi Rafi, >>>>> >>>>> Thanks for your response. OK, I think it is possible to capture debug >>>>> logs, since the error seems to be reproduced a few times per day. I >>>>> will try that. However, so I want to avoid redundant debug outputs if >>>>> possible, is there a way to enable debug log only on specific client >>>>> nodes? >>>> if you are using fuse mount, there is proc kind of feature called .meta >>>> . You can set log level through that for a particular client [1] . But I >>>> also want log from bricks because I suspect bricks process for >>>> initiating the disconnects. >>>> >>>> >>>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel >>>> >>>>> Regards >>>>> >>>>> Yonex >>>>> >>>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>>>>> Hi Yonex, >>>>>> >>>>>> Is this consistently reproducible ? if so, Can you enable debug log [1] >>>>>> and check for any message similar to [2]. Basically you can even search >>>>>> for "EOF on socket". >>>>>> >>>>>> You can set your log level back to default (INFO) after capturing for >>>>>> some time. >>>>>> >>>>>> >>>>>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >>>>>> gluster volume set <volname> diagnostics.client-log-level DEBUG >>>>>> >>>>>> [2] : http://pastebin.com/xn8QHXWa >>>>>> >>>>>> >>>>>> Regards >>>>>> >>>>>> Rafi KC >>>>>> >>>>>> On 12/12/2016 09:35 PM, yonex wrote: >>>>>>> Hi, >>>>>>> >>>>>>> When my application moves a file from it's local disk to FUSE-mounted >>>>>>> GlusterFS volume, the client outputs many warnings and errors not >>>>>>> always but occasionally. The volume is a simple distributed volume. >>>>>>> >>>>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>>>>>> >>>>>>> It seems to come from something like a network disconnection >>>>>>> ("Transport endpoint is not connected") at a glance, but other >>>>>>> networking applications on the same machine don't observe such a >>>>>>> thing. So I guess there may be a problem somewhere in GlusterFS stack. >>>>>>> >>>>>>> It ended in failing to rename a file, logging PHP Warning like below: >>>>>>> >>>>>>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >>>>>>> to open stream: Input/output error in [snipped].php on line 278 >>>>>>> PHP Warning: >>>>>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>>>>>> Input/output error in [snipped].php on line 278 >>>>>>> >>>>>>> Conditions: >>>>>>> >>>>>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>>>>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>>>>>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>>>>>> - Server machines' OS: CentOS 6. >>>>>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>>>>>> - The number of connected FUSE clients is 260. >>>>>>> - No firewall between connected machines. >>>>>>> - Neither remounting volumes nor rebooting client machines take effect. >>>>>>> - It is caused by not only rename() but also copy() and filesize() operation. >>>>>>> - No outputs in brick logs when it happens. >>>>>>> >>>>>>> Any ideas? I'd appreciate any help. >>>>>>> >>>>>>> Regards. >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
Mohammed Rafi K C
2016-Dec-19 12:03 UTC
[Gluster-users] File operation failure on simple distributed volume
On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote:> Client 0-glusterfs01-client-2 has disconnected from bricks around > 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs > around the time.You can find the brick name and hostname for 0-glusterfs01-client-2 from client graph. Rafi> > Are you there in any of gluster irc channel, if so Have you got a > nickname that I can search. > > Regards > Rafi KC > > On 12/19/2016 04:28 PM, yonex wrote: >> Rafi, >> >> OK. Thanks for your guide. I found the debug log and pasted lines around that. >> http://pastebin.com/vhHR6PQN >> >> Regards >> >> >> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>> On 12/16/2016 09:10 PM, yonex wrote: >>>> Rafi, >>>> >>>> Thanks, the .meta feature I didn't know is very nice. I finally have >>>> captured debug logs from a client and bricks. >>>> >>>> A mount log: >>>> - http://pastebin.com/Tjy7wGGj >>>> >>>> FYI rickdom126 is my client's hostname. >>>> >>>> Brick logs around that time: >>>> - Brick1: http://pastebin.com/qzbVRSF3 >>>> - Brick2: http://pastebin.com/j3yMNhP3 >>>> - Brick3: http://pastebin.com/m81mVj6L >>>> - Brick4: http://pastebin.com/JDAbChf6 >>>> - Brick5: http://pastebin.com/7saP6rsm >>>> >>>> However I could not find any message like "EOF on socket". I hope >>>> there is any helpful information in the logs above. >>> Indeed. I understand that the connections are in disconnected state. But >>> what particularly I'm looking for is the cause of the disconnect, Can >>> you paste the debug logs when it start disconnects, and around that. You >>> may see a debug logs that says "disconnecting now". >>> >>> >>> Regards >>> Rafi KC >>> >>> >>>> Regards. >>>> >>>> >>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>>>> On 12/13/2016 09:56 PM, yonex wrote: >>>>>> Hi Rafi, >>>>>> >>>>>> Thanks for your response. OK, I think it is possible to capture debug >>>>>> logs, since the error seems to be reproduced a few times per day. I >>>>>> will try that. However, so I want to avoid redundant debug outputs if >>>>>> possible, is there a way to enable debug log only on specific client >>>>>> nodes? >>>>> if you are using fuse mount, there is proc kind of feature called .meta >>>>> . You can set log level through that for a particular client [1] . But I >>>>> also want log from bricks because I suspect bricks process for >>>>> initiating the disconnects. >>>>> >>>>> >>>>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel >>>>> >>>>>> Regards >>>>>> >>>>>> Yonex >>>>>> >>>>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>>>>>> Hi Yonex, >>>>>>> >>>>>>> Is this consistently reproducible ? if so, Can you enable debug log [1] >>>>>>> and check for any message similar to [2]. Basically you can even search >>>>>>> for "EOF on socket". >>>>>>> >>>>>>> You can set your log level back to default (INFO) after capturing for >>>>>>> some time. >>>>>>> >>>>>>> >>>>>>> [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and >>>>>>> gluster volume set <volname> diagnostics.client-log-level DEBUG >>>>>>> >>>>>>> [2] : http://pastebin.com/xn8QHXWa >>>>>>> >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> Rafi KC >>>>>>> >>>>>>> On 12/12/2016 09:35 PM, yonex wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> When my application moves a file from it's local disk to FUSE-mounted >>>>>>>> GlusterFS volume, the client outputs many warnings and errors not >>>>>>>> always but occasionally. The volume is a simple distributed volume. >>>>>>>> >>>>>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>>>>>>> >>>>>>>> It seems to come from something like a network disconnection >>>>>>>> ("Transport endpoint is not connected") at a glance, but other >>>>>>>> networking applications on the same machine don't observe such a >>>>>>>> thing. So I guess there may be a problem somewhere in GlusterFS stack. >>>>>>>> >>>>>>>> It ended in failing to rename a file, logging PHP Warning like below: >>>>>>>> >>>>>>>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed >>>>>>>> to open stream: Input/output error in [snipped].php on line 278 >>>>>>>> PHP Warning: >>>>>>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>>>>>>> Input/output error in [snipped].php on line 278 >>>>>>>> >>>>>>>> Conditions: >>>>>>>> >>>>>>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>>>>>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>>>>>>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>>>>>>> - Server machines' OS: CentOS 6. >>>>>>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>>>>>>> - The number of connected FUSE clients is 260. >>>>>>>> - No firewall between connected machines. >>>>>>>> - Neither remounting volumes nor rebooting client machines take effect. >>>>>>>> - It is caused by not only rename() but also copy() and filesize() operation. >>>>>>>> - No outputs in brick logs when it happens. >>>>>>>> >>>>>>>> Any ideas? I'd appreciate any help. >>>>>>>> >>>>>>>> Regards. >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users