Mohammed Rafi K C
2017-Feb-14 08:19 UTC
[Gluster-users] File operation failure on simple distributed volume
Hi Yonex, Are you still hitting this issue ? Regards Rafi KC On 01/16/2017 10:36 AM, yonex wrote:> Hi > > I noticed that there is a high throughput degradation while attaching > the gdb script to a glusterfs client process. Write speed becomes 2% > or less. It is not be able to keep thrown in production. > > Could you provide the custom build that you mentioned before? I am > going to keep trying to reproduce the problem outside of the > production environment. > > Regards > > 2017?1?8? 21:54?Mohammed Rafi K C <rkavunga at redhat.com>: > >> Is there any update on this ? >> >> >> Regards >> >> Rafi KC >> >> On 12/24/2016 03:53 PM, yonex wrote: >>> Rafi, >>> >>> Thanks again. I will try that and get back to you. >>> >>> Regards. >>> >>> >>> 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>>> Hi Yonex, >>>> >>>> As we discussed in irc #gluster-devel , I have attached the gdb script >>>> along with this mail. >>>> >>>> Procedure to run the gdb script. >>>> >>>> 1) Install gdb, >>>> >>>> 2) Download and install gluster debuginfo for your machine . packages >>>> location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757 >>>> >>>> 3) find the process id and attach gdb to the process using the command >>>> gdb attach <pid> -x <path_to_script> >>>> >>>> 4) Continue running the script till you hit the problem >>>> >>>> 5) Stop the gdb >>>> >>>> 6) You will see a file called mylog.txt in the location where you ran >>>> the gdb >>>> >>>> >>>> Please keep an eye on the attached process. If you have any doubt >>>> please >>>> feel free to revert me. >>>> >>>> Regards >>>> >>>> Rafi KC >>>> >>>> >>>> On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote: >>>>> On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote: >>>>>> Client 0-glusterfs01-client-2 has disconnected from bricks around >>>>>> 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs >>>>>> around the time. >>>>> You can find the brick name and hostname for >>>>> 0-glusterfs01-client-2 from >>>>> client graph. >>>>> >>>>> Rafi >>>>> >>>>>> Are you there in any of gluster irc channel, if so Have you got a >>>>>> nickname that I can search. >>>>>> >>>>>> Regards >>>>>> Rafi KC >>>>>> >>>>>> On 12/19/2016 04:28 PM, yonex wrote: >>>>>>> Rafi, >>>>>>> >>>>>>> OK. Thanks for your guide. I found the debug log and pasted >>>>>>> lines around that. >>>>>>> http://pastebin.com/vhHR6PQN >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> >>>>>>> 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: >>>>>>>> On 12/16/2016 09:10 PM, yonex wrote: >>>>>>>>> Rafi, >>>>>>>>> >>>>>>>>> Thanks, the .meta feature I didn't know is very nice. I >>>>>>>>> finally have >>>>>>>>> captured debug logs from a client and bricks. >>>>>>>>> >>>>>>>>> A mount log: >>>>>>>>> - http://pastebin.com/Tjy7wGGj >>>>>>>>> >>>>>>>>> FYI rickdom126 is my client's hostname. >>>>>>>>> >>>>>>>>> Brick logs around that time: >>>>>>>>> - Brick1: http://pastebin.com/qzbVRSF3 >>>>>>>>> - Brick2: http://pastebin.com/j3yMNhP3 >>>>>>>>> - Brick3: http://pastebin.com/m81mVj6L >>>>>>>>> - Brick4: http://pastebin.com/JDAbChf6 >>>>>>>>> - Brick5: http://pastebin.com/7saP6rsm >>>>>>>>> >>>>>>>>> However I could not find any message like "EOF on socket". I hope >>>>>>>>> there is any helpful information in the logs above. >>>>>>>> Indeed. I understand that the connections are in disconnected >>>>>>>> state. But >>>>>>>> what particularly I'm looking for is the cause of the >>>>>>>> disconnect, Can >>>>>>>> you paste the debug logs when it start disconnects, and around >>>>>>>> that. You >>>>>>>> may see a debug logs that says "disconnecting now". >>>>>>>> >>>>>>>> >>>>>>>> Regards >>>>>>>> Rafi KC >>>>>>>> >>>>>>>> >>>>>>>>> Regards. >>>>>>>>> >>>>>>>>> >>>>>>>>> 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C >>>>>>>>> <rkavunga at redhat.com>: >>>>>>>>>> On 12/13/2016 09:56 PM, yonex wrote: >>>>>>>>>>> Hi Rafi, >>>>>>>>>>> >>>>>>>>>>> Thanks for your response. OK, I think it is possible to >>>>>>>>>>> capture debug >>>>>>>>>>> logs, since the error seems to be reproduced a few times per >>>>>>>>>>> day. I >>>>>>>>>>> will try that. However, so I want to avoid redundant debug >>>>>>>>>>> outputs if >>>>>>>>>>> possible, is there a way to enable debug log only on >>>>>>>>>>> specific client >>>>>>>>>>> nodes? >>>>>>>>>> if you are using fuse mount, there is proc kind of feature >>>>>>>>>> called .meta >>>>>>>>>> . You can set log level through that for a particular client >>>>>>>>>> [1] . But I >>>>>>>>>> also want log from bricks because I suspect bricks process for >>>>>>>>>> initiating the disconnects. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel >>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> >>>>>>>>>>> Yonex >>>>>>>>>>> >>>>>>>>>>> 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C >>>>>>>>>>> <rkavunga at redhat.com>: >>>>>>>>>>>> Hi Yonex, >>>>>>>>>>>> >>>>>>>>>>>> Is this consistently reproducible ? if so, Can you enable >>>>>>>>>>>> debug log [1] >>>>>>>>>>>> and check for any message similar to [2]. Basically you can >>>>>>>>>>>> even search >>>>>>>>>>>> for "EOF on socket". >>>>>>>>>>>> >>>>>>>>>>>> You can set your log level back to default (INFO) after >>>>>>>>>>>> capturing for >>>>>>>>>>>> some time. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [1] : gluster volume set <volname> >>>>>>>>>>>> diagnostics.brick-log-level DEBUG and >>>>>>>>>>>> gluster volume set <volname> diagnostics.client-log-level DEBUG >>>>>>>>>>>> >>>>>>>>>>>> [2] : http://pastebin.com/xn8QHXWa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> >>>>>>>>>>>> Rafi KC >>>>>>>>>>>> >>>>>>>>>>>> On 12/12/2016 09:35 PM, yonex wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> When my application moves a file from it's local disk to >>>>>>>>>>>>> FUSE-mounted >>>>>>>>>>>>> GlusterFS volume, the client outputs many warnings and >>>>>>>>>>>>> errors not >>>>>>>>>>>>> always but occasionally. The volume is a simple >>>>>>>>>>>>> distributed volume. >>>>>>>>>>>>> >>>>>>>>>>>>> A sample of logs pasted: http://pastebin.com/axkTCRJX >>>>>>>>>>>>> >>>>>>>>>>>>> It seems to come from something like a network disconnection >>>>>>>>>>>>> ("Transport endpoint is not connected") at a glance, but other >>>>>>>>>>>>> networking applications on the same machine don't observe >>>>>>>>>>>>> such a >>>>>>>>>>>>> thing. So I guess there may be a problem somewhere in >>>>>>>>>>>>> GlusterFS stack. >>>>>>>>>>>>> >>>>>>>>>>>>> It ended in failing to rename a file, logging PHP Warning >>>>>>>>>>>>> like below: >>>>>>>>>>>>> >>>>>>>>>>>>> PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): >>>>>>>>>>>>> failed >>>>>>>>>>>>> to open stream: Input/output error in [snipped].php on >>>>>>>>>>>>> line 278 >>>>>>>>>>>>> PHP Warning: >>>>>>>>>>>>> rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): >>>>>>>>>>>>> Input/output error in [snipped].php on line 278 >>>>>>>>>>>>> >>>>>>>>>>>>> Conditions: >>>>>>>>>>>>> >>>>>>>>>>>>> - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo >>>>>>>>>>>>> - Volume info and status pasted: http://pastebin.com/JPt2KeD8 >>>>>>>>>>>>> - Client machines' OS: Scientific Linux 6 or CentOS 6. >>>>>>>>>>>>> - Server machines' OS: CentOS 6. >>>>>>>>>>>>> - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. >>>>>>>>>>>>> - The number of connected FUSE clients is 260. >>>>>>>>>>>>> - No firewall between connected machines. >>>>>>>>>>>>> - Neither remounting volumes nor rebooting client machines >>>>>>>>>>>>> take effect. >>>>>>>>>>>>> - It is caused by not only rename() but also copy() and >>>>>>>>>>>>> filesize() operation. >>>>>>>>>>>>> - No outputs in brick logs when it happens. >>>>>>>>>>>>> >>>>>>>>>>>>> Any ideas? I'd appreciate any help. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards. >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170214/c5a890a4/attachment.html>
yonex
2017-Feb-16 15:43 UTC
[Gluster-users] File operation failure on simple distributed volume
Hi Rafi, I'm still on this issue. But reproduction has not yet been achieved outside of production. In production environment, I have made applications stop writing data to glusterfs volume. Only read operations are going. P.S. It seems that I have corrupted the email thread..;-( http://lists.gluster.org/pipermail/gluster-users/2017-January/029679.html 2017-02-14 17:19 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>:> Hi Yonex, > > Are you still hitting this issue ? > > > Regards > > Rafi KC > > > On 01/16/2017 10:36 AM, yonex wrote: > > Hi > > I noticed that there is a high throughput degradation while attaching the > gdb script to a glusterfs client process. Write speed becomes 2% or less. It > is not be able to keep thrown in production. > > Could you provide the custom build that you mentioned before? I am going to > keep trying to reproduce the problem outside of the production environment. > > Regards > > 2017?1?8? 21:54?Mohammed Rafi K C <rkavunga at redhat.com>: > > Is there any update on this ? > > > Regards > > Rafi KC > > On 12/24/2016 03:53 PM, yonex wrote: > > Rafi, > > > Thanks again. I will try that and get back to you. > > > Regards. > > > > 2016-12-23 18:03 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: > > Hi Yonex, > > > As we discussed in irc #gluster-devel , I have attached the gdb script > > along with this mail. > > > Procedure to run the gdb script. > > > 1) Install gdb, > > > 2) Download and install gluster debuginfo for your machine . packages > > location --- > https://cbs.centos.org/koji/buildinfo?buildID=12757 > > > 3) find the process id and attach gdb to the process using the command > > gdb attach <pid> -x <path_to_script> > > > 4) Continue running the script till you hit the problem > > > 5) Stop the gdb > > > 6) You will see a file called mylog.txt in the location where you ran > > the gdb > > > > Please keep an eye on the attached process. If you have any doubt please > > feel free to revert me. > > > Regards > > > Rafi KC > > > > On 12/19/2016 05:33 PM, Mohammed Rafi K C wrote: > > On 12/19/2016 05:32 PM, Mohammed Rafi K C wrote: > > Client 0-glusterfs01-client-2 has disconnected from bricks around > > 2016-12-15 11:21:17.854249 . Can you look and/or paste the brick logs > > around the time. > > You can find the brick name and hostname for 0-glusterfs01-client-2 from > > client graph. > > > Rafi > > > Are you there in any of gluster irc channel, if so Have you got a > > nickname that I can search. > > > Regards > > Rafi KC > > > On 12/19/2016 04:28 PM, yonex wrote: > > Rafi, > > > OK. Thanks for your guide. I found the debug log and pasted lines around > that. > > http://pastebin.com/vhHR6PQN > > > Regards > > > > 2016-12-19 14:58 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: > > On 12/16/2016 09:10 PM, yonex wrote: > > Rafi, > > > Thanks, the .meta feature I didn't know is very nice. I finally have > > captured debug logs from a client and bricks. > > > A mount log: > > - http://pastebin.com/Tjy7wGGj > > > FYI rickdom126 is my client's hostname. > > > Brick logs around that time: > > - Brick1: http://pastebin.com/qzbVRSF3 > > - Brick2: http://pastebin.com/j3yMNhP3 > > - Brick3: http://pastebin.com/m81mVj6L > > - Brick4: http://pastebin.com/JDAbChf6 > > - Brick5: http://pastebin.com/7saP6rsm > > > However I could not find any message like "EOF on socket". I hope > > there is any helpful information in the logs above. > > Indeed. I understand that the connections are in disconnected state. But > > what particularly I'm looking for is the cause of the disconnect, Can > > you paste the debug logs when it start disconnects, and around that. You > > may see a debug logs that says "disconnecting now". > > > > Regards > > Rafi KC > > > > Regards. > > > > 2016-12-14 15:20 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: > > On 12/13/2016 09:56 PM, yonex wrote: > > Hi Rafi, > > > Thanks for your response. OK, I think it is possible to capture debug > > logs, since the error seems to be reproduced a few times per day. I > > will try that. However, so I want to avoid redundant debug outputs if > > possible, is there a way to enable debug log only on specific client > > nodes? > > if you are using fuse mount, there is proc kind of feature called .meta > > . You can set log level through that for a particular client [1] . But I > > also want log from bricks because I suspect bricks process for > > initiating the disconnects. > > > > [1] eg : echo 8 > /mnt/glusterfs/.meta/logging/loglevel > > > Regards > > > Yonex > > > 2016-12-13 23:33 GMT+09:00 Mohammed Rafi K C <rkavunga at redhat.com>: > > Hi Yonex, > > > Is this consistently reproducible ? if so, Can you enable debug log [1] > > and check for any message similar to [2]. Basically you can even search > > for "EOF on socket". > > > You can set your log level back to default (INFO) after capturing for > > some time. > > > > [1] : gluster volume set <volname> diagnostics.brick-log-level DEBUG and > > gluster volume set <volname> diagnostics.client-log-level DEBUG > > > [2] : http://pastebin.com/xn8QHXWa > > > > Regards > > > Rafi KC > > > On 12/12/2016 09:35 PM, yonex wrote: > > Hi, > > > When my application moves a file from it's local disk to FUSE-mounted > > GlusterFS volume, the client outputs many warnings and errors not > > always but occasionally. The volume is a simple distributed volume. > > > A sample of logs pasted: http://pastebin.com/axkTCRJX > > > It seems to come from something like a network disconnection > > ("Transport endpoint is not connected") at a glance, but other > > networking applications on the same machine don't observe such a > > thing. So I guess there may be a problem somewhere in GlusterFS stack. > > > It ended in failing to rename a file, logging PHP Warning like below: > > > PHP Warning: rename(/glusterfs01/db1/stack/f0/13a9a2f0): failed > > to open stream: Input/output error in [snipped].php on line 278 > > PHP Warning: > > rename(/var/stack/13a9a2f0,/glusterfs01/db1/stack/f0/13a9a2f0): > > Input/output error in [snipped].php on line 278 > > > Conditions: > > > - GlusterFS 3.8.5 installed via yum CentOS-Gluster-3.8.repo > > - Volume info and status pasted: http://pastebin.com/JPt2KeD8 > > - Client machines' OS: Scientific Linux 6 or CentOS 6. > > - Server machines' OS: CentOS 6. > > - Kernel version is 2.6.32-642.6.2.el6.x86_64 on all machines. > > - The number of connected FUSE clients is 260. > > - No firewall between connected machines. > > - Neither remounting volumes nor rebooting client machines take effect. > > - It is caused by not only rename() but also copy() and filesize() > operation. > > - No outputs in brick logs when it happens. > > > Any ideas? I'd appreciate any help. > > > Regards. > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > > >