Hi, We've been using GlusterFS 3.4.0 on Debian 7 for around 11 months. Every so often (once every 6 weeks or so) a server running the client and nginx accessing the gluster storage would suffer a hard lockup of nginx and require a reboot. This wasn't such a big deal as there were several clients so there was never a loss to production. However in the last few days we've seen a huge increase in these lockups. Perhaps 2-3 times a day on multiple machines. The bug appears to be very similar to https://bugzilla.redhat.com/show_bug.cgi?id=764743 from Gluster 3.2 and the suggestion of first killing the Gluster process does allow a restart of nginx without a reboot. The only changes to our usage of late have been to stop a rebalance process (as we're at the point where we need to add extra bricks) and to breach 80% usage on the volume (Total volume size is 400TB). Has anyone else had any experience of this bug re-emerging in Gluster 3.4? Thanks, Mark -- This email, including attachments, is private and confidential. If you have received this email in error please notify the sender and delete it from your system. Emails are not secure and may contain viruses. No liability can be accepted for viruses that might be transferred by this email or any attachment. Any unauthorised copying of this message or unauthorised distribution and publication of the information contained herein are prohibited. 7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB. Registered in England and Wales. Registered No. 04843573. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140804/bb172722/attachment.html>
----- Original Message -----> Hi, > > We've been using GlusterFS 3.4.0 on Debian 7 for around 11 months. Every so > often (once every 6 weeks or so) a server running the client and nginx > accessing the gluster storage would suffer a hard lockup of nginx and > require a reboot. This wasn't such a big deal as there were several clients > so there was never a loss to production. > > However in the last few days we've seen a huge increase in these lockups. > Perhaps 2-3 times a day on multiple machines. > > The bug appears to be very similar to > https://bugzilla.redhat.com/show_bug.cgi?id=764743 from Gluster 3.2 and the > suggestion of first killing the Gluster process does allow a restart of > nginx without a reboot. > > The only changes to our usage of late have been to stop a rebalance process > (as we're at the point where we need to add extra bricks) and to breach 80% > usage on the volume (Total volume size is 400TB). > > Has anyone else had any experience of this bug re-emerging in Gluster 3.4?As a side thought, how practical would it be to upgrade to GlusterFS 3.4.5? There have been a lot of bug fixes (some pretty important) between 3.4.0, and 3.4.5. I have no idea if there's a fix for the problem you mention in there btw, I'm just asking in general. :) Regards and best wishes, Justin Clift -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift
Could you take a statedump of the mount process when the hang happens using steps at https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.md Pranith On 08/05/2014 02:10 AM, Mark Harrigan wrote:> Hi, > > We've been using GlusterFS 3.4.0 on Debian 7 for around 11 months. > Every so often (once every 6 weeks or so) a server running the client > and nginx accessing the gluster storage would suffer a hard lockup of > nginx and require a reboot. This wasn't such a big deal as there were > several clients so there was never a loss to production. > > However in the last few days we've seen a huge increase in these > lockups. Perhaps 2-3 times a day on multiple machines. > > The bug appears to be very similar to > https://bugzilla.redhat.com/show_bug.cgi?id=764743 from Gluster 3.2 > and the suggestion of first killing the Gluster process does allow a > restart of nginx without a reboot. > > The only changes to our usage of late have been to stop a rebalance > process (as we're at the point where we need to add extra bricks) and > to breach 80% usage on the volume (Total volume size is 400TB). > > Has anyone else had any experience of this bug re-emerging in Gluster 3.4? > > Thanks, > > Mark > > > > This email, including attachments, is private and confidential. If you > have received this email in error please notify the sender and delete > it from your system. Emails are not secure and may contain viruses. No > liability can be accepted for viruses that might be transferred by > this email or any attachment. Any unauthorised copying of this message > or unauthorised distribution and publication of the information > contained herein are prohibited. > > 7digital Limited. Registered office:69 Wilson Street, London EC2A 2BB. > Registered inEngland and Wales. Registered No. 04843573. > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140806/e99f5ae6/attachment.html>