Micha Ober
2016-Nov-14 23:03 UTC
[Gluster-users] After upgrade from 3.4.2 to 3.8.5 - High CPU usage resulting in disconnects and split-brain
Hi, I upgraded an installation of GlusterFS on Ubuntu 14.04.3 from version 3.4.2 to 3.8.5. Few hours after the upgrade, I noticed files in "split-brain" state. I never had split-brain files in months of operation before, with the old version. Using htop, I observed the "glusterfs" process jumping from 0% to 100+% CPU usage every now and then. Using iostat, I confirmed there is no bottleneck on the local disks (util is well below 10%) Inspecting the logfiles, it looks like clients are losing connection quite often: [2016-11-14 16:34:56.685349] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-gv0-client-1: server X.X.X.62:49152 has not responded in the last 42 seconds, disconnecting. [2016-11-14 16:35:47.690348] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-gv0-client-8: server X.X.X.219:49153 has not responded in the last 42 seconds, disconnecting. [2016-11-14 17:09:33.903096] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-gv0-client-7: server X.X.X.62:49153 has not responded in the last 42 seconds, disconnecting. There are a total of 6 servers with 2 bricks each (Distribute-Replicate) The result of a 60 second gluster volume profile can be seen here: http://pastebin.com/5WN5S63B After upgrading, I set: cluster.granular-entry-heal yes cluster.locking-scheme granular I now reverted to no/full to see if files are still going "split-brain". Best regards -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161115/bae2c13d/attachment.html>
Pranith Kumar Karampuri
2016-Nov-15 18:06 UTC
[Gluster-users] After upgrade from 3.4.2 to 3.8.5 - High CPU usage resulting in disconnects and split-brain
On Tue, Nov 15, 2016 at 4:33 AM, Micha Ober <micha2k at gmail.com> wrote:> Hi, > > I upgraded an installation of GlusterFS on Ubuntu 14.04.3 from version > 3.4.2 to 3.8.5. > Few hours after the upgrade, I noticed files in "split-brain" state. I > never had split-brain files in months of operation before, with the old > version. > > Using htop, I observed the "glusterfs" process jumping from 0% to 100+% > CPU usage every now and then. > Using iostat, I confirmed there is no bottleneck on the local disks (util > is well below 10%) > > Inspecting the logfiles, it looks like clients are losing connection quite > often: > > [2016-11-14 16:34:56.685349] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] > 0-gv0-client-1: server X.X.X.62:49152 has not responded in the last 42 > seconds, disconnecting. > [2016-11-14 16:35:47.690348] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] > 0-gv0-client-8: server X.X.X.219:49153 has not responded in the last 42 > seconds, disconnecting. > [2016-11-14 17:09:33.903096] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] > 0-gv0-client-7: server X.X.X.62:49153 has not responded in the last 42 > seconds, disconnecting. >You need to find out what is leading to the disconnects above. These could be the reason for split-brain.> > There are a total of 6 servers with 2 bricks each (Distribute-Replicate) > > The result of a 60 second gluster volume profile can be seen here: > http://pastebin.com/5WN5S63B > > After upgrading, I set: > > cluster.granular-entry-heal yes > cluster.locking-scheme granular > > I now reverted to no/full to see if files are still going "split-brain". > > Best regards > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161115/7cd0a635/attachment.html>