Pranith Kumar Karampuri
2015-May-29 08:16 UTC
[Gluster-users] 100% cpu on brick replication
Could you give gluster volume info output? Pranith On 05/29/2015 01:18 PM, Pedro Oriani wrote:> I've set > > cluster.entry-self-heal: off > > Maybe I've missed, and when started the service on srv02 seemed to do > the job. > then i've restarted the service. > > on srv02 > > 11607 ? Ssl 0:00 /usr/sbin/glusterfs -s localhost > --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/eb93ca526d4559069efc40da9c71b3a4.socket > --xlator-option *replicate*.node-uuid=7207ea30-41e9-4344-8fc3-47743b83629e > 11612 ? Ssl 0:03 /usr/sbin/glusterfsd -s 172.16.0.2 > --volfile-id vol1.172.16.0.2.data-glusterfs-vol1-brick1-brick -p > /var/lib/glusterd/vols/vol1/run/172.16.0.2-data-glusterfs-vol1-brick1-brick.pid > -S /var/run/gluster/09285d60c2c8c9aa546602147a99a347.socket > --brick-name /data/glusterfs/vol1/brick1/brick -l > /var/log/glusterfs/bricks/data-glusterfs-vol1-brick1-brick.log > --xlator-option > *-posix.glusterd-uuid=7207ea30-41e9-4344-8fc3-47743b83629e > --brick-port 49154 --xlator-option vol1-server.listen-port=49154 > > > it's seems like self healing starts and brings down srv01, with 600% load > > thanks, > Pedro > > ------------------------------------------------------------------------ > Date: Fri, 29 May 2015 12:37:19 +0530 > From: pkarampu at redhat.com > To: sgunfio at hotmail.com > CC: Gluster-users at gluster.org > Subject: Re: [Gluster-users] 100% cpu on brick replication > > > > On 05/29/2015 12:34 PM, Pedro Oriani wrote: > > Hi Pranith, > > it's for sure related to a replication / healing task, because > occurses when you create a new replicated brick or when you bring > back online an old one. > The problem is that the cpu load on the online brick is so high > that I cannot do normal operations. > In my case when a replication / healing occurs, the cluster cannot > serve content. > I'm asking if there is a way to limit cpu usage in this case, or > set a less aggressive mode, because otherwise I have to rethink > the image repository. > > Disable self-heal. I see that you already did that for self-heal > daemon. Lets do that even for mounts. > gluster volume set <volname> cluster.entry-self-heal off > > Let me know how that goes. > > Pranith > > > thanks, > Pedro > > ------------------------------------------------------------------------ > Date: Fri, 29 May 2015 11:14:29 +0530 > From: pkarampu at redhat.com <mailto:pkarampu at redhat.com> > To: sgunfio at hotmail.com <mailto:sgunfio at hotmail.com>; > gluster-users at gluster.org <mailto:gluster-users at gluster.org> > Subject: Re: [Gluster-users] 100% cpu on brick replication > > > > On 05/27/2015 08:48 PM, Pedro Oriani wrote: > > Hi All, > I'm writing because I'm experiecing an issue with gluster's > replication feature. > I've a brick on srv1 with about 2TB of mixed side files, > ranging from 10k a 300k > When I add a new replication brick on srv2, the glusterfs > process take all the cpu. > This is unsuitable because the volume is not responding at > normal r/w queries. > > Glusterfs version is 3.7.0 > > Is it because of self-heals? Was the brick offline until then? > > Pranith > > > the underlaying volume is xfs. > > > Volume Name: vol1 > Type: Replicate > Volume ID: > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 172.16.0.1:/data/glusterfs/vol1/brick1/brick > Brick2: 172.16.0.2:/data/glusterfs/vol1/brick1/brick > Options Reconfigured: > performance.cache-size: 1gb > cluster.self-heal-daemon: off > cluster.data-self-heal-algorithm: full > cluster.metadata-self-heal: off > performance.cache-max-file-size: 2MB > performance.cache-refresh-timeout: 1 > performance.stat-prefetch: off > performance.read-ahead: on > performance.quick-read: off > performance.write-behind-window-size: 4MB > performance.flush-behind: on > performance.write-behind: on > performance.io-thread-count: 32 > performance.io-cache: on > network.ping-timeout: 2 > nfs.addr-namelookup: off > performance.strict-write-ordering: on > > > there is any parameter or hint that I can follow to limit cpu > occupation to grant a replication with few lag on normal > operations ? > > thank > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://www.gluster.org/mailman/listinfo/gluster-users > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150529/10b80381/attachment.html>