This was raised earlier but I don't believe it was ever resolved and it is becoming a serious issue for me. I'm doing rolling upgrades on our three node cluster (Replica 3, Sharded, VM Workload). I update one node, reboot it, wait for healing to complete, do the next one. Only the heal count does not change, it just does not seem to start. It can take hours before it shifts, but once it does, its quite rapid. Node 1 has restarted and the heal count has been static at 511 shards for 45 minutes now. Nodes 1 & 2 have low CPU load, node 3 has glusterfsd pegged at 800% CPU. This was *not* the case in earlier versions of gluster (3.7.11 I think), healing would start almost right away. I think it started doing this when the afr locking improvements where made. I have experimented with full & diff heal modes, doesn't make any difference. Current: Gluster Version 4.8.4 Volume Name: datastore4 Type: Replicate Volume ID: 0ba131ef-311d-4bb1-be46-596e83b2f6ce Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore4 Brick2: vng.proxmox.softlog:/tank/vmdata/datastore4 Brick3: vna.proxmox.softlog:/tank/vmdata/datastore4 Options Reconfigured: cluster.self-heal-window-size: 1024 cluster.locking-scheme: granular cluster.granular-entry-heal: on performance.readdir-ahead: on cluster.data-self-heal: on features.shard: on cluster.quorum-type: auto cluster.server-quorum-type: server nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off performance.strict-write-ordering: off performance.stat-prefetch: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off cluster.eager-lock: enable network.remote-dio: enable features.shard-block-size: 64MB cluster.background-self-heal-count: 16 Thanks, -- Lindsay Mathieson
Any errors/warnings in the glustershd logs? -Krutika On Sat, Oct 1, 2016 at 8:18 PM, Lindsay Mathieson < lindsay.mathieson at gmail.com> wrote:> This was raised earlier but I don't believe it was ever resolved and it is > becoming a serious issue for me. > > > I'm doing rolling upgrades on our three node cluster (Replica 3, Sharded, > VM Workload). > > > I update one node, reboot it, wait for healing to complete, do the next > one. > > > Only the heal count does not change, it just does not seem to start. It > can take hours before it shifts, but once it does, its quite rapid. Node 1 > has restarted and the heal count has been static at 511 shards for 45 > minutes now. Nodes 1 & 2 have low CPU load, node 3 has glusterfsd pegged at > 800% CPU. > > > This was *not* the case in earlier versions of gluster (3.7.11 I think), > healing would start almost right away. I think it started doing this when > the afr locking improvements where made. > > > I have experimented with full & diff heal modes, doesn't make any > difference. > > Current: > > Gluster Version 4.8.4 > > Volume Name: datastore4 > Type: Replicate > Volume ID: 0ba131ef-311d-4bb1-be46-596e83b2f6ce > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore4 > Brick2: vng.proxmox.softlog:/tank/vmdata/datastore4 > Brick3: vna.proxmox.softlog:/tank/vmdata/datastore4 > Options Reconfigured: > cluster.self-heal-window-size: 1024 > cluster.locking-scheme: granular > cluster.granular-entry-heal: on > performance.readdir-ahead: on > cluster.data-self-heal: on > features.shard: on > cluster.quorum-type: auto > cluster.server-quorum-type: server > nfs.disable: on > nfs.addr-namelookup: off > nfs.enable-ino32: off > performance.strict-write-ordering: off > performance.stat-prefetch: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > cluster.eager-lock: enable > network.remote-dio: enable > features.shard-block-size: 64MB > cluster.background-self-heal-count: 16 > > > Thanks, > > > > > > -- > Lindsay Mathieson > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161001/47d29b16/attachment.html>
On 2/10/2016 12:48 AM, Lindsay Mathieson wrote:> 511 shards for 45 minutesAt (roughly) the one hour mark it started ticking over, heal completed at 1.5 hours. -- Lindsay Mathieson
On 2/10/2016 12:48 AM, Lindsay Mathieson wrote:> Only the heal count does not change, it just does not seem to start. > It can take hours before it shifts, but once it does, its quite rapid. > Node 1 has restarted and the heal count has been static at 511 shards > for 45 minutes now. Nodes 1 & 2 have low CPU load, node 3 has > glusterfsd pegged at 800% CPU.Ok, had a try at systematically reproducing it this morning and was actually unable to do so - quite weird. Testing was the same as last night - move all the VM's off a server and reboot it, wait for the healing to finish. This time I tried it with various different settings. Test 1 ------ cluster.granular-entry-heal no cluster.locking-scheme full Shards / Min: 350 / 8 Test 2 ------ cluster.granular-entry-heal yes cluster.locking-scheme granular Shards / Min: 391 / 10 Test 3 ------ cluster.granular-entry-heal yes cluster.locking-scheme granular heal command issued Shards / Min: 358 / 11 Test 3 ------ cluster.granular-entry-heal yes cluster.locking-scheme granular heal full command issued Shards / Min: 358 / 27 Best results were with cluster.granular-entry-heal=yes, cluster.locking-scheme=granular but they were all quite good. Don't know why it was so much worse last night - i/o load, cpu and memory were the same. However one thin that is different which I can't easily reproduce was that the cluster had been running for several weeks, but last night I rebooted all nodes. Could gluster be developing an issue after running for some time? -- Lindsay Mathieson
On 2/10/2016 12:48 AM, Lindsay Mathieson wrote:> This was raised earlier but I don't believe it was ever resolved and > it is becoming a serious issue for me. > > > I'm doing rolling upgrades on our three node cluster (Replica 3, > Sharded, VM Workload). > > > I update one node, reboot it, wait for healing to complete, do the > next one.Recently, I decided to remove all the heal optimisations I made to volume settings. - cluster.self-heal-window-size: 1024 - cluster.background-self-heal-count: 16 Just reset them to defaults. Since then I've done several rolling reboots and a lot of test kills of glusterfsd. Each time the volume heals have been prompt and fast, so I'm going t assume my settings were part of the problem. Perhaps the large heal windows size was repsonsible for the long delays before heal started? Cheers, -- Lindsay Mathieson