Joel Young
2013-Oct-25 18:49 UTC
[Gluster-users] Extra work in gluster volume rebalance and odd reporting
Folks, With gluster 1.4.0 on fedora 19: I have a four node gluster peer group (ir0, ir1, ir2, ir3). I've got two distributed filesystems on the cluster. One (work) distributed with bricks on ir0, ir1, and ir2. The other (home) replicated and distributed with replication across the distribution pairs (ir0, ir3) and (ir1, ir2). When doing a gluster volume rebalance home start and gluster volume rebalance work start, it does rebalance operations on every node in the peer group. For work, it ran a rebalance on ir3 even though there is no brick on ir3. For home, it ran a rebalance on ir1 and ir3 and did no work on those. [root at ir0]# gluster volume rebalance home status; gluster volume rebalance work status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 33441 2.3GB 120090 0 in progress 67154.00 ir2 12878 32.7GB 234395 0 completed 29569.00 ir3 0 0Bytes 234367 0 completed 1581.00 ir1 0 0Bytes 234367 0 completed 1569.00 volume rebalance: home: success: Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 1862936 0 completed 4444.00 ir2 417 10.4GB 1862936 417 completed 4466.00 ir3 0 0Bytes 1862936 0 completed 4454.00 ir1 4 282.8MB 1862936 4 completed 4438.00 Sometimes I would get: volume rebalance: work: success: [root at ir0 ghenders]# gluster volume rebalance home status; gluster volume rebalance work status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 31466 2.3GB 114290 0 in progress 63194.00 localhost 31466 2.3GB 114290 0 in progress 63194.00 localhost 31466 2.3GB 114290 0 in progress 63194.00 localhost 31466 2.3GB 114290 0 in progress 63194.00 ir3 0 0Bytes 234367 0 completed 1581.00 volume rebalance: home: success: Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 1862936 0 completed 4444.00 localhost 0 0Bytes 1862936 0 completed 4444.00 localhost 0 0Bytes 1862936 0 completed 4444.00 localhost 0 0Bytes 1862936 0 completed 4444.00 ir1 4 282.8MB 1862936 4 completed 4438.00 Where it only reports progress on one node. Should I file bugs on these? Joel
Joel Young
2013-Oct-25 21:29 UTC
[Gluster-users] Extra work in gluster volume rebalance and odd reporting
A couple more things: 1. For the work volume, the failures are caused by hard links that can't be rebalanced. It is odd thought that the hardlinks show up in the rebalanced files count even though they failed. 2. Output of gluster volume info: Volume Name: home Type: Distributed-Replicate Volume ID: 83fa39a6-6e68-4e1c-8fae-3c3e30b1bd66 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: ir0:/lhome/gluster_home Brick2: ir1:/lhome/gluster_home Brick3: ir2:/lhome/gluster_home Brick4: ir3:/raid/gluster_home Options Reconfigured: cluster.lookup-unhashed: no performance.client-io-threads: on performance.cache-size: 512MB server.statedump-path: /tmp Volume Name: work Type: Distribute Volume ID: 823816bb-2e60-4b37-a142-ba464a77bfdc Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: ir0:/raid/gluster_work Brick2: ir1:/raid/gluster_work Brick3: ir2:/raid/gluster_work Options Reconfigured: performance.client-io-threads: on performance.cache-size: 1GB performance.write-behind-window-size: 3MB performance.flush-behind: on server.statedump-path: /tmp Thanks, Joel On Fri, Oct 25, 2013 at 11:49 AM, Joel Young <jdy at cryregarder.com> wrote:> Folks, > > With gluster 1.4.0 on fedora 19: > > I have a four node gluster peer group (ir0, ir1, ir2, ir3). I've got > two distributed filesystems on the cluster. > > One (work) distributed with bricks on ir0, ir1, and ir2. The other > (home) replicated and distributed with replication across the > distribution pairs (ir0, ir3) and (ir1, ir2). > > When doing a gluster volume rebalance home start and gluster volume > rebalance work start, it does rebalance operations on every node in > the peer group. For work, it ran a rebalance on ir3 even though there > is no brick on ir3. For home, it ran a rebalance on ir1 and ir3 and > did no work on those. > > [root at ir0]# gluster volume rebalance home status; gluster volume > rebalance work status > Node Rebalanced-files > size scanned failures status run time in secs > --------- ----------- > ----------- ----------- ----------- ------------ > -------------- > localhost 33441 > 2.3GB 120090 0 in progress 67154.00 > ir2 12878 > 32.7GB 234395 0 completed 29569.00 > ir3 0 > 0Bytes 234367 0 completed 1581.00 > ir1 0 > 0Bytes 234367 0 completed 1569.00 > volume rebalance: home: success: > Node Rebalanced-files > size scanned failures status run time in secs > --------- ----------- > ----------- ----------- ----------- ------------ > -------------- > localhost 0 > 0Bytes 1862936 0 completed 4444.00 > ir2 417 > 10.4GB 1862936 417 completed 4466.00 > ir3 0 > 0Bytes 1862936 0 completed 4454.00 > ir1 4 > 282.8MB 1862936 4 completed 4438.00 > > > Sometimes I would get: > > volume rebalance: work: success: > [root at ir0 ghenders]# gluster volume rebalance home status; gluster > volume rebalance work status > Node Rebalanced-files > size scanned failures status run time in secs > --------- ----------- > ----------- ----------- ----------- ------------ > -------------- > localhost 31466 > 2.3GB 114290 0 in progress 63194.00 > localhost 31466 > 2.3GB 114290 0 in progress 63194.00 > localhost 31466 > 2.3GB 114290 0 in progress 63194.00 > localhost 31466 > 2.3GB 114290 0 in progress 63194.00 > ir3 0 > 0Bytes 234367 0 completed 1581.00 > volume rebalance: home: success: > Node Rebalanced-files > size scanned failures status run time in secs > --------- ----------- > ----------- ----------- ----------- ------------ > -------------- > localhost 0 > 0Bytes 1862936 0 completed 4444.00 > localhost 0 > 0Bytes 1862936 0 completed 4444.00 > localhost 0 > 0Bytes 1862936 0 completed 4444.00 > localhost 0 > 0Bytes 1862936 0 completed 4444.00 > ir1 4 > 282.8MB 1862936 4 completed 4438.00 > > > Where it only reports progress on one node. > > Should I file bugs on these? > > Joel