Hey there, I've been madly hacking on cool new puppet-gluster features... In my lack of sleep, I've put together some comments about gluster add/remove brick features. Hopefully they are useful, and make sense. These are sort of "bugs". Have a look, and let me know if I should formally report any of these... Cheers... James PS: this is also mirrored here: http://paste.fedoraproject.org/50402/12956713 because email has destroyed formatting :P All tests are done on gluster 3.4.1, using CentOS 6.4 on vm's. Firewall has been disabled for testing purposes. gluster --version glusterfs 3.4.1 built on Sep 27 2013 13:13:58 ### 1) simple operations shouldn't fail # running the following commands in succession without files: # gluster volume add-brick examplevol vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 # gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 start ... status shows a failure: [root at vmx1 ~]# gluster volume add-brick examplevol vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 volume add-brick: success [root at vmx1 ~]# gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 not started 0.00 vmx2.example.com 0 0Bytes 0 0 not started 0.00 [root at vmx1 ~]# gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 start volume remove-brick start: success ID: ecbcc2b6-4351-468a-8f53-3a09159e4059 [root at vmx1 ~]# gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 8 0 completed 0.00 vmx2.example.com 0 0Bytes 0 1 failed 0.00 [root at vmx1 ~]# gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success [root at vmx1 ~]# ### 1b) on the other node, the output shows an extra row (also including the failure) [root at vmx2 ~]# gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 completed 0.00 localhost 0 0Bytes 0 0 completed 0.00 vmx1.example.com 0 0Bytes 0 1 failed 0.00 ### 2) formatting: # the "skipped" column doesn't seem to have any data, as a result formatting is broken... # this problem is obviously not seen in the more useful --xml output below. neither is the 'skipped' column. [root at vmx1 examplevol]# gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo3 vmx2.example.com:/tmp/foo3 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 8 0 completed 0.00 vmx2.example.com 0 0Bytes 8 0 completed 0.00 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cliOutput> <opRet>0</opRet> <opErrno>115</opErrno> <opErrstr/> <volRemoveBrick> <task-id>d99cab76-cd7d-4579-80ae-c1e6faff3d1d</task-id> <nodeCount>2</nodeCount> <node> <nodeName>localhost</nodeName> <files>0</files> <size>0</size> <lookups>8</lookups> <failures>0</failures> <status>3</status> <statusStr>completed</statusStr> </node> <node> <nodeName>vmx2.example.com</nodeName> <files>0</files> <size>0</size> <lookups>8</lookups> <failures>0</failures> <status>3</status> <statusStr>completed</statusStr> </node> <aggregate> <files>0</files> <size>0</size> <lookups>16</lookups> <failures>0</failures> <status>3</status> <statusStr>completed</statusStr> </aggregate> </volRemoveBrick> </cliOutput> ### 3) [root at vmx1 examplevol]# gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo3 vmx2.example.com:/tmp/foo3 status Node Rebalanced-files size scanned failures skipped status run-time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 8 0 completed 0.00 vmx2.example.com 0 0Bytes 8 0 completed 0.00 [root at vmx1 examplevol]# gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo3 vmx2.example.com:/tmp/foo3 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success This shouldn't warn you that you might experience data loss. If the rebalance has successfully worked, and bricks shouldn't be accepting new files, then gluster should know this, and just let you commit safely. I guess you can consider this a UI bug, as long as it checks before hand that it's safe. ### 4) Aggregate "totals", aka the <aggregate> </aggregate> data isn't shown in the normal command line output. ### 5) the volume shouldn't have to be "started" for a rebalance to work... we might want to do a rebalance, but keep it "stopped" so that clients can't mount. This is probably due to gluster needing it "online" to rebalance, but nonetheless, it doesn't work with what users/sysadmins expect. ### 6) in the command: gluster volume rebalance myvolume status ; gluster volume rebalance myvolume status --xml && echo t No where does it mention the volume, or the specific bricks which are being [re-]balanced. In particular, a volume name would be especially useful in the --xml output. This would be useful if multiple rebalances are going on... I realize this is because the rebalance command only allows you to specify one volume at a time, but to be consistent with other commands, a volume rebalance status command should let you get info on many volumes. Also, still missing per brick information. Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 2 0 0 in progress 1.00 vmx2.example.com 0 0Bytes 7 0 0 in progress 1.00 volume rebalance: examplevol: success: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cliOutput> <opRet>0</opRet> <opErrno>115</opErrno> <opErrstr/> <volRebalance> <task-id>c5e9970b-f96a-4a28-af14-5477cf90d638</task-id> <op>3</op> <nodeCount>2</nodeCount> <node> <nodeName>localhost</nodeName> <files>0</files> <size>0</size> <lookups>2</lookups> <failures>0</failures> <status>1</status> <statusStr>in progress</statusStr> </node> <node> <nodeName>vmx2.example.com</nodeName> <files>0</files> <size>0</size> <lookups>7</lookups> <failures>0</failures> <status>1</status> <statusStr>in progress</statusStr> </node> <aggregate> <files>0</files> <size>0</size> <lookups>9</lookups> <failures>0</failures> <status>1</status> <statusStr>in progress</statusStr> </aggregate> </volRebalance> </cliOutput> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131030/94d04e44/attachment.sig>
Kaushal M
2013-Nov-11 07:20 UTC
[Gluster-users] A bunch of comments about brick management
Hey James, I'm replying inline about your observations. Some of your observations are valid bugs. But most of them are such because they haven't been implemented and are feature requests. On Wed, Oct 30, 2013 at 4:10 PM, James <purpleidea at gmail.com> wrote:> Hey there, > > I've been madly hacking on cool new puppet-gluster features... In my > lack of sleep, I've put together some comments about gluster add/remove > brick features. Hopefully they are useful, and make sense. These are > sort of "bugs". Have a look, and let me know if I should formally report > any of these... > > Cheers... > James > > PS: this is also mirrored here: > http://paste.fedoraproject.org/50402/12956713 > because email has destroyed formatting :P > > > All tests are done on gluster 3.4.1, using CentOS 6.4 on vm's. > Firewall has been disabled for testing purposes. > gluster --version > glusterfs 3.4.1 built on Sep 27 2013 13:13:58 > > > ### 1) simple operations shouldn't fail > # running the following commands in succession without files: > # gluster volume add-brick examplevol vmx1.example.com:/tmp/foo9 > vmx2.example.com:/tmp/foo9 > # gluster volume remove-brick examplevol vmx1.example.com:/tmp/foo9 > vmx2.example.com:/tmp/foo9 start ... status > > shows a failure: > > [root at vmx1 ~]# gluster volume add-brick examplevol > vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 > volume add-brick: success > [root at vmx1 ~]# gluster volume remove-brick examplevol > vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 status > Node Rebalanced-files size > scanned failures skipped status run-time in secs > --------- ----------- ----------- > ----------- ----------- ----------- ------------ -------------- > localhost 0 0Bytes > 0 0 not started 0.00 > vmx2.example.com 0 0Bytes > 0 0 not started 0.00 > [root at vmx1 ~]# gluster volume remove-brick examplevol > vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 start > volume remove-brick start: success > ID: ecbcc2b6-4351-468a-8f53-3a09159e4059 > [root at vmx1 ~]# gluster volume remove-brick examplevol > vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 status > Node Rebalanced-files size > scanned failures skipped status run-time in secs > --------- ----------- ----------- > ----------- ----------- ----------- ------------ -------------- > localhost 0 0Bytes > 8 0 completed 0.00 > vmx2.example.com 0 0Bytes > 0 1 failed 0.00I don't know why this the process failed on one node. This is a rebalance issue, not a cli one, and is a valid bug. If you can reproduce it consistently, then please file a bug for this.> [root at vmx1 ~]# gluster volume remove-brick examplevol > vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 commit > Removing brick(s) can result in data loss. Do you want to Continue? > (y/n) y > volume remove-brick commit: success > [root at vmx1 ~]# > > ### 1b) on the other node, the output shows an extra row (also including > the failure) > > [root at vmx2 ~]# gluster volume remove-brick examplevol > vmx1.example.com:/tmp/foo9 vmx2.example.com:/tmp/foo9 status > Node Rebalanced-files size > scanned failures skipped status run-time in secs > --------- ----------- ----------- > ----------- ----------- ----------- ------------ -------------- > localhost 0 0Bytes > 0 0 completed 0.00 > localhost 0 0Bytes > 0 0 completed 0.00 > vmx1.example.com 0 0Bytes > 0 1 failed 0.00 >This is a bug, which I've seen some other times as well. But haven't gone around to file it or fix it. So file a bug for this as well.> > ### 2) formatting: > > # the "skipped" column doesn't seem to have any data, as a result > formatting is broken... > # this problem is obviously not seen in the more useful --xml output > below. neither is the 'skipped' column. > > [root at vmx1 examplevol]# gluster volume remove-brick examplevol > vmx1.example.com:/tmp/foo3 vmx2.example.com:/tmp/foo3 status > Node Rebalanced-files size > scanned failures skipped status run-time in secs > --------- ----------- ----------- > ----------- ----------- ----------- ------------ -------------- > localhost 0 0Bytes > 8 0 completed 0.00 > vmx2.example.com 0 0Bytes > 8 0 completed 0.00 > > > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <cliOutput> > <opRet>0</opRet> > <opErrno>115</opErrno> > <opErrstr/> > <volRemoveBrick> > <task-id>d99cab76-cd7d-4579-80ae-c1e6faff3d1d</task-id> > <nodeCount>2</nodeCount> > <node> > <nodeName>localhost</nodeName> > <files>0</files> > <size>0</size> > <lookups>8</lookups> > <failures>0</failures> > <status>3</status> > <statusStr>completed</statusStr> > </node> > <node> > <nodeName>vmx2.example.com</nodeName> > <files>0</files> > <size>0</size> > <lookups>8</lookups> > <failures>0</failures> > <status>3</status> > <statusStr>completed</statusStr> > </node> > <aggregate> > <files>0</files> > <size>0</size> > <lookups>16</lookups> > <failures>0</failures> > <status>3</status> > <statusStr>completed</statusStr> > </aggregate> > </volRemoveBrick> > </cliOutput> >Skipped count wasn't available in rebalance status' xml output in 3.4.0, but has been added recently and should be available in 3.4.2. (http://review.gluster.org/6000) Regarding the missing skipped count in remove-brick status output, it might be because for remove-brick a skipped file is treated as a failure. But this might have been fixed in 3.4.2, but needs to checked to confirm.> > ### 3) > [root at vmx1 examplevol]# gluster volume remove-brick examplevol > vmx1.example.com:/tmp/foo3 vmx2.example.com:/tmp/foo3 status > Node Rebalanced-files size > scanned failures skipped status run-time in secs > --------- ----------- ----------- > ----------- ----------- ----------- ------------ -------------- > localhost 0 0Bytes > 8 0 completed 0.00 > vmx2.example.com 0 0Bytes > 8 0 completed 0.00 > [root at vmx1 examplevol]# gluster volume remove-brick examplevol > vmx1.example.com:/tmp/foo3 vmx2.example.com:/tmp/foo3 commit > Removing brick(s) can result in data loss. Do you want to Continue? > (y/n) y > volume remove-brick commit: success > > This shouldn't warn you that you might experience data loss. If the > rebalance has successfully worked, and bricks shouldn't be accepting new > files, then gluster should know this, and just let you commit safely. > I guess you can consider this a UI bug, as long as it checks before hand > that it's safe. >The warning is given by the CLI, before even contacting glusterd. The CLI doesn't have any information regarding the status of a remove-brick commands' rebalance process. It gives the warning for all 'remove-brick commit' commands, be it a forceful removal or a clean removal with rebalancing. This would be a nice feature to have though, but it requires changes to way CLI and glusterd communicate.> > ### 4) > Aggregate "totals", aka the <aggregate> </aggregate> data isn't shown in > the normal command line output. >The aggregate totals were added to the xml output because the ovirt team, which was the driver behind having XML outputs, requested for it. It should be simple enough to add it to the normal output. If it is desired please raise a RFE bug.> > ### 5) the volume shouldn't have to be "started" for a rebalance to > work... we might want to do a rebalance, but keep it "stopped" so that > clients can't mount. > This is probably due to gluster needing it "online" to rebalance, but > nonetheless, it doesn't work with what users/sysadmins expect. >Gluster requires the bricks to be running for rebalance to happen, so we cannot start a rebalance with the volume stopped. But we could bring a mechanism to barrier client access to the volume during rebalance. This type of barriering is being considered for the volume level snapshot feature planned for 3.6 . But, since rebalance is a long running process when compared to a snapshot, there might be certain difficulties in barriering during rebalance.> > ### 6) in the command: gluster volume rebalance myvolume status ; > gluster volume rebalance myvolume status --xml && echo t > No where does it mention the volume, or the specific bricks which are > being [re-]balanced. In particular, a volume name would be especially > useful in the --xml output. > This would be useful if multiple rebalances are going on... I realize > this is because the rebalance command only allows you to specify one > volume at a time, but to be consistent with other commands, a volume > rebalance status command should let you get info on many volumes. > Also, still missing per brick information. > > > > Node Rebalanced-files size > scanned failures skipped status run time in secs > --------- ----------- ----------- > ----------- ----------- ----------- ------------ -------------- > localhost 0 0Bytes > 2 0 0 in progress 1.00 > vmx2.example.com 0 0Bytes > 7 0 0 in progress 1.00 > volume rebalance: examplevol: success: > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <cliOutput> > <opRet>0</opRet> > <opErrno>115</opErrno> > <opErrstr/> > <volRebalance> > <task-id>c5e9970b-f96a-4a28-af14-5477cf90d638</task-id> > <op>3</op> > <nodeCount>2</nodeCount> > <node> > <nodeName>localhost</nodeName> > <files>0</files> > <size>0</size> > <lookups>2</lookups> > <failures>0</failures> > <status>1</status> > <statusStr>in progress</statusStr> > </node> > <node> > <nodeName>vmx2.example.com</nodeName> > <files>0</files> > <size>0</size> > <lookups>7</lookups> > <failures>0</failures> > <status>1</status> > <statusStr>in progress</statusStr> > </node> > <aggregate> > <files>0</files> > <size>0</size> > <lookups>9</lookups> > <failures>0</failures> > <status>1</status> > <statusStr>in progress</statusStr> > </aggregate> > </volRebalance> > </cliOutput> >Having the volume name in xml output is a valid enhancement. Go ahead and open a RFE bug for it. The rebalance process on each node crawls the whole volume to find files which need to be migrated and which are present on bricks of the volume belonging to that node. So the rebalance status of a node can be considered the status of the brick. But if a node contains more than one brick of the volume being rebalanced, we don't have a way to differentiate, and I'm not sure if we could do that.> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users