Stephen Remde
2018-Dec-12 01:05 UTC
[Gluster-users] distribute remove-brick has started migrating the wrong brick (glusterfs 3.8.13)
I requested a brick be removed from a distribute only volume and it seems to be migrating data from the wrong brick... unless I am reading this wrong which I doubt because the disk usage is definitely decreasing on the wrong brick. gluster> volume status Status of volume: video-backup Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306 Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651 Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826 Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311 Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656 Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316 Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662 Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322 Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667 Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836 Task Status of Volume video-backup ------------------------------------------------------------------------------ Task : Rebalance ID : 7895be7c-4ab9-440d-a301-c11dae0dd9e1 Status : completed gluster> volume remove-brick video-backup 10.0.0.43:/export/md1/brick start volume remove-brick start: success ID: f666a196-03c2-4940-bd38-45d8383345a4 gluster> volume status Status of volume: video-backup Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306 Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651 Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826 Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311 Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656 Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316 Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662 Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322 Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667 Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836 Task Status of Volume video-backup ------------------------------------------------------------------------------ Task : Remove brick ID : f666a196-03c2-4940-bd38-45d8383345a4 Removed bricks: 10.0.0.43:/export/md1/brick Status : in progress But when I check the rebalance log on the host with the brick being removed, it is actually migrating data from the other brick on the same host 10.0.0.43:/export/md0/brick ..... [2018-12-11 11:59:52.572657] I [MSGID: 109086] [dht-shared.c:297:dht_parse_decommissioned_bricks] 0-video-backup-dht: *decommissioning subvolume video-backup-client-9* .... 29: volume video-backup-client-2 30: type protocol/client 31: option clnt-lk-version 1 32: option volfile-checksum 0 33: option volfile-key rebalance/video-backup 34: option client-version 3.8.15 35: option process-uuid node-dc4-03-25536-2018/12/11-11:59:47:551328-video-backup-client-2-0-0 36: option fops-version 1298437 37: option ping-timeout 42 38: option remote-host 10.0.0.43 39: option remote-subvolume /export/md0/brick 40: option transport-type socket 41: option transport.address-family inet 42: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7 43: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2 44: end-volume ... 112: volume video-backup-client-9 113: type protocol/client 114: option ping-timeout 42 115: option remote-host 10.0.0.43 116: option remote-subvolume /export/md1/brick 117: option transport-type socket 118: option transport.address-family inet 119: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7 120: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2 121: end-volume ... [2018-12-11 11:59:52.608698] I [dht-rebalance.c:3668:gf_defrag_start_crawl] 0-video-backup-dht: gf_defrag_start_crawl using commit hash 3766302106 [2018-12-11 11:59:52.609478] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of / [2018-12-11 11:59:52.615348] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-2 [2018-12-11 11:59:52.615378] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-9 ... [2018-12-11 11:59:52.616554] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on / [2018-12-11 11:59:54.000363] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /symlinks.txt: attempting to move from video-backup-client-2 to video-backup-client-4 [2018-12-11 11:59:55.110549] I [MSGID: 109022] [dht-rebalance.c:1703:dht_migrate_file] 0-video-backup-dht: completed migration of /symlinks.txt from subvolume video-backup-client-2 to video-backup-client-4 [2018-12-11 11:59:58.100931] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6 [2018-12-11 11:59:58.107389] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6 [2018-12-11 11:59:58.132138] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6 took 0.02 secs [2018-12-11 11:59:58.330393] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6/2017 [2018-12-11 11:59:58.337601] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6/2017 [2018-12-11 11:59:58.493906] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908101048: attempting to move from video-backup-client-2 to video-backup-client-4 [2018-12-11 11:59:58.706068] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908120734132317: attempting to move from video-backup-client-2 to video-backup-client-4 [2018-12-11 11:59:58.783952] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124091841: attempting to move from video-backup-client-2 to video-backup-client-4 [2018-12-11 11:59:58.843315] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124135453: attempting to move from video-backup-client-2 to video-backup-client-4 [2018-12-11 11:59:58.951637] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161122111252: attempting to move from video-backup-client-2 to video-backup-client-4 [2018-12-11 11:59:59.005324] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6/2017 took 0.67 secs [2018-12-11 11:59:59.005362] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/58906aaaaca0515f5994104d20170213154555: attempting to move from video-backup-client-2 to video-backup-client-4 etc... Can I stop/cancel it without data loss? How can I make gluster remove the correct brick? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181212/1dd6c1df/attachment.html>
Nithya Balachandran
2018-Dec-12 03:07 UTC
[Gluster-users] distribute remove-brick has started migrating the wrong brick (glusterfs 3.8.13)
This is the current behaviour of rebalance and nothing to be concerned about - it will migrate data on all bricks on the nodes which host the bricks being removed. The data on the removed bricks will be moved to other bricks, some of the data on the other bricks on the node will just be moved to other bricks based on the new directory layouts. I will fix this in the near future but you don't need to to stop the remove-brick operation. Regards, Nithya On Wed, 12 Dec 2018 at 06:36, Stephen Remde <stephen.remde at gaist.co.uk> wrote:> I requested a brick be removed from a distribute only volume and it seems to be migrating data from the wrong brick... unless I am reading this wrong which I doubt because the disk usage is definitely decreasing on the wrong brick. > > gluster> volume status > Status of volume: video-backup > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306 > Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651 > Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826 > Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311 > Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656 > Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316 > Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662 > Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322 > Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667 > Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836 > > Task Status of Volume video-backup > ------------------------------------------------------------------------------ > Task : Rebalance > ID : 7895be7c-4ab9-440d-a301-c11dae0dd9e1 > Status : completed > > gluster> volume remove-brick video-backup 10.0.0.43:/export/md1/brick start > volume remove-brick start: success > ID: f666a196-03c2-4940-bd38-45d8383345a4 > > gluster> volume status > Status of volume: video-backup > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306 > Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651 > Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826 > Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311 > Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656 > Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316 > Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662 > Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322 > Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667 > Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836 > > Task Status of Volume video-backup > ------------------------------------------------------------------------------ > Task : Remove brick > ID : f666a196-03c2-4940-bd38-45d8383345a4 > Removed bricks: > 10.0.0.43:/export/md1/brick > Status : in progress > > > But when I check the rebalance log on the host with the brick being removed, it is actually migrating data from the other brick on the same host 10.0.0.43:/export/md0/brick > > > ..... > [2018-12-11 11:59:52.572657] I [MSGID: 109086] [dht-shared.c:297:dht_parse_decommissioned_bricks] 0-video-backup-dht: *decommissioning subvolume video-backup-client-9* > .... > 29: volume video-backup-client-2 > 30: type protocol/client > 31: option clnt-lk-version 1 > 32: option volfile-checksum 0 > 33: option volfile-key rebalance/video-backup > 34: option client-version 3.8.15 > 35: option process-uuid node-dc4-03-25536-2018/12/11-11:59:47:551328-video-backup-client-2-0-0 > 36: option fops-version 1298437 > 37: option ping-timeout 42 > 38: option remote-host 10.0.0.43 > 39: option remote-subvolume /export/md0/brick > 40: option transport-type socket > 41: option transport.address-family inet > 42: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7 > 43: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2 > 44: end-volume > ... > 112: volume video-backup-client-9 > 113: type protocol/client > 114: option ping-timeout 42 > 115: option remote-host 10.0.0.43 > 116: option remote-subvolume /export/md1/brick > 117: option transport-type socket > 118: option transport.address-family inet > 119: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7 > 120: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2 > 121: end-volume > ... > [2018-12-11 11:59:52.608698] I [dht-rebalance.c:3668:gf_defrag_start_crawl] 0-video-backup-dht: gf_defrag_start_crawl using commit hash 3766302106 > [2018-12-11 11:59:52.609478] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of / > [2018-12-11 11:59:52.615348] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-2 > [2018-12-11 11:59:52.615378] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-9 > ... > [2018-12-11 11:59:52.616554] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on / > [2018-12-11 11:59:54.000363] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /symlinks.txt: attempting to move from video-backup-client-2 to video-backup-client-4 > [2018-12-11 11:59:55.110549] I [MSGID: 109022] [dht-rebalance.c:1703:dht_migrate_file] 0-video-backup-dht: completed migration of /symlinks.txt from subvolume video-backup-client-2 to video-backup-client-4 > [2018-12-11 11:59:58.100931] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6 > [2018-12-11 11:59:58.107389] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6 > [2018-12-11 11:59:58.132138] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6 took 0.02 secs > [2018-12-11 11:59:58.330393] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6/2017 > [2018-12-11 11:59:58.337601] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6/2017 > [2018-12-11 11:59:58.493906] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908101048: attempting to move from video-backup-client-2 to video-backup-client-4 > [2018-12-11 11:59:58.706068] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908120734132317: attempting to move from video-backup-client-2 to video-backup-client-4 > [2018-12-11 11:59:58.783952] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124091841: attempting to move from video-backup-client-2 to video-backup-client-4 > [2018-12-11 11:59:58.843315] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124135453: attempting to move from video-backup-client-2 to video-backup-client-4 > [2018-12-11 11:59:58.951637] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161122111252: attempting to move from video-backup-client-2 to video-backup-client-4 > [2018-12-11 11:59:59.005324] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6/2017 took 0.67 secs > [2018-12-11 11:59:59.005362] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/58906aaaaca0515f5994104d20170213154555: attempting to move from video-backup-client-2 to video-backup-client-4 > > etc... > > Can I stop/cancel it without data loss? How can I make gluster remove the correct brick? > > Thanks > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181212/67b54d3c/attachment.html>