Stephen Remde
2018-Dec-18 09:26 UTC
[Gluster-users] distribute remove-brick has started migrating the wrong brick (glusterfs 3.8.13)
Nithya, I've realised, I will not have enough space on the other bricks in my cluster to migrate data off the server so I can remove the single brick - is there a work around? As you can see below, the new brick was created with the wrong raid configuration, so I want to remove it recreate the raid, and re add it. xxxxxx Filesystem Size Used Avail Use% Mounted on dc4-01 /dev/md0 95T 87T 8.0T 92% /export/md0 dc4-01 /dev/md1 95T 87T 8.4T 92% /export/md1 dc4-01 /dev/md2 95T 86T 9.3T 91% /export/md2 dc4-01 /dev/md3 95T 86T 8.9T 91% /export/md3 dc4-02 /dev/md0 95T 89T 6.5T 94% /export/md0 dc4-02 /dev/md1 95T 87T 8.4T 92% /export/md1 dc4-02 /dev/md2 95T 87T 8.6T 91% /export/md2 dc4-02 /dev/md3 95T 86T 8.8T 91% /export/md3 dc4-03 /dev/md0 95T 74T 21T 78% /export/md0 dc4-03 /dev/md1 102T 519G 102T 1% /export/md1 This is the backup storage, so if I HAVE to lose the 519GB and resync, that's an acceptable worst-case. gluster> v info video-backup Volume Name: video-backup Type: Distribute Volume ID: 887bdc2a-ca5e-4ca2-b30d-86831839ed04 Status: Started Snapshot Count: 0 Number of Bricks: 10 Transport-type: tcp Bricks: Brick1: 10.0.0.41:/export/md0/brick Brick2: 10.0.0.42:/export/md0/brick Brick3: 10.0.0.43:/export/md0/brick Brick4: 10.0.0.41:/export/md1/brick Brick5: 10.0.0.42:/export/md1/brick Brick6: 10.0.0.41:/export/md2/brick Brick7: 10.0.0.42:/export/md2/brick Brick8: 10.0.0.41:/export/md3/brick Brick9: 10.0.0.42:/export/md3/brick Brick10: 10.0.0.43:/export/md1/brick Options Reconfigured: cluster.rebal-throttle: aggressive cluster.min-free-disk: 1% transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Best, Steve On Wed, 12 Dec 2018 at 03:07, Nithya Balachandran <nbalacha at redhat.com> wrote:> > This is the current behaviour of rebalance and nothing to be concerned > about - it will migrate data on all bricks on the nodes which host the > bricks being removed. The data on the removed bricks will be moved to other > bricks, some of the data on the other bricks on the node will just be > moved to other bricks based on the new directory layouts. > I will fix this in the near future but you don't need to to stop the > remove-brick operation. > > Regards, > Nithya > > On Wed, 12 Dec 2018 at 06:36, Stephen Remde <stephen.remde at gaist.co.uk> > wrote: > >> I requested a brick be removed from a distribute only volume and it seems to be migrating data from the wrong brick... unless I am reading this wrong which I doubt because the disk usage is definitely decreasing on the wrong brick. >> >> gluster> volume status >> Status of volume: video-backup >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306 >> Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651 >> Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826 >> Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311 >> Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656 >> Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316 >> Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662 >> Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322 >> Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667 >> Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836 >> >> Task Status of Volume video-backup >> ------------------------------------------------------------------------------ >> Task : Rebalance >> ID : 7895be7c-4ab9-440d-a301-c11dae0dd9e1 >> Status : completed >> >> gluster> volume remove-brick video-backup 10.0.0.43:/export/md1/brick start >> volume remove-brick start: success >> ID: f666a196-03c2-4940-bd38-45d8383345a4 >> >> gluster> volume status >> Status of volume: video-backup >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306 >> Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651 >> Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826 >> Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311 >> Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656 >> Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316 >> Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662 >> Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322 >> Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667 >> Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836 >> >> Task Status of Volume video-backup >> ------------------------------------------------------------------------------ >> Task : Remove brick >> ID : f666a196-03c2-4940-bd38-45d8383345a4 >> Removed bricks: >> 10.0.0.43:/export/md1/brick >> Status : in progress >> >> >> But when I check the rebalance log on the host with the brick being removed, it is actually migrating data from the other brick on the same host 10.0.0.43:/export/md0/brick >> >> >> ..... >> [2018-12-11 11:59:52.572657] I [MSGID: 109086] [dht-shared.c:297:dht_parse_decommissioned_bricks] 0-video-backup-dht: *decommissioning subvolume video-backup-client-9* >> .... >> 29: volume video-backup-client-2 >> 30: type protocol/client >> 31: option clnt-lk-version 1 >> 32: option volfile-checksum 0 >> 33: option volfile-key rebalance/video-backup >> 34: option client-version 3.8.15 >> 35: option process-uuid node-dc4-03-25536-2018/12/11-11:59:47:551328-video-backup-client-2-0-0 >> 36: option fops-version 1298437 >> 37: option ping-timeout 42 >> 38: option remote-host 10.0.0.43 >> 39: option remote-subvolume /export/md0/brick >> 40: option transport-type socket >> 41: option transport.address-family inet >> 42: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7 >> 43: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2 >> 44: end-volume >> ... >> 112: volume video-backup-client-9 >> 113: type protocol/client >> 114: option ping-timeout 42 >> 115: option remote-host 10.0.0.43 >> 116: option remote-subvolume /export/md1/brick >> 117: option transport-type socket >> 118: option transport.address-family inet >> 119: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7 >> 120: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2 >> 121: end-volume >> ... >> [2018-12-11 11:59:52.608698] I [dht-rebalance.c:3668:gf_defrag_start_crawl] 0-video-backup-dht: gf_defrag_start_crawl using commit hash 3766302106 >> [2018-12-11 11:59:52.609478] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of / >> [2018-12-11 11:59:52.615348] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-2 >> [2018-12-11 11:59:52.615378] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-9 >> ... >> [2018-12-11 11:59:52.616554] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on / >> [2018-12-11 11:59:54.000363] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /symlinks.txt: attempting to move from video-backup-client-2 to video-backup-client-4 >> [2018-12-11 11:59:55.110549] I [MSGID: 109022] [dht-rebalance.c:1703:dht_migrate_file] 0-video-backup-dht: completed migration of /symlinks.txt from subvolume video-backup-client-2 to video-backup-client-4 >> [2018-12-11 11:59:58.100931] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6 >> [2018-12-11 11:59:58.107389] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6 >> [2018-12-11 11:59:58.132138] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6 took 0.02 secs >> [2018-12-11 11:59:58.330393] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6/2017 >> [2018-12-11 11:59:58.337601] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6/2017 >> [2018-12-11 11:59:58.493906] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908101048: attempting to move from video-backup-client-2 to video-backup-client-4 >> [2018-12-11 11:59:58.706068] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908120734132317: attempting to move from video-backup-client-2 to video-backup-client-4 >> [2018-12-11 11:59:58.783952] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124091841: attempting to move from video-backup-client-2 to video-backup-client-4 >> [2018-12-11 11:59:58.843315] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124135453: attempting to move from video-backup-client-2 to video-backup-client-4 >> [2018-12-11 11:59:58.951637] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161122111252: attempting to move from video-backup-client-2 to video-backup-client-4 >> [2018-12-11 11:59:59.005324] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6/2017 took 0.67 secs >> [2018-12-11 11:59:59.005362] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/58906aaaaca0515f5994104d20170213154555: attempting to move from video-backup-client-2 to video-backup-client-4 >> >> etc... >> >> Can I stop/cancel it without data loss? How can I make gluster remove the correct brick? >> >> Thanks >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181218/0d0473aa/attachment.html>
Nithya Balachandran
2018-Dec-18 15:37 UTC
[Gluster-users] distribute remove-brick has started migrating the wrong brick (glusterfs 3.8.13)
On Tue, 18 Dec 2018 at 14:56, Stephen Remde <stephen.remde at gaist.co.uk> wrote:> Nithya, > > I've realised, I will not have enough space on the other bricks in my > cluster to migrate data off the server so I can remove the single brick - > is there a work around? > > As you can see below, the new brick was created with the wrong raid > configuration, so I want to remove it recreate the raid, and re add it. > > xxxxxx Filesystem Size Used Avail Use% Mounted on > dc4-01 /dev/md0 95T 87T 8.0T 92% /export/md0 > dc4-01 /dev/md1 95T 87T 8.4T 92% /export/md1 > dc4-01 /dev/md2 95T 86T 9.3T 91% /export/md2 > dc4-01 /dev/md3 95T 86T 8.9T 91% /export/md3 > dc4-02 /dev/md0 95T 89T 6.5T 94% /export/md0 > dc4-02 /dev/md1 95T 87T 8.4T 92% /export/md1 > dc4-02 /dev/md2 95T 87T 8.6T 91% /export/md2 > dc4-02 /dev/md3 95T 86T 8.8T 91% /export/md3 > dc4-03 /dev/md0 95T 74T 21T 78% /export/md0 > dc4-03 /dev/md1 102T 519G 102T 1% /export/md1 > >I believe this is the brick being removed - the one that has about 519G of data? If I have understood the scenario properly, there seems to be plenty of free space on the other bricks (most seem to have terabytes free) . Is there something I am missing ? Regards, Nithya> This is the backup storage, so if I HAVE to lose the 519GB and resync, > that's an acceptable worst-case. > > gluster> v info video-backup > > Volume Name: video-backup > Type: Distribute > Volume ID: 887bdc2a-ca5e-4ca2-b30d-86831839ed04 > Status: Started > Snapshot Count: 0 > Number of Bricks: 10 > Transport-type: tcp > Bricks: > Brick1: 10.0.0.41:/export/md0/brick > Brick2: 10.0.0.42:/export/md0/brick > Brick3: 10.0.0.43:/export/md0/brick > Brick4: 10.0.0.41:/export/md1/brick > Brick5: 10.0.0.42:/export/md1/brick > Brick6: 10.0.0.41:/export/md2/brick > Brick7: 10.0.0.42:/export/md2/brick > Brick8: 10.0.0.41:/export/md3/brick > Brick9: 10.0.0.42:/export/md3/brick > Brick10: 10.0.0.43:/export/md1/brick > Options Reconfigured: > cluster.rebal-throttle: aggressive > cluster.min-free-disk: 1% > transport.address-family: inet > performance.readdir-ahead: on > nfs.disable: on > > > Best, > > Steve > > > On Wed, 12 Dec 2018 at 03:07, Nithya Balachandran <nbalacha at redhat.com> > wrote: > >> >> This is the current behaviour of rebalance and nothing to be concerned >> about - it will migrate data on all bricks on the nodes which host the >> bricks being removed. The data on the removed bricks will be moved to other >> bricks, some of the data on the other bricks on the node will just be >> moved to other bricks based on the new directory layouts. >> I will fix this in the near future but you don't need to to stop the >> remove-brick operation. >> >> Regards, >> Nithya >> >> On Wed, 12 Dec 2018 at 06:36, Stephen Remde <stephen.remde at gaist.co.uk> >> wrote: >> >>> I requested a brick be removed from a distribute only volume and it seems to be migrating data from the wrong brick... unless I am reading this wrong which I doubt because the disk usage is definitely decreasing on the wrong brick. >>> >>> gluster> volume status >>> Status of volume: video-backup >>> Gluster process TCP Port RDMA Port Online Pid >>> ------------------------------------------------------------------------------ >>> Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306 >>> Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651 >>> Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826 >>> Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311 >>> Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656 >>> Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316 >>> Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662 >>> Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322 >>> Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667 >>> Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836 >>> >>> Task Status of Volume video-backup >>> ------------------------------------------------------------------------------ >>> Task : Rebalance >>> ID : 7895be7c-4ab9-440d-a301-c11dae0dd9e1 >>> Status : completed >>> >>> gluster> volume remove-brick video-backup 10.0.0.43:/export/md1/brick start >>> volume remove-brick start: success >>> ID: f666a196-03c2-4940-bd38-45d8383345a4 >>> >>> gluster> volume status >>> Status of volume: video-backup >>> Gluster process TCP Port RDMA Port Online Pid >>> ------------------------------------------------------------------------------ >>> Brick 10.0.0.41:/export/md0/brick 49172 0 Y 5306 >>> Brick 10.0.0.42:/export/md0/brick 49172 0 Y 3651 >>> Brick 10.0.0.43:/export/md0/brick 49155 0 Y 2826 >>> Brick 10.0.0.41:/export/md1/brick 49173 0 Y 5311 >>> Brick 10.0.0.42:/export/md1/brick 49173 0 Y 3656 >>> Brick 10.0.0.41:/export/md2/brick 49174 0 Y 5316 >>> Brick 10.0.0.42:/export/md2/brick 49174 0 Y 3662 >>> Brick 10.0.0.41:/export/md3/brick 49175 0 Y 5322 >>> Brick 10.0.0.42:/export/md3/brick 49175 0 Y 3667 >>> Brick 10.0.0.43:/export/md1/brick 49156 0 Y 4836 >>> >>> Task Status of Volume video-backup >>> ------------------------------------------------------------------------------ >>> Task : Remove brick >>> ID : f666a196-03c2-4940-bd38-45d8383345a4 >>> Removed bricks: >>> 10.0.0.43:/export/md1/brick >>> Status : in progress >>> >>> >>> But when I check the rebalance log on the host with the brick being removed, it is actually migrating data from the other brick on the same host 10.0.0.43:/export/md0/brick >>> >>> >>> ..... >>> [2018-12-11 11:59:52.572657] I [MSGID: 109086] [dht-shared.c:297:dht_parse_decommissioned_bricks] 0-video-backup-dht: *decommissioning subvolume video-backup-client-9* >>> .... >>> 29: volume video-backup-client-2 >>> 30: type protocol/client >>> 31: option clnt-lk-version 1 >>> 32: option volfile-checksum 0 >>> 33: option volfile-key rebalance/video-backup >>> 34: option client-version 3.8.15 >>> 35: option process-uuid node-dc4-03-25536-2018/12/11-11:59:47:551328-video-backup-client-2-0-0 >>> 36: option fops-version 1298437 >>> 37: option ping-timeout 42 >>> 38: option remote-host 10.0.0.43 >>> 39: option remote-subvolume /export/md0/brick >>> 40: option transport-type socket >>> 41: option transport.address-family inet >>> 42: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7 >>> 43: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2 >>> 44: end-volume >>> ... >>> 112: volume video-backup-client-9 >>> 113: type protocol/client >>> 114: option ping-timeout 42 >>> 115: option remote-host 10.0.0.43 >>> 116: option remote-subvolume /export/md1/brick >>> 117: option transport-type socket >>> 118: option transport.address-family inet >>> 119: option username 9e7fe743-ecd7-40aa-b3db-e112086b2fc7 >>> 120: option password dab178d6-ecb4-4293-8c1d-6281ec2cafc2 >>> 121: end-volume >>> ... >>> [2018-12-11 11:59:52.608698] I [dht-rebalance.c:3668:gf_defrag_start_crawl] 0-video-backup-dht: gf_defrag_start_crawl using commit hash 3766302106 >>> [2018-12-11 11:59:52.609478] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of / >>> [2018-12-11 11:59:52.615348] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-2 >>> [2018-12-11 11:59:52.615378] I [MSGID: 0] [dht-rebalance.c:3746:gf_defrag_start_crawl] 0-video-backup-dht: local subvols are video-backup-client-9 >>> ... >>> [2018-12-11 11:59:52.616554] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on / >>> [2018-12-11 11:59:54.000363] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /symlinks.txt: attempting to move from video-backup-client-2 to video-backup-client-4 >>> [2018-12-11 11:59:55.110549] I [MSGID: 109022] [dht-rebalance.c:1703:dht_migrate_file] 0-video-backup-dht: completed migration of /symlinks.txt from subvolume video-backup-client-2 to video-backup-client-4 >>> [2018-12-11 11:59:58.100931] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6 >>> [2018-12-11 11:59:58.107389] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6 >>> [2018-12-11 11:59:58.132138] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6 took 0.02 secs >>> [2018-12-11 11:59:58.330393] I [MSGID: 109081] [dht-common.c:4198:dht_setxattr] 0-video-backup-dht: fixing the layout of /A6/2017 >>> [2018-12-11 11:59:58.337601] I [dht-rebalance.c:2652:gf_defrag_process_dir] 0-video-backup-dht: migrate data called on /A6/2017 >>> [2018-12-11 11:59:58.493906] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908101048: attempting to move from video-backup-client-2 to video-backup-client-4 >>> [2018-12-11 11:59:58.706068] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/57c81ed09f31cd6c1c8990ae20160908120734132317: attempting to move from video-backup-client-2 to video-backup-client-4 >>> [2018-12-11 11:59:58.783952] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124091841: attempting to move from video-backup-client-2 to video-backup-client-4 >>> [2018-12-11 11:59:58.843315] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161124135453: attempting to move from video-backup-client-2 to video-backup-client-4 >>> [2018-12-11 11:59:58.951637] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/584a8bcdaca0515f595dff8820161122111252: attempting to move from video-backup-client-2 to video-backup-client-4 >>> [2018-12-11 11:59:59.005324] I [dht-rebalance.c:2866:gf_defrag_process_dir] 0-video-backup-dht: Migration operation on dir /A6/2017 took 0.67 secs >>> [2018-12-11 11:59:59.005362] I [dht-rebalance.c:1230:dht_migrate_file] 0-video-backup-dht: /A6/2017/58906aaaaca0515f5994104d20170213154555: attempting to move from video-backup-client-2 to video-backup-client-4 >>> >>> etc... >>> >>> Can I stop/cancel it without data loss? How can I make gluster remove the correct brick? >>> >>> Thanks >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181218/08b0dc7b/attachment.html>