Nithya Balachandran
2019-Jun-27 06:47 UTC
[Gluster-users] Removing subvolume from dist/rep volume
Hi, On Tue, 25 Jun 2019 at 15:26, Dave Sherohman <dave at sherohman.org> wrote:> I have a 9-brick, replica 2+A cluster and plan to (permanently) remove > one of the three subvolumes. I think I've worked out how to do it, but > want to verify first that I've got it right, since downtime or data loss > would be Bad Things. > > The current configuration has six data bricks across six hosts (B > through G), and all three arbiter bricks on the same host (A), such as > one might create with > > # gluster volume create myvol replica 3 arbiter 1 B:/data C:/data A:/arb1 > D:/data E:/data A:/arb2 F:/data G:/data A:/arb3 > > > My objective is to remove nodes B and C entirely. > > First up is to pull their bricks from the volume: > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > (wait for data to be migrated) > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > >There are some edge cases that may prevent a file from being migrated during a remove-brick. Please do the following after this: 1. Check the remove-brick status for any failures. If there are any, check the rebalance log file for errors. 2. Even if there are no failures, check the removed bricks to see if any files have not been migrated. If there are any, please check that they are valid files on the brick and copy them to the volume from the brick to the mount point. The rest of the steps look good. Regards, Nithya> And then remove the nodes with: > > # gluster peer detach B > # gluster peer detach C > > > Is this correct, or did I forget any steps and/or mangle the syntax on > any commands? > > Also, for the remove-brick command, is there any way to throttle the > amount of bandwidth which will be used for the data migration? > Unfortunately, I was not able to provision a dedicated VLAN for the > gluster servers to communicate among themselves, so I don't want it > hogging all available capacity if that can be avoided. > > > If it makes a difference, my gluster version is 3.12.15-1, running on > Debian and installed from the debs at > > deb > https://download.gluster.org/pub/gluster/glusterfs/3.12/LATEST/Debian/9/amd64/apt > stretch main > > -- > Dave Sherohman > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190627/de1ff438/attachment.html>
Nithya Balachandran
2019-Jun-27 06:49 UTC
[Gluster-users] Removing subvolume from dist/rep volume
On Thu, 27 Jun 2019 at 12:17, Nithya Balachandran <nbalacha at redhat.com> wrote:> Hi, > > > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman <dave at sherohman.org> wrote: > >> I have a 9-brick, replica 2+A cluster and plan to (permanently) remove >> one of the three subvolumes. I think I've worked out how to do it, but >> want to verify first that I've got it right, since downtime or data loss >> would be Bad Things. >> >> The current configuration has six data bricks across six hosts (B >> through G), and all three arbiter bricks on the same host (A), such as >> one might create with >> >> # gluster volume create myvol replica 3 arbiter 1 B:/data C:/data A:/arb1 >> D:/data E:/data A:/arb2 F:/data G:/data A:/arb3 >> >> >> My objective is to remove nodes B and C entirely. >> >> First up is to pull their bricks from the volume: >> >> # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start >> (wait for data to be migrated) >> # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit >> >> > There are some edge cases that may prevent a file from being migrated > during a remove-brick. Please do the following after this: > > 1. Check the remove-brick status for any failures. If there are any, > check the rebalance log file for errors. > 2. Even if there are no failures, check the removed bricks to see if > any files have not been migrated. If there are any, please check that they > are valid files on the brick and that they match on both bricks (files are > not in split brain) and copy them to the volume from the brick to the mount > point. > > You can run the following at the root of the brick to find any files thathave not been migrated: find . -not \( -path ./.glusterfs -prune \) -type f -not -perm 01000> The rest of the steps look good. > > Regards, > Nithya > >> And then remove the nodes with: >> >> # gluster peer detach B >> # gluster peer detach C >> >> >> Is this correct, or did I forget any steps and/or mangle the syntax on >> any commands? >> >> Also, for the remove-brick command, is there any way to throttle the >> amount of bandwidth which will be used for the data migration? >> Unfortunately, I was not able to provision a dedicated VLAN for the >> gluster servers to communicate among themselves, so I don't want it >> hogging all available capacity if that can be avoided. >> >> >> If it makes a difference, my gluster version is 3.12.15-1, running on >> Debian and installed from the debs at >> >> deb >> https://download.gluster.org/pub/gluster/glusterfs/3.12/LATEST/Debian/9/amd64/apt >> stretch main >> >> -- >> Dave Sherohman >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190627/94477e35/attachment.html>
Dave Sherohman
2019-Jun-28 09:03 UTC
[Gluster-users] Removing subvolume from dist/rep volume
On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote:> On Tue, 25 Jun 2019 at 15:26, Dave Sherohman <dave at sherohman.org> wrote: > > My objective is to remove nodes B and C entirely. > > > > First up is to pull their bricks from the volume: > > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > > (wait for data to be migrated) > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > > > > > There are some edge cases that may prevent a file from being migrated > during a remove-brick. Please do the following after this: > > 1. Check the remove-brick status for any failures. If there are any, > check the rebalance log file for errors. > 2. Even if there are no failures, check the removed bricks to see if any > files have not been migrated. If there are any, please check that they are > valid files on the brick and copy them to the volume from the brick to the > mount point. > > The rest of the steps look good.Apparently, they weren't quite right. I tried it and it just gives me the usage notes in return. Transcript of the commands and output is below. Any insight on how I got the syntax wrong? --- cut here --- root at merlin:/# gluster volume status Status of volume: palantir Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick saruman:/var/local/brick0/data 49153 0 Y 17995 Brick gandalf:/var/local/brick0/data 49153 0 Y 9415 Brick merlin:/var/local/arbiter1/data 49170 0 Y 35034 Brick azathoth:/var/local/brick0/data 49153 0 Y 25312 Brick yog-sothoth:/var/local/brick0/data 49152 0 Y 10671 Brick merlin:/var/local/arbiter2/data 49171 0 Y 35043 Brick cthulhu:/var/local/brick0/data 49153 0 Y 21925 Brick mordiggian:/var/local/brick0/data 49152 0 Y 12368 Brick merlin:/var/local/arbiter3/data 49172 0 Y 35050 Self-heal Daemon on localhost N/A N/A Y 1209 Self-heal Daemon on saruman.lub.lu.se N/A N/A Y 23253 Self-heal Daemon on gandalf.lub.lu.se N/A N/A Y 9542 Self-heal Daemon on mordiggian.lub.lu.se N/A N/A Y 11016 Self-heal Daemon on yog-sothoth.lub.lu.se N/A N/A Y 8126 Self-heal Daemon on cthulhu.lub.lu.se N/A N/A Y 30998 Self-heal Daemon on azathoth.lub.lu.se N/A N/A Y 34399 Task Status of Volume palantir ------------------------------------------------------------------------------ Task : Rebalance ID : e58bc091-5809-4364-af83-2b89bc5c7106 Status : completed root at merlin:/# gluster volume remove-brick palantir saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> ... <start|stop|status|commit|force> root at merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> ... <start|stop|status|commit|force> root at merlin:/# gluster volume remove-brick palantir replica 3 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> ... <start|stop|status|commit|force> --- cut here --- -- Dave Sherohman
Dave Sherohman
2019-Jun-28 14:24 UTC
[Gluster-users] Removing subvolume from dist/rep volume
On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote:> There are some edge cases that may prevent a file from being migrated > during a remove-brick. Please do the following after this: > > 1. Check the remove-brick status for any failures. If there are any, > check the rebalance log file for errors. > 2. Even if there are no failures, check the removed bricks to see if any > files have not been migrated. If there are any, please check that they are > valid files on the brick and copy them to the volume from the brick to the > mount point.Well, looks like I hit one of those edge cases. Probably because of some issues around a reboot last September which left a handful of files in a state where self-heal identified them as needing to be healed, but incapable of actually healing them. (Check the list archives for "Kicking a stuck heal", posted on Sept 4, if you want more details.) So I'm getting 9 failures on the arbiter (merlin), 8 on one data brick (gandalf), and 3 on the other (saruman). Looking in /var/log/gluster/palantir-rebalance.log, I see those numbers of migrate file failed: /.shard/291e9749-2d1b-47af-ad53-3a09ad4e64c6.229: failed to lock file on palantir-replicate-1 [Stale file handle] errors. Also, merlin has four errors, and gandalf has one, of the form: Gfid mismatch detected for <gfid:be318638-e8a0-4c6d-977d-7a937aa84806>/0f500288-ff62-4f0b-9574-53f510b4159f.2898>, 9f00c0fe-58c3-457e-a2e6-f6a006d1cfc6 on palantir-client-7 and 08bb7cdc-172b-4c21-916a-2a244c095a3e on palantir-client-1. There are no gfid mismatches recorded on saruman. All of the gfid mismatches are for <gfid:be318638-e8a0-4c6d-977d-7a937aa84806> and (on saruman) appear to correspond to 0-byte files (e.g., .shard/0f500288-ff62-4f0b-9574-53f510b4159f.2898, in the case of the gfid mismatch quoted above). For both types of errors, all affected files are in .shard/ and have UUID-style names, so I have no idea which actual files they belong to. File sizes are generally either 0 bytes or 4M (exactly), although one of them has a size slightly larger than 3M. So I'm assuming they're chunks of larger files (which would be almost all the files on the volume - it's primarily holding disk image files for kvm servers). Web searches generally seem to consider gfid mismatches to be a form of split-brain, but `gluster volume heal palantir info split-brain` shows "Number of entries in split-brain: 0" for all bricks, including those bricks which are reporting gfid mismatches. Given all that, how do I proceed with cleaning up the stale handle issues? I would guess that this will involve somehow converting the shard filename to a "real" filename, then shutting down the corresponding VM and maybe doing some additional cleanup. And then there's the gfid mismatches. Since they're for 0-byte files, is it safe to just ignore them on the assumption that they only hold metadata? Or do I need to do some kind of split-brain resolution on them (even though gluster says no files are in split-brain)? Finally, a listing of /var/local/brick0/data/.shard on saruman, in case any of the information it contains (like file sizes/permissions) might provide clues to resolving the errors: --- cut here --- root at saruman:/var/local/brick0/data/.shard# ls -l total 63996 -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2864 -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2868 -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2879 -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2898 -rw------- 2 root libvirt-qemu 4194304 May 17 14:42 291e9749-2d1b-47af-ad53-3a09ad4e64c6.229 -rw------- 2 root libvirt-qemu 4194304 Jun 24 09:10 291e9749-2d1b-47af-ad53-3a09ad4e64c6.925 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 12:54 2df12cb0-6cf4-44ae-8b0a-4a554791187e.266 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 16:30 2df12cb0-6cf4-44ae-8b0a-4a554791187e.820 -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 17 20:22 323186b1-6296-4cbe-8275-b940cc9d65cf.27466 -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 27 05:01 323186b1-6296-4cbe-8275-b940cc9d65cf.32575 -rw-r--r-- 2 root libvirt-qemu 3145728 Jun 11 13:23 323186b1-6296-4cbe-8275-b940cc9d65cf.3448 ---------T 2 root libvirt-qemu 0 Jun 28 14:26 4cd094f4-0344-4660-98b0-83249d5bd659.22998 -rw------- 2 root libvirt-qemu 4194304 Mar 13 2018 6cdd2e5c-f49e-492b-8039-239e71577836.1302 ---------T 2 root libvirt-qemu 0 Jun 28 13:22 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.47131 ---------T 2 root libvirt-qemu 0 Jun 28 13:22 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.52615 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 08:56 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.100 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 11:29 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.106 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 28 02:35 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.137 -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 4 2018 9544617c-901c-4613-a94b-ccfad4e38af1.165 -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 4 2018 9544617c-901c-4613-a94b-ccfad4e38af1.168 -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 5 2018 9544617c-901c-4613-a94b-ccfad4e38af1.193 -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 6 2018 9544617c-901c-4613-a94b-ccfad4e38af1.3800 ---------T 2 root libvirt-qemu 0 Jun 28 15:02 b48a5934-5e5b-4918-8193-6ff36f685f70.46559 -rw-rw---- 2 root libvirt-qemu 0 Oct 12 2018 c5bde2f2-3361-4d1a-9c88-28751ef74ce6.3568 -rw-r--r-- 2 root libvirt-qemu 4194304 Apr 13 2018 c953c676-152d-4826-80ff-bd307fa7f6e5.10724 -rw-r--r-- 2 root libvirt-qemu 4194304 Apr 11 2018 c953c676-152d-4826-80ff-bd307fa7f6e5.3101 --- cut here --- -- Dave Sherohman
Nithya Balachandran
2019-Jul-02 06:55 UTC
[Gluster-users] Removing subvolume from dist/rep volume
Hi Dave, Yes, files in split brain are not migrated as we cannot figure out which is the good copy. Adding Ravi to look at this and see what can be done. Also adding Krutika as this is a sharded volume. The files with the "---------T" permissions are internal files and can be ignored. Ravi and Krutika, please take a look at the other files. Regards, Nithya On Fri, 28 Jun 2019 at 19:56, Dave Sherohman <dave at sherohman.org> wrote:> On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > > There are some edge cases that may prevent a file from being migrated > > during a remove-brick. Please do the following after this: > > > > 1. Check the remove-brick status for any failures. If there are any, > > check the rebalance log file for errors. > > 2. Even if there are no failures, check the removed bricks to see if > any > > files have not been migrated. If there are any, please check that > they are > > valid files on the brick and copy them to the volume from the brick > to the > > mount point. > > Well, looks like I hit one of those edge cases. Probably because of > some issues around a reboot last September which left a handful of files > in a state where self-heal identified them as needing to be healed, but > incapable of actually healing them. (Check the list archives for > "Kicking a stuck heal", posted on Sept 4, if you want more details.) > > So I'm getting 9 failures on the arbiter (merlin), 8 on one data brick > (gandalf), and 3 on the other (saruman). Looking in > /var/log/gluster/palantir-rebalance.log, I see those numbers of > > migrate file failed: /.shard/291e9749-2d1b-47af-ad53-3a09ad4e64c6.229: > failed to lock file on palantir-replicate-1 [Stale file handle] > > errors. > > Also, merlin has four errors, and gandalf has one, of the form: > > Gfid mismatch detected for > <gfid:be318638-e8a0-4c6d-977d-7a937aa84806>/0f500288-ff62-4f0b-9574-53f510b4159f.2898>, > 9f00c0fe-58c3-457e-a2e6-f6a006d1cfc6 on palantir-client-7 and > 08bb7cdc-172b-4c21-916a-2a244c095a3e on palantir-client-1. > > There are no gfid mismatches recorded on saruman. All of the gfid > mismatches are for <gfid:be318638-e8a0-4c6d-977d-7a937aa84806> and (on > saruman) appear to correspond to 0-byte files (e.g., > .shard/0f500288-ff62-4f0b-9574-53f510b4159f.2898, in the case of the > gfid mismatch quoted above). > > For both types of errors, all affected files are in .shard/ and have > UUID-style names, so I have no idea which actual files they belong to. > File sizes are generally either 0 bytes or 4M (exactly), although one of > them has a size slightly larger than 3M. So I'm assuming they're chunks > of larger files (which would be almost all the files on the volume - > it's primarily holding disk image files for kvm servers). > > Web searches generally seem to consider gfid mismatches to be a form of > split-brain, but `gluster volume heal palantir info split-brain` shows > "Number of entries in split-brain: 0" for all bricks, including those > bricks which are reporting gfid mismatches. > > > Given all that, how do I proceed with cleaning up the stale handle > issues? I would guess that this will involve somehow converting the > shard filename to a "real" filename, then shutting down the > corresponding VM and maybe doing some additional cleanup. > > And then there's the gfid mismatches. Since they're for 0-byte files, > is it safe to just ignore them on the assumption that they only hold > metadata? Or do I need to do some kind of split-brain resolution on > them (even though gluster says no files are in split-brain)? > > > Finally, a listing of /var/local/brick0/data/.shard on saruman, in case > any of the information it contains (like file sizes/permissions) might > provide clues to resolving the errors: > > --- cut here --- > root at saruman:/var/local/brick0/data/.shard# ls -l > total 63996 > -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 > 0f500288-ff62-4f0b-9574-53f510b4159f.2864 > -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 > 0f500288-ff62-4f0b-9574-53f510b4159f.2868 > -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 > 0f500288-ff62-4f0b-9574-53f510b4159f.2879 > -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 > 0f500288-ff62-4f0b-9574-53f510b4159f.2898 > -rw------- 2 root libvirt-qemu 4194304 May 17 14:42 > 291e9749-2d1b-47af-ad53-3a09ad4e64c6.229 > -rw------- 2 root libvirt-qemu 4194304 Jun 24 09:10 > 291e9749-2d1b-47af-ad53-3a09ad4e64c6.925 > -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 12:54 > 2df12cb0-6cf4-44ae-8b0a-4a554791187e.266 > -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 16:30 > 2df12cb0-6cf4-44ae-8b0a-4a554791187e.820 > -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 17 20:22 > 323186b1-6296-4cbe-8275-b940cc9d65cf.27466 > -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 27 05:01 > 323186b1-6296-4cbe-8275-b940cc9d65cf.32575 > -rw-r--r-- 2 root libvirt-qemu 3145728 Jun 11 13:23 > 323186b1-6296-4cbe-8275-b940cc9d65cf.3448 > ---------T 2 root libvirt-qemu 0 Jun 28 14:26 > 4cd094f4-0344-4660-98b0-83249d5bd659.22998 > -rw------- 2 root libvirt-qemu 4194304 Mar 13 2018 > 6cdd2e5c-f49e-492b-8039-239e71577836.1302 > ---------T 2 root libvirt-qemu 0 Jun 28 13:22 > 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.47131 > ---------T 2 root libvirt-qemu 0 Jun 28 13:22 > 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.52615 > -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 08:56 > 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.100 > -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 11:29 > 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.106 > -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 28 02:35 > 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.137 > -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 4 2018 > 9544617c-901c-4613-a94b-ccfad4e38af1.165 > -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 4 2018 > 9544617c-901c-4613-a94b-ccfad4e38af1.168 > -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 5 2018 > 9544617c-901c-4613-a94b-ccfad4e38af1.193 > -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 6 2018 > 9544617c-901c-4613-a94b-ccfad4e38af1.3800 > ---------T 2 root libvirt-qemu 0 Jun 28 15:02 > b48a5934-5e5b-4918-8193-6ff36f685f70.46559 > -rw-rw---- 2 root libvirt-qemu 0 Oct 12 2018 > c5bde2f2-3361-4d1a-9c88-28751ef74ce6.3568 > -rw-r--r-- 2 root libvirt-qemu 4194304 Apr 13 2018 > c953c676-152d-4826-80ff-bd307fa7f6e5.10724 > -rw-r--r-- 2 root libvirt-qemu 4194304 Apr 11 2018 > c953c676-152d-4826-80ff-bd307fa7f6e5.3101 > --- cut here --- > > -- > Dave Sherohman > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190702/d122ffc3/attachment.html>