Strahil Nikolov
2020-Feb-19 07:56 UTC
[Gluster-users] Advice for running out of space on a replicated 4-brick gluster
On February 19, 2020 1:59:12 AM GMT+02:00, Artem Russakovskii <archon810 at gmail.com> wrote:>Hi Strahil, > >We have 4 main servers, and I wanted to run gluster on all of them, >with >everything working even if 3/4 are down, so I set up a replica 4 with >quorum at 1. It's been working well for several years now, and I can >lose 3 >out of 4 servers to outages and remain up. > >Amar, so to clarify, right now I set up the volume using "gluster v >create >$GLUSTER_VOL replica 4 server1:brick1 server2:brick2 server3:brick3 >server4:brick4". >In order to turn it into a replica 4 but distributed across 4 old >bricks >and 4 new bricks (say server1:brick5 server2:brick6 server3:brick7 >server4:brick8), what exact commands do I need to issue? > >The docs are a bit confusing for this case IMO: > >> volume add-brick <VOLNAME> [<stripe|replica> <COUNT> [arbiter ><COUNT>]] >> <NEW-BRICK> ... [force] - add brick to volume <VOLNAME> > > >Do I need to specify a stripe? Do I need to repeat the replica param >and >keep it at 4? I.e.: > >> gluster v add-brick $GLUSTER_VOL replicate 4 server1:brick5 >> gluster v add-brick $GLUSTER_VOL replicate 4 server2:brick6 >> gluster v add-brick $GLUSTER_VOL replicate 4 server3:brick7 >> gluster v add-brick $GLUSTER_VOL replicate 4 server4:brick8 >> gluster v rebalance $GLUSTER_VOL fix-layout start > > >My reservations about going with this new approach also include the >fact >that right now I can back up and restore just the brick data itself as >each >brick contains the full copy of the data, and it's a loooot faster to >access the brick data during backups (probably an order of magnitude >due to >unresolved list issues). If I go distributed replicated, my current >backup >strategy will need to shift to backing up the gluster volume itself >(not >sure what kind of additional load that would put on the servers), or >maybe >backing up one brick from each replica would work too, though it's >unclear >if I'd be able to restore by just copying the data from such backups >back >into one restore location to recreate the full set of data (would that >work?). > >Thanks again for your answers. > >Sincerely, >Artem > >-- >Founder, Android Police <http://www.androidpolice.com>, APK Mirror ><http://www.apkmirror.com/>, Illogical Robot LLC >beerpla.net | @ArtemR ><http://twitter.com/ArtemR> > > >On Mon, Feb 17, 2020 at 3:29 PM Strahil Nikolov <hunter86_bg at yahoo.com> >wrote: > >> On February 18, 2020 1:16:19 AM GMT+02:00, Artem Russakovskii < >> archon810 at gmail.com> wrote: >> >Hi all, >> > >> >We currently have an 8TB 4-brick replicated volume on our 4 servers, >> >and >> >are at 80% capacity. The max disk size on our host is 10TB. I'm >> >starting to >> >think about what happens closer to 100% and see 2 options. >> > >> >Either we go with another new 4-brick replicated volume and start >> >dealing >> >with symlinks in our webapp to make sure it knows which volumes the >> >data is >> >on, which is a bit of a pain (but not too much) on the sysops side >of >> >things. Right now the whole volume mount is symlinked to a single >> >location >> >in the webapps (an uploads/ directory) and life is good. After such >a >> >split, I'd have to split uploads into yeardir symlinks, make sure >> >future >> >yeardir symlinks are created ahead of time and point to the right >> >volume, >> >etc). >> > >> >The other direction would be converting the replicated volume to a >> >distributed replicated one >> > >> >https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-replicated-volumes >> , >> >but I'm a bit scared to do it with production data (even after >testing, >> >of >> >course), and having never dealt with a distributed replicated >volume. >> > >> >1. Is it possible to convert our existing volume on the fly by >adding 4 >> > bricks but keeping the replica count at 4? >> >2. What happens if bricks 5-8 which contain the replicated volume #2 >go >> >down for whatever reason or can't meet their quorum, but the >replicated >> > volume #1 is still up? Does the whole main combined volume become >> >unavailable or only a portion of it which has data residing on >> >replicated >> > volume #2? >> > 3. Any other gotchas? >> > >> >Thank you very much in advance. >> > >> >Sincerely, >> >Artem >> > >> >-- >> >Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> ><http://www.apkmirror.com/>, Illogical Robot LLC >> >beerpla.net | @ArtemR >> ><http://twitter.com/ArtemR> >> >> Distributed replicated sounds more reasonable. >> >> Out of curiocity, why did you decide to have an even number of bricks >in >> the replica - it can still suffer from split-brain? >> >> 1. It should be OK, but I have never done it. Test on some VMs >before >> proceeding. >> Rebalance might take some time, so keep that in mind. >> >> 2.All files on replica 5-8 will be unavailable untill yoiu recover >that >> set of bricks. >> >> Best Regards, >> Strahil Nikolov >> >>Hi Artem, That's interesting... In order to extend the volume you will need to: gluster peer probe node5 gluster peer probe node6 gluster peer probe node7 gluster peer probe node8 gluster volume add-brick replica 4 node{5..8}:/brick/path Note: Asuming that the brick paths are the same. For the backup, you can still backup via the gluster bricks, but you need to pick 1 node per replica set - as your data will be like this: Node1..4-> A Node5..8-> B So, you need to backup from 2 hosts instead of one. Best Regards, Strahil Nikolov
Artem Russakovskii
2020-Feb-21 21:27 UTC
[Gluster-users] Advice for running out of space on a replicated 4-brick gluster
Hi Strahil, Actually, I'd be attaching the additional block storage to the same hosts, just under different mounts, that's why in my replies, I referenced servers1 through 4, and there would be no servers 5-8. server1:brick1 server2:brick2 server3:brick3 server4:brick4 server1:brick5 server2:brick6 server3:brick7 server4:brick8 So no peer probe needed, and the backup can still go to the same machine, just with the new brick selected. Good to know. I'll test all of this, but it looks trivial in retrospect. Hope it doesn't bite us in the butt! Thanks, all. Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | @ArtemR <http://twitter.com/ArtemR> On Tue, Feb 18, 2020 at 11:56 PM Strahil Nikolov <hunter86_bg at yahoo.com> wrote:> On February 19, 2020 1:59:12 AM GMT+02:00, Artem Russakovskii < > archon810 at gmail.com> wrote: > >Hi Strahil, > > > >We have 4 main servers, and I wanted to run gluster on all of them, > >with > >everything working even if 3/4 are down, so I set up a replica 4 with > >quorum at 1. It's been working well for several years now, and I can > >lose 3 > >out of 4 servers to outages and remain up. > > > >Amar, so to clarify, right now I set up the volume using "gluster v > >create > >$GLUSTER_VOL replica 4 server1:brick1 server2:brick2 server3:brick3 > >server4:brick4". > >In order to turn it into a replica 4 but distributed across 4 old > >bricks > >and 4 new bricks (say server1:brick5 server2:brick6 server3:brick7 > >server4:brick8), what exact commands do I need to issue? > > > >The docs are a bit confusing for this case IMO: > > > >> volume add-brick <VOLNAME> [<stripe|replica> <COUNT> [arbiter > ><COUNT>]] > >> <NEW-BRICK> ... [force] - add brick to volume <VOLNAME> > > > > > >Do I need to specify a stripe? Do I need to repeat the replica param > >and > >keep it at 4? I.e.: > > > >> gluster v add-brick $GLUSTER_VOL replicate 4 server1:brick5 > >> gluster v add-brick $GLUSTER_VOL replicate 4 server2:brick6 > >> gluster v add-brick $GLUSTER_VOL replicate 4 server3:brick7 > >> gluster v add-brick $GLUSTER_VOL replicate 4 server4:brick8 > >> gluster v rebalance $GLUSTER_VOL fix-layout start > > > > > >My reservations about going with this new approach also include the > >fact > >that right now I can back up and restore just the brick data itself as > >each > >brick contains the full copy of the data, and it's a loooot faster to > >access the brick data during backups (probably an order of magnitude > >due to > >unresolved list issues). If I go distributed replicated, my current > >backup > >strategy will need to shift to backing up the gluster volume itself > >(not > >sure what kind of additional load that would put on the servers), or > >maybe > >backing up one brick from each replica would work too, though it's > >unclear > >if I'd be able to restore by just copying the data from such backups > >back > >into one restore location to recreate the full set of data (would that > >work?). > > > >Thanks again for your answers. > > > >Sincerely, > >Artem > > > >-- > >Founder, Android Police <http://www.androidpolice.com>, APK Mirror > ><http://www.apkmirror.com/>, Illogical Robot LLC > >beerpla.net | @ArtemR > ><http://twitter.com/ArtemR> > > > > > >On Mon, Feb 17, 2020 at 3:29 PM Strahil Nikolov <hunter86_bg at yahoo.com> > >wrote: > > > >> On February 18, 2020 1:16:19 AM GMT+02:00, Artem Russakovskii < > >> archon810 at gmail.com> wrote: > >> >Hi all, > >> > > >> >We currently have an 8TB 4-brick replicated volume on our 4 servers, > >> >and > >> >are at 80% capacity. The max disk size on our host is 10TB. I'm > >> >starting to > >> >think about what happens closer to 100% and see 2 options. > >> > > >> >Either we go with another new 4-brick replicated volume and start > >> >dealing > >> >with symlinks in our webapp to make sure it knows which volumes the > >> >data is > >> >on, which is a bit of a pain (but not too much) on the sysops side > >of > >> >things. Right now the whole volume mount is symlinked to a single > >> >location > >> >in the webapps (an uploads/ directory) and life is good. After such > >a > >> >split, I'd have to split uploads into yeardir symlinks, make sure > >> >future > >> >yeardir symlinks are created ahead of time and point to the right > >> >volume, > >> >etc). > >> > > >> >The other direction would be converting the replicated volume to a > >> >distributed replicated one > >> > > >> > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-replicated-volumes > >> , > >> >but I'm a bit scared to do it with production data (even after > >testing, > >> >of > >> >course), and having never dealt with a distributed replicated > >volume. > >> > > >> >1. Is it possible to convert our existing volume on the fly by > >adding 4 > >> > bricks but keeping the replica count at 4? > >> >2. What happens if bricks 5-8 which contain the replicated volume #2 > >go > >> >down for whatever reason or can't meet their quorum, but the > >replicated > >> > volume #1 is still up? Does the whole main combined volume become > >> >unavailable or only a portion of it which has data residing on > >> >replicated > >> > volume #2? > >> > 3. Any other gotchas? > >> > > >> >Thank you very much in advance. > >> > > >> >Sincerely, > >> >Artem > >> > > >> >-- > >> >Founder, Android Police <http://www.androidpolice.com>, APK Mirror > >> ><http://www.apkmirror.com/>, Illogical Robot LLC > >> >beerpla.net | @ArtemR > >> ><http://twitter.com/ArtemR> > >> > >> Distributed replicated sounds more reasonable. > >> > >> Out of curiocity, why did you decide to have an even number of bricks > >in > >> the replica - it can still suffer from split-brain? > >> > >> 1. It should be OK, but I have never done it. Test on some VMs > >before > >> proceeding. > >> Rebalance might take some time, so keep that in mind. > >> > >> 2.All files on replica 5-8 will be unavailable untill yoiu recover > >that > >> set of bricks. > >> > >> Best Regards, > >> Strahil Nikolov > >> > >> > > Hi Artem, > > That's interesting... > In order to extend the volume you will need to: > gluster peer probe node5 > gluster peer probe node6 > gluster peer probe node7 > gluster peer probe node8 > > gluster volume add-brick replica 4 node{5..8}:/brick/path > > Note: Asuming that the brick paths are the same. > > For the backup, you can still backup via the gluster bricks, but you need > to pick 1 node per replica set - as your data will be like this: > Node1..4-> A > Node5..8-> B > > So, you need to backup from 2 hosts instead of one. > > Best Regards, > Strahil Nikolov >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200221/fc02f8c4/attachment.html>