Hi all, I would like to ask if, and with how much success, you are using GlusterFS for virtual machine storage. My plan: I want to setup a 2-node cluster, where VM runs on the nodes themselves and can be live-migrated on demand. I have some questions: - do you use GlusterFS for similar setup? - if so, how do you feel about it? - if a node crashes/reboots, how the system re-syncs? Will the VM files be fully resynchronized, or the live node keeps some sort of write bitmap to resynchronize changed/written chunks only? (note: I know about sharding, but I would like to avoid it); - finally, how much stable is the system? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8
lemonnierk at ulrar.net
2017-Aug-23 16:10 UTC
[Gluster-users] GlusterFS as virtual machine storage
On Mon, Aug 21, 2017 at 10:09:20PM +0200, Gionatan Danti wrote:> Hi all, > I would like to ask if, and with how much success, you are using > GlusterFS for virtual machine storage.Hi, we have similar clusters.> > My plan: I want to setup a 2-node cluster, where VM runs on the nodes > themselves and can be live-migrated on demand. >Use 3 nodes for gluster, or at the very least 2 + 1 arbiter. You really really don't want a 2 node setup, you'll get split brains.> I have some questions: > - do you use GlusterFS for similar setup?Yes, but always 3 nodes.> - if so, how do you feel about it?Works pretty well now, we've had problems but 3.7.15 works great. We are currently testing 3.8, seems to work fine too. I imagine the newer ones do aswell.> - if a node crashes/reboots, how the system re-syncs? Will the VM files > be fully resynchronized, or the live node keeps some sort of write > bitmap to resynchronize changed/written chunks only? (note: I know about > sharding, but I would like to avoid it);You really should use sharding, not sure exactly what happens without it but I know it made the VM unusable during heal (they froze), basically. Sharding solved that, with it it works well even during heal. It only retransfert each shard so the amount to re-sync is pretty low.> - finally, how much stable is the system? >Haven't had any problems with gluster itself since we updated to > 3.7.11. It just works, even on a pretty bad network. Performances aren't amazing, but I imagine you know that, replication has a cost.> Thanks. > > -- > Danti Gionatan > Supporto Tecnico > Assyoma S.r.l. - www.assyoma.it > email: g.danti at assyoma.it - info at assyoma.it > GPG public key ID: FF5F32A8 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Digital signature URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170823/4bb7cf8c/attachment.sig>
Hi, after many VM crashes during upgrades of Gluster, losing network connectivity on one node etc. I would advise running replica 2 with arbiter. I once even managed to break this setup (with arbiter) due to network partitioning - one data node never healed and I had to restore from backups (it was easier and kind of non-production). Be extremely careful and plan for failure. -ps On Mon, Aug 21, 2017 at 10:09 PM, Gionatan Danti <g.danti at assyoma.it> wrote:> Hi all, > I would like to ask if, and with how much success, you are using GlusterFS > for virtual machine storage. > > My plan: I want to setup a 2-node cluster, where VM runs on the nodes > themselves and can be live-migrated on demand. > > I have some questions: > - do you use GlusterFS for similar setup? > - if so, how do you feel about it? > - if a node crashes/reboots, how the system re-syncs? Will the VM files be > fully resynchronized, or the live node keeps some sort of write bitmap to > resynchronize changed/written chunks only? (note: I know about sharding, but > I would like to avoid it); > - finally, how much stable is the system? > > Thanks. > > -- > Danti Gionatan > Supporto Tecnico > Assyoma S.r.l. - www.assyoma.it > email: g.danti at assyoma.it - info at assyoma.it > GPG public key ID: FF5F32A8 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
Il 23-08-2017 18:14 Pavel Szalbot ha scritto:> Hi, after many VM crashes during upgrades of Gluster, losing network > connectivity on one node etc. I would advise running replica 2 with > arbiter.Hi Pavel, this is bad news :( So, in your case at least, Gluster was not stable? Something as simple as an update would let it crash?> I once even managed to break this setup (with arbiter) due to network > partitioning - one data node never healed and I had to restore from > backups (it was easier and kind of non-production). Be extremely > careful and plan for failure.I would use VM locking via sanlock or virtlock, so a split brain should not cause simultaneous changes on both replicas. I am more concerned about volume heal time: what will happen if the standby node crashes/reboots? Will *all* data be re-synced from the master, or only changed bit will be re-synced? As stated above, I would like to avoid using sharding... Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8
On 8/21/2017 1:09 PM, Gionatan Danti wrote:> > Hi all, > I would like to ask if, and with how much success, you are using > GlusterFS for virtual machine storage. > > My plan: I want to setup a 2-node cluster, where VM runs on the nodes > themselves and can be live-migrated on demand. > > I have some questions: > - do you use GlusterFS for similar setup?2 node plus Arbiter. You NEED the arbiter or a third node. Do NOT try 2 node with a VM. We also use sharding to speed up heals. 3 node would be even better but 2 node + Arbiter is faster. Note the arbiter doesn't have to be great kit. Its mostly memory and a small amount of Hard Disk space and even on older systems, you can throw in a cheap low capacity SSD drive. You probably have a bunch of those lying around. You could probably get away with using containers for the arbiter and 'share' an arbiter "host" among clusters. We haven't had a chance to try that yet, but with net=host and a unique IP per container, I don't see why it would be an issue.> - if so, how do you feel about it?Very happy. Reasonable and reliable performance (compared to other distributed storage). Gluster does not have the performance of a direct attached SSD drive but none of the distributed storage options can do that, unless they cheat with heavy buffering and async writes which is problematic on VM files, if something bad happens.> - if a node crashes/reboots, how the system re-syncs? Will the VM > files be fully resynchronized, or the live node keeps some sort of > write bitmap to resynchronize changed/written chunks only? (note: I > know about sharding, but I would like to avoid it);Without sharding any reheal after an outage (planned or otherwise) will take a LOT longer (because you have to sync the entire VM file which in our case is 20GB to 150GB per VM affected). That can take quite a while even with a fast network. With sharding in many cases the reheal after maintenance amounts to a 'pause' and is almost a non-event, because it only has to heal the few shards that are out of sync. The cool thing about gluster in old-school replication node, is the VM files are all there on each node. There is no master index that can get corrupted with your bits spread out among the various nodes. Of course with sharding, you would have to re-assemble the file, but that has been discussed on this list and we have tested that several times on even large VMs, by removing a brick and having a tech re-assemble and check the md5sum to make sure we have a working VM file.> > - finally, how much stable is the system?We were on 3.4 for years for some old clusters, never had a serious problem but we had to be really careful during upgrades/reboots because they were 2 node systems and if you didn't do things precisely you ended up with a split-brain. On the rare crash event, we would often pick a good image from one of the nodes and designate it as the source. We are on 3.10 now and the arbiter+sharding go a long way in solving that issue. -wk