Hi, I have an environment consisting of 4 nodes ( with large disks). I have to create a volume to contain image of virtual machines. In documentation i read: *Hosting virtual machine images requires the consistency of three-way replication, which is provided by three-way replicated volumes, three-way distributed replicated volumes, arbitrated replicated volumes, and distributed arbitrated replicated volumes. * So I'm going to confusion to configure this volume. I have 4 nodes and I don't want to lose space by dedicating one to the function of arbiter. Would it be reasonable to configure the volume as in these two examples? # gluster volume create test1 replica 3 \ server1:/bricks/brick1 server2:/bricks/brick1 server3:/bricks/brick1 \ server2:/bricks/brick2 server3:/bricks/brick2 server4:/bricks/brick2 \ server3:/bricks/brick3 server4:/bricks/brick3 server1:/bricks/brick3 \ server4:/bricks/brick4 server1:/bricks/brick4 server2:/bricks/brick4 # gluster volume create test1 replica 3 arbiter 1 \ server1:/bricks/brick1 server2:/bricks/brick1 server3:/bricks/arbiter_brick1 \ server2:/bricks/brick2 server3:/bricks/brick2 server4:/bricks/arbiter_brick2 \ server3:/bricks/brick3 server4:/bricks/brick3 server1:/bricks/arbiter_brick3 \ server4:/bricks/brick4 server1:/bricks/brick4 server2:/bricks/arbiter_brick4 Thanks, -- *Cristian Del Carlo* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190906/e4cfeebc/attachment.html>
Hi Cristian, Both approaches are correct but they have different usable capacity and tolerance to node failures. First one is a full replica 3 meaning you get your total node capacity divided by 3, because replica 3, and it tolerates a simultaneous of two nodes and very good for split-brain avoidance. Second one is a replica 2 with arbiter, also very good for split-brain avoidance (that's purpose of arbiter bricks). In this case you get your total capacity divided by two except a little space going to arbiter-bricks, may be less than 1% of normal storage bricks. It tolerates one node failure at the same time. For VM usage remember to enable sharding with a shard size of 256MB at least before use volume. If efficiency between total and usable capacity is a concern for you and you think you could tolerate only one node failure at the same time, may I suggest you to use a distributed dispersed 3 redundancy 1 volume?. You will get your total capacity divided by 3 times 2 (that's 2/3 of total capacity) and this config still tolerates one node failure at the same time. Hope this helps. *Ramon Selga* 934 76 69 10 670 25 37 05 DataLab SL <http://www.datalab.es/> Aviso Legal <http://www.datalab.es/cont_cas/legal.html> El 06/09/19 a les 17:11, Cristian Del Carlo ha escrit:> Hi, > > I have an environment consisting of 4 nodes ( with large disks). > I have to create a volume to contain image of virtual machines. > > In documentation i read: > /Hosting virtual machine images requires the consistency of three-way > replication, > which is provided by three-way replicated volumes, three-way distributed > replicated volumes, > arbitrated replicated volumes, and distributed arbitrated replicated volumes. / > > So I'm going to confusion to configure this volume. > I have 4 nodes and I don't want to lose space by dedicating one to the > function of arbiter. > > Would it be reasonable to configure the volume as in these two examples? > > # gluster volume create test1 replica 3 \ > server1:/bricks/brick1 server2:/bricks/brick1 server3:/bricks/brick1 \ > server2:/bricks/brick2 server3:/bricks/brick2 server4:/bricks/brick2 \ > server3:/bricks/brick3 server4:/bricks/brick3 server1:/bricks/brick3 \ > server4:/bricks/brick4 server1:/bricks/brick4 server2:/bricks/brick4 > > # gluster volume create test1 replica 3 arbiter 1 \ > server1:/bricks/brick1 server2:/bricks/brick1 server3:/bricks/arbiter_brick1 \ > server2:/bricks/brick2 server3:/bricks/brick2 server4:/bricks/arbiter_brick2 \ > server3:/bricks/brick3 server4:/bricks/brick3 server1:/bricks/arbiter_brick3 \ > server4:/bricks/brick4 server1:/bricks/brick4 server2:/bricks/arbiter_brick4 > > Thanks, > > -- > > */ > /* > */Cristian Del Carlo/* > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190906/c27bfe81/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Imatge1 Type: image/png Size: 164 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190906/c27bfe81/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: DataLab sl Type: image/png Size: 483 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190906/c27bfe81/attachment-0001.png>
On Fri, Sep 06, 2019 at 05:11:57PM +0200, Cristian Del Carlo wrote:> I have 4 nodes and I don't want to lose space by dedicating one to the > function of arbiter.You're chasing a false efficiency here. To avoid split-brain, you *absolutely must* have more than two bricks aware of each file. If you just have two replicas and no arbiter, and one node is no longer able to see the other one, then how can it know whether it has the "working" brick or the "malfunctioning" brick? In order to resolve this, you need a third reference point so that the node can either say "there are two of us, we're working" or "I've lost both of the others; I must be the one that failed". That third reference point can be either another replica or an arbiter. Another replica contains a full copy of all your data, so it will be very large. For a 10T subvolume, each replica needs a full 10T of storage dedicated to it. An arbiter contains only the file metatdata (directory entries, etc.) for each file, which is much, much smaller, especially for something like VM images, where you have a small number of very large files, because the metadata size is solely a function of how many files you have, and not the size of the files. The gluster volume where I store my VM images has a total capacity of 23T (6T used); its arbiter bricks have a total of 1.8G - approximately 0.3% of the space that would be needed for an additional replica of the data. As you can see, using an arbiter gives you (nearly) as much data security as an additional replica, while consuming a tiny, tiny fraction of the space that would be "lost" to an additional full replica. If you're trying to maximize usable capacity in your volume then an arbiter configuration is absolutely the way to go. -- Dave Sherohman