Xavier Hernandez
2014-Nov-25 15:19 UTC
[Gluster-users] Need some clarifications about the disperse feature
Hi Ayelet, On 11/25/2014 02:41 PM, Ayelet Shemesh wrote:> Hello Gluster experts, > > I have been using gluster for a small cluster for a few years now and I > have a question regarding the new disperse feature, which is for me a > much anticipated addition. > > *Suppose* I create a volume with a disperse set of 3, redundancy 1 > (let's call them A1, A2, A3) and then I add 3 more bricks to that volume > (we'll call them B1, B2, B3). > > *First question* - which of the bricks will be the one carrying the > redundancy data?In current implementation, there's no difference between data and redundancy. All bricks behave exactly equal and there isn't anyone more important than another. In a configuration with 3 bricks and redundancy 1, you can lose any brick and everything will continue working normally.> > *Second question* - If I have machines with faster disk - should I > assign them to the data or the redundancy bricks? What should I expect > the load to be on the redundancy machine in heavy read scenarios and in > heavy write scenarios?As I said, there isn't a dedicated redundancy brick, so there's no benefit in assigning the fast disk to a specific brick. Read requests only need to be processed on N - R bricks (N = total number of bricks, R = redundancy). This means that in your configuration, each read will be sent to 2 bricks. If all bricks are alive and healthy, the disperse translator balances these reads among all nodes, giving 2/3 of the load to each brick. Write requests are processed by all bricks, so the load is the same on all of them.> > *Third question* - _does this require reading the entire data_ of A1, A2 > and A3 by initiating a heal or another operation? >Healing operations are on file basis. If only some files of A3 have been damaged, it will only read the corresponding data from A1 and A2, but not the entire contents of A1 and A2. To heal a file, all file contents are read.> *4th question* (and most important for me) - I saw in the list that it > is now a Distributed-Dispersed volume. I understand I can now lose, for > example bricks A1 and B1 and still have my entire data intact.Correct> Is this also correct for bricks from the same set, for example A1 and A2?No, each disperse set is independent and have the same redundancy. It's equivalent to a distributed replicated: if you lose both bricks of the same replica set, you will lose access to the data stored in that replica set.> Or to put it in a more generic way - _does this create the exact same > dispersed volume as if I created it originally with A1, A2 A3 B1 B2 B3 > and a redundancy of 2?No. These are two different configurations. Both have the same effective capacity, but the probability of failure in the second case is several times lower than the first one (you can lose *any* two bricks without losing access to the data). However it's more expensive to grow the volume because you will need to add 6 new bricks at the same time, while with the first case you only need to add 3. Xavi
Ayelet Shemesh
2014-Nov-26 08:35 UTC
[Gluster-users] Need some clarifications about the disperse feature
Thank you Xavi, it's very helpful (also to Atin). Have you had any benchmarks of how much penalty in performance I should expect for an intense reading using this feature? Naturaly I will test in my specific environment, just want to know if there are any benchmarks I can see for now. Ayelet On Tue, Nov 25, 2014 at 5:19 PM, Xavier Hernandez <xhernandez at datalab.es> wrote:> Hi Ayelet, > > On 11/25/2014 02:41 PM, Ayelet Shemesh wrote: > >> Hello Gluster experts, >> >> I have been using gluster for a small cluster for a few years now and I >> have a question regarding the new disperse feature, which is for me a >> much anticipated addition. >> >> *Suppose* I create a volume with a disperse set of 3, redundancy 1 >> (let's call them A1, A2, A3) and then I add 3 more bricks to that volume >> (we'll call them B1, B2, B3). >> >> *First question* - which of the bricks will be the one carrying the >> redundancy data? >> > > In current implementation, there's no difference between data and > redundancy. All bricks behave exactly equal and there isn't anyone more > important than another. In a configuration with 3 bricks and redundancy 1, > you can lose any brick and everything will continue working normally. > > >> *Second question* - If I have machines with faster disk - should I >> assign them to the data or the redundancy bricks? What should I expect >> the load to be on the redundancy machine in heavy read scenarios and in >> heavy write scenarios? >> > > As I said, there isn't a dedicated redundancy brick, so there's no benefit > in assigning the fast disk to a specific brick. > > Read requests only need to be processed on N - R bricks (N = total number > of bricks, R = redundancy). This means that in your configuration, each > read will be sent to 2 bricks. If all bricks are alive and healthy, the > disperse translator balances these reads among all nodes, giving 2/3 of the > load to each brick. > > Write requests are processed by all bricks, so the load is the same on all > of them. > > >> *Third question* - _does this require reading the entire data_ of A1, A2 >> and A3 by initiating a heal or another operation? >> >> > Healing operations are on file basis. If only some files of A3 have been > damaged, it will only read the corresponding data from A1 and A2, but not > the entire contents of A1 and A2. To heal a file, all file contents are > read. > > *4th question* (and most important for me) - I saw in the list that it >> is now a Distributed-Dispersed volume. I understand I can now lose, for >> example bricks A1 and B1 and still have my entire data intact. >> > > Correct > > Is this also correct for bricks from the same set, for example A1 and A2? >> > > No, each disperse set is independent and have the same redundancy. It's > equivalent to a distributed replicated: if you lose both bricks of the same > replica set, you will lose access to the data stored in that replica set. > > Or to put it in a more generic way - _does this create the exact same >> dispersed volume as if I created it originally with A1, A2 A3 B1 B2 B3 >> and a redundancy of 2? >> > > No. These are two different configurations. Both have the same effective > capacity, but the probability of failure in the second case is several > times lower than the first one (you can lose *any* two bricks without > losing access to the data). However it's more expensive to grow the volume > because you will need to add 6 new bricks at the same time, while with the > first case you only need to add 3. > > Xavi >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141126/f08fd8eb/attachment.html>