I am prototyping GlusterFS with ~50-60TB of raw disk space across non-raided disks in ~30 compute nodes. I initially separated the nodes into groups of two, and did a replicate across each set of single drives in a pair of servers. Next I did a stripe across the 33 resulting AFR groups, with a block size of 1MB and later with the default block size. With these configurations I am only seeing throughput of about 15-25 MB/s, despite a full Gig-E network. What is generally the recommended configuration in a large striped environment? I am wondering if the number of nodes in the stripe is causing too much overhead, or if the bottleneck is likely somewhere else. In addition, I saw a thread on the list that indicates it is better to replicate across stripes rather than stripe across replicates. Does anyone have any comments or opinion regarding this? Thanks, Jordan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090220/7e8849e5/attachment.html>
At 11:02 PM 2/20/2009, Jordan Mendler wrote:>I am prototyping GlusterFS with ~50-60TB of raw disk space across >non-raided disks in ~30 compute nodes. I initially separated the >nodes into groups of two, and did a replicate across each set of >single drives in a pair of servers. Next I did a stripe across the >33 resulting AFR groups, with a block size of 1MB and later with the >default block size. With these configurations I am only seeing >throughput of about 15-25 MB/s, despite a full Gig-E network. > >What is generally the recommended configuration in a large striped >environment? I am wondering if the number of nodes in the stripe is >causing too much overhead, or if the bottleneck is likely somewhere >else. In addition, I saw a thread on the list that indicates it is >better to replicate across stripes rather than stripe across >replicates. Does anyone have any comments or opinion regarding this?I think that''s all guesswork, I''m not sure anyones done a thorough test with gluster 2.0 on those choices. Personally, from a data management perspective, I''d rather replicate then stripe, so that I know that each node in a replica has exactly the same data. With striping then replicating, I imagine there is the possibility to have some data that''s on one node in one stripe set on 2 nodes in another stripe set and this causes a problem if you have to take it apart or deal with it later. However, if you have the time, it''d be great to see results of you testing with a 15 node stripe and a 10 node stripe to see how those numbers rate vs. the 30 node stripe you have now. then, flip the replication and do the same tests again. Keith
At 11:02 PM 2/20/2009, Jordan Mendler wrote:>I am prototyping GlusterFS with ~50-60TB of raw disk space across >non-raided disks in ~30 compute nodes. I initially separated the >nodes into groups of two, and did a replicate across each set of >single drives in a pair of servers. Next I did a stripe across the >33 resulting AFR groups, with a block size of 1MB and later with the >default block size. With these configurations I am only seeing >throughput of about 15-25 MB/s, despite a full Gig-E network. > >What is generally the recommended configuration in a large striped >environment? I am wondering if the number of nodes in the stripe is >causing too much overhead, or if the bottleneck is likely somewhere >else. In addition, I saw a thread on the list that indicates it is >better to replicate across stripes rather than stripe across >replicates. Does anyone have any comments or opinion regarding this?I think that's all guesswork, I'm not sure anyones done a thorough test with gluster 2.0 on those choices. Personally, from a data management perspective, I'd rather replicate then stripe, so that I know that each node in a replica has exactly the same data. With striping then replicating, I imagine there is the possibility to have some data that's on one node in one stripe set on 2 nodes in another stripe set and this causes a problem if you have to take it apart or deal with it later. However, if you have the time, it'd be great to see results of you testing with a 15 node stripe and a 10 node stripe to see how those numbers rate vs. the 30 node stripe you have now. then, flip the replication and do the same tests again. Keith
Hi Jordan, Replies Inline. At 11:02 PM 2/20/2009, Jordan Mendler wrote:> >> I am prototyping GlusterFS with ~50-60TB of raw disk space across >> non-raided disks in ~30 compute nodes. I initially separated the nodes into >> groups of two, and did a replicate across each set of single drives in a >> pair of servers. Next I did a stripe across the 33 resulting AFR groups, >> with a block size of 1MB and later with the default block size. With these >> configurations I am only seeing throughput of about 15-25 MB/s, despite a >> full Gig-E network. >> >Generally, we recommend stripe set of 4 nodes, and if you have more nodes, we recommend doing aggregate of multiple stripe volumes. This will help with scaling issues if you decide to add more nodes later, because by nature, stripe translator can't add more subvolumes, instead distribute can add more subvolumes (which can be even a new stripe of 4 subvolumes). Also, we recommend having stripe-size of 128KB, with which one should have write-behind block-size of 128KB * (no of subvolumes of stripe), which helps to send each write call parallel to all the nodes.> >> What is generally the recommended configuration in a large striped >> environment? I am wondering if the number of nodes in the stripe is causing >> too much overhead, or if the bottleneck is likely somewhere else. > >Yes, if the number of striped volumes are high, there is bit of more CPU consumption at client, and we may not utilize the parallelism properly. Again, having setup as described above should help.> In addition, I saw a thread on the list that indicates it is better to >> replicate across stripes rather than stripe across replicates. Does anyone >> have any comments or opinion regarding this? >> >after rc2 releases, both should work fine. but before that, there was a known bug that, replicate was not handling 'holes' created in stripe, while self-healing. Now that issue has been addressed. Regards, -- Amar Tumballi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090302/55136620/attachment.html>