Mark True
2008-Jun-14 18:22 UTC
[Lustre-discuss] Rule of thumb for setting up lustre resources...
Hello! I am new to the list, but I have been researching Lustre for quite some time and finally have an occasion to use it. I am trying to do some capacity planning and I am wondering if there are some general rules of thumb for configuring a Lustre environment. Specifically: A> If increasing the number of OSTs increases throughput, is there a relationship that can be used to determine how many OSTs we''re likely to need at the outset to establish a baseline minimum throughput. For examples, if I want to get 3GB sustained throughput how many OSTs will facilitate this. B> Does the MGS and MDS have to be separate for best performance, or can they be consolidated into one server without causing too much hardship C> Right now I am looking at a model where I am connecting all the OSTs, and the MDS/MGS together using infiniband, and connecting the storage via fibrechannel. Is this the ideal solution or am I going in the wrong direction. D> Just wondering what clustering software people use on the front end with Lustre typically, if they are going to be using this as a filesystem for some kind of HPC environment, what is the most popular clustering technology for this. E> Does Heartbeat install next to whatever HPC clustering technology you have? Thanks, and I hope that I can soon be someone who contributes rather than just asking questions :) --Mark T. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080614/a65532c0/attachment-0001.html
Brian J. Murrell
2008-Jun-16 14:03 UTC
[Lustre-discuss] Rule of thumb for setting up lustre resources...
On Sat, 2008-06-14 at 14:22 -0400, Mark True wrote:> > Hello!Hi.> A> If increasing the number of OSTs increases throughput, is there a > relationship that can be used to determine how many OSTs we''re likely > to need at the outset to establish a baseline minimum throughput.Of course.> For examples, if I want to get 3GB sustained throughput how many OSTs > will facilitate this.That is _completely_ dependent on your hardware configuration. If you are adding an "identical" (to an existing) OSTs you can simply use the speed of the existing OST to determine how much more the new OST will add. But be very careful of ceilings. You can of course only add so many OSTs before you start to hit other resource limitations such as bus bandwidth in the OSS and network bandwidth of the OSS''s interconnect, etc. In short, you need to understand the performance capability of all of your components to come up with an overall design that meets your performance goals and scales to future goals.> B> Does the MGS and MDS have to be separate for best performance, or > can they be consolidated into one server without causing too much > hardshipI''d tend to say that most people put them into the same server. For anything but "toy" installations however, we strongly suggest you put the MGS and MDT on separate devices.> C> Right now I am looking at a model where I am connecting all the > OSTs, and the MDS/MGS together using infiniband,Just to keep the nomenclature straight, an OST is a device (i.e. a disk) in/attached to an OSS. An OSS is the server that serves OSTs.> and connecting the storage via fibrechannel. Is this the ideal > solution or am I going in the wrong direction.That sounds suitable. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080616/d8205498/attachment.bin
Klaus Steden
2008-Jun-16 19:30 UTC
[Lustre-discuss] Rule of thumb for setting up lustre resources...
Hi Mark, See my comments inline below. cheers, Klaus On 6/14/08 11:22 AM, "Mark True" <darfoo at gmail.com>did etch on stone tablets:> > Hello! > > I am new to the list, but I have been researching Lustre for quite some time > and finally have an occasion to use it. I am trying to do some capacity > planning and I am wondering if there are some general rules of thumb for > configuring a Lustre environment. > > Specifically: > > A> If increasing the number of OSTs increases throughput, is there a > relationship that can be used to determine how many OSTs we''re likely to need > at the outset to establish a baseline minimum throughput. For examples, if I > want to get 3GB sustained throughput how many OSTs will facilitate this. > > B> Does the MGS and MDS have to be separate for best performance, or can they > be consolidated into one server without causing too much hardship > > > C> Right now I am looking at a model where I am connecting all the OSTs, and > the MDS/MGS together using infiniband, and connecting the storage via > fibrechannel. Is this the ideal solution or am I going in the wrong > direction. >This is a good solution, and will give you good performance overall, although you can mix different storage technologies and network technologies within the same storage environment and it should remain relatively transparent. I''ve got a cluster that handles both FC storage and iSCSI storage, but I know there are people out there using DRBD, and I''m dying to try Infiniband-based storage as well. Anything that presents a block device to an OSS should be suitable for use with Lustre, but some will perform better than others. Bottom line, I think, is pick the best technology for your price range and performance needs. Infiniband + FC is pretty much the top of the mountain, though.> > D> Just wondering what clustering software people use on the front end with > Lustre typically, if they are going to be using this as a filesystem for some > kind of HPC environment, what is the most popular clustering technology for > this. >Our CFS clusters are all organized as part of ROCKS clusters. I know a number of people on this list are on the ROCKS list, so there''s good cross-pollination between technologies. It''s a mature cluster architecture designed for HPC, and bundles a number of useful solutions and tools onboard (MPI, SGE, Torque, distributed compilers, visualization, etc.). It''s also relatively easy to integrate with Lustre, as you can simply drop in the pre-built Lustre RPMs into the cluster installer and be ready to go in a few minutes.> > E> Does Heartbeat install next to whatever HPC clustering technology you have? >I''m using Linux-HA, and it wasn''t built into my cluster software distro, but it was easy enough to drop into the mix, and as of late last year had native disk support for Lustre file systems.> > Thanks, and I hope that I can soon be someone who contributes rather than just > asking questions :) > > --Mark T. > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Mark True
2008-Jun-17 13:40 UTC
[Lustre-discuss] Rule of thumb for setting up lustre resources...
*Size:*3.2 k Hey Brian, Thanks so much for the prompt response, I do have a couple of questions for clarification: Does the hardware makeup of the OSS affect the speed of the OSTs? If so, what is likely to be the bottleneck in an OSS. Say we have an OSS with 3 OSTs attatched, is that different than having three OSSs with 1 OST apiece? Also, does the OSS have as much of a performance impact as the speed of the OST. What is the recommended max number of OSTs per OSS? If I am able to determine the max capabilities of an OST/OSS is it safe to assume that the increase in performance scales linearly as I increase the number of OSS/OSTs? Thanks, --Mark T. On Mon, Jun 16, 2008 at 10:03 AM, Brian J. Murrell <Brian.Murrell at sun.com> wrote:> On Sat, 2008-06-14 at 14:22 -0400, Mark True wrote: > > > > Hello! > > Hi. > > > A> If increasing the number of OSTs increases throughput, is there a > > relationship that can be used to determine how many OSTs we''re likely > > to need at the outset to establish a baseline minimum throughput. > > Of course. > > > For examples, if I want to get 3GB sustained throughput how many OSTs > > will facilitate this. > > That is _completely_ dependent on your hardware configuration. If you > are adding an "identical" (to an existing) OSTs you can simply use the > speed of the existing OST to determine how much more the new OST will > add. But be very careful of ceilings. You can of course only add so > many OSTs before you start to hit other resource limitations such as bus > bandwidth in the OSS and network bandwidth of the OSS''s interconnect, > etc. In short, you need to understand the performance capability of all > of your components to come up with an overall design that meets your > performance goals and scales to future goals. > > > B> Does the MGS and MDS have to be separate for best performance, or > > can they be consolidated into one server without causing too much > > hardship > > I''d tend to say that most people put them into the same server. For > anything but "toy" installations however, we strongly suggest you put > the MGS and MDT on separate devices. > > > C> Right now I am looking at a model where I am connecting all the > > OSTs, and the MDS/MGS together using infiniband, > > Just to keep the nomenclature straight, an OST is a device (i.e. a disk) > in/attached to an OSS. An OSS is the server that serves OSTs. > > > and connecting the storage via fibrechannel. Is this the ideal > > solution or am I going in the wrong direction. > > That sounds suitable. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080617/9dba2a52/attachment-0001.html
Brian J. Murrell
2008-Jun-17 14:02 UTC
[Lustre-discuss] Rule of thumb for setting up lustre resources...
On Tue, 2008-06-17 at 09:40 -0400, Mark True wrote:> Thanks so much for the prompt response, I do have a couple of > questions for clarification: > > Does the hardware makeup of the OSS affect the speed of the OSTs?Of course. An OST is only going to go as fast as the hardware that it''s made up of. If you put a slow disk in an OSS, the OST is going to be slow.> If so, what is likely to be the bottleneck in an OSS.There is no one right answer to that. You have to get together with your hardware vendor and explain your use scenario (i.e. for an OSS) and have them spec out some hardware that meets the use-case. If you will be spec''ing your own hardware then you need to grab the technical specifications for all of the hardware you are proposing using and understand their performance aspects. If you don''t feel confident in being able to do the latter, then I would suggest you do the former. In general, an OSS is I/O bound. You need to provide enough I/O capacity between the disk and network through which the data will travel.> Say we have an OSS with 3 OSTs attatched, is that different than > having > three OSSs with 1 OST apiece?That depends on whether that OSS with the 3 OSTs attached has the I/O capacity to do full-out I/O to all three disks. As I''ve said before, this is basically an exercise in understanding the capacity of your entire I/O path from OST to client and sizing to meet that capacity.> Also, does the OSS have as much of a performance impact as the speed > of the > OST.The OSS hosts the OST, so you can''t really compare the performance impact of one vs. the other.> What is the recommended max number of OSTs per OSS?As you have probably gathered, that is *completely* dependent on the hardware you are using for OSSes and OSTs and there is no one answer that fits all hardware. You just want to make sure you don''t create bottlenecks, again by understanding the capacity of the various paths from OST (disk) to client.> If I am able to determine the max capabilities of an OST/OSS is it > safe to > assume that the increase in performance scales linearly as I increase > the > number of OSS/OSTs?Yes, raw bandwidth will grow pretty linearly. It is up to your applications and use-cases to take advantage of that though. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080617/12e0a7d4/attachment-0001.bin