Hi All ? We''re?a research lab new to Lustre and are purchasing a small HPC cluster.? We wish to seek your comments and help on sizing the hardware. So far we plan to have the following: - 1 Head node, 2 x 6 core intel X5670, 32GB RAM, 2 x 300 GB SAS 10krpm HDDs - 1 MDS+MDT node, 2 x 6 core intel X5670, 24GB RAM, 2 x 500GB SATA 7.2krpm HDDs - 2 Storage nodes (OSS collocated with OST), each with: 2 x 4 core intel E5620, 24GB RAM, 14 x 600GB SAS 15krpm HDDs (raw 8.4TB) - 12 compute nodes, each with 2 x 6 core intel X5670. - Infiniband QDR connectivity ? - Average file size: 20MB - Usage pattern: running parallel (MPI) models like WRF (weather model),?POM (hydrodynamics model) etc ? Questions: - Is the above hardware config all right? - What is the impact of putting the 2nd (failover) MDS+MDT on the Head Node? - Is Lustre easy to maintain? - Is Lustre reasonably stable and problem free? ? THanks for any tips or suggestions. ? regards -chk Physical Oceanography research Lab
Ashley Pittman
2011-May-27 09:20 UTC
[Lustre-discuss] How''s this config for our Lustre setup?
On 27 May 2011, at 08:02, Kek wrote:> Hi All > > We''re a research lab new to Lustre and are purchasing a small HPC cluster. We wish to seek your comments and help on sizing the hardware. > So far we plan to have the following: > - 1 Head node, 2 x 6 core intel X5670, 32GB RAM, 2 x 300 GB SAS 10krpm HDDs > - 1 MDS+MDT node, 2 x 6 core intel X5670, 24GB RAM, 2 x 500GB SATA 7.2krpm HDDs > - 2 Storage nodes (OSS collocated with OST), each with: 2 x 4 core intel E5620, 24GB RAM, 14 x 600GB SAS 15krpm HDDs (raw 8.4TB) > - 12 compute nodes, each with 2 x 6 core intel X5670. > - Infiniband QDR connectivity > > - Average file size: 20MB > - Usage pattern: running parallel (MPI) models like WRF (weather model), POM (hydrodynamics model) etc > > Questions: > - Is the above hardware config all right?It doesn''t sound unreasonable for a entry-level system, a few things stick out at me though: ??T stands for a Target and refers to a single storage device, ??S stands for a Server and refers to a physical machine. It''s therefore correct to say that the MDS hosts the MDT and the OSSs host the OSTs. Talking about OSS collocated with OST does not make much sense. The OSS will benefit from having lots of ram, 24Gb is good but the MDT less so. That said ram is cheap so there wouldn''t be much of a saving. Why do you have 2xHDD for the MDT? It''ll use a single device, having two only makes sense if you are using raid 1 which you should be. See below. SAS is probably overkill for the OST drives at this scale, SATA will be cheaper and higher capacity. Now for the main point from the above config. Data will be striped over all OSTs, if you get a disk failure then the data stored on that OST will be lost forever. As files are likely to be striped this means that you will likely loose a considerable % of all data for each and every disk failure (think 80% plus - you''ll be able to recover a subset of small files and parts of larger files only). If you assume that the MTBF for a hard drive is four years (48 months) and you have 30 drives then you can expect one to fail at least every two months. Lustre itself doesn''t protect against this, it simply works at the device level, to provide some resilience against disk failure you should use RAID at some level. At this level software raid1 or similar would be acceptable. As above if you use SATA disks rather than SAS you may find this is both cheaper, more resilient and still give you more capacity.> - What is the impact of putting the 2nd (failover) MDS+MDT on the Head Node?I''m not sure you understand what you mean here, unless the data is replicated between a backup device on the head node and the MDT device this wouldn''t be possible. You could do this using DRDB I believe although the standard is to use external raid controllers multiply connected to do this.> - Is Lustre easy to maintain?Generally yes although the learning curve can be steep at the beginning.> - Is Lustre reasonably stable and problem free?Yes, it''s a lot better than it used to be. Ashley.
Peter Kjellström
2011-May-27 15:18 UTC
[Lustre-discuss] How''s this config for our Lustre setup?
See my comments inline below. On Friday, May 27, 2011 09:02:30 AM Kek wrote:> Hi All > > We''re a research lab new to Lustre and are purchasing a small HPC cluster. > We wish to seek your comments and help on sizing the hardware. So far we > plan to have the following: > - 1 Head node, 2 x 6 core intel X5670, 32GB RAM, 2 x 300 GB SAS 10krpm HDDs > - 1 MDS+MDT node, 2 x 6 core intel X5670, 24GB RAM, 2 x 500GB SATA 7.2krpm > HDDs > - 2 Storage nodes (OSS collocated with OST), each with: 2 x 4 core > intel E5620, 24GB RAM, 14 x 600GB SAS 15krpm HDDs (raw 8.4TB)I''d switch the SAS drives on the OSS-nodes for SATA and buy SAS on the MDT where IOPs matters more. I''d also consider a few more drives on the MDS like 4-6 SAS in a raid10. Make sure your OSS-nodes use raid6.> - 12 compute > nodes, each with 2 x 6 core intel X5670. > - Infiniband QDR connectivity > > - Average file size: 20MB > - Usage pattern: running parallel (MPI) models like WRF (weather > model), POM (hydrodynamics model) etc > Questions: > - Is the above hardware config all right?Except for a general feeling of few and expensive components, yes. ;-)> - What is the impact of putting the 2nd (failover) MDS+MDT on the HeadYou filesystem will essentially be unavailalbe if one out of MDS + 2xOSS is down/dead so it doesn''t make much HA sense to configure only the MDS with a fail-over (and it will require either drdb or shared external storage). For a small config like this I''d say it''s a no-brainer to keep it simple and stay away from fail-over stuff (we run a lustre with no failover and almost 100 servers and even that has a reasonable availability).> Node? - Is Lustre easy to maintain? > - Is Lustre reasonably stable and problem free?Compared to similar software I''d say so, yes. There is a quite nice admin- guide that I recommend you read at: http://wiki.lustre.org/index.php/Lustre_Documentation /Peter> THanks for any tips or suggestions. > > regards > -chk-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110527/8ead0b15/attachment.bin