Henry_Xu at Dell.com
2010-Jul-24 02:25 UTC
[Lustre-discuss] How to achieve 20GB/s file system throughput?
Hello, One of my customer want to set up HPC with thousands of compute nodes. The parallel file system should have 20GB/s throughput. I am not sure whether lustre can make it. How many IO nodes needed to achieve this target? My assumption is 100 or more IO nodes(rack servers) are needed. Thanks in advance! Henry Xu, System Consultant, -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100724/3fdc830f/attachment-0001.html
Joe Landman
2010-Jul-24 02:50 UTC
[Lustre-discuss] How to achieve 20GB/s file system throughput?
On 07/23/2010 10:25 PM, Henry_Xu at Dell.com wrote:> Hello, > > One of my customer want to set up HPC with thousands of compute nodes. > The parallel file system should have 20GB/s throughput. I am not sure > whether lustre can make it. How many IO nodes needed to achieve this target?I hate to say "it depends" but, it does in fact depend upon many things. What type of IO is the customer doing; large block sequential spread out over many nodes (parallel IO), or small block random, or a mixture? It is possible to achieve 20GB/s, and quite a bit more, using Lustre. As to whether or not that 20GB/s is meaningful to their code(s), thats a different question. It would be 20GB/s in aggregate, over possibly many compute nodes doing IO.> My assumption is 100 or more IO nodes(rack servers) are needed.Hmmm ... If you can achieve 500+ MB/s per OST, then you would need about 40 OSTs. You can have each OSS handle several OSTs. There are efficiency losses you should be aware of, but 20GB/s using some mechanism to measure this, should be possible with a realistic number of units. Don''t forget to count efficiency losses in the design. 100 IO nodes ... I presume you mean OSSes? If your units are slower, then yes, you will need more of them to achieve this performance. You would need to make sure you have a well designed and correctly functional Infiniband infrastructure in addition to the other issues. We''ve found that Lustre is ... very sensitive ... to the Infiniband implementation. Regards, Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
Bernd Schubert
2010-Jul-24 08:52 UTC
[Lustre-discuss] How to achieve 20GB/s file system throughput?
On Saturday, July 24, 2010, Henry_Xu at dell.com wrote:> Hello, > > > > One of my customer want to set up HPC with thousands of compute nodes. > The parallel file system should have 20GB/s throughput. I am not sure > whether lustre can make it. How many IO nodes needed to achieve this > target? > > > > My assumption is 100 or more IO nodes(rack servers) are needed. >I''m a bit prejudiced, of course, but with DDN storage that would be quite simple. With the older DDN S2A 9990, you can get 5GB/s per controller-pair , with the newer SFA10000 you can get 6.5 to 7GB/s (we are still tuning it) per controller pair. Each controller pair (couplet in DDN terms) usually has 4 servers connected and fits into single rack in a 300 drive configuration. So you can get 20GB/s with 3 or 4 racks and 12 or 16 OSS servers, which is much below your 100 IO nodes ;) Cheers, Bernd -- Bernd Schubert DataDirect Networks
hung-sheng tsao
2010-Jul-24 11:39 UTC
[Lustre-discuss] How to achieve 20GB/s file system throughput?
may be checkout http://www.terascala.com provide lustre appliance it claim rts1000 20TB per enclosure 2GB throughput regards On Fri, Jul 23, 2010 at 10:25 PM, <Henry_Xu at dell.com> wrote:> Hello, > > > > One of my customer want to set up HPC with thousands of compute nodes. The > parallel file system should have 20GB/s throughput. I am not sure whether > lustre can make it. How many IO nodes needed to achieve this target? > > > > My assumption is 100 or more IO nodes(rack servers) are needed. > > > > Thanks in advance! > > * * > > *Henry Xu*, > > *S*ystem *C*onsultant, > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-- Hung-Sheng Tsao, Ph.D. <laotsao at gmail.com> laotsao at gmail.com http://laotsao.wordpress.com 9734950840 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100724/09ec5dd8/attachment.html
Joe Landman
2010-Jul-24 13:08 UTC
[Lustre-discuss] How to achieve 20GB/s file system throughput?
Hate to reply to myself ... not an advertisement On 07/23/2010 10:50 PM, Joe Landman wrote:> On 07/23/2010 10:25 PM, Henry_Xu at Dell.com wrote:[...]> It is possible to achieve 20GB/s, and quite a bit more, using Lustre. > As to whether or not that 20GB/s is meaningful to their code(s), thats a > different question. It would be 20GB/s in aggregate, over possibly many > compute nodes doing IO.I should point out that we have customers with 20GB/s maximum theoretical configs (best case scenarios) with our siCluster (http://scalableinformatics.com/sicluster), with 8 IO units. Their write patterns and Infiniband configurations don''t seem to allow achieving this in practice. Simple benchmark tests (mixtures of llnl mpi-io, io-bm, iozone, ...) show sustained results north of 12 GB/s for them. Again, to set expectations, most users codes never utilize storage systems very effectively, hence you might design a 20GB/s storage system, and the IO being done might not hit much above 500 MB/s for single threads.>> My assumption is 100 or more IO nodes(rack servers) are needed. > Hmmm ... If you can achieve 500+ MB/s per OST, then you would need about > 40 OSTs. You can have each OSS handle several OSTs. There are > efficiency losses you should be aware of, but 20GB/s using some > mechanism to measure this, should be possible with a realistic number of > units. Don''t forget to count efficiency losses in the design.We do this in 8 machines (theoretical max performance), and could put this in a single rack. We prefer to break it out among more IO nodes, say 16-24 smaller nodes, with 2-3 OSTs per OSS (e.g. IO node). My comments are to make sure your customer understands the efficiency issues, and that simple fortran writes from a single thread aren''t going to be done at 20GB/s. That is, not unlike a compute cluster, a storage cluster has an aggregate bandwidth, that a single node or reader/writer cannot achieve on its own. Regards, Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615