Hello List memebers, We are planning a Lustre setup for our lab. I was searching for cheap but reliable DAS and/or JBOD solutions for setting this up and came across this: http://www.supermicro.com/products/chassis/4U/846/SC846E26-R1200.cfm I would like to know if anyone has any experience in setting up this or similar kind of a system. I am planning to add 5 similar 4U chassis and create a 200TB Lustre pool. These will be the OSS and I can use a smaller sized box for the MDS. By the way, what will be the ideal memory size if we use such a system for OSS, keeping in mind I need to make 6 OST''s ( i.e. 8TB file-system size per OST ). Thanks in advance, Richard.
On 2010-08-10, at 02:04, Richard Chang wrote:> By the way, what will be the ideal memory size if we use such a system for OSS, keeping in mind I need to make 6 OST''s ( i.e. 8TB file-system size per OST ).I can''t comment on the system, but _minimum_ RAM sizes are documented in the manual. You normally want more than the minimum (RAM is cheap), but definitely not less. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
these setup are similar to the Sun lustre MDS with dual nehalem server and 12 HDD JBODS with 15krpm SAS drives and OSS with dual intel nehalem server and 24HDD JBODS and 72rpm SATA IIRC the standard memory are all 24 GB http://wiki.lustre.org/images/4/4f/JoeyJablonski.pdf On 8/10/2010 2:04 AM, Richard Chang wrote:> Hello List memebers, > We are planning a Lustre setup for our lab. I was searching for cheap but reliable DAS and/or JBOD solutions for setting this up and came across this: > > http://www.supermicro.com/products/chassis/4U/846/SC846E26-R1200.cfm > > I would like to know if anyone has any experience in setting up this or similar kind of a system. > > I am planning to add 5 similar 4U chassis and create a 200TB Lustre pool. These will be the OSS and I can use a smaller sized box for the MDS. > > By the way, what will be the ideal memory size if we use such a system for OSS, keeping in mind I need to make 6 OST''s ( i.e. 8TB file-system size per OST ). > > Thanks in advance, > > Richard. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: laotsao.vcf Type: text/x-vcard Size: 139 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100810/626b39d0/attachment.vcf
> Hello List memebers, > We are planning a Lustre setup for our lab. I was searching for cheap but > reliable DAS and/or JBOD solutions for setting this up and came across > this: > > http://www.supermicro.com/products/chassis/4U/846/SC846E26-R1200.cfm > > I would like to know if anyone has any experience in setting up this or > similar kind of a system.I''ve only just started playing with 2 OSSes based on that chassis. They have a: Supermicro X8DTH (7 PCI-E 8x slots) motherboard Dual quad core Intel Xeon processors Four 8 port 3ware 9650 raid controllers (4 OSTs/OSS) 1 Mellanox MT26428 QDR Infiniband HCA 24 WD2003FYYS hard drives The disks are carved into four 4+1+spare R5 arrays for 8 total OST''s. I hate to admit it but I skimped on the memory initially and only got 4GB. We''ll never have a read cache hit regardless so I''m not entirely convinced it''s starved at 4GB but the manual suggests that''s low for 4 OST''s and 2 processors (though empirically I''m not convinced). If you can throw enough memory at it to get disk cache hits by all means do. I''ll throw more memory at it at the first hint it''s beneficial. I''m currently client bound doing total Lustre throughput tests. For reasons I don''t fully understand (and have been pondering posting about) I simply can''t get more than about 700MB/s reads on a client. They seem to be bound re-assembling the replies which I expected, I just assumed the peak would be higher. With 3 clients I get 2.1GB/s reads across the 8 OST''s with IOzone: 8 threads per 3 discrete IOzones, doing 1MB IO''s so each OST is seeing 3 concurrent reads which more or less mimics our software. Eventually I expect it to peak at around 3GB/s aggregate (1.5GB/s per OSS). Our data reduction software/cluster size lends itself to this type of config where we have many times the number of OST''s of multi-GB files mapped to individual cluster nodes for processing. So no striping; each file on an OST. The disks are relatively reliable. I don''t plan to scale it beyond 6-8 OSS''s so reliability is still manageable. We use the same chassis/disks but different raid/network link for bulk storage. We have 22 in all (~500 disks). James Robnett NRAO/AOC
for nehalem CPU and memory DIMM population setup please checkout this link http://blogs.sun.com/jnerl/entry/configuring_and_optimizing_intel_xeon that explain various memory bandwidth and DIMM population. Of course these memory bandwidth may not apply to lustre:-( On 8/10/2010 7:49 AM, James Robnett wrote:>> Hello List memebers, >> We are planning a Lustre setup for our lab. I was searching for cheap but >> reliable DAS and/or JBOD solutions for setting this up and came across >> this: >> >> http://www.supermicro.com/products/chassis/4U/846/SC846E26-R1200.cfm >> >> I would like to know if anyone has any experience in setting up this or >> similar kind of a system. > I''ve only just started playing with 2 OSSes based on that chassis. > They have a: > Supermicro X8DTH (7 PCI-E 8x slots) motherboard > Dual quad core Intel Xeon processors > Four 8 port 3ware 9650 raid controllers (4 OSTs/OSS) > 1 Mellanox MT26428 QDR Infiniband HCA > 24 WD2003FYYS hard drives > > The disks are carved into four 4+1+spare R5 arrays for 8 total OST''s. > > I hate to admit it but I skimped on the memory initially and only got > 4GB. We''ll never have a read cache hit regardless so I''m not entirely > convinced it''s starved at 4GB but the manual suggests that''s low for 4 > OST''s and 2 processors (though empirically I''m not convinced). If you can > throw enough memory at it to get disk cache hits by all means do. I''ll > throw more memory at it at the first hint it''s beneficial. > > I''m currently client bound doing total Lustre throughput tests. For > reasons I don''t fully understand (and have been pondering posting about) > I simply can''t get more than about 700MB/s reads on a client. They seem > to be bound re-assembling the replies which I expected, I just assumed > the peak would be higher. > > With 3 clients I get 2.1GB/s reads across the 8 OST''s with IOzone: 8 > threads per 3 discrete IOzones, doing 1MB IO''s so each OST is seeing 3 > concurrent reads which more or less mimics our software. Eventually > I expect it to peak at around 3GB/s aggregate (1.5GB/s per OSS). > > Our data reduction software/cluster size lends itself to this type of > config where we have many times the number of OST''s of multi-GB files > mapped to individual cluster nodes for processing. So no striping; each > file on an OST. The disks are relatively reliable. I don''t plan to scale > it beyond 6-8 OSS''s so reliability is still manageable. > > We use the same chassis/disks but different raid/network link for bulk > storage. We have 22 in all (~500 disks). > > James Robnett > NRAO/AOC > > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: laotsao.vcf Type: text/x-vcard Size: 139 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100810/b77ae837/attachment-0001.vcf
On 8/10/2010 5:19 PM, James Robnett wrote:> I''ve only just started playing with 2 OSSes based on that chassis. > They have a: > Supermicro X8DTH (7 PCI-E 8x slots) motherboard > Dual quad core Intel Xeon processors > Four 8 port 3ware 9650 raid controllers (4 OSTs/OSS) > 1 Mellanox MT26428 QDR Infiniband HCA > 24 WD2003FYYS hard drives > > The disks are carved into four 4+1+spare R5 arrays for 8 total OST''s.Hello James, Thanks for sharing your config details. I heard that rebuilding a 2TB volume will take much longer with R5 and if there is another failure in between rebuilds, we are lost. I was wondering if RAID 6 is better, i.e., four 4+2 R6 arrays per chassis for a total of 4 OST''s per OSS. Provided the hardware allows creation of R6 arrays. On that note, does the RAID controller support RAID 6?. Richard.