thr3ads.net - Lustre discuss - [Lustre-discuss] HW experience [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Martin Gasthuber

2008-Mar-26 11:53 UTC

[Lustre-discuss] HW experience

Hi,

  we would like to establish a small Lustre instance and for the OST
planning to use standard Dell PE1950 servers (2x QuadCore + 16 GB Ram) and
for the disk a JBOD (MD1000) steered by the PE1950 internal Raid controller
(Raid-6). Any experience (good or bad) with such a config ?

thanxs,
   Martin

Aaron S. Knister

2008-Mar-26 23:33 UTC

head link

[Lustre-discuss] HW experience

After I got the kinks work out, this setup FLIES, especially over infiniband.
It''s actually what I''m running for a 50tb lustre set up. I
would strongly recommend either two 7 disk raid5s with a hot spare or one raid6,
per MD1000.. I don''t know what the performance difference is like. I
also believe ldiskfs will now allow you to format a single partition larger than
8TB. If this is not the case, go with the two smaller raid5s. If you use LVM and
split up a single physical partition into two virtual
partitions...you''re performance will hurt.

Also, don''t bother partitioning the raids. Use raw block devices (i.e.
/dev/sdX). I''ve also seen significantly better performance out of RHEL5
than RHEL4 but that was with the perc5, so I can''t speak for the perc6.
The key (at least with the perc5s) can be found in this article-
http://thias.marmotte.net/archives/2008/01/05/Dell-PERC5E-and-MD1000-performance-tweaks.html.
It makes a WORLD of difference. Good luck!

-Aaron 

oh and PS I''d also avoid putting more than 3 MD1000s per 1950 for
bandwidth reasons.

----- Original Message ----- 
From: "Martin Gasthuber" <martin.gasthuber at desy.de> 
To: "Lustre" <lustre-discuss at clusterfs.com> 
Sent: Wednesday, March 26, 2008 7:53:31 AM GMT -05:00 US/Canada Eastern 
Subject: [Lustre-discuss] HW experience 

Hi, 

we would like to establish a small Lustre instance and for the OST 
planning to use standard Dell PE1950 servers (2x QuadCore + 16 GB Ram) and 
for the disk a JBOD (MD1000) steered by the PE1950 internal Raid controller 
(Raid-6). Any experience (good or bad) with such a config ? 

thanxs, 
Martin 

_______________________________________________ 
Lustre-discuss mailing list 
Lustre-discuss at lists.lustre.org 
http://lists.lustre.org/mailman/listinfo/lustre-discuss 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080326/ecc708d5/attachment-0002.html

Balagopal Pillai

2008-Mar-27 01:31 UTC

head link

[Lustre-discuss] HW experience

On Wed, 26 Mar 2008, Martin Gasthuber wrote:
Hi,

     I have a lustre setup with two OSS and 8 MD1000 connected to two 
perc5e controllers with ~90 TB of raw storage. The 6 mds are on a 
different perc5i controller. It uses bonded gigabit ethernet. Performance 
is not bad. I have another lustre setup with a single oss and a 3ware 
9650. The performance is better than that setup. If the new perc6 is 
based on lsi 8888, since it is supports raid 6 (i could be wrong on this 
assumption), then the performance of perc 5 and 6 can''t be directly 
comparable as one is an intel iop based controller and the 8888 is a
power pc based one. Some benchmarks on the internet list the 8888 as a 
very good controller with one of the best performance available. So if 
perc6 is based on 8888, it could be likely better than perc 5. 

         If there is only one md1000, then you should look at the 
performance of split mode with one lun per raid5 (better raid6 for peace 
of mind) with one hot spare (in case of raid6). Perc 6 was my first choice 
over perc 5 just to get raid 6, but it was not out yet when the equipment 
was purchased. The md1000''s that i have here have seagate barracuda es 
drives which are supposed to be enterprise hard drives. But sata drives do 
have a high failure rate. I have 6 failed in the last 4-5 months (out of 
120 in total), which is not that high especially when the arrays are almost 
full. Perc 5 does have some weird characteristics, for example a month ago 
a drive in one MD1000 enclosure started showing "unexpected sense"
errors.
Unfortunately it started happening twice or thrice per second and the 
monitoring deamon started sending email for every instance as i configured 
the alert level too high. Finally when the drive failed, one more drive 
fell off another volume for no apparant reason. Dell is also unable to 
explain that. I was fortunate that the drive didn''t fall off the same
raid
5 volume as the actual failing drive. The fell off drive is still a good 
drive and is now in the global hotspare pool. It was good on hindsight 
that i didn''t go for a single big raid5 volume with 13 or 14 drives 
and
went for two raid 5 volumes  per enclosure ( 7 + 6 plus two hot spares) 
presisely for this scenario. This reduces risk of multiple simultaneous 
drive failures in the same volume. But as i mentioned before, if perc6 is 
based on lsi 8888, then it is a totally new product not directly 
comparable to perc 5. 

       Also watch out for some critical Lustre bugs which are 
showstoppers. Like this - 
https://bugzilla.lustre.org/show_bug.cgi?id=13438  Until this patch came 
out, the two OSS crashed on a daily basis for two months. There is also 
another bug that affects nfs exports that needs a reboot of the lustre client
that
does the nfs export. I went around it for now by using a virtual machine 
for nfs export of lustre volumes so that a reboot won''t affect running 
compute jobs. There is also another problem as explained in an email in 
the list a few days ago with clients getting evicted. So i was 
concentrating much more on MD1000''s in the beginning, but in the end i
was more than happy when i
got a working stable lustre configuration and now not too keen to extract 
the last ounce of performance out of the MD1000 anymore :-) 

Regards
Balagopal



 
 
> Hi,
> 
>   we would like to establish a small Lustre instance and for the OST
> planning to use standard Dell PE1950 servers (2x QuadCore + 16 GB Ram) and
> for the disk a JBOD (MD1000) steered by the PE1950 internal Raid controller
> (Raid-6). Any experience (good or bad) with such a config ?
> 
> thanxs,
>    Martin
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Kilian CAVALOTTI

2008-Mar-27 03:37 UTC

head link

[Lustre-discuss] HW experience

Hi Martin,

On Wednesday 26 March 2008 04:53:31 Martin Gasthuber
wrote:>   we would like to establish a small Lustre instance and for the OST
> planning to use standard Dell PE1950 servers (2x QuadCore + 16 GB Ram)
> and for the disk a JBOD (MD1000) steered by the PE1950 internal Raid
> controller (Raid-6). Any experience (good or bad) with such a config ?
I also have a 50TB Lustre setup based on this hardware: 8 PE1950 OSSes 
connected to two MD1000 OSTes each. The MDS uses a MD3000 as a MDT for 
high-availability (redundancy is not currently in use, though, I never 
managed to get it working reliably).

Can''t say much about the PERC6 controller, since I''m using its
older
brother PERC5, but memory wise, you should be good with 16B. We planned 
4GB per OSS (2xOST each) at the beginning, but we had to double that to 
avoid memory exhaustion [1]. It will depend on the load induced by the 
clients, though.

MD1000s'' performance is great as long as you set the read-ahead
settings as
Aaron mentioned.

/scratch $ iozone -c -c -R -b ~/iozone.xls -C -r 64k -s 24m -i 0 -i 1 -i 
2 -i8 -t50   
"Throughput report Y-axis is type of test X-axis is number of
processes"
"Record size = 64 Kbytes "
"Output is in Kbytes/sec"

"  Initial write " 1317906.72
"        Rewrite " 2423618.81
"           Read " 3484409.47
"        Re-read " 4023550.60
"    Random read " 3361937.08
" Mixed workload " 2994666.57
"   Random write " 1777569.04


[1]http://lists.lustre.org/pipermail/lustre-discuss/2008-February/004874.html

Cheers,
-- 
Kilian

Apparently Analagous Threads

Search for more maybe matching threads

Lustre discuss - Mar 2008 - HW experience

[Lustre-discuss] HW experience

[Lustre-discuss] HW experience

[Lustre-discuss] HW experience

[Lustre-discuss] HW experience

Apparently Analagous Threads