thr3ads.net - Lustre discuss - [Lustre-discuss] Federated Switch Configuration For Large-Scale Lustre Clusters? [May 2006]

If this information is useful, please help other people find it:
Share via:

EKC

2006-May-19 07:36 UTC

[Lustre-discuss] Federated Switch Configuration For Large-Scale Lustre Clusters?

Hello,

I have a 22-node Lustre cluster that I''ve cobbled together for
evaluation purposes. I have all of the lustre nodes running off of a
single 24-port Gigabit ethernet switch. My intention is to scale this
Lustre cluster up to several hundred gigabit ethernet Lustre nodes
over the next several months. Each node is a single-board VIA mini ITX
computer (dual processor) with on-board gigabit ethernet and 2 gig''s
of RAM.

I''m using Warewulf and PXE to automatically configure each of the
nodes. Each OSS is using a local IDE disk. For OST failover I''m
planning on pairing OSS''s using DRBD. Although I have not yet setup
DRBD to accomplish this, the DRBD kernel patch seems to be compatible
with Lustre.

The only scalability issue that I have at this moment is the Gigabit
switch. What sort of ethernet switch configurations have people used
to deploy Lustre clusters with 100''s (1000''s?) of nodes?

I''ve been reading about the "federated switch" configuration
used by
the Blue/Gene supercomputer. Apparently, they have been using a dozen
Cisco 6509 switch''s for their Lustre cluster
(http://www.cisco.com/en/US/products/hw/switches/ps708/index.html).
Ebay lists Cisco 6509 switches at $25K! That''s $100+ per port for
10/100 ethernet.

What is a cost-effective and practical approach to setting up gigabit
switches for a 1000 node Lustre cluster? What sort of gigabit ethernet
switches should I be using?

Currently, I''m using a 24-port Netgear Gigabit Ethernet switch
(http://www.newegg.com/Product/Product.asp?Item=N82E16833122058). What
sort of performance hit would I experience if I arranged a few dozen
of these witches in a tree structure?

Thanks in advance

Lustre discuss - May 2006 - Federated Switch Configuration For Large-Scale Lustre Clusters?

[Lustre-discuss] Federated Switch Configuration For Large-Scale Lustre Clusters?