EKC
2006-May-19 07:36 UTC
[Lustre-discuss] Federated Switch Configuration For Large-Scale Lustre Clusters?
Hello, I have a 22-node Lustre cluster that I''ve cobbled together for evaluation purposes. I have all of the lustre nodes running off of a single 24-port Gigabit ethernet switch. My intention is to scale this Lustre cluster up to several hundred gigabit ethernet Lustre nodes over the next several months. Each node is a single-board VIA mini ITX computer (dual processor) with on-board gigabit ethernet and 2 gig''s of RAM. I''m using Warewulf and PXE to automatically configure each of the nodes. Each OSS is using a local IDE disk. For OST failover I''m planning on pairing OSS''s using DRBD. Although I have not yet setup DRBD to accomplish this, the DRBD kernel patch seems to be compatible with Lustre. The only scalability issue that I have at this moment is the Gigabit switch. What sort of ethernet switch configurations have people used to deploy Lustre clusters with 100''s (1000''s?) of nodes? I''ve been reading about the "federated switch" configuration used by the Blue/Gene supercomputer. Apparently, they have been using a dozen Cisco 6509 switch''s for their Lustre cluster (http://www.cisco.com/en/US/products/hw/switches/ps708/index.html). Ebay lists Cisco 6509 switches at $25K! That''s $100+ per port for 10/100 ethernet. What is a cost-effective and practical approach to setting up gigabit switches for a 1000 node Lustre cluster? What sort of gigabit ethernet switches should I be using? Currently, I''m using a 24-port Netgear Gigabit Ethernet switch (http://www.newegg.com/Product/Product.asp?Item=N82E16833122058). What sort of performance hit would I experience if I arranged a few dozen of these witches in a tree structure? Thanks in advance