Hello List, I wanted to outline our proposed configuration and hope for some feedback from the list on whether the LNET config is sound. We''re planning a site-wide (HPC clusters) Lustre filesystem at NYU to be installed in the next few weeks. Lustre version 1.8.1. The MDS, OSS, and routers will be running Redhat 5.3 with Lustre Kernels. Our cluster compute/login nodes will be running Rocks/Redhat 5.1 with lustre kernels. We''ve installed a small test cluster (1 MDS/MGS, 2 OSS, and 4 compute clients) with the same versions and it works well. Note: Sun is going to be onsite to install the MDS failover pair as part of their "Lustre Building Blocks" (Hi Hung-Sheng) Here goes. Core Lustre Network: We have two Mellanox MTS3600 36-Port QDR switches. I thought we''d configure one as o2ib0, the other o2ib1. There''s also the possibility of combining them and connecting each dual port HCA to each switch. Servers on the "Core Lustre Network" will be known as $SERVER_TYPE-$SERVER_NAME-$IB_RANK, so we have: 2 MDS/MGS servers configured as a failover pair. Each has a dual port IB QDR HCA. MDS servers would be on both core networks, o2ib0 and o2ib1. mds-0-0 #o2ib0 mds-0-1 #o2ib1 mds-1-0 #o2ib0 mds-1-1 #o2ib1 4 OSS servers will be configured as 2 sets of failover pairs. Each has a dual port IB QDR HCA. OSS servers would be on both core networks, o2ib0 and o2ib1. oss-0-0 #o2ib0 oss-0-1 #o2ib1 ... oss-3-0 oss-3-1 Each cluster (currently 3) will have 2 routers on the "Lustre Core" network(s) and the "Private Cluster IB" network. 2 of the clusters have DDR private IB networks. The other cluster has a QDR private IB network. I know the 2 QDR switches are overkill but they were relatively cheap and should survive adding more clients and storage. The DDR cluster''s routers will have 2 routers, each with 1 dual port QDR HCA (core) and 1 single-port DDR HCA (private). The QDR cluster''s routers will have 2 dual port QDR HCAs, 1 core, 1 private. Here''s where I''m not 100000% sure about the proper LNET config. Let''s assume the cluster we''re talking about is o2ib4 1. Each cluster sees only one core network o2ib0 OR o2ib1. This roughly corresponds to the multi-rail config in the manual but balancing by cluster (not perfect I know). The routers would be configured with 1 address on the "Lustre Core Network" and 1 address on the "Private Cluster IB". Clients would specify mds-0-0:mds-1-0 or mds-0-1:mds-1-1 as the metadata failover pair when mounting the cluster. Both servers on o2ib0. options lnet="o2ib4(ib0)" routes="o2ib0 local.router.ip.[1-2]@o2ib4;" or options lnet="o2ib4(ib0)" routes="o2ib1 local.router.ip.[1-2]@o2ib4;" 2. Each cluster uses both core networks The routers would be configured with 1 address on o2ib0, 1 address on o2ib1, and 1 address on the "Private Cluster IB". Compute clients would specify mds-0-0,mds-0-1:mds-1-0,mds-0-1 Compute clients would use: options lnet="o2ib4(ib0)" routes="o2ib0 local.router.ip.[1-2]@o2ib4; o2ib1 local.router.ip.[1-2]@o2ib4;" Will that work? If once switch fails, clients should fail over to the other mds address pair as both addresses become unreachable. If an MDS fails, the clients should stay on the same o2ib network but fail over to the other MDS. Is this possible? I would think that even in the second configuration we''d want to manually balance the traffic. Some clients would specify o2ib0 first while other specify o2ib1 first. Erik -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091013/d9684836/attachment.html