Christoph Biardzki
2006-May-19 07:36 UTC
[Lustre-discuss] Using lctl for "bonding", recovery questions
Hello, I''d like to ask for some insight on lustre networking. We have the=20 folowing setup: MDS: Ethernet only OSTs: Myrinet (TCP) and Ethernet A-Clients: Ethernet only B-Clients: Myrinet (TCP) and Ethernet I configured two addresses for the OSTs and B-clients. lconf however=20 configures lustre in such a way that the "Myrinet"-connection is not=20 used and everything goes over ethernet. What we do is to call lctl=20 after lustre start and issue a connect <myrinet-address-of-server> 988 for every OST - then both connections are used in parallel Now my question is whether and how lustre recovers the network when eg.=20 an OST is restarted - my impression is that something works differently=20 (that means connections are not restored) by as there are many timeout=20 values in the recovery processes I would like to see a short high-level=20 overview when lustre will reconnect to a particular server (=20 "--failover" is configured for the OSTs as I assume this is necessary=20 to enable transparent recovery after an OST reboot, right?) Thanks! - Christoph --=20 Leibniz Rechenzentrum M=FCnchen (LRZ) http://www.lrz.de Abteilung Hochleistungssysteme Barer Str. 21 - 80333 M=FCnchen - Germany Tel. ++49-(0)89 / 289-28853, Raum S1527