Hi all, we have two servers A, B as a failover MGS/MDT pair, with IPs A=10.12.112.28 and B=10.12.115.120 over tcp. When server B crashes, MGS and MDT are mounted on A. Recovery times out with only one out of 445 clients recovered. Afterwards, the MDT lists all its OSTs as UP and in the logs of the OSTs I see: Lustre: MGC10.12.112.28 at tcp: Connection restored to service MGS using nid 10.12.112.28 at tcp. Lustre: lustre-OST008d: received MDS connection from 10.12.112.28 at tcp So far so good. However, no client will reconnect, nor will a client connect to server A when freshly mounted! I do "mount -t lustre 10.12.112.28:10.12.115.120:/lustre /mp" and get: Lustre: Lustre Version: 1.8.4 Lustre: Build Version: 1.8.4-19700101010000-PRISTINE-2.6.26-2-amd64 Lustre: Added LNI 10.12.68.195 at tcp [8/256/0/180] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; http://www.lustre.org/ Lustre: MGC10.12.112.28 at tcp: Reactivating import Lustre: 14530:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1347247522447397 sent from gsilust-MDT0000-mdc-ffff81033d489400 to NID 10.12.115.120 at tcp 5s ago has timed out (5s prior to deadline). req at ffff8103312da400 x1347247522447397/t0 o38->gsilust-MDT0000_UUID at 10.12.115.120@tcp:12/10 lens 368/584 e 0 to 1 dl 1284835365 ref 1 fl Rpc:N/0/0 rc 0/0 Obviously the clients stubbornly try to connect to the failed server, 10.12.115.120. I''m sure the failover has worked before, since server A had its problems last January, when the MDT was moved to B which has served the fs ever since. No apparent changes were introduced in the mean time, so now I am at a loss. Yours, Thomas