I know the subject line isn''t the best, but I don''t know what to say other then a luster client is acting up while others are fine. This client is our ''file'' server. It runs a nfs and samba server on top of the lustre mount. /etc/fstab 92.168.5.104 at tcp0:192.168.5.105 at tcp0:/lustre /lustre lustre defaults,localflock,_netdev 0 0 Right now lfs df -h shows all the oss as resource unavailable, yet lctl dl says they are up lctl dl 0 UP mgc MGC192.168.5.104 at tcp adc80ed6-e9a1-6791-e3aa-9a699e11275d 5 1 UP lov lustre-clilov-ffff81032f9a0400 db1e9918-482f-063d-1b42-c2c394a4c81b 4 2 UP mdc lustre-MDT0000-mdc-ffff81032f9a0400 db1e9918-482f-063d-1b42-c2c394a4c81b 5 3 UP osc lustre-OST0000-osc-ffff81032f9a0400 db1e9918-482f-063d-1b42-c2c394a4c81b 5 4 UP osc lustre-OST0001-osc-ffff81032f9a0400 db1e9918-482f-063d-1b42-c2c394a4c81b 5 5 UP osc lustre-OST0002-osc-ffff81032f9a0400 db1e9918-482f-063d-1b42-c2c394a4c81b 5 6 UP osc lustre-OST0003-osc-ffff81032f9a0400 db1e9918-482f-063d-1b42-c2c394a4c81b 5 On the cluster, all nodes are connected just fine, so it seems to just be this client. This is what I''m seeing from dmesg: Alot of these messages: LustreError: 4462:0:(llite_nfs.c:96:search_inode_for_lustre()) failure -2 inode 560441703 Then these messages when the ''disconnect'' happens Lustre: 13877:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1353138276259258 sent from lustre-OST0003-osc-ffff81032f9a0400 to NID 192.168.5.101 at tcp 7s ago has timed out (7s prior to deadline). req at ffff8101f18cd000 x1353138276259258/t0 o101->lustre-OST0003_UUID at 192.168.5.101@tcp:28/4 lens 296/544 e 0 to 1 dl 1297448442 ref 1 fl Rpc:/0/0 rc 0/0 Lustre: lustre-OST0003-osc-ffff81032f9a0400: Connection to service lustre-OST0003 via nid 192.168.5.101 at tcp was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-OST0003-osc-ffff81032f9a0400: Connection restored to service lustre-OST0003 using nid 192.168.5.101 at tcp. Lustre: 24416:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1353138276259591 sent from lustre-OST0002-osc-ffff81032f9a0400 to NID 192.168.5.101 at tcp 8s ago has timed out (7s prior to deadline). req at ffff810292c74c00 x1353138276259591/t0 o101->lustre-OST0002_UUID at 192.168.5.101@tcp:28/4 lens 296/544 e 0 to 1 dl 1297448442 ref 1 fl Rpc:/0/0 rc 0/0 Lustre: 24416:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1 previous similar message Lustre: lustre-OST0002-osc-ffff81032f9a0400: Connection to service lustre-OST0002 via nid 192.168.5.101 at tcp was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 1 previous similar message Lustre: lustre-OST0002-osc-ffff81032f9a0400: Connection restored to service lustre-OST0002 using nid 192.168.5.101 at tcp. Lustre: Skipped 1 previous similar message Lustre: 13877:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1353138276259258 sent from lustre-OST0003-osc-ffff81032f9a0400 to NID 192.168.5.101 at tcp 7s ago has timed out (7s prior to deadline). req at ffff8101f18cd000 x1353138276259258/t0 o101->lustre-OST0003_UUID at 192.168.5.101@tcp:28/4 lens 296/544 e 0 to 1 dl 1297448449 ref 1 fl Rpc:/2/0 rc 0/0 Lustre: lustre-OST0003-osc-ffff81032f9a0400: Connection to service lustre-OST0003 via nid 192.168.5.101 at tcp was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-OST0003-osc-ffff81032f9a0400: Connection restored to service lustre-OST0003 using nid 192.168.5.101 at tcp. Lustre: 13877:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1353138276318758 sent from lustre-OST0003-osc-ffff81032f9a0400 to NID 192.168.5.101 at tcp 0s ago has failed due to network error (7s prior to deadline). req at ffff810321140800 x1353138276318758/t0 o101->lustre-OST0003_UUID at 192.168.5.101@tcp:28/4 lens 296/544 e 0 to 1 dl 1297448467 ref 1 fl Rpc:/0/0 rc 0/0 Lustre: lustre-OST0003-osc-ffff81032f9a0400: Connection to service lustre-OST0003 via nid 192.168.5.101 at tcp was lost; in progress operations using this service will wait for recovery to complete. LustreError: 3897:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.5.100 LustreError: 11b-b: Connection to 192.168.5.100 at tcp at host 192.168.5.100 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.5.100 at tcp one of its NIDs? Lustre: 3904:0:(import.c:517:import_select_connection()) lustre-OST0002-osc-ffff81032f9a0400: tried all connections, increasing latency to 2s Lustre: 3903:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1353138276318825 sent from lustre-OST0000-osc-ffff81032f9a0400 to NID 192.168.5.101 at tcp 0s ago has failed due to network error (6s prior to deadline). req at ffff8102e8277000 x1353138276318825/t0 o8->lustre-OST0000_UUID at 192.168.5.101@tcp:28/4 lens 368/584 e 0 to 1 dl 1297448473 ref 1 fl Rpc:N/0/0 rc 0/0 Lustre: 3903:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 7 previous similar messages LustreError: 3899:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.5.101 LustreError: 3899:0:(socklnd_cb.c:1714:ksocknal_recv_hello()) Skipped 1 previous similar message LustreError: 11b-b: Connection to 192.168.5.101 at tcp at host 192.168.5.101 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.5.101 at tcp one of its NIDs? Which now just repeats. How can I get this client reconnected? -- Personally, I liked the university. They gave us money and facilities, we didn''t have to produce anything! You''ve never been out of college! You don''t know what it''s like out there! I''ve worked in the private sector. They expect results. -Ray Ghostbusters