I''m seeing two error messages in my logs im not entirely sure how to interpret - clients os redhat v5.4 x86_64 kernel 2.6.18-164.11.1_lustre ofed 1.5.0 (qlogic version) lustre 1.8.1.1 - servers os redhat v5.3 x86_64 kernel 2.6.18-128.7.1_lustre ofed 1.4.2 (qlogic version) lustre 1.8.1.1 first error lustre: 8305:0 (client.c:1383:prlrpc_expire_one_require()) @@@ Request x1339802879282312 sent from fs1-ost0002 to nid 1.1.1.98 at o2ib 7s ago has timed out (limit 7s) req at ffff8101c7fb6800 x1339802879282312/t0 o106->@NET_0x5000064843162_UUID:15/16 lens 296/424 e 0 to 1 dl 1279653812 ref 27185 fl Rpc:/2/0 rc 0/0 Now it''s true the OST and the client do have infiniband cards, but the client does not have a NID setup for IB and the lustre client is not set to mount over o2ib only TCP, so i''m not sure why there are any RPC''s destined for IB. the OSS does have IB NID''s On the same OSS, I''m also seeing LustreError: 8429:0:(filter.c:2520:filter_grant_sanity_check()) filter_statfs: tot_granted 155983872 != fo_tot_granted 1959129600 LustreError: 8429:0:(filter.c:2523:filter_grant_sanity_check()) filter_statfs: tot_pending 0 != fo_tot_pending 3145728 I''m not sure if they''re related or not. Only one OSS server is doing this and I have 10 in production. second error (different OSS) LustreError: 8219:0:(ost_handler.c:1038:ost_brw_write()) client csum a5908b2, server csum c68fd7de LustreError: 168-f: fs1-ost000b: BAD WRITE CHECKSUM: changed in transit before arrival at OST from 12345-1.1.1.25 at o2ib inum 33358052/386095879 object 1779/0 extent [192495232-1942499327] these two lines will repeat a few times with different csum values, but from the same host. I''m also seeing processes deadlock (no time accumulation and not killable with -9) when reading files from the filesystem. I haven''t narrowed it down but it seems to happen more with striped files then non-striped Any thoughts? Thanks - Michael