Hi, is there anybody who can read these messages and give me a hint where to look for the problem? I''m getting rather easilly this LBUG due to either (o2iblnd_cb.c:1068:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed or (o2iblnd_cb.c:171:kiblnd_get_idle_tx()) ASSERTION(tx->tx_sending == 0) failed Using lustre 1.6.1 as downloaded, on top of RHEL4U5, with o2ib and getting this a few times per day while writing huge files with "dd". Any hint (where to look into this further) would be very welcome! Some more surroundings of the error message are below. Best regards, Erich Lustre: necd3-OST0000-osc-0000010080cd5800: Connection restored to service necd3-OST0000 using nid 192.168.0.27@o2ib. LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc 00000100156ba000 LustreError: 6820:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc 000001001f1f4000 LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc 00000100a10b0000 LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc 000001002e3fa000 LustreError: 6820:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc 0000010066604000 LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc 0000010070022000 LustreError: 6820:0:(events.c:55:request_out_callback()) @@@ type 4, status -5 req@0000010135463800 x1000806/t0 o400->MGS@MGC192.168.0.23@o2ib_0:26 lens 128/128 ref 2 fl Rpc:N/0/0 rc 0/-22 LustreError: 6820:0:(events.c:55:request_out_callback()) Skipped 6 previous similar messages LustreError: 6820:0:(o2iblnd_cb.c:1068:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed LustreError: 6819:0:(o2iblnd_cb.c:1068:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed LustreError: 6819:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG Lustre: 6819:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 6819 kiblnd_sd_00 R running task 0 6819 1 6824 6820 (L-TLB) 0000000000000000 0000000000000000 ffffffffa028d43d 0000000000000005 ffffff000006c5a0 0000000000000000 0000000000000005 ffffffffa0288894 0000000000000000 0000000000000000 Call Trace:<ffffffffa0288894>{:libcfs:libcfs_assertion_failed+84} <ffffffffa0404d53>{:ko2iblnd:kiblnd_tx_complete+67} <0>LustreError: 6820:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG kiblnd_sd_01 R running task 0 6820 1 6819 6821 (L-TLB) 0000000000000000 0000000000000000 ffffffffa028d43d 0000000000000005 ffffff000006c6d0 0000000000000000 0000000000000005 ffffffffa0288894 <ffffffff80133741>{__wake_up+54} <ffffffffa0409e60>{:ko2iblnd:kiblnd_scheduler+736} 0000000000000000 0000000000000000 Call Trace:<ffffffffa0288894>{:libcfs:libcfs_assertion_failed+84} <ffffffffa0404d53>{:ko2iblnd:kiblnd_tx_complete+67} <ffffffff8013369a>{default_wake_function+0} <ffffffff80110de3>{child_rip+8} <ffffffffa0409b80>{:ko2iblnd:kiblnd_scheduler+0} <ffffffff80133741>{__wake_up+54}<ffffffff80110ddb>{child_rip+0} <3>LustreError: 6824:0:(client.c:962:ptlrpc_expire_one_request()) @@@ network error (sent at 1187792190, 0s ago) req@0000010135463800 x1000806/t0 o400->MGS@MGC192.168.0.23@o2ib_0:26 lens 128/128 ref 1 fl Rpc:N/0/0 rc 0/-22 <ffffffffa0409e60>{:ko2iblnd:kiblnd_scheduler+736} <3>LustreError: 6824:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 8 previous similar messages LustreError: 166-1: MGC192.168.0.23@o2ib: Connection to service MGS via nid 192.168.0.23@o2ib was lost; in progress operations using this service will fail. <ffffffff8013369a>{default_wake_function+0} <1>LustreError: dumping log to /tmp/lustre-log.1187792190.6819 <ffffffff80110de3>{child_rip+8} <ffffffffa0409b80>{:ko2iblnd:kiblnd_scheduler+0} <ffffffff80110ddb>{child_rip+0} LustreError: dumping log to /tmp/lustre-log.1187792190.6820 LustreError: 2697:0:(events.c:55:request_out_callback()) @@@ type 4, status -113 req@0000010080d9a800 x1000808/t0 o400->necd3-OST0000_UUID@192.168.0.27@o2ib:28 lens 128/128 ref 2 fl Rpc:N/0/0 rc 0/-22 LustreError: 2697:0:(events.c:55:request_out_callback()) Skipped 1 previous similar message LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with 192.168.0.23@o2ib Lustre: necd3-OST0000-osc-0000010080cd5800: Connection to service necd3-OST0000 via nid 192.168.0.27@o2ib was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 3 previous similar messages LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@0000010037e16200 x1000867/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with 192.168.0.23@o2ib LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with 192.168.0.27@o2ib LustreError: 6823:0:(events.c:55:request_out_callback()) @@@ type 4, status -103 req@00000100c7e68e00 x1000850/t0 o400->necd3-OST0003_UUID@192.168.0.27@o2ib:28 lens 128/128 ref 2 fl Rpc:N/0/0 rc 0/-22 LustreError: 6823:0:(events.c:55:request_out_callback()) Skipped 1 previous similar message LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@000001007e3be600 x1000871/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous similar messages LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) @@@ timeout (sent at 1187792290, 100s ago) req@00000100c7e68a00 x1000856/t0 o250->MGS@MGC192.168.0.23@o2ib_0:26 lens 304/328 ref 2 fl Rpc:/0/0 rc 0/-22 LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 26 previous similar messages LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with 192.168.0.27@o2ib LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Skipped 1 previous similar message LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@0000010008040c00 x1000886/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous similar messages LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@0000010132b50200 x1000890/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous similar messages LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@000001007d98ea00 x1000905/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous similar messages LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@0000010135548e00 x1000913/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous similar messages LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@0000010037ee6200 x1000924/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous similar messages LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) @@@ timeout (sent at 1187792641, 100s ago) req@00000100c7eb1a00 x1000917/t0 o38->necd3-MDT0000_UUID@192.168.0.23@o2ib:12 lens 304/328 ref 2 fl Rpc:/0/0 rc 0/-22 LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 63 previous similar messages LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@0000010075926200 x1000939/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous similar messages