Hi,
is there anybody who can read these messages and give me a hint where to
look for the problem? I''m getting rather easilly this LBUG due to
either
(o2iblnd_cb.c:1068:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0)
failed
or
(o2iblnd_cb.c:171:kiblnd_get_idle_tx()) ASSERTION(tx->tx_sending == 0) failed
Using lustre 1.6.1 as downloaded, on top of RHEL4U5, with o2ib and getting
this a few times per day while writing huge files with "dd".
Any hint (where to look into this further) would be very welcome! Some more
surroundings of the error message are below.
Best regards,
Erich
Lustre: necd3-OST0000-osc-0000010080cd5800: Connection restored to service
necd3-OST0000 using nid 192.168.0.27@o2ib.
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 00000100156ba000
LustreError: 6820:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 000001001f1f4000
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 00000100a10b0000
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 000001002e3fa000
LustreError: 6820:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 0000010066604000
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status
-5, desc 0000010070022000
LustreError: 6820:0:(events.c:55:request_out_callback()) @@@ type 4, status -5
req@0000010135463800 x1000806/t0 o400->MGS@MGC192.168.0.23@o2ib_0:26 lens
128/128 ref 2 fl Rpc:N/0/0 rc 0/-22
LustreError: 6820:0:(events.c:55:request_out_callback()) Skipped 6 previous
similar messages
LustreError: 6820:0:(o2iblnd_cb.c:1068:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
LustreError: 6819:0:(o2iblnd_cb.c:1068:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
LustreError: 6819:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG
Lustre: 6819:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for
process 6819
kiblnd_sd_00 R running task 0 6819 1 6824 6820 (L-TLB)
0000000000000000 0000000000000000 ffffffffa028d43d 0000000000000005
ffffff000006c5a0 0000000000000000 0000000000000005 ffffffffa0288894
0000000000000000 0000000000000000
Call Trace:<ffffffffa0288894>{:libcfs:libcfs_assertion_failed+84}
<ffffffffa0404d53>{:ko2iblnd:kiblnd_tx_complete+67}
<0>LustreError: 6820:0:(tracefile.c:433:libcfs_assertion_failed())
LBUG
kiblnd_sd_01 R running task 0 6820 1 6819 6821 (L-TLB)
0000000000000000 0000000000000000 ffffffffa028d43d 0000000000000005
ffffff000006c6d0 0000000000000000 0000000000000005 ffffffffa0288894
<ffffffff80133741>{__wake_up+54}
<ffffffffa0409e60>{:ko2iblnd:kiblnd_scheduler+736}
0000000000000000 0000000000000000
Call Trace:<ffffffffa0288894>{:libcfs:libcfs_assertion_failed+84}
<ffffffffa0404d53>{:ko2iblnd:kiblnd_tx_complete+67}
<ffffffff8013369a>{default_wake_function+0}
<ffffffff80110de3>{child_rip+8}
<ffffffffa0409b80>{:ko2iblnd:kiblnd_scheduler+0}
<ffffffff80133741>{__wake_up+54}<ffffffff80110ddb>{child_rip+0}
<3>LustreError: 6824:0:(client.c:962:ptlrpc_expire_one_request()) @@@
network error (sent at 1187792190, 0s ago) req@0000010135463800 x1000806/t0
o400->MGS@MGC192.168.0.23@o2ib_0:26 lens 128/128 ref 1 fl Rpc:N/0/0 rc 0/-22
<ffffffffa0409e60>{:ko2iblnd:kiblnd_scheduler+736}
<3>LustreError: 6824:0:(client.c:962:ptlrpc_expire_one_request())
Skipped 8 previous similar messages
LustreError: 166-1: MGC192.168.0.23@o2ib: Connection to service MGS via nid
192.168.0.23@o2ib was lost; in progress operations using this service will fail.
<ffffffff8013369a>{default_wake_function+0} <1>LustreError: dumping
log to /tmp/lustre-log.1187792190.6819
<ffffffff80110de3>{child_rip+8}
<ffffffffa0409b80>{:ko2iblnd:kiblnd_scheduler+0}
<ffffffff80110ddb>{child_rip+0}
LustreError: dumping log to /tmp/lustre-log.1187792190.6820
LustreError: 2697:0:(events.c:55:request_out_callback()) @@@ type 4, status -113
req@0000010080d9a800 x1000808/t0
o400->necd3-OST0000_UUID@192.168.0.27@o2ib:28 lens 128/128 ref 2 fl Rpc:N/0/0
rc 0/-22
LustreError: 2697:0:(events.c:55:request_out_callback()) Skipped 1 previous
similar message
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with
192.168.0.23@o2ib
Lustre: necd3-OST0000-osc-0000010080cd5800: Connection to service necd3-OST0000
via nid 192.168.0.27@o2ib was lost; in progress operations using this service
will wait for recovery to complete.
Lustre: Skipped 3 previous similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@0000010037e16200 x1000867/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens
232/240 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with
192.168.0.23@o2ib
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with
192.168.0.27@o2ib
LustreError: 6823:0:(events.c:55:request_out_callback()) @@@ type 4, status -103
req@00000100c7e68e00 x1000850/t0
o400->necd3-OST0003_UUID@192.168.0.27@o2ib:28 lens 128/128 ref 2 fl Rpc:N/0/0
rc 0/-22
LustreError: 6823:0:(events.c:55:request_out_callback()) Skipped 1 previous
similar message
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@000001007e3be600 x1000871/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens
232/240 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous
similar messages
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1187792290, 100s ago) req@00000100c7e68a00 x1000856/t0
o250->MGS@MGC192.168.0.23@o2ib_0:26 lens 304/328 ref 2 fl Rpc:/0/0 rc 0/-22
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 26
previous similar messages
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with
192.168.0.27@o2ib
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Skipped 1 previous
similar message
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@0000010008040c00 x1000886/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens
232/240 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous
similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@0000010132b50200 x1000890/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens
232/240 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous
similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@000001007d98ea00 x1000905/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens
232/240 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous
similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@0000010135548e00 x1000913/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens
232/240 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous
similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@0000010037ee6200 x1000924/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens
232/240 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous
similar messages
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1187792641, 100s ago) req@00000100c7eb1a00 x1000917/t0
o38->necd3-MDT0000_UUID@192.168.0.23@o2ib:12 lens 304/328 ref 2 fl Rpc:/0/0
rc 0/-22
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 63
previous similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req@0000010075926200 x1000939/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens
232/240 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous
similar messages