Updates:
I reformatted the file system and this time i am only using one node
acting as OSS, MGS and MDT. And a different node mounting that file
system. Now I run my iozone on the mounted lustre filesystem with single
thread and it works fine, then with two threads the file system works
fine but as soon as i run iozone with 4 threads iozone -s12G -r2048k -t4
i see the crash.
The output from the OSS,MDT,MGS node is as follows
Lustre: MDS lfs-MDT0000: lfs-OST0001_UUID now active, resetting orphans
LustreError: 1785:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to 192.2.1.1@o2ib: error 0(waiting)
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 4,
status -103, desc 0000010102eb0000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 2,
status -103, desc 0000010102eb0000
LustreError: 6662:0:(ost_handler.c:974:ost_brw_write()) @@@ network
error on bulk GET 0(1048576)
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 4,
status -103, desc 0000010102eaa000
Lustre: 6662:0:(ost_handler.c:1062:ost_brw_write()) lfs-OST0000:
ignoring bulk IO comm error with
5d80c3a8-13b8-12f8-e058-f1f865797726@NET_0x50000c0020101_UUID id
12345-192.2.1.1@o2ib - client will retry
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 2,
status -103, desc 0000010102eaa000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 4,
status -103, desc 0000010090996000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 2,
status -103, desc 0000010090996000
LustreError: 6554:0:(ost_handler.c:974:ost_brw_write()) @@@ network
error on bulk GET 0(1048576)
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 4,
status -103, desc 0000010068524000
LustreError: 6554:0:(ost_handler.c:974:ost_brw_write()) Skipped 3
previous similar messages
Lustre: 6554:0:(ost_handler.c:1062:ost_brw_write()) lfs-OST0001:
ignoring bulk IO comm error with
5d80c3a8-13b8-12f8-e058-f1f865797726@NET_0x50000c0020101_UUID id
12345-192.2.1.1@o2ib - client will retry
Lustre: 6554:0:(ost_handler.c:1062:ost_brw_write()) Skipped 1 previous
similar message
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 2,
status -103, desc 0000010068524000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 4,
status -103, desc 000001008a5b4000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 2,
status -103, desc 000001008a5b4000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 4,
status -103, desc 0000010090922000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 2,
status -103, desc 0000010090922000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 4,
status -103, desc 0000010102eb2000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 2,
status -103, desc 0000010102eb2000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 4,
status -103, desc 000001008a5f6000
LustreError: 5762:0:(events.c:299:server_bulk_callback()) event type 2,
status -103, desc 000001008a5f6000
Lustre: 1785:0:(o2iblnd_cb.c:2147:kiblnd_passive_connect()) Conn race
192.2.1.1@o2ib
Lustre: 6259:0:(ldlm_lib.c:492:target_handle_reconnect()) lfs-OST0000:
5d80c3a8-13b8-12f8-e058-f1f865797726 reconnecting
LustreError: 5762:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
RDMA with 192.2.1.1@o2ib
LustreError: 5762:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to 192.2.1.1@o2ib: error
-110(sending)(sending_rsrvd)(sending_nocred)
Lustre: MGS: haven''t heard from client
d4eb170a-d9d7-1af6-fb8e-e360379507ba (at 192.2.1.1@o2ib) in 242 seconds.
I think it''s dead, and I am evicting it.
Lustre: lfs-OST0000: haven''t heard from client
5d80c3a8-13b8-12f8-e058-f1f865797726 (at 192.2.1.1@o2ib) in 235 seconds.
I think it''s dead, and I am evicting it.
Lustre: lfs-MDT0000: haven''t heard from client
5d80c3a8-13b8-12f8-e058-f1f865797726 (at 192.2.1.1@o2ib) in 253 seconds.
I think it''s dead, and I am evicting it.
Lustre: lfs-OST0001: haven''t heard from client
5d80c3a8-13b8-12f8-e058-f1f865797726 (at 192.2.1.1@o2ib) in 239 seconds.
I think it''s dead, and I am evicting it.
The messages from the client is as follows
Lustre: OBD class driver, info@clusterfs.com
Lustre Version: 1.5.95
Build Version:
1.5.95-19691231160000-PRISTINE-.usr.src.linux-2.6.9-42.EL_lustre.1.5.95smp
Lustre: Added LNI 192.2.1.1@o2ib [8/64]
Lustre: Lustre Client File System; info@clusterfs.com
Lustre: mount data:
Lustre: profile: lfs-client
Lustre: device: 192.2.1.2@o2ib:/lfs
Lustre: flags: 2
Lustre: 0 UP mgc MGC192.2.1.2@o2ib
d4eb170a-d9d7-1af6-fb8e-e360379507ba 5
Lustre: 1 UP lov lfs-clilov-000001007e238c00
5d80c3a8-13b8-12f8-e058-f1f865797726 3
Lustre: 2 UP mdc lfs-MDT0000-mdc-000001007e238c00
5d80c3a8-13b8-12f8-e058-f1f865797726 4
Lustre: 3 UP osc lfs-OST0000-osc-000001007e238c00
5d80c3a8-13b8-12f8-e058-f1f865797726 4
Lustre: 4 UP osc lfs-OST0001-osc-000001007e238c00
5d80c3a8-13b8-12f8-e058-f1f865797726 4
Lustre: mount 192.2.1.2@o2ib:/lfs complete
ib_mthca 0000:02:00.0: SQ dc0406 full (26642862 head, 26640803 tail,
2064 max, 5 nreq)
LustreError: 8700:0:(o2iblnd_cb.c:976:kiblnd_check_sends()) Error -12
posting transmit to 192.2.1.2@o2ib
LustreError: 8700:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to 192.2.1.2@o2ib: error -12(waiting)
LustreError: 8700:0:(events.c:127:client_bulk_callback()) event type 0,
status -12, desc 0000010077aa8000
LustreError: 8700:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
192.2.1.2@o2ib failed: 5
LustreError: 8700:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
192.2.1.2@o2ib failed: 5
LustreError: 8700:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) Skipped 42
previous similar messages
LustreError: 8700:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 00000100af53c000
LustreError: 8700:0:(events.c:51:request_out_callback()) @@@ type 4,
status -5
LustreError: 8729:0:(client.c:950:ptlrpc_expire_one_request()) @@@
timeout (sent at 1163109128, 0s ago)
Lustre: 8729:0:(peer.c:238:lnet_debug_peer()) 192.2.1.2@o2ib
8 up 8 8 8 8 -8 0
LustreError: lfs-OST0000-osc-000001007e238c00: Connection to service
lfs-OST0000 via nid 192.2.1.2@o2ib was lost; in progress operations
using this service will wait for recovery to complete.
LustreError: 8700:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 000001003a6b0000
LustreError: 8700:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
192.2.1.2@o2ib failed: 5
LustreError: 8700:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) Skipped 471
previous similar messages
LustreError: 8700:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 000001011726a000
LustreError: 8700:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
192.2.1.2@o2ib failed: 5
LustreError: 8700:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 000001010ccac000
LustreError: 8700:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 0000010062af0000
LustreError: 8700:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
192.2.1.2@o2ib failed: 5
LustreError: 8700:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) Skipped 587
previous similar messages
LustreError: 8700:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 0000010041fbc000
LustreError: 8700:0:(events.c:127:client_bulk_callback()) event type 0,
status -5, desc 0000010064a70000
LustreError: 8700:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
LustreError: 8700:0:(linux-debug.c:130:lbug_with_loc()) LBUG
Lustre: 8700:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing
stack for process 8700
kiblnd_sd_03 R running task 0 8700 1 8701 8699
(L-TLB)
00000100080097e0 0000000000000000 0000000000000001 0000000000000216
0000000000000012 0000000000000001 ffffffffa0425f28
ffffffff80111731
0000010146527dd8 000000000000019c
Call Trace:<ffffffff80111600>{show_trace+375}
<ffffffff8011173c>{show_stack+241}
<ffffffffa02ed9f6>{:libcfs:lbug_with_loc+72}
<ffffffffa02f1d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0}
<ffffffffa0418848>{:ko2iblnd:kiblnd_tx_complete+80}
<ffffffffa041d0cc>{:ko2iblnd:kiblnd_scheduler+1210}
<ffffffff801331a9>{default_wake_function+0} <6>Lustre:
lfs-OST0000-osc-000001007e238c00: Connection restored to service
lfs-OST0000 using nid 192.2.1.2@o2ib.
Lustre: 8697:0:(lib-move.c:1624:lnet_parse_put()) Dropping PUT from
12345-192.2.1.2@o2ib portal 4 match 246764 offset 0 length 320: 2
LustreError: 8697:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
LustreError: 8697:0:(linux-debug.c:130:lbug_with_loc()) LBUG
Lustre: 8697:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing
stack for process 8697
kiblnd_sd_00 R running task 0 8697 1 8698 7435
(L-TLB)
00000100080097e0 0000000000000000 0000000000000001 0000000000000216
0000000000000012 0000000000000001 ffffffffa0425f28
ffffffff80111731
000001013b93fdd8 000000000000019c
Call Trace:<ffffffff80131f2f>{schedule_tail+55}
<ffffffff80110e23>{child_rip+8}
<ffffffffa041cc12>{:ko2iblnd:kiblnd_scheduler+0}
<ffffffff80111600>{show_trace+375}
<ffffffff80110e1b>{child_rip
+0}
<ffffffff8011173c>{show_stack+241}
<ffffffffa02ed9f6>{:libcfs:lbug_with_loc+72}
<ffffffffa02f1d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0} <0>LustreError:
8699:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
ASSERTION(tx->tx_sending> 0) failed
kiblnd_sd_02 R running task 0 8699 1 8700
8698 (L-TLB)
00000100080097e0 0000000000000000 0000000000000001 0000000000000216
0000000000000012 <ffffffffa0418848>{:ko2iblnd:kiblnd_tx_complete
+80}0000000000000001 ffffffffa0425f28 ffffffff80111731
0000010146525dd8 000000000000019c
Call Trace:
<ffffffffa041d0cc>{:ko2iblnd:kiblnd_scheduler+1210}
<ffffffff80111600>{show_trace+375}
<ffffffff8011173c>{show_stack
+241}
<ffffffffa02ed9f6>{:libcfs:lbug_with_loc+72}
<ffffffffa02f1d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0}
<ffffffffa0418848>{:ko2iblnd:kiblnd_tx_complete+80}
<ffffffffa041d0cc>{:ko2iblnd:kiblnd_scheduler+1210}
<ffffffff801331a9>{default_wake_function+0}
<ffffffff801331a9>{default_wake_function+0}
<ffffffff80131f2f>{schedule_tail+55}
<ffffffff80110e23>{child_rip+8}
<ffffffffa041cc12>{:ko2iblnd:kiblnd_scheduler+0}
<ffffffff80110e1b>{child_rip+0}
<ffffffff80131f2f>{schedule_tail+55}
<1>LustreError: dumping log to /tmp/lustre-log.1163109128.8697
<ffffffff80110e23>{child_rip+8}
<ffffffffa041cc12>{:ko2iblnd:kiblnd_scheduler+0}
<4>Lustre: 8698:0:(lib-move.c:1624:lnet_parse_put()) Dropping PUT
from 12345-192.2.1.2@o2ib portal 4 match 246762 offset 0 length 320: 2
Lustre: 8698:0:(lib-move.c:1624:lnet_parse_put()) Skipped 1 previous
similar message
LustreError: 8698:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
ASSERTION(tx->tx_sending > 0) failed
kiblnd_sd_01 R running task 0 8698 1 8699 8697
(L-TLB)
000001007e215e00 00000000c0020102 0000000100000001 0000000000000216
000001005576deb8 0000000100000001 0000000000000000
ffffffff80111731
000001005576ddd8 000000000000019c
Call Trace:<ffffffff80110e1b>{child_rip+0}
<ffffffff80111600>{show_trace+375}
<ffffffff8011173c>{show_stack+241}
<ffffffffa02ed9f6>{:libcfs:lbug_with_loc+72}
<ffffffffa02f1d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0}
<ffffffffa0418848>{:ko2iblnd:kiblnd_tx_complete+80}
<ffffffffa041d0cc>{:ko2iblnd:kiblnd_scheduler+1210}
<1>LustreError: dumping log to /tmp/lustre-log.1163109128.8700
LustreError: dumping log to /tmp/lustre-log.1163109128.8699
<ffffffff801331a9>{default_wake_function+0}
<ffffffff80131f2f>{schedule_tail+55}
<ffffffff80110e23>{child_rip+8}
<ffffffffa041cc12>{:ko2iblnd:kiblnd_scheduler+0}
<ffffffff80110e1b>{child_rip+0}
LustreError: dumping log to /tmp/lustre-log.1163109128.8698
LustreError: can''t open /tmp/lustre-log.1163109128.8698 file: err -17
LustreError: can''t open /tmp/lustre-log.1163109128.8698 for dump: rc
-17
LustreError: can''t open /tmp/lustre-log.1163109128.8698 file: err -17
LustreError: can''t open /tmp/lustre-log.1163109128.8698 for dump: rc
-17
Lustre: 8699:0:(linux-debug.c:96:libcfs_run_upcall()) Invoked LNET
upcall /usr/lib/lustre/lnet_upcall
LBUG,/usr/src/redhat/BUILD/lustre-1.5.95/lnet/libcfs/tracefile.c,libcfs_assertion_failed,412
LustreError: 2476:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
Closing conn to 192.2.1.2@o2ib: error 0(waiting)
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) @@@ Unexpectedly
long timeout: desc 00000100a7604000
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) @@@ Unexpectedly
long timeout: desc 00000100a7604000
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) Skipped 1
previous similar message
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) @@@ Unexpectedly
long timeout: desc 00000100a7604000
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) Skipped 1
previous similar message
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) @@@ Unexpectedly
long timeout: desc 00000100a7604000
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) Skipped 1
previous similar message
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) @@@ Unexpectedly
long timeout: desc 00000100a7604000
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) Skipped 1
previous similar message
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) @@@ Unexpectedly
long timeout: desc 00000100a7604000
Lustre: 8729:0:(niobuf.c:302:ptlrpc_unregister_bulk()) Skipped 1
previous similar message
On Wed, 2006-11-08 at 09:29 -0800, Anand wrote:> Hi,
>
> I was testing lustre 1.6 beta (5) with standard linux 2.6.9-42 kernel on
> x86_64 (CentOS 4) installation with OFED-1.1 (IB) as the interconnect.
> The setup is such that 8 nodes are sharing one OST (80G) and a separate
> node shares two OST''s and is also acting as the MGS/MDS.
>
> The 8 Nodes also mount the exported file system. When i try to run
> iozone on the lustre file system from the head node (that is only a
> client) and visit that directory from any other node there are crashes.
> The logs are attached below.
>
> Could somebody tell me what might be wrong. I understand that this is
> not a recommended configuration but I just wanted to check the
> scalability.
>
> Thanks
>
> Anand
>
>
> ##########HEAD NODE (CLIENT ONLY)###################################
> Lustre: OBD class driver, info@clusterfs.com
> Lustre Version: 1.5.95
> Build Version:
> 1.5.95-19691231160000-PRISTINE-.usr.src.linux-2.6.9-42.EL_lustre.1.5.95smp
> Lustre: Added LNI 192.2.1.1@o2ib [8/64]
> Lustre: Lustre Client File System; info@clusterfs.com
> Lustre: mount data:
> Lustre: profile: lfs-client
> Lustre: device: 192.2.1.2@o2ib:/lfs
> Lustre: flags: 2
> Lustre: 0 UP mgc MGC192.2.1.2@o2ib
> 59df7475-a854-b1bc-abc4-1ad9b947113a 5
> Lustre: 1 UP lov lfs-clilov-000001007e2eb400
> 062c1341-76b7-b41d-90be-68497e189830 3
> Lustre: 2 UP mdc lfs-MDT0000-mdc-000001007e2eb400
> 062c1341-76b7-b41d-90be-68497e189830 4
> Lustre: 3 UP osc lfs-OST0000-osc-000001007e2eb400
> 062c1341-76b7-b41d-90be-68497e189830 4
> Lustre: 4 UP osc lfs-OST0001-osc-000001007e2eb400
> 062c1341-76b7-b41d-90be-68497e189830 4
> Lustre: mount 192.2.1.2@o2ib:/lfs complete
> Losing some ticks... checking if CPU frequency changed.
> LustreError: 7476:0:(lib-move.c:93:lnet_try_match_md()) Matching packet
> from 12345-192.2.1.2@o2ib, match 365275 length 808 too big: 560 left, 5
> Lustre: 7476:0:(lib-move.c:1624:lnet_parse_put()) Dropping PUT from
> 12345-192.2.1.2@o2ib portal 10 match 365275 offset 0 length 808: 2
> LustreError: 19139:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162941625, 100s ago)
> Lustre: 19139:0:(peer.c:238:lnet_debug_peer()) 192.2.1.2@o2ib
> 2 up 8 8 8 8 -8 0
> LustreError: lfs-MDT0000-mdc-000001007e2eb400: Connection to service
> lfs-MDT0000 via nid 192.2.1.2@o2ib was lost; in progress operations
> using
> Lustre: lfs-MDT0000-mdc-000001007e2eb400: Connection restored to service
> lfs-MDT0000 using nid 192.2.1.2@o2ib.
> ib_mthca 0000:02:00.0: SQ c9040a full (485434 head, 483377 tail, 2064
> max, 7 nreq)
> LustreError: 7477:0:(o2iblnd_cb.c:976:kiblnd_check_sends()) Error -12
> posting transmit to 192.2.1.250@o2ib
> LustreError: 7477:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.250@o2ib: error -12(waiting)
> LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
> status -12, desc 000001002d302000
> LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
> status -103, desc 0000010101dca000
> LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
> status -103, desc 000001002e270000
> LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
> status -103, desc 0000010085b7a000
> LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
> status -103, desc 000001007c8ca000
> LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
> status -103, desc 0000010106992000
> LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
> status -103, desc 000001009039a000
> LustreError: 7477:0:(events.c:127:client_bulk_callback()) event type 0,
> status -103, desc 0000010143248000
> LustreError: 7507:0:(client.c:904:ptlrpc_check_set()) @@@ bulk transfer
> failed
> LustreError: 7507:0:(linux-debug.c:130:lbug_with_loc()) LBUG
> Lustre: 7507:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing
> stack for process 7507
> ptlrpcd S 00000100784b9800 0 7507 1 7508 7480
> (L-TLB)
> 0000000300000000 00000100410739a8 0000000000000000 0000000000000001
> 0000000000000246 0000000000000003 000001012969c000
> 0000000000000004
> 0000000000000000 ffffffffa03b7630
> Call Trace:<ffffffff80148773>{__kernel_text_address+26}
> <ffffffff80111600>{show_trace+375}
> <ffffffff8011173c>{show_stack+241}
> <ffffffffa02ed9f6>{:libcfs:lbug_with_loc+72}
> <ffffffffa03b978d>{:ptlrpc:ptlrpc_check_set+1782}
> <ffffffffa03d8a47>{:ptlrpc:ptlrpcd_check+279}
> <ffffffffa03d8d36>{:ptlrpc:ptlrpcd+533}
> <ffffffff801331a9>{default_wake_function+0}
> <ffffffffa03b850b>{:ptlrpc:ptlrpc_expired_set+0}
> <ffffffffa03b850b>{:ptlrpc:ptlrpc_expired_set+0}
> <ffffffff801331a9>{default_wake_function+0}
> <ffffffff80131f2f>{schedule_tail+55}
> <ffffffff80110e23>{child_rip+8}
> <ffffffffa03d8b21>{:ptlrpc:ptlrpcd+0}
> <ffffffff80110e1b>{child_rip+0}
> LustreError: dumping log to /tmp/lustre-log.1162941923.7507
> Lustre: 7507:0:(linux-debug.c:96:libcfs_run_upcall()) Invoked LNET
> upcall /usr/lib/lustre/lnet_upcall
> LBUG,/usr/src/redhat/BUILD/lustre-1.5.95/
> Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-MDT0000_UUID INACTIVE by administrator request
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
> of page 000001007ee1e458 failed: -5
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
> of page 000001007e6fa8c0 failed: -5
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped 70
> previous similar messages
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
> of page 000001007f34a370 failed: -5
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped 273
> previous similar messages
> Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-OST0001_UUID INACTIVE by administrator request
> Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 1
> previous similar message
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
> of page 000001007f8c3330 failed: -5
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped 383
> previous similar messages
> Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-OST0002_UUID INACTIVE by administrator request
> LustreError: 19691:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 1 previous
> similar message
> LustreError: 19351:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 3 previous
> similar messages
> LustreError: 19351:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 19351:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 3 previous
> similar messages
> LustreError: 19351:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 19693:0:(llite_lib.c:1376:ll_statfs_internal()) mdc_statfs
> fails: rc = -5
> LustreError: 19717:0:(llite_lib.c:1376:ll_statfs_internal()) mdc_statfs
> fails: rc = -5
> LustreError: 19690:0:(import.c:203:ptlrpc_invalidate_import())
> lfs-OST0002_UUID: rc = -110 waiting for callback (5 != 0)
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
> of page 000001007fae07f0 failed: -5
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped 294
> previous similar messages
> Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-OST0003_UUID INACTIVE by administrator request
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) writepage
> of page 000001007f45b488 failed: -5
> LustreError: 19690:0:(file.c:710:ll_pgcache_remove_extent()) Skipped
> 1386 previous similar messages
> Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-OST0005_UUID INACTIVE by administrator request
> Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 1
> previous similar message
> LustreError: 19814:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 19814:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 3 previous
> similar messages
> LustreError: 19814:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 19812:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 19812:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683700
> LustreError: 19690:0:(import.c:203:ptlrpc_invalidate_import())
> lfs-OST0005_UUID: rc = -110 waiting for callback (8 != 0)
> Lustre: 19690:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-OST0006_UUID INACTIVE by administrator request
> LustreError: 19690:0:(import.c:203:ptlrpc_invalidate_import())
> lfs-OST0008_UUID: rc = -110 waiting for callback (5 != 0)
> LustreError: 20058:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 20058:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 21 previous
> similar messages
> LustreError: 20058:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 20058:0:(file.c:2215:ll_inode_revalidate_fini()) Skipped 21
> previous similar messages
> LustreError: 20059:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 20059:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 20060:0:(llite_lib.c:1376:ll_statfs_internal()) mdc_statfs
> fails: rc = -5
> LustreError: 18111:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 18111:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 18111:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 18111:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-MDT0000_UUID INACTIVE by administrator request
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 2
> previous similar messages
> LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
> lfs-OST0002_UUID: rc = -110 waiting for callback (5 != 0)
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-OST0003_UUID INACTIVE by administrator request
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 3
> previous similar messages
> LustreError: 7478:0:(o2iblnd_cb.c:1022:kiblnd_tx_complete()) tx ->
> 192.2.1.252@o2ib type d0 cookie 0xea03dsending 1 waiting 0: failed 12
> LustreError: 7478:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.252@o2ib: error -5(waiting)
> LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
> lfs-OST0005_UUID: rc = -110 waiting for callback (8 != 0)
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-OST0006_UUID INACTIVE by administrator request
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 2
> previous similar messages
> LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
> lfs-OST0008_UUID: rc = -110 waiting for callback (5 != 0)
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-MDT0000_UUID INACTIVE by administrator request
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 2
> previous similar messages
> LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
> lfs-OST0002_UUID: rc = -110 waiting for callback (5 != 0)
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-OST0003_UUID INACTIVE by administrator request
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 3
> previous similar messages
> LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
> lfs-OST0005_UUID: rc = -110 waiting for callback (8 != 0)
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) setting
> import lfs-OST0006_UUID INACTIVE by administrator request
> Lustre: 20065:0:(recover.c:230:ptlrpc_set_import_active()) Skipped 2
> previous similar messages
> LustreError: 20065:0:(import.c:203:ptlrpc_invalidate_import())
> lfs-OST0008_UUID: rc = -110 waiting for callback (5 != 0)
> LustreError: 20863:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 20863:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 21239:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 21239:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 21240:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 21240:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 21344:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 21344:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 21345:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 21345:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 21346:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 21347:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 21347:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 1 previous
> similar message
> LustreError: 21418:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 21420:0:(llite_lib.c:1376:ll_statfs_internal()) mdc_statfs
> fails: rc = -5
> LustreError: 21422:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue:
> -5
> LustreError: 21422:0:(mdc_locks.c:414:mdc_enqueue()) Skipped 1 previous
> similar message
> LustreError: 21422:0:(file.c:2215:ll_inode_revalidate_fini()) failure -5
> inode 2683697
> LustreError: 21422:0:(file.c:2215:ll_inode_revalidate_fini()) Skipped 1
> previous similar message
>
>
> ##########################OST Node (one of them)#######################
> LDISKFS FS on sdb, internal journal
> LDISKFS-fs: mounted filesystem with ordered data mode.
> Lustre: OBD class driver, info@clusterfs.com
> Lustre Version: 1.5.95
> Build Version:
> 1.5.95-19691231160000-PRISTINE-.usr.src.linux-2.6.9-42.EL_lustre.1.5.95smp
> Lustre: Added LNI 192.2.1.254@o2ib [8/64]
> Lustre: Lustre Client File System; info@clusterfs.com
> Lustre: mount data:
> Lustre: device: /dev/sdb
> Lustre: flags: 0
> kjournald starting. Commit interval 5 seconds
> LDISKFS FS on sdb, internal journal
> LDISKFS-fs: mounted filesystem with ordered data mode.
> Lustre: disk data:
> Lustre: server: lfs-OSTffff
> Lustre: uuid:
> Lustre: fs: lfs
> Lustre: index: ffff
> Lustre: config: 1
> Lustre: flags: 0x72
> Lustre: diskfs: ldiskfs
> Lustre: options: errors=remount-ro,extents,mballoc
> Lustre: params: mgsnode=192.2.1.2@o2ib
> Lustre: comment:
> kjournald starting. Commit interval 5 seconds
> LDISKFS FS on sdb, internal journal
> LDISKFS-fs: mounted filesystem with ordered data mode.
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> Lustre: disk data:
> Lustre: server: lfs-OST0002
> Lustre: uuid:
> Lustre: fs: lfs
> Lustre: index: 0002
> Lustre: config: 2
> Lustre: flags: 0x2
> Lustre: diskfs: ldiskfs
> Lustre: options: errors=remount-ro,extents,mballoc
> Lustre: params: mgsnode=192.2.1.2@o2ib
> Lustre: comment:
> Lustre: Filtering OBD driver; info@clusterfs.com
> Lustre: lfs-OST0002: new disk, initializing
> Lustre: OST lfs-OST0002 now serving dev
> (lfs-OST0002/d4578f7c-974c-4317-9537-a5d96bfe529c) with recovery enabled
> Lustre: 0 UP mgc MGC192.2.1.2@o2ib
> c6fccd10-45a7-90ea-8d4b-7411ba7bd65f 6
> Lustre: 1 UP ost OSS OSS_uuid 3
> Lustre: 2 UP obdfilter lfs-OST0002 lfs-OST0002_UUID 3
> Lustre: mount /dev/sdb complete
> Lustre: lfs-OST0002: received MDS connection from 192.2.1.2@o2ib
> Lustre: mount data:
> Lustre: profile: lfs-client
> Lustre: device: 192.2.1.2@o2ib:/lfs
> Lustre: flags: 2
> Lustre: 0 UP mgc MGC192.2.1.2@o2ib
> c6fccd10-45a7-90ea-8d4b-7411ba7bd65f 5
> Lustre: 1 UP ost OSS OSS_uuid 3
> Lustre: 2 UP obdfilter lfs-OST0002 lfs-OST0002_UUID 9
> Lustre: 3 UP lov lfs-clilov-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 3
> Lustre: 4 UP mdc lfs-MDT0000-mdc-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 4
> Lustre: 5 UP osc lfs-OST0000-osc-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 4
> Lustre: 6 UP osc lfs-OST0001-osc-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 4
> Lustre: 7 UP osc lfs-OST0002-osc-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 4
> Lustre: 8 UP osc lfs-OST0003-osc-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 4
> Lustre: 9 UP osc lfs-OST0004-osc-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 4
> Lustre: 10 UP osc lfs-OST0005-osc-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 4
> Lustre: 11 UP osc lfs-OST0006-osc-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 4
> Lustre: 12 UP osc lfs-OST0007-osc-000001007103b400
> 45081e76-b115-3f24-80c8-ab1470284e1f 4
> Lustre: mount 192.2.1.2@o2ib:/lfs complete
> Lustre: lfs-OST0002: haven''t heard from client
> 062c1341-76b7-b41d-90be-68497e189830 (at 192.2.1.1@o2ib) in 237 seconds.
> I think it''s dead, and I am evicting it.
> ib_mthca 0000:02:00.0: SQ 00040a full (198882 head, 196825 tail, 2064
> max, 7 nreq)
> LustreError: 5937:0:(o2iblnd_cb.c:976:kiblnd_check_sends()) Error -12
> posting transmit to 192.2.1.250@o2ib
> LustreError: 5937:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.250@o2ib: error -12(waiting)
> LustreError: 5937:0:(events.c:127:client_bulk_callback()) event type 0,
> status -12, desc 000001005e632000
> LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
> 192.2.1.250@o2ib failed: 5
> LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
> 192.2.1.250@o2ib failed: 5
> LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) Skipped 29
> previous similar messages
> LustreError: 5936:0:(events.c:127:client_bulk_callback()) event type 0,
> status -5, desc 000001003b480000
> LustreError: 5936:0:(events.c:127:client_bulk_callback()) event type 0,
> status -5, desc 0000010135818000
> LustreError: 5938:0:(events.c:127:client_bulk_callback()) event type 0,
> status -5, desc 000001000ee8a000
> LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) RDMA to
> 192.2.1.250@o2ib failed: 5
> LustreError: 5936:0:(o2iblnd_cb.c:1009:kiblnd_tx_complete()) Skipped 688
> previous similar messages
> LustreError: 5936:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
> ASSERTION(tx->tx_sending > 0) failed
> LustreError: 5938:0:(events.c:127:client_bulk_callback()) event type 0,
> status -5, desc 000001000bfba000
> LustreError: 5937:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
> ASSERTION(tx->tx_sending > 0) failed
> LustreError: 5937:0:(linux-debug.c:130:lbug_with_loc()) LBUG
> LustreError: 5938:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
> ASSERTION(tx->tx_sending > 0) failed
> Lustre: 5937:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing
> stack for process 5937
> kiblnd_sd_02 R running task 0 5937 1 5938 5936
> (L-TLB)
> kiblnd_sd_03 00000100080017e0 0000000000000000 0000000000000001
> 0000000000000216
> 0000000000000012 0000000000000001 ffffffffa0396f28
> ffffffff80111731
> 000001007c049dd8 000000000000019c R running task 0 5938
> 1 5939
> Call Trace: 5937 (L-TLB)
> 00000100080017e0 0000000000000000 0000000000000001 0000000000000216
> 0000000000000012 0000000000000001 ffffffffa0396f28
> ffffffff80111731
> 000001014a6bddd8 000000000000019c
> Call Trace:<ffffffff80111600>{show_trace+375}
> <ffffffff80111600>{show_trace+375} <0>LustreError:
> 5936:0:(linux-debug.c:130:lbug_with_loc()) LBUG
> LustreError: 5936:0:(linux-debug.c:130:lbug_with_loc()) Skipped 1
> previous similar message
> Lustre: 5936:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing
> stack for process 5936
> Lustre: 5936:0:(linux-debug.c:155:libcfs_debug_dumpstack()) Skipped 1
> previous similar message
> kiblnd_sd_01 R running task 0 5936 1 5937 5935
> (L-TLB)
> 00000100080017e0 0000000000000001 0000000100000001 0000000000000003
> 0000000000000012 0000000000000001 ffffffffa0396f28
> ffffffff80111731
> 000001014a6bbdd8 000000000000019c
> Call Trace:<ffffffff8011173c>{show_stack+241}
> <ffffffff8011173c>{show_stack+241}
> <ffffffffa025e9f6>{:libcfs:lbug_with_loc+72}
> <ffffffffa0262d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0}
> <ffffffffa025e9f6>{:libcfs:lbug_with_loc+72}
> <ffffffffa0262d0a>{:libcfs:LASSERT_TAGE_INVARIANT
> +0}<ffffffffa0389848>{:ko2iblnd:kiblnd_tx_complete+80}
>
> <ffffffffa038e0cc>{:ko2iblnd:kiblnd_scheduler+1210}
> <ffffffffa0389848>{:ko2iblnd:kiblnd_tx_complete+80}
> <ffffffffa038e0cc>{:ko2iblnd:kiblnd_scheduler+1210}
> <ffffffff80111600>{show_trace+375}
<ffffffff8011173c>{show_stack
> +241}
> <ffffffffa025e9f6>{:libcfs:lbug_with_loc+72}
> <ffffffffa0262d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0}
> <ffffffffa0389848>{:ko2iblnd:kiblnd_tx_complete+80}
> <ffffffffa038e0cc>{:ko2iblnd:kiblnd_scheduler+1210}
> <ffffffff801331a9>{default_wake_function+0}
> <ffffffff801331a9>{default_wake_function+0}
> <ffffffff801331a9>{default_wake_function+0}
> <ffffffff80131f2f>{schedule_tail+55}
> <ffffffff80131f2f>{schedule_tail+55}
> <ffffffff80110e23>{child_rip+8}
<ffffffff80110e23>{child_rip+8}
> <ffffffffa038dc12>{:ko2iblnd:kiblnd_scheduler+0}
> <ffffffffa038dc12>{:ko2iblnd:kiblnd_scheduler+0}
> <ffffffff80110e1b>{child_rip+0}
<ffffffff80110e1b>{child_rip+0}
>
> <ffffffff80131f2f>{schedule_tail+55}
> <0>LustreError:
5935:0:(o2iblnd_cb.c:1001:kiblnd_tx_complete())
> ASSERTION(tx->tx_sending > 0) failed
> <ffffffff80110e23>{child_rip+8}kiblnd_sd_00 R running task 0
> 5935 1 5936 5420 (L-TLB)
> 000001014d126c00 00000000c00201f9 0000000100000001 0000000000000216
> 000001007c047eb8 000000019237a000 0000000088000001
> ffffffff80111731
> 000001007c047dd8 000000000000019c
> Call Trace:<ffffffffa038dc12>{:ko2iblnd:kiblnd_scheduler+0}
> <ffffffff80111600>{show_trace+375}
<ffffffff80110e1b>{child_rip
> +0}
> <ffffffff8011173c>{show_stack+241}
> <ffffffffa025e9f6>{:libcfs:lbug_with_loc+72}
> <ffffffffa0262d0a>{:libcfs:LASSERT_TAGE_INVARIANT+0}
> <ffffffffa0389848>{:ko2iblnd:kiblnd_tx_complete+80}
> <ffffffffa038e0cc>{:ko2iblnd:kiblnd_scheduler+1210}
> <1>LustreError: dumping log to /tmp/lustre-log.1162942447.5936
> <ffffffff801331a9>{default_wake_function+0}
> <ffffffff80131f2f>{schedule_tail+55}
> <ffffffff80110e23>{child_rip+8}
> <ffffffffa038dc12>{:ko2iblnd:kiblnd_scheduler+0}
> <ffffffff80110e1b>{child_rip+0}
> LustreError: dumping log to /tmp/lustre-log.1162942447.5937
> LustreError: dumping log to /tmp/lustre-log.1162942447.5935
> LustreError: dumping log to /tmp/lustre-log.1162942447.5938
> Lustre: 5938:0:(linux-debug.c:96:libcfs_run_upcall()) Invoked LNET
> upcall /usr/lib/lustre/lnet_upcall
>
LBUG,/usr/src/redhat/BUILD/lustre-1.5.95/lnet/libcfs/tracefile.c,libcfs_assertion_failed,412
> LustreError: can''t open /tmp/lustre-log.1162942447.5938 file: err
-17
> LustreError: can''t open /tmp/lustre-log.1162942447.5938 for dump:
rc -17
> LustreError: can''t open /tmp/lustre-log.1162942447.5938 file: err
-17
> LustreError: can''t open /tmp/lustre-log.1162942447.5938 for dump:
rc -17
> LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.252@o2ib: error 0(waiting)
> LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.247@o2ib: error 0(waiting)
> LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.249@o2ib: error 0(waiting)
> LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.251@o2ib: error
> 0(sending)(sending_rsrvd)(sending_nocred)(waiting)
> LustreError: 5939:0:(events.c:51:request_out_callback()) @@@ type 4,
> status -103
> LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942447, 55s ago)
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.251@o2ib
> 10 up 8 8 8 0 -1 3136
> LustreError: lfs-OST0004-osc-000001007103b400: Connection to service
> lfs-OST0004 via nid 192.2.1.251@o2ib was lost; in progress operations
> using this service will wait for recovery to complete.
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.2@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.2@o2ib: error
> -110(sending)(sending_rsrvd)(sending_nocred)(waiting)
> LustreError: 5939:0:(events.c:51:request_out_callback()) @@@ type 4,
> status -103
> LustreError: 5939:0:(events.c:51:request_out_callback()) Skipped 1
> previous similar message
> LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942447, 57s ago)
> LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
> previous similar message
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.2@o2ib
> 24 up 8 8 8 -13 -14 7368
> LustreError: lfs-OST0000-osc-000001007103b400: Connection to service
> lfs-OST0000 via nid 192.2.1.2@o2ib was lost; in progress operations
> using this service will wait for recovery to complete.
> Lustre: 1731:0:(o2iblnd_cb.c:2147:kiblnd_passive_connect()) Conn race
> 192.2.1.2@o2ib
> LustreError: 1731:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.248@o2ib: error
> 0(sending)(sending_rsrvd)(sending_nocred)(waiting)
> LustreError: 5939:0:(events.c:51:request_out_callback()) @@@ type 4,
> status -103
> LustreError: 5939:0:(events.c:51:request_out_callback()) Skipped 1
> previous similar message
> LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942447, 58s ago)
> LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
> previous similar message
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.248@o2ib
> 10 up 8 8 8 0 -1 3136
> LustreError: lfs-OST0007-osc-000001007103b400: Connection to service
> lfs-OST0007 via nid 192.2.1.248@o2ib was lost; in progress operations
> using this service will wait for recovery to complete.
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.1@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.1@o2ib: error -110(waiting)
> LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942447, 100s ago)
> LustreError: 5969:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
> previous similar message
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 5 up 8 8 8 5 1 760
> LustreError: lfs-OST0008-osc-000001007103b400: Connection to service
> lfs-OST0008 via nid 192.2.1.247@o2ib was lost; in progress operations
> using this service will wait for recovery to complete.
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 5 up 8 8 8 5 1 760
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 5 up 8 8 8 5 1 760
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 5 up 8 8 8 5 1 760
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 5 up 8 8 8 5 1 760
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 6 up 8 8 8 4 1 1104
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 6 up 8 8 8 4 1 1104
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.252@o2ib
> 8 up 8 8 8 2 0 2032
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.252@o2ib
> 8 up 8 8 8 2 0 2032
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.252@o2ib
> 8 up 8 8 8 2 0 2032
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.252@o2ib
> 9 up 8 8 8 1 0 2376
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 4 up 8 8 8 6 -1 336
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 4 up 8 8 8 6 -1 336
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 4 up 8 8 8 6 -1 336
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 4 up 8 8 8 6 -1 336
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 5 up 8 8 8 5 -1 680
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 5 up 8 8 8 5 -1 680
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 5 up 8 8 8 5 -1 680
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 5 up 8 8 8 5 -1 680
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 5 up 8 8 8 5 -1 680
> Lustre: 5969:0:(peer.c:238:lnet_debug_peer()) 192.2.1.249@o2ib
> 15 up 8 8 8 -2 -2 3728
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.250@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.250@o2ib: error -110(waiting)
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.248@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.248@o2ib: error -110(waiting)
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.2@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 1
> previous similar message
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.2@o2ib: error -110(waiting)
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Skipped 1 previous similar message
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.247@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.247@o2ib: error -110(waiting)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942502, 100s ago)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped
> 41 previous similar messages
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.251@o2ib
> 11 up 8 8 8 -1 -1 3480
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942504, 100s ago)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
> previous similar message
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.2@o2ib
> 27 up 8 8 8 -17 -17 8320
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942505, 100s ago)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
> previous similar message
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.248@o2ib
> 11 up 8 8 8 -1 -1 3480
> LustreError: 7261:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942520, 100s ago)
> LustreError: 7261:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
> previous similar message
> Lustre: 7261:0:(peer.c:238:lnet_debug_peer()) 192.2.1.2@o2ib
> 27 up 8 8 8 -17 -17 8320
> LustreError: lfs-MDT0000-mdc-000001007103b400: Connection to service
> lfs-MDT0000 via nid 192.2.1.2@o2ib was lost; in progress operations
> using this service will wait for recovery to complete.
> LustreError: Skipped 2 previous similar messages
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942547, 100s ago)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
> previous similar message
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 6 up 8 8 8 4 1 1104
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.252@o2ib
> 9 up 8 8 8 1 0 2376
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 5 up 8 8 8 5 -1 680
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.248@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 1
> previous similar message
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.248@o2ib: error -110(waiting)
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Skipped 1 previous similar message
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.250@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 2
> previous similar messages
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.250@o2ib: error -110(waiting)
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Skipped 2 previous similar messages
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.2@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.2@o2ib: error -110(waiting)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942620, 100s ago)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 5
> previous similar messages
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.2@o2ib
> 29 up 8 8 8 -19 -19 9008
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.2@o2ib
> 29 up 8 8 8 -19 -19 9008
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.251@o2ib
> 12 up 8 8 8 -2 -2 3824
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.248@o2ib
> 12 up 8 8 8 -2 -2 3824
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.247@o2ib
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942670, 100s ago)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 7
> previous similar messages
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.252@o2ib
> 10 up 8 8 8 0 0 2720
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 6 up 8 8 8 4 -1 1024
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 7 up 8 8 8 3 1 1448
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942745, 100s ago)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 5
> previous similar messages
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.2@o2ib
> 31 up 8 8 8 -21 -21 9696
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.2@o2ib
> 31 up 8 8 8 -21 -21 9696
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.251@o2ib
> 13 up 8 8 8 -3 -3 4168
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.248@o2ib
> 13 up 8 8 8 -3 -3 4168
> Lustre: 5969:0:(niobuf.c:302:ptlrpc_unregister_bulk()) @@@ Unexpectedly
> long timeout: desc 000001000ead4000
>
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.247@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 1
> previous similar message
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.247@o2ib: error -110(waiting)
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Skipped 2 previous similar messages
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.249@o2ib
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Skipped 3
> previous similar messages
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Closing conn to 192.2.1.249@o2ib: error -110(waiting)
> LustreError: 5939:0:(o2iblnd_cb.c:1767:kiblnd_close_conn_locked())
> Skipped 3 previous similar messages
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1162942795, 100s ago)
> LustreError: 5970:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 7
> previous similar messages
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.252@o2ib
> 11 up 8 8 8 -1 -1 3064
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.250@o2ib
> 7 up 8 8 8 3 -1 1368
> Lustre: 5970:0:(peer.c:238:lnet_debug_peer()) 192.2.1.247@o2ib
> 8 up 8 8 8 2 1 1792
> LustreError: 5939:0:(o2iblnd_cb.c:2691:kiblnd_check_conns()) Timed out
> RDMA with 192.2.1.2@o2ib