Are the InfiniBand kernel modules that come with the Lustre 1.5.97
RHEL4 rpms supposed to work out of the box?
I''m trying the stock Lustre 1.5.97 rpms on x86_64 centos4.4 with IB.
IB works fine for MPI communication, just not well with Lustre when
there''s more than a few OSTs.
The problem can be seen easily by running ''bonnie++ -s 0 -n 2''
(2048
zero-sized files) in a striped dir. bonnie completes as expect without
Lustre striping, but hangs indefinitely with piles of Lustre timeouts
when striping is set and there''s say, more than 5 OSTs (see the lovely
messages below).
The identical Lustre setup over tcp instead of o2ib works just fine.
If anyone has 1.5.97 + IB and standard Lustre rpms working reliably
in a non-trivial setup, then I''ll happily stop trying to wrangling
newer
OFED into the kernels, and will instead perhaps blame our hardware.
setup:
on x19:
mkfs.lustre --fsname=testfs --mdt --mgs --reformat /dev/sdb3
mount -t lustre /dev/sdb3 /mnt/mdt
on 10 other nodes:
mkfs.lustre --fsname=testfs --ost --mgsnode=x19ib@o2ib --reformat /dev/sdb3
mount -t lustre /dev/sdb3 /mnt/ost1
on x18 (another node):
mount -t lustre x19ib@o2ib:/testfs /mnt/testfs
x18 % bonnie++ -s 0 -n 2
(takes about 20 seconds to complete happily)
x18 % lfs setstripe . 1048576 -1 -1
x18 % bonnie++ -s 0 -n 2
Create files in sequential order...done.
Stat files in sequential order...
(hangs indefinitely pumping out timeouts)
everything can still ping everything over IPoIB at this stage.
errors (see below) spool out indefinitely...
cheers,
robin
Feb 7 22:19:16 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.2@o2ib
Feb 7 22:19:16 x18 kernel: Lustre: 6:0:(linux-debug.c:98:libcfs_run_upcall())
Invoked LNET upcall /usr/lib/lustre/lnet_upcall
ROUTER_NOTIFY,192.168.1.2@o2ib,down,1170847104
Feb 7 22:19:17 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.8@o2ib
Feb 7 22:19:17 x18 kernel: Lustre: 7:0:(linux-debug.c:98:libcfs_run_upcall())
Invoked LNET upcall /usr/lib/lustre/lnet_upcall
ROUTER_NOTIFY,192.168.1.8@o2ib,down,1170847104
Feb 7 22:19:18 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.11@o2ib
Feb 7 22:19:18 x18 kernel: Lustre: 6:0:(linux-debug.c:98:libcfs_run_upcall())
Invoked LNET upcall /usr/lib/lustre/lnet_upcall
ROUTER_NOTIFY,192.168.1.11@o2ib,down,1170847104
Feb 7 22:20:03 x2 kernel: Lustre:
4437:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0009:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:20:03 x11 kernel: Lustre:
4454:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0001:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:20:04 x8 kernel: Lustre:
4441:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0000:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:20:04 x2 kernel: LustreError:
4174:0:(ldlm_lockd.c:1099:ldlm_handle_cancel()) received cancel for unknown lock
cookie 0xbea4f79dfc770036 from client 6054bd07-52e9-3343-8619-f2f5e72d4c7d id
12345-192.168.1.18@o2ib
Feb 7 22:20:04 x11 kernel: LustreError:
4191:0:(ldlm_lockd.c:1099:ldlm_handle_cancel()) received cancel for unknown lock
cookie 0xabd5ae77b17d4376 from client 6054bd07-52e9-3343-8619-f2f5e72d4c7d id
12345-192.168.1.18@o2ib
Feb 7 22:20:04 x18 kernel: LustreError:
3233:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847103, 100s ago) req@0000010150630e00 x130440/t0
o103->testfs-OST0000_UUID@192.168.1.8@o2ib:28 lens 232/128 ref 2 fl Rpc:N/0/0
rc 0/0
Feb 7 22:20:04 x8 kernel: LustreError:
4179:0:(ldlm_lockd.c:1099:ldlm_handle_cancel()) received cancel for unknown lock
cookie 0x35b3d1a8de15a7c5 from client 6054bd07-52e9-3343-8619-f2f5e72d4c7d id
12345-192.168.1.18@o2ib
Feb 7 22:20:04 x18 kernel: LustreError: testfs-OST0009-osc-0000010154987800:
Connection to service testfs-OST0009 via nid 192.168.1.2@o2ib was lost; in
progress operations using this service will wait for recovery to complete.
Feb 7 22:20:04 x18 kernel: LustreError:
3418:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847103, 100s ago) req@000001014f979a00 x130449/t0
o101->testfs-OST0009_UUID@192.168.1.2@o2ib:28 lens 232/288 ref 2 fl Rpc:/0/0
rc 0/0
Feb 7 22:20:04 x18 kernel: LustreError:
3418:0:(client.c:942:ptlrpc_expire_one_request()) Skipped 2 previous similar
messages
Feb 7 22:20:04 x18 kernel: LustreError: testfs-OST0000-osc-0000010154987800:
Connection to service testfs-OST0000 via nid 192.168.1.8@o2ib was lost; in
progress operations using this service will wait for recovery to complete.
Feb 7 22:20:04 x18 kernel: LustreError: Skipped 1 previous similar message
Feb 7 22:20:04 x18 kernel: Lustre: testfs-OST0001-osc-0000010154987800:
Connection restored to service testfs-OST0001 using nid 192.168.1.11@o2ib.
Feb 7 22:20:04 x18 kernel: LustreError:
3234:0:(file.c:763:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116
Feb 7 22:20:04 x18 kernel: LustreError:
3233:0:(client.c:942:ptlrpc_expire_one_request()) Skipped 2 previous similar
messages
Feb 7 22:20:04 x18 kernel: LustreError:
3233:0:(file.c:763:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116
Feb 7 22:20:04 x18 kernel: LustreError:
3233:0:(file.c:763:ll_extent_lock_callback()) Skipped 1 previous similar message
Feb 7 22:20:58 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.10@o2ib
Feb 7 22:20:59 x18 kernel: Lustre: 6:0:(linux-debug.c:98:libcfs_run_upcall())
Invoked LNET upcall /usr/lib/lustre/lnet_upcall
ROUTER_NOTIFY,192.168.1.10@o2ib,down,1170847204
Feb 7 22:20:59 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.12@o2ib
Feb 7 22:20:59 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.14@o2ib
Feb 7 22:20:59 x18 kernel: Lustre: 6:0:(linux-debug.c:98:libcfs_run_upcall())
Invoked LNET upcall /usr/lib/lustre/lnet_upcall
ROUTER_NOTIFY,192.168.1.12@o2ib,down,1170847204
Feb 7 22:20:59 x18 kernel: Lustre: 6:0:(linux-debug.c:98:libcfs_run_upcall())
Skipped 1 previous similar message
Feb 7 22:21:43 x5 kernel: Lustre:
4459:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0007:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:21:44 x12 kernel: Lustre:
4464:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0004:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:21:44 x14 kernel: Lustre:
4457:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0005:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:21:44 x6 kernel: Lustre:
4462:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0008:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:21:44 x14 kernel: LustreError:
4180:0:(ldlm_lockd.c:1099:ldlm_handle_cancel()) received cancel for unknown lock
cookie 0xb2c7e57b434dc289 from client 6054bd07-52e9-3343-8619-f2f5e72d4c7d id
12345-192.168.1.18@o2ib
Feb 7 22:21:44 x12 kernel: LustreError:
4187:0:(ldlm_lockd.c:1099:ldlm_handle_cancel()) received cancel for unknown lock
cookie 0x98f9d4d6c722ba2c from client 6054bd07-52e9-3343-8619-f2f5e72d4c7d id
12345-192.168.1.18@o2ib
Feb 7 22:21:44 x18 kernel: LustreError:
3232:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847203, 100s ago) req@0000010157126c00 x130696/t0
o103->testfs-OST0003_UUID@192.168.1.10@o2ib:28 lens 232/128 ref 2 fl
Rpc:N/0/0 rc 0/0
Feb 7 22:21:44 x18 kernel: LustreError:
3231:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847203, 100s ago) req@00000101504eb400 x130698/t0
o103->testfs-OST0005_UUID@192.168.1.14@o2ib:28 lens 232/128 ref 2 fl
Rpc:N/0/0 rc 0/0
Feb 7 22:21:44 x10 kernel: Lustre:
4463:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0003:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:21:44 x18 kernel: LustreError: testfs-OST0005-osc-0000010154987800:
Connection to service testfs-OST0005 via nid 192.168.1.14@o2ib was lost; in
progress operations using this service will wait for recovery to complete.
Feb 7 22:21:44 x10 kernel: LustreError:
4177:0:(ldlm_lockd.c:1099:ldlm_handle_cancel()) received cancel for unknown lock
cookie 0x51849194aa870f51 from client 6054bd07-52e9-3343-8619-f2f5e72d4c7d id
12345-192.168.1.18@o2ib
Feb 7 22:21:44 x18 kernel: Lustre: testfs-OST0005-osc-0000010154987800:
Connection restored to service testfs-OST0005 using nid 192.168.1.14@o2ib.
Feb 7 22:21:44 x18 kernel: Lustre: Skipped 2 previous similar messages
Feb 7 22:21:44 x18 kernel: LustreError:
3231:0:(file.c:763:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116
Feb 7 22:21:44 x18 kernel: LustreError:
3232:0:(client.c:942:ptlrpc_expire_one_request()) Skipped 3 previous similar
messages
Feb 7 22:21:44 x18 kernel: LustreError: testfs-OST0003-osc-0000010154987800:
Connection to service testfs-OST0003 via nid 192.168.1.10@o2ib was lost; in
progress operations using this service will wait for recovery to complete.
Feb 7 22:21:44 x18 kernel: LustreError: Skipped 3 previous similar messages
Feb 7 22:21:44 x18 kernel: Lustre: testfs-OST0003-osc-0000010154987800:
Connection restored to service testfs-OST0003 using nid 192.168.1.10@o2ib.
Feb 7 22:21:44 x18 kernel: Lustre: Skipped 1 previous similar message
Feb 7 22:21:44 x18 kernel: LustreError:
3232:0:(file.c:763:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116
Feb 7 22:21:44 x18 kernel: LustreError:
3232:0:(file.c:763:ll_extent_lock_callback()) Skipped 1 previous similar message
Feb 7 22:21:48 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.5@o2ib
Feb 7 22:21:48 x18 kernel: Lustre: 6:0:(linux-debug.c:98:libcfs_run_upcall())
Invoked LNET upcall /usr/lib/lustre/lnet_upcall
ROUTER_NOTIFY,192.168.1.5@o2ib,down,1170847204
Feb 7 22:21:48 x18 kernel: Lustre: 6:0:(linux-debug.c:98:libcfs_run_upcall())
Skipped 1 previous similar message
Feb 7 22:22:33 x18 kernel: LustreError:
3228:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847253, 100s ago) req@0000010151602400 x130711/t0
o400->testfs-OST0007_UUID@192.168.1.5@o2ib:28 lens 128/128 ref 2 fl Rpc:N/0/0
rc 0/0
Feb 7 22:22:33 x18 kernel: LustreError:
3228:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847253, 100s ago) req@0000010151602600 x130712/t0
o400->testfs-OST0008_UUID@192.168.1.6@o2ib:28 lens 128/128 ref 2 fl Rpc:N/0/0
rc 0/0
Feb 7 22:22:58 x18 kernel: LustreError:
3228:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847278, 100s ago) req@000001015197de00 x130723/t0
o400->testfs-OST0007_UUID@192.168.1.5@o2ib:28 lens 128/128 ref 2 fl Rpc:N/0/0
rc 0/0
Feb 7 22:23:23 x18 kernel: LustreError:
3229:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847303, 100s ago) req@000001014f8ade00 x130740/t0
o8->testfs-OST0007_UUID@192.168.1.5@o2ib:28 lens 304/328 ref 2 fl Rpc:/0/0 rc
0/0
Feb 7 22:23:23 x18 kernel: LustreError:
3229:0:(client.c:942:ptlrpc_expire_one_request()) Skipped 3 previous similar
messages
Feb 7 22:23:24 x5 kernel: Lustre:
4464:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0007:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:23:24 x6 kernel: Lustre:
4467:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0008:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:23:24 x18 kernel: Lustre: testfs-OST0008-osc-0000010154987800:
Connection restored to service testfs-OST0008 using nid 192.168.1.6@o2ib.
Feb 7 22:24:20 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.10@o2ib
Feb 7 22:24:20 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Skipped 1 previous similar
message
Feb 7 22:24:21 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.16@o2ib
Feb 7 22:24:21 x18 kernel: Lustre: 7:0:(linux-debug.c:98:libcfs_run_upcall())
Invoked LNET upcall /usr/lib/lustre/lnet_upcall
ROUTER_NOTIFY,192.168.1.16@o2ib,down,1170847404
Feb 7 22:25:04 x18 kernel: LustreError:
3418:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847404, 100s ago) req@0000010150a5fa00 x132189/t0
o101->testfs-OST0003_UUID@192.168.1.10@o2ib:28 lens 232/288 ref 2 fl Rpc:/0/0
rc 0/0
Feb 7 22:25:04 x18 kernel: LustreError:
3418:0:(client.c:942:ptlrpc_expire_one_request()) Skipped 1 previous similar
message
Feb 7 22:25:04 x10 kernel: Lustre:
4233:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0003:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:25:04 x16 kernel: Lustre:
4237:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0002:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:25:04 x18 kernel: LustreError: testfs-OST0003-osc-0000010154987800:
Connection to service testfs-OST0003 via nid 192.168.1.10@o2ib was lost; in
progress operations using this service will wait for recovery to complete.
Feb 7 22:25:04 x18 kernel: LustreError: testfs-OST0002-osc-0000010154987800:
Connection to service testfs-OST0002 via nid 192.168.1.16@o2ib was lost; in
progress operations using this service will wait for recovery to complete.
Feb 7 22:25:04 x18 kernel: Lustre: testfs-OST0003-osc-0000010154987800:
Connection restored to service testfs-OST0003 using nid 192.168.1.10@o2ib.
Feb 7 22:25:04 x18 kernel: Lustre: Skipped 1 previous similar message
Feb 7 22:26:02 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.14@o2ib
Feb 7 22:26:44 x18 kernel: LustreError:
3418:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847504, 100s ago) req@0000010151e79600 x133406/t0
o101->testfs-OST0005_UUID@192.168.1.14@o2ib:28 lens 232/288 ref 2 fl Rpc:/0/0
rc 0/0
Feb 7 22:26:44 x18 kernel: LustreError:
3418:0:(client.c:942:ptlrpc_expire_one_request()) Skipped 1 previous similar
message
Feb 7 22:26:44 x14 kernel: Lustre:
4284:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0005:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:26:44 x18 kernel: LustreError: testfs-OST0005-osc-0000010154987800:
Connection to service testfs-OST0005 via nid 192.168.1.14@o2ib was lost; in
progress operations using this service will wait for recovery to complete.
Feb 7 22:26:44 x18 kernel: Lustre: testfs-OST0005-osc-0000010154987800:
Connection restored to service testfs-OST0005 using nid 192.168.1.14@o2ib.
Feb 7 22:26:44 x18 kernel: Lustre: Skipped 1 previous similar message
Feb 7 22:27:41 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.2@o2ib
Feb 7 22:27:42 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.8@o2ib
Feb 7 22:28:24 x16 kernel: Lustre:
4337:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0002:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:28:24 x10 kernel: Lustre:
4333:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0003:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:28:24 x8 kernel: Lustre:
4326:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0000:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:28:24 x8 kernel: LustreError:
4179:0:(ldlm_lockd.c:1099:ldlm_handle_cancel()) received cancel for unknown lock
cookie 0x35b3d1a8de15abed from client 6054bd07-52e9-3343-8619-f2f5e72d4c7d id
12345-192.168.1.18@o2ib
Feb 7 22:28:24 x18 kernel: LustreError:
3231:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847604, 100s ago) req@00000101505c7800 x134299/t0
o103->testfs-OST0009_UUID@192.168.1.2@o2ib:28 lens 232/128 ref 2 fl Rpc:N/0/0
rc 0/0
Feb 7 22:28:24 x18 kernel: LustreError: testfs-OST0003-osc-0000010154987800:
Connection to service testfs-OST0003 via nid 192.168.1.10@o2ib was lost; in
progress operations using this service will wait for recovery to complete.
Feb 7 22:28:24 x2 kernel: Lustre:
4322:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0009:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting
Feb 7 22:28:24 x18 kernel: Lustre: testfs-OST0000-osc-0000010154987800:
Connection restored to service testfs-OST0000 using nid 192.168.1.8@o2ib.
Feb 7 22:28:24 x2 kernel: LustreError:
4174:0:(ldlm_lockd.c:1099:ldlm_handle_cancel()) received cancel for unknown lock
cookie 0xbea4f79dfc77045e from client 6054bd07-52e9-3343-8619-f2f5e72d4c7d id
12345-192.168.1.18@o2ib
Feb 7 22:28:24 x18 kernel: LustreError:
3234:0:(file.c:763:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116
Feb 7 22:28:24 x18 kernel: LustreError:
3231:0:(client.c:942:ptlrpc_expire_one_request()) Skipped 3 previous similar
messages
Feb 7 22:28:24 x18 kernel: Lustre: testfs-OST0009-osc-0000010154987800:
Connection restored to service testfs-OST0009 using nid 192.168.1.2@o2ib.
Feb 7 22:28:24 x18 kernel: LustreError:
3231:0:(file.c:763:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116
Feb 7 22:28:33 x18 kernel: LustreError:
3210:0:(o2iblnd_cb.c:2793:kiblnd_check_conns()) Timed out RDMA with
192.168.1.10@o2ib
Feb 7 22:29:14 x18 kernel: LustreError:
3228:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847654, 100s ago) req@000001015125e200 x134307/t0
o400->testfs-OST0002_UUID@192.168.1.16@o2ib:28 lens 128/128 ref 2 fl
Rpc:N/0/0 rc 0/0
Feb 7 22:29:39 x18 kernel: LustreError:
3228:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847679, 100s ago) req@0000010150194200 x134319/t0
o400->testfs-OST0002_UUID@192.168.1.16@o2ib:28 lens 128/128 ref 2 fl
Rpc:N/0/0 rc 0/0
Feb 7 22:29:39 x18 kernel: LustreError:
3228:0:(client.c:942:ptlrpc_expire_one_request()) Skipped 1 previous similar
message
Feb 7 22:30:04 x18 kernel: LustreError:
3229:0:(client.c:942:ptlrpc_expire_one_request()) @@@ timeout (sent at
1170847704, 100s ago) req@0000010150429600 x134339/t0
o8->testfs-OST0003_UUID@192.168.1.10@o2ib:28 lens 304/328 ref 2 fl Rpc:/0/0
rc 0/0
Feb 7 22:30:04 x18 kernel: LustreError:
3229:0:(client.c:942:ptlrpc_expire_one_request()) Skipped 1 previous similar
message
Feb 7 22:30:29 x18 kernel: Lustre: testfs-OST0002-osc-0000010154987800:
Connection restored to service testfs-OST0002 using nid 192.168.1.16@o2ib.
Feb 7 22:30:29 x16 kernel: Lustre:
4343:0:(ldlm_lib.c:497:target_handle_reconnect()) testfs-OST0002:
6054bd07-52e9-3343-8619-f2f5e72d4c7d reconnecting