Hi,
We are testing 1.6b5 for a InfiniBand cluster with RHEL 4. We use the
binaries provides by CFS and use OFED 1.1 as the IB stack.
At several times some of the clients hang during fs mount or when an OST
is added (see log).
Error:
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
from OFED:
enum ib_cm_rej_reason {
IB_CM_REJ_INVALID_SERVICE_ID = 8,
Once an IPoIB ping is started to the corresponding OST the client
continues. Afterwards it is quite stable.
Any idea how this could be fixed?
Thanks,
Mirko
-------------- next part --------------
Lustre: mount data:
Lustre: profile: testfs-client
Lustre: device: 10.0.90.10@o2ib:/testfs
Lustre: flags: 2
Lustre: 0 UP mgc MGC10.0.90.10@o2ib 438411f9-d2cc-f576-9a5d-bc927badfa60 5
Lustre: 1 UP lov testfs-clilov-0000010075688000
7255c262-21e0-f804-91dd-2e8008cc166a 3
Lustre: 2 UP mdc testfs-MDT0000-mdc-0000010075688000
7255c262-21e0-f804-91dd-2e8008cc166a 4
Lustre: 3 UP osc testfs-OST0000-osc-0000010075688000
7255c262-21e0-f804-91dd-2e8008cc166a 4
Lustre: 4 UP osc testfs-OST0001-osc-0000010075688000
7255c262-21e0-f804-91dd-2e8008cc166a 4
Lustre: mount 10.0.90.10@o2ib:/testfs complete
Lustre: client 0000010075688000 umount complete
Lustre: mount data:
Lustre: profile: testfs-client
Lustre: device: 10.0.90.10@o2ib:/testfs
Lustre: flags: 2
Lustre: 0 UP mgc MGC10.0.90.10@o2ib caf868ce-f8dc-8c83-ecd4-caf4a75378f2 5
Lustre: 1 UP lov testfs-clilov-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 3
Lustre: 2 UP mdc testfs-MDT0000-mdc-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: 3 UP osc testfs-OST0000-osc-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: 4 UP osc testfs-OST0001-osc-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: 5 UP osc testfs-OST0002-osc-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: 6 UP osc testfs-OST0003-osc-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: 7 UP osc testfs-OST0004-osc-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: 8 UP osc testfs-OST0005-osc-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: 9 UP osc testfs-OST0006-osc-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: 10 UP osc testfs-OST0007-osc-000001007eaba800
3119a81c-5954-8b92-edab-5e38c9f7743d 4
Lustre: mount 10.0.90.10@o2ib:/testfs complete
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 1 previous
similar message
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166521780, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1
previous similar message
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166521805, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 5 previous
similar messages
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166521830, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 1 previous
similar message
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166521855, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous
similar messages
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166521880, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous
similar messages
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166521905, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 17 previous
similar messages
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166521930, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 1 previous
similar message
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166521955, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous
similar messages
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166521980, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 5171:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous
similar messages
LustreError: 1776:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1776:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1776:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1776:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166522005, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Rx from
10.0.90.8@o2ib failed: 5
LustreError: 5170:0:(o2iblnd_cb.c:455:kiblnd_rx_complete()) Skipped 15 previous
similar messages
LustreError: 1775:0:(o2iblnd_cb.c:2314:kiblnd_rejected()) 10.0.90.8@o2ib
rejected: reason 8, size 148
LustreError: 1775:0:(o2iblnd_cb.c:1935:kiblnd_peer_connect_failed()) Deleting
messages for 10.0.90.8@o2ib: connection failed
LustreError: 1775:0:(events.c:51:request_out_callback()) @@@ type 4, status -113
LustreError: 1775:0:(events.c:51:request_out_callback()) Skipped 3 previous
similar messages
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent
at 1166522030, 0s ago)
LustreError: 5909:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 3
previous similar messages
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0
Lustre: 5909:0:(peer.c:238:lnet_debug_peer()) 10.0.90.8@o2ib 2
up 8 8 8 8 6 0