Hi all,
I have a question regarding a few error messages presented after a client has
mounted the File System. The FS mounted ok and is useable but the LusterErrors
do not look normal. The client does not have IB connectivity to the MDS/OSS but
uses "tcpX" to access the MDSs/OSSs. MDSs and OSSs are inter-connected
with IB. The following are the configurations:
MDS1:
? RHEL 5.3, Lustre 1.8.1.1, MGS and MDS on the same external storage
drive; managed by heartbeat v1 for failover
? 10Ge tcp2(eth4) 10.103.30.201
? 10Ge tcp3(eth5) 10.103.30.101
? Infiniband o2ib0(ib0) 10.103.34.201
? Infiniband 02ib1(ib1) 10.103.34.101
? modprobe.conf: options lnet
networks=o2ib0(ib0),o2ib1(ib1),tcp2(eth4),tcp3(eth5)
? MGS/MDS Parameters: lov.stripesize=25M lov.stripecount=1
failover.node=10.103.34.202 at o2ib,10.103.34.102 at o2ib1
mdt.group_upcall=/usr/sbin/l_getgroups
MDS2:
? RHEL 5.3, Lustre 1.8.1.1, pointing to the same external storage drive
as MDS1; managed by heartbeat v1 failover
? 10Ge tcp2(eth4) 10.103.30.202
? 10Ge tcp3(eth5) 10.103.30.102
? Infiniband o2ib0(ib0) 10.103.34.202
? Infiniband 02ib1(ib1) 10.103.34.102
? options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp2(eth4),tcp3(eth5)
OSS1:
? RHEL 5.3, Lustre 1.8.1.1, pointing to 16 OSTs on SAN storage; mount 8
odd number OSTs managed by heartbeat v1 for failover to OSS2
? 10Ge tcp4(eth4) 10.103.31.203
? 10Ge tcp5(eth5) 10.103.31.103
? Infiniband o2ib0(ib0) 10.103.34.203
? Infiniband 02ib1(ib1) 10.103.34.103
? options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp4(eth4),tcp5(eth5)
? OSTs definition: Parameters: mgsnode=10.103.34.201 at
o2ib,10.103.34.101 at o2ib1 failover.node=10.103.34.204 at o2ib,10.103.34.104 at
o2ib1
OSS2:
? RHEL 5.3, Lustre 1.8.1.1, pointing to same 16 OSTs on SAN storage as
OSS1; mount 8 even number OSTs managed by heartbeat v1 for failover to OSS1
? 10Ge tcp4(eth4) 10.103.31.204
? 10Ge tcp5(eth5) 10.103.31.104
? Infiniband o2ib0(ib0) 10.103.34.204
? Infiniband 02ib1(ib1) 10.103.34.104
? options lnet networks=o2ib0(ib0),o2ib1(ib1),tcp4(eth4),tcp5(eth5)
? OSTs definition: Parameters: mgsnode=10.103.34.201 at
o2ib,10.103.34.101 at o2ib1 failover.node=10.103.34.204 at o2ib,10.103.34.104 at
o2ib1
Client:
? RHEL4.5, Lustre 1.6.6 or RHEL5.3, Lustre 1.8.1.1
? Ge tcp4(eth2) 10.103.31.129 ? OSS Channel
? Ge tcp2(eth3) 10.103.30.129 ? MDS Channel
? options lnet networks=tcp2(eth3),tcp4(eth2)
The following messages are from RHEL4.5, Lustre 1.6.6:
Dec 15 22:09:32 bg8mo29sz kernel: LustreError:
29975:0:(events.c:465:ptlrpc_uuid_to_peer()) No NID found for 10.103.34.202 at
o2ib
Dec 15 22:09:32 bg8mo29sz kernel: LustreError:
29975:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer 10.103.34.202
at o2ib!
Dec 15 22:09:32 bg8mo29sz kernel: Lustre: spfs-clilov-000001020463f400.lov: set
parameter stripesize=25M
Dec 15 22:09:32 bg8mo29sz kernel: Lustre: Skipped 1 previous similar message
Dec 15 22:09:32 bg8mo29sz kernel: Lustre: Client spfs-client has started
The following messages are from RHEL5.3, Lustre 1.8.1.1:
Lustre: MGC10.103.30.201 at tcp2: Reactivating import
LustreError: 3200:0:(events.c:460:ptlrpc_uuid_to_peer()) No NID found for
10.103.34.202 at o2ib
LustreError: 3200:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer
10.103.34.202 at o2ib!
LustreError: 3200:0:(events.c:460:ptlrpc_uuid_to_peer()) No NID found for
10.103.34.204 at o2ib
LustreError: 3200:0:(client.c:69:ptlrpc_uuid_to_connection()) cannot find peer
10.103.34.204 at o2ib!
Lustre: Client spfs-client has started
I couldn''t figure out whether there is a configuration error in the MDS
and OSS failover setup or if this is harmful warning. I would think that this
setup should work and the client can be outside of the IB network. Any hint from
anyone on this? As I mentioned earlier the FS works fine after mounting. Should
I just ignore these error messages?
Thanks in advance.
Steve
Stephen Chu
AT&T Labs CSO
C5-3C03
200 Laurel Ave
Middletown, NJ
stephenchu at att.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091228/74663b60/attachment-0001.html