Greetings, I am still working on tracking my lustre timeout/reconnect issue. The lustre version on the OSS and MGS is 2.6.18-53.1.13.el5_lustre.1.6.4.3smp. I am seeing IMP_INVALID messages in my log files. I think (not certain) that I have a bad IB cable or port in a card, but I am trying to ascertain that it is not perhaps another issue (such as a dnsmasq GUID). I am seeking information using the "lctl" command. The only network on which we use lustre is Infiniband (ib0). The Gb TCP network is not in the initial tuneconf file system creation. Am I not invoking lctl properly? (I am using the syntax in the man page.) The network is up. LNET is running. My "lctl ping xxx.yyy.zzz.aaa.bbb at ib0" return nicely. I am trying to gather information about my network and my lctl commands are informing me that I need to run the network command before inquiring about "interface_list". I do run the network command and I am as yet unable to get peer_list or conn_list information as lctl indicates that "network" was not run. Am I missing something in lctl command usage? Thanks, megan [root at mds1 ~]# lctl network ib0 up Can''t parse net ib0 [root at mds1 ~]# lctl interface_list You must run the ''network'' command before ''interface_list''. [root at mds1 ~]# lctl network usage: network <net>|up|down [root at mds1 ~]# lctl network up LNET configured [root at mds1 ~]# lctl interface_list You must run the ''network'' command before ''interface_list''. [root at mds1 ~]# lctl dl 1 UP mgc MGC192.168.64.210 at o2ib c7135d07-19c5-abe2-2ca3-976185b80dde 5 2 UP mdt MDS MDS_uuid 3 8 UP lov crew8-mdtlov crew8-mdtlov_UUID 4 9 UP mds crew8-MDT0000 crew8-MDT0000_UUID 13 10 UP osc crew8-OST0000-osc crew8-mdtlov_UUID 5 11 UP osc crew8-OST0001-osc crew8-mdtlov_UUID 5 12 UP osc crew8-OST0002-osc crew8-mdtlov_UUID 5 13 UP osc crew8-OST0003-osc crew8-mdtlov_UUID 5 14 UP osc crew8-OST0004-osc crew8-mdtlov_UUID 5 15 UP osc crew8-OST0005-osc crew8-mdtlov_UUID 5 16 UP osc crew8-OST0006-osc crew8-mdtlov_UUID 5 17 UP osc crew8-OST0007-osc crew8-mdtlov_UUID 5 18 UP osc crew8-OST0008-osc crew8-mdtlov_UUID 5 19 UP osc crew8-OST0009-osc crew8-mdtlov_UUID 5 20 UP osc crew8-OST000a-osc crew8-mdtlov_UUID 5 21 UP osc crew8-OST000b-osc crew8-mdtlov_UUID 5 [root at mds1 ~]# lctl interface_list You must run the ''network'' command before ''interface_list''. [root at mds1 ~]# lctl conn_list You must run the ''network'' command before ''conn_list''.
On Fri, Jan 29, 2010 at 12:30:51PM -0500, Ms. Megan Larko wrote:> that "network" was not run. Am I missing something in lctl command > usage?# lctl lctl > net up LNET configured lctl > list_nids 10.8.0.166 at tcp lctl > conn_list You must run the ''network'' command before ''conn_list''. lctl > net tcp lctl > conn_list 12345-10.8.0.167 at tcp I[2]sata17->sata18:1014 16384/654368 nonagle 12345-10.8.0.167 at tcp O[1]sata17->sata18:1015 66232/87380 nonagle 12345-10.8.0.167 at tcp C[0]sata17->sata18:1016 16384/87380 nonagle 12345-10.8.0.199 at tcp I[0]sata17->sfire10:1020 16384/87380 nonagle 12345-10.8.0.199 at tcp O[3]sata17->sfire10:1022 58440/87380 nonagle 12345-10.8.0.199 at tcp C[1]sata17->sfire10:1023 16384/4194304 nonagle 12345-10.8.0.200 at tcp I[0]sata17->sfire11:1014 16384/87380 nonagle 12345-10.8.0.200 at tcp O[3]sata17->sfire11:1015 16384/87380 nonagle 12345-10.8.0.200 at tcp C[2]sata17->sfire11:1016 58440/3246672 nonagle I don''t have any ib cards on this node, but you can do the same with "o2ib" instead of "tcp". HTH Johann
Thank you Johann! The key line seems to be "net o2ib" and *not* net ib0. Thanks for pointing out my error. megan On Fri, Jan 29, 2010 at 1:00 PM, Johann Lombardi <johann at sun.com> wrote:> On Fri, Jan 29, 2010 at 12:30:51PM -0500, Ms. Megan Larko wrote: >> that "network" was not run. ?Am I missing something in lctl command >> usage? > > # lctl > lctl > net up > LNET configured > lctl > list_nids > 10.8.0.166 at tcp > lctl > conn_list > You must run the ''network'' command before ''conn_list''. > lctl > net tcp > lctl > conn_list > 12345-10.8.0.167 at tcp I[2]sata17->sata18:1014 16384/654368 nonagle > 12345-10.8.0.167 at tcp O[1]sata17->sata18:1015 66232/87380 nonagle > 12345-10.8.0.167 at tcp C[0]sata17->sata18:1016 16384/87380 nonagle > 12345-10.8.0.199 at tcp I[0]sata17->sfire10:1020 16384/87380 nonagle > 12345-10.8.0.199 at tcp O[3]sata17->sfire10:1022 58440/87380 nonagle > 12345-10.8.0.199 at tcp C[1]sata17->sfire10:1023 16384/4194304 nonagle > 12345-10.8.0.200 at tcp I[0]sata17->sfire11:1014 16384/87380 nonagle > 12345-10.8.0.200 at tcp O[3]sata17->sfire11:1015 16384/87380 nonagle > 12345-10.8.0.200 at tcp C[2]sata17->sfire11:1016 58440/3246672 nonagle > > I don''t have any ib cards on this node, but you can do the same with > "o2ib" instead of "tcp". > > HTH > > Johann >