Jim Albin
2006-Nov-27 12:22 UTC
[Lustre-discuss] install problem; localhost test generates
I''m trying to install and test lustre 1.4.7 on a new system and am failing to start the single system test. I''ve installed the kernel and the libcfs module. I copied the steps for creating a local.xml file using the local.sh script on this page; https://mail.clusterfs.com/wikis/lustre/LustreHowto This seems to work correctly and generates the local.xml file. When I try to start it with "lconf -v --node localhost --reformat local.xml I get this output to the screen, then it hangs until I ^C out of it. The messages below are logged in the system log file. --- <snip> bottom of output from lconf command ---- + /usr/sbin/lctl cfg_device MDC_head4_mds-test_MNT_localhost setup mds-test_UUID localhost_UUID quit MTPT: MNT_localhost MNT_localhost_UUID /mnt/lustre mds-test_UUID lov- test_UUID + mkdir /mnt/lustre + mount -t lustre_lite -o osc=lov-test,mdc=MDC_head4_mds- test_MNT_localhost local /mnt/lustre Traceback (most recent call last): File "/usr/sbin/lconf", line 2852, in ? main() File "/usr/sbin/lconf", line 2845, in main doHost(lustreDB, node_list) File "/usr/sbin/lconf", line 2288, in doHost for_each_profile(node_db, prof_list, doSetup) File "/usr/sbin/lconf", line 2068, in for_each_profile operation(services) File "/usr/sbin/lconf", line 2088, in doSetup n.prepare() File "/usr/sbin/lconf", line 1899, in prepare ret, val = run(cmd) File "/usr/sbin/lconf", line 530, in run return runcmd(cmd) File "/usr/sbin/lconf", line 520, in runcmd out = f.readlines() KeyboardInterrupt ----- /var/log/messages contents ---- Nov 27 11:53:43 head4 kernel: kjournald starting. Commit interval 5 seconds Nov 27 11:53:43 head4 kernel: LDISKFS FS on loop2, internal journal Nov 27 11:53:43 head4 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from 127.0.0.1 for 127.0.0.1@tcp: No matching NI Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: (socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.1 Nov 27 11:53:43 head4 kernel: LustreError: Connection to 127.0.0.1@tcp at host 127.0.0.1 on port 988 was reset: is it running a compatible version of Lustre and is 127.0.0.1@tcp one of its NIDs? Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: (socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240 192.174.32.87@tcp->127.0.0.1@tcp Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: (events.c:53:request_out_callback()) @@@ type 4, status -5 req@0000010003d31e00 x1/t0 o8->ost1-test_UUID@localhost_UUID:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0 Nov 27 11:53:43 head4 kernel: LustreError: 18648:0: (client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1164653623, 0s ago) req@0000010003d31e00 x1/t0 o8->ost1- test_UUID@localhost_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 And finally, the RPC timeout messages continue to log into the system log every 30 seconds or so until I reboot. Can someone see something I''m missing or has worked through this problem? I wonder if the installation manual & howto wiki is missing something or assuming something that is different in my setup. I run nmap localhost and see that port 988/tcp is open. Thanks in advance for any help. -- Jim Albin Sr. Systems Administrator, HPC Systems Scientific Computing Center National Renewable Energy Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: local.sh Type: application/x-shellscript Size: 795 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20061127/54e591c1/local.bin
Felix, Evan J
2006-Nov-27 12:38 UTC
[Lustre-discuss] install problem; localhost test generates
Can you tell us if iptables or selinux is enabled and running on this machine? Evan> -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Jim Albin > Sent: Monday, November 27, 2006 11:22 AM > To: lustre-discuss@clusterfs.com > Subject: [Lustre-discuss] install problem; localhost test generates > > I''m trying to install and test lustre 1.4.7 on a new system > and am failing to start the single system test. I''ve > installed the kernel and the libcfs module. I copied the > steps for creating a local.xml file using the local.sh script > on this page; https://mail.clusterfs.com/wikis/lustre/LustreHowto > This seems to work correctly and generates the local.xml file. > > When I try to start it with "lconf -v --node localhost > --reformat local.xml I get this output to the screen, then it > hangs until I ^C out of it. The messages below are logged in > the system log file. > > --- <snip> bottom of output from lconf command ---- > > + /usr/sbin/lctl > cfg_device MDC_head4_mds-test_MNT_localhost > setup mds-test_UUID localhost_UUID > quit > MTPT: MNT_localhost MNT_localhost_UUID /mnt/lustre > mds-test_UUID lov- test_UUID > + mkdir /mnt/lustre > + mount -t lustre_lite -o osc=lov-test,mdc=MDC_head4_mds- > test_MNT_localhost local /mnt/lustre > Traceback (most recent call last): > File "/usr/sbin/lconf", line 2852, in ? > main() > File "/usr/sbin/lconf", line 2845, in main > doHost(lustreDB, node_list) > File "/usr/sbin/lconf", line 2288, in doHost > for_each_profile(node_db, prof_list, doSetup) > File "/usr/sbin/lconf", line 2068, in for_each_profile > operation(services) > File "/usr/sbin/lconf", line 2088, in doSetup > n.prepare() > File "/usr/sbin/lconf", line 1899, in prepare > ret, val = run(cmd) > File "/usr/sbin/lconf", line 530, in run > return runcmd(cmd) > File "/usr/sbin/lconf", line 520, in runcmd > out = f.readlines() > KeyboardInterrupt > > ----- /var/log/messages contents ---- > > Nov 27 11:53:43 head4 kernel: kjournald starting. Commit > interval 5 seconds Nov 27 11:53:43 head4 kernel: LDISKFS FS > on loop2, internal journal Nov 27 11:53:43 head4 kernel: > LDISKFS-fs: mounted filesystem with ordered data mode. > Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for > OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp > Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from > 127.0.0.1 for 127.0.0.1@tcp: No matching NI Nov 27 11:53:43 > head4 kernel: LustreError: 17622:0: > (socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading > HELLO from > 127.0.0.1 > Nov 27 11:53:43 head4 kernel: LustreError: Connection to > 127.0.0.1@tcp at host 127.0.0.1 on port 988 was reset: is it > running a compatible version of Lustre and is 127.0.0.1@tcp > one of its NIDs? > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: > (socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet > type 1 len 240 192.174.32.87@tcp->127.0.0.1@tcp Nov 27 > 11:53:43 head4 kernel: LustreError: 17622:0: > (events.c:53:request_out_callback()) @@@ type 4, status -5 > req@0000010003d31e00 x1/t0 o8->ost1-test_UUID@localhost_UUID:6 lens > 240/272 ref 2 fl Rpc:/0/0 rc 0/0 > Nov 27 11:53:43 head4 kernel: LustreError: 18648:0: > (client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent > at 1164653623, 0s ago) req@0000010003d31e00 x1/t0 o8->ost1- > test_UUID@localhost_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 > > And finally, the RPC timeout messages continue to log into > the system log every 30 seconds or so until I reboot. Can > someone see something I''m missing or has worked through this > problem? I wonder if the installation manual & howto wiki is > missing something or assuming something that is different in > my setup. I run nmap localhost and see that port 988/tcp is > open. Thanks in advance for any help. > > -- > Jim Albin > Sr. Systems Administrator, HPC Systems > Scientific Computing Center > National Renewable Energy Laboratory > >
Aaron Knister
2006-Nov-27 12:54 UTC
[Lustre-discuss] install problem; localhost test generates
Looking at the bolded line, lustre doesn''t have a configured NID for "127.0.0.1". Can you run a "grep -B2 nid" on your xml file and post the output? -Aaron> Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for > OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp > Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from > *127.0.0.1 for 127.0.0.1@tcp: No matching NI* > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: >-- Aaron Knister Center for Research on Environment and Water 4041 Powder Mill Road, Suite 302; Calverton MD 20705 Office: (240) 247-1456 Fax: (301) 595-9790 http://crew.iges.org
Jim Albin
2006-Nov-27 13:17 UTC
[Lustre-discuss] install problem; localhost test generates
Here is the nid strings from local.xml [root@lester4 lustre.localhost]# grep -B2 nid local.xml <profile_ref uuidref=''PROFILE_localhost_UUID''/> <network uuid=''NET_localhost_tcp_UUID'' nettype=''tcp'' name=''NET_localhost_tcp''> <nid>localhost</nid> Thanks. Jim Albin On Mon, 2006-11-27 at 14:54 -0500, Aaron Knister wrote:> Looking at the bolded line, lustre doesn''t have a configured NID for > "127.0.0.1". > > Can you run a "grep -B2 nid" on your xml file and post the output? > > -Aaron > > > Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for > > OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp > > Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from > > *127.0.0.1 for 127.0.0.1@tcp: No matching NI* > > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: > > > >-- Jim Albin Sr. Systems Administrator, HPC Systems Scientific Computing Center National Renewable Energy Laboratory
Aaron Knister
2006-Nov-27 13:22 UTC
[Lustre-discuss] install problem; localhost test generates
Open up your local.xml file and change <nid>localhost</nid> to <nid>127.0.0.1@tcp</nid> then try it again. -Aaron Jim Albin wrote:> Here is the nid strings from local.xml > > [root@lester4 lustre.localhost]# grep -B2 nid local.xml > <profile_ref uuidref=''PROFILE_localhost_UUID''/> > <network uuid=''NET_localhost_tcp_UUID'' nettype=''tcp'' > name=''NET_localhost_tcp''> > <nid>localhost</nid> > > Thanks. Jim Albin > > On Mon, 2006-11-27 at 14:54 -0500, Aaron Knister wrote: > >> Looking at the bolded line, lustre doesn''t have a configured NID for >> "127.0.0.1". >> >> Can you run a "grep -B2 nid" on your xml file and post the output? >> >> -Aaron >> >> >>> Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for >>> OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp >>> Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from >>> *127.0.0.1 for 127.0.0.1@tcp: No matching NI* >>> Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: >>> >>> >>
Nathaniel Rutman
2006-Nov-27 13:37 UTC
[Lustre-discuss] install problem; localhost test generates
You can''t use the loopback IP addr as a nid. Use something that does not resolve to 127.0.0.1 Jim Albin wrote:> I''m trying to install and test lustre 1.4.7 on a new system and am > failing to start the single system test. I''ve installed the kernel and > the libcfs module. I copied the steps for creating a local.xml file > using the local.sh script on this page; > https://mail.clusterfs.com/wikis/lustre/LustreHowto > This seems to work correctly and generates the local.xml file. > > When I try to start it with "lconf -v --node localhost --reformat > local.xml > I get this output to the screen, then it hangs until I ^C out of it. The > messages below are logged in the system log file. > > --- <snip> bottom of output from lconf command ---- > > + /usr/sbin/lctl > cfg_device MDC_head4_mds-test_MNT_localhost > setup mds-test_UUID localhost_UUID > quit > MTPT: MNT_localhost MNT_localhost_UUID /mnt/lustre mds-test_UUID lov- > test_UUID > + mkdir /mnt/lustre > + mount -t lustre_lite -o osc=lov-test,mdc=MDC_head4_mds- > test_MNT_localhost local /mnt/lustre > Traceback (most recent call last): > File "/usr/sbin/lconf", line 2852, in ? > main() > File "/usr/sbin/lconf", line 2845, in main > doHost(lustreDB, node_list) > File "/usr/sbin/lconf", line 2288, in doHost > for_each_profile(node_db, prof_list, doSetup) > File "/usr/sbin/lconf", line 2068, in for_each_profile > operation(services) > File "/usr/sbin/lconf", line 2088, in doSetup > n.prepare() > File "/usr/sbin/lconf", line 1899, in prepare > ret, val = run(cmd) > File "/usr/sbin/lconf", line 530, in run > return runcmd(cmd) > File "/usr/sbin/lconf", line 520, in runcmd > out = f.readlines() > KeyboardInterrupt > > ----- /var/log/messages contents ---- > > Nov 27 11:53:43 head4 kernel: kjournald starting. Commit interval 5 > seconds > Nov 27 11:53:43 head4 kernel: LDISKFS FS on loop2, internal journal > Nov 27 11:53:43 head4 kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for > OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp > Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from > 127.0.0.1 for 127.0.0.1@tcp: No matching NI > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: > (socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading HELLO from > 127.0.0.1 > Nov 27 11:53:43 head4 kernel: LustreError: Connection to 127.0.0.1@tcp > at host 127.0.0.1 on port 988 was reset: is it running a compatible > version of Lustre and is 127.0.0.1@tcp one of its NIDs? > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: > (socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240 > 192.174.32.87@tcp->127.0.0.1@tcp > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: > (events.c:53:request_out_callback()) @@@ type 4, status -5 > req@0000010003d31e00 x1/t0 o8->ost1-test_UUID@localhost_UUID:6 lens > 240/272 ref 2 fl Rpc:/0/0 rc 0/0 > Nov 27 11:53:43 head4 kernel: LustreError: 18648:0: > (client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1164653623, 0s ago) req@0000010003d31e00 x1/t0 o8->ost1- > test_UUID@localhost_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 > > And finally, the RPC timeout messages continue to log into the system > log every 30 seconds or so until I reboot. Can someone see something I''m > missing or has worked through this problem? I wonder if the installation > manual & howto wiki is missing something or assuming something that is > different in my setup. I run nmap localhost and see that port 988/tcp is > open. Thanks in advance for any help. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Nathaniel Rutman
2006-Nov-29 10:34 UTC
[Lustre-discuss] install problem; localhost test generates
The node name given in the XML file is used only to identify what services to set up when you run lconf. The --nid entry is what determines the LNET network address that servers/clients will try to communicate with each other on. As a convenience, you can use the hostname in the nid, and lconf will resolve it to its ipaddr. ${LMC} -m $CONFIG --add net --node uml1 --nid uml1@tcp --nettype lnet The nids for a node are determined by LNET from the networks option in modprobe.conf, not lconf. But of course, the nids in the XML must match these nids for any communication to succeed. The way to think about it is that lconf determines who to talk to (remote identities), but modprobe.conf determines the local identity. BTW, the confusion brought on by trying to configure these aspects of Lustre has led to a major overhaul of the configuration system, now called MountConf and debuting in Lustre 1.6.0 https://mail.clusterfs.com/wikis/lustre/MountConf Jim Albin wrote:> Hi, thanks for the response. Yes I was able to get it working and using > the ethernet interface I want it to. Have I interpreted the problem > correctly, it tries to use any or all ethernet interfaces configured > regardless of the node or IP address in the XML file? And if the > loopback does not work given this circumstance then the single system > test instructions are just outdated? I appreciate your help. > > Jim Albin > > On Tue, 2006-11-28 at 17:04 -0800, Nathaniel Rutman wrote: > >> Sorry, I was travelling today - I saw you got help on the list. >> Yes, we need to update our docs, and we are working on that. >> >> Jim Albin wrote: >> >>> Hi again, >>> I used the node name that maps to the interface for eth3 and it works >>> now. Not sure but it appears the nid is getting mapped to eth3 so using >>> that interface for the single system test seems to work. >>> thanks again. >>> Jim Albin >>> >>> On Tue, 2006-11-28 at 09:16 -0700, Jim Albin wrote: >>> >>> >>>> Good morning Nathaniel. >>>> "lctl list_nids" shows this >>>> # lctl list_nids >>>> 192.174.32.87@tcp >>>> >>>> which is the ip address for eth3; which is not mapped to either >>>> localhost or the interface of the hostname (head4 = eth1 >>>> 172.16.100.4). My conclusion is that the localhost single system test >>>> doesn''t work as described, it is mapping the nid to interfaces >>>> regardless of the node name in the xml file. I also found that if I did >>>> not add the "--node localhost" to the lconf --reformat line it will >>>> complain "No host entry" and stop. (instead of also trying the localhost >>>> as described in the installation manual). >>>> >>>> The "New Schema" section describes the LNET concept and mentions that it >>>> will attempt to use all available interfaces but I don''t see any more >>>> advice on how to configure the single system test for a specific >>>> interface. I will try using ip addresses next. >>>> >>>> Thanks for taking the time to respond. >>>> Jim Albin >>>> >>>> On Mon, 2006-11-27 at 15:12 -0800, Nathaniel Rutman wrote: >>>> >>>> >>>>> Use "lctl list_nids" to show the local nids after starting lnet (lctl >>>>> network up). Use one of those in the config. >>>>> You could "ping head4" to see what it resolves to. >>>>> >>>>> Jim Albin wrote: >>>>> >>>>> >>>>>> Hi Nathaniel, >>>>>> I followed the steps in the installation manual section 2.3.1 >>>>>> (LustreManual.html) distributed with it and I tried cutting and pasting >>>>>> the local.sh script from the wiki-howto (Using Supplied Configuration >>>>>> Tools section) >>>>>> (https://mail.clusterfs.com/wikis/lustre/LustreHowto). >>>>>> >>>>>> My /etc/hosts file shows this for localhost: >>>>>> >>>>>> # grep localhost /etc/hosts >>>>>> 127.0.0.1 localhost.localdomain localhost >>>>>> # hostname >>>>>> head4 >>>>>> ( so my hostname does NOT map to 127.0.0.1) >>>>>> >>>>>> I''m wondering if it has something to do with the ethernet interfaces >>>>>> (there are 4) and my default route is set for one of them; >>>>>> >>>>>> #netstat -rn >>>>>> Kernel IP routing table >>>>>> Destination Gateway Genmask Flags MSS Window irtt >>>>>> Iface >>>>>> 192.174.32.0 0.0.0.0 255.255.255.0 U 0 0 0 >>>>>> eth3 >>>>>> 172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 >>>>>> eth2 >>>>>> 172.16.0.0 0.0.0.0 255.255.0.0 U 0 0 0 >>>>>> eth1 >>>>>> 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 >>>>>> eth3 >>>>>> 0.0.0.0 192.174.32.26 0.0.0.0 UG 0 0 0 >>>>>> eth3 >>>>>> >>>>>> This line (from the syslog snippet below) shows the ip address of eth3 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>> (socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> 240 >>>>>> >>>>>> >>>>>> >>>>>>>>> 192.174.32.87@tcp->127.0.0.1@tcp >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> Thanks for looking at it, I can resend the local.sh and/or local.xml if >>>>>> that would help. I suspect this is trivial and I may be able to set up a >>>>>> multiple node test but wanted to try and get this working first. >>>>>> >>>>>> Jim Albin >>>>>> >>>>>> >>>>>> >>>>>> On Mon, 2006-11-27 at 13:28 -0800, Nathaniel Rutman wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Yikes, I hope not. >>>>>>> From the LustreHowTo https://mail.clusterfs.com/wikis/lustre/LustreHowto >>>>>>> "One common problem with some Linux setups is that the hostname is >>>>>>> mapped in /etc/hosts to 127.0.0.1, which causes the clients to be unable >>>>>>> to communicate to the servers." >>>>>>> Where are you looking? >>>>>>> >>>>>>> Jim Albin wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Then the documentation and Wiki are incorrect? >>>>>>>> >>>>>>>> On Mon, 2006-11-27 at 12:37 -0800, Nathaniel Rutman wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> You can''t use the loopback IP addr as a nid. >>>>>>>>> Use something that does not resolve to 127.0.0.1 >>>>>>>>> >>>>>>>>> Jim Albin wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> I''m trying to install and test lustre 1.4.7 on a new system and am >>>>>>>>>> failing to start the single system test. I''ve installed the kernel and >>>>>>>>>> the libcfs module. I copied the steps for creating a local.xml file >>>>>>>>>> using the local.sh script on this page; >>>>>>>>>> https://mail.clusterfs.com/wikis/lustre/LustreHowto >>>>>>>>>> This seems to work correctly and generates the local.xml file. >>>>>>>>>> >>>>>>>>>> When I try to start it with "lconf -v --node localhost --reformat >>>>>>>>>> local.xml >>>>>>>>>> I get this output to the screen, then it hangs until I ^C out of it. The >>>>>>>>>> messages below are logged in the system log file. >>>>>>>>>> >>>>>>>>>> --- <snip> bottom of output from lconf command ---- >>>>>>>>>> >>>>>>>>>> + /usr/sbin/lctl >>>>>>>>>> cfg_device MDC_head4_mds-test_MNT_localhost >>>>>>>>>> setup mds-test_UUID localhost_UUID >>>>>>>>>> quit >>>>>>>>>> MTPT: MNT_localhost MNT_localhost_UUID /mnt/lustre mds-test_UUID lov- >>>>>>>>>> test_UUID >>>>>>>>>> + mkdir /mnt/lustre >>>>>>>>>> + mount -t lustre_lite -o osc=lov-test,mdc=MDC_head4_mds- >>>>>>>>>> test_MNT_localhost local /mnt/lustre >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File "/usr/sbin/lconf", line 2852, in ? >>>>>>>>>> main() >>>>>>>>>> File "/usr/sbin/lconf", line 2845, in main >>>>>>>>>> doHost(lustreDB, node_list) >>>>>>>>>> File "/usr/sbin/lconf", line 2288, in doHost >>>>>>>>>> for_each_profile(node_db, prof_list, doSetup) >>>>>>>>>> File "/usr/sbin/lconf", line 2068, in for_each_profile >>>>>>>>>> operation(services) >>>>>>>>>> File "/usr/sbin/lconf", line 2088, in doSetup >>>>>>>>>> n.prepare() >>>>>>>>>> File "/usr/sbin/lconf", line 1899, in prepare >>>>>>>>>> ret, val = run(cmd) >>>>>>>>>> File "/usr/sbin/lconf", line 530, in run >>>>>>>>>> return runcmd(cmd) >>>>>>>>>> File "/usr/sbin/lconf", line 520, in runcmd >>>>>>>>>> out = f.readlines() >>>>>>>>>> KeyboardInterrupt >>>>>>>>>> >>>>>>>>>> ----- /var/log/messages contents ---- >>>>>>>>>> >>>>>>>>>> Nov 27 11:53:43 head4 kernel: kjournald starting. Commit interval 5 >>>>>>>>>> seconds >>>>>>>>>> Nov 27 11:53:43 head4 kernel: LDISKFS FS on loop2, internal journal >>>>>>>>>> Nov 27 11:53:43 head4 kernel: LDISKFS-fs: mounted filesystem with >>>>>>>>>> ordered data mode. >>>>>>>>>> Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for >>>>>>>>>> OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp >>>>>>>>>> Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from >>>>>>>>>> 127.0.0.1 for 127.0.0.1@tcp: No matching NI >>>>>>>>>> Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: >>>>>>>>>> (socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading HELLO from >>>>>>>>>> 127.0.0.1 >>>>>>>>>> Nov 27 11:53:43 head4 kernel: LustreError: Connection to 127.0.0.1@tcp >>>>>>>>>> at host 127.0.0.1 on port 988 was reset: is it running a compatible >>>>>>>>>> version of Lustre and is 127.0.0.1@tcp one of its NIDs? >>>>>>>>>> Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: >>>>>>>>>> (socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240 >>>>>>>>>> 192.174.32.87@tcp->127.0.0.1@tcp >>>>>>>>>> Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: >>>>>>>>>> (events.c:53:request_out_callback()) @@@ type 4, status -5 >>>>>>>>>> req@0000010003d31e00 x1/t0 o8->ost1-test_UUID@localhost_UUID:6 lens >>>>>>>>>> 240/272 ref 2 fl Rpc:/0/0 rc 0/0 >>>>>>>>>> Nov 27 11:53:43 head4 kernel: LustreError: 18648:0: >>>>>>>>>> (client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at >>>>>>>>>> 1164653623, 0s ago) req@0000010003d31e00 x1/t0 o8->ost1- >>>>>>>>>> test_UUID@localhost_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 >>>>>>>>>> >>>>>>>>>> And finally, the RPC timeout messages continue to log into the system >>>>>>>>>> log every 30 seconds or so until I reboot. Can someone see something I''m >>>>>>>>>> missing or has worked through this problem? I wonder if the installation >>>>>>>>>> manual & howto wiki is missing something or assuming something that is >>>>>>>>>> different in my setup. I run nmap localhost and see that port 988/tcp is >>>>>>>>>> open. Thanks in advance for any help. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Lustre-discuss mailing list >>>>>>>>>> Lustre-discuss@clusterfs.com >>>>>>>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>