Jim Albin
2006-Nov-27 12:22 UTC
[Lustre-discuss] install problem; localhost test generates
I''m trying to install and test lustre 1.4.7 on a new system and am
failing to start the single system test. I''ve installed the kernel and
the libcfs module. I copied the steps for creating a local.xml file
using the local.sh script on this page;
https://mail.clusterfs.com/wikis/lustre/LustreHowto
This seems to work correctly and generates the local.xml file.
When I try to start it with "lconf -v --node localhost --reformat
local.xml
I get this output to the screen, then it hangs until I ^C out of it. The
messages below are logged in the system log file.
--- <snip> bottom of output from lconf command ----
+ /usr/sbin/lctl
cfg_device MDC_head4_mds-test_MNT_localhost
setup mds-test_UUID localhost_UUID
quit
MTPT: MNT_localhost MNT_localhost_UUID /mnt/lustre mds-test_UUID lov-
test_UUID
+ mkdir /mnt/lustre
+ mount -t lustre_lite -o osc=lov-test,mdc=MDC_head4_mds-
test_MNT_localhost local /mnt/lustre
Traceback (most recent call last):
File "/usr/sbin/lconf", line 2852, in ?
main()
File "/usr/sbin/lconf", line 2845, in main
doHost(lustreDB, node_list)
File "/usr/sbin/lconf", line 2288, in doHost
for_each_profile(node_db, prof_list, doSetup)
File "/usr/sbin/lconf", line 2068, in for_each_profile
operation(services)
File "/usr/sbin/lconf", line 2088, in doSetup
n.prepare()
File "/usr/sbin/lconf", line 1899, in prepare
ret, val = run(cmd)
File "/usr/sbin/lconf", line 530, in run
return runcmd(cmd)
File "/usr/sbin/lconf", line 520, in runcmd
out = f.readlines()
KeyboardInterrupt
----- /var/log/messages contents ----
Nov 27 11:53:43 head4 kernel: kjournald starting. Commit interval 5
seconds
Nov 27 11:53:43 head4 kernel: LDISKFS FS on loop2, internal journal
Nov 27 11:53:43 head4 kernel: LDISKFS-fs: mounted filesystem with
ordered data mode.
Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for
OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp
Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from
127.0.0.1 for 127.0.0.1@tcp: No matching NI
Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
(socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading HELLO from
127.0.0.1
Nov 27 11:53:43 head4 kernel: LustreError: Connection to 127.0.0.1@tcp
at host 127.0.0.1 on port 988 was reset: is it running a compatible
version of Lustre and is 127.0.0.1@tcp one of its NIDs?
Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
(socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240
192.174.32.87@tcp->127.0.0.1@tcp
Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
(events.c:53:request_out_callback()) @@@ type 4, status -5
req@0000010003d31e00 x1/t0 o8->ost1-test_UUID@localhost_UUID:6 lens
240/272 ref 2 fl Rpc:/0/0 rc 0/0
Nov 27 11:53:43 head4 kernel: LustreError: 18648:0:
(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at
1164653623, 0s ago) req@0000010003d31e00 x1/t0 o8->ost1-
test_UUID@localhost_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
And finally, the RPC timeout messages continue to log into the system
log every 30 seconds or so until I reboot. Can someone see something
I''m
missing or has worked through this problem? I wonder if the installation
manual & howto wiki is missing something or assuming something that is
different in my setup. I run nmap localhost and see that port 988/tcp is
open. Thanks in advance for any help.
--
Jim Albin
Sr. Systems Administrator, HPC Systems
Scientific Computing Center
National Renewable Energy Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: local.sh
Type: application/x-shellscript
Size: 795 bytes
Desc: not available
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20061127/54e591c1/local.bin
Felix, Evan J
2006-Nov-27 12:38 UTC
[Lustre-discuss] install problem; localhost test generates
Can you tell us if iptables or selinux is enabled and running on this machine? Evan> -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Jim Albin > Sent: Monday, November 27, 2006 11:22 AM > To: lustre-discuss@clusterfs.com > Subject: [Lustre-discuss] install problem; localhost test generates > > I''m trying to install and test lustre 1.4.7 on a new system > and am failing to start the single system test. I''ve > installed the kernel and the libcfs module. I copied the > steps for creating a local.xml file using the local.sh script > on this page; https://mail.clusterfs.com/wikis/lustre/LustreHowto > This seems to work correctly and generates the local.xml file. > > When I try to start it with "lconf -v --node localhost > --reformat local.xml I get this output to the screen, then it > hangs until I ^C out of it. The messages below are logged in > the system log file. > > --- <snip> bottom of output from lconf command ---- > > + /usr/sbin/lctl > cfg_device MDC_head4_mds-test_MNT_localhost > setup mds-test_UUID localhost_UUID > quit > MTPT: MNT_localhost MNT_localhost_UUID /mnt/lustre > mds-test_UUID lov- test_UUID > + mkdir /mnt/lustre > + mount -t lustre_lite -o osc=lov-test,mdc=MDC_head4_mds- > test_MNT_localhost local /mnt/lustre > Traceback (most recent call last): > File "/usr/sbin/lconf", line 2852, in ? > main() > File "/usr/sbin/lconf", line 2845, in main > doHost(lustreDB, node_list) > File "/usr/sbin/lconf", line 2288, in doHost > for_each_profile(node_db, prof_list, doSetup) > File "/usr/sbin/lconf", line 2068, in for_each_profile > operation(services) > File "/usr/sbin/lconf", line 2088, in doSetup > n.prepare() > File "/usr/sbin/lconf", line 1899, in prepare > ret, val = run(cmd) > File "/usr/sbin/lconf", line 530, in run > return runcmd(cmd) > File "/usr/sbin/lconf", line 520, in runcmd > out = f.readlines() > KeyboardInterrupt > > ----- /var/log/messages contents ---- > > Nov 27 11:53:43 head4 kernel: kjournald starting. Commit > interval 5 seconds Nov 27 11:53:43 head4 kernel: LDISKFS FS > on loop2, internal journal Nov 27 11:53:43 head4 kernel: > LDISKFS-fs: mounted filesystem with ordered data mode. > Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for > OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp > Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from > 127.0.0.1 for 127.0.0.1@tcp: No matching NI Nov 27 11:53:43 > head4 kernel: LustreError: 17622:0: > (socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading > HELLO from > 127.0.0.1 > Nov 27 11:53:43 head4 kernel: LustreError: Connection to > 127.0.0.1@tcp at host 127.0.0.1 on port 988 was reset: is it > running a compatible version of Lustre and is 127.0.0.1@tcp > one of its NIDs? > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: > (socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet > type 1 len 240 192.174.32.87@tcp->127.0.0.1@tcp Nov 27 > 11:53:43 head4 kernel: LustreError: 17622:0: > (events.c:53:request_out_callback()) @@@ type 4, status -5 > req@0000010003d31e00 x1/t0 o8->ost1-test_UUID@localhost_UUID:6 lens > 240/272 ref 2 fl Rpc:/0/0 rc 0/0 > Nov 27 11:53:43 head4 kernel: LustreError: 18648:0: > (client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent > at 1164653623, 0s ago) req@0000010003d31e00 x1/t0 o8->ost1- > test_UUID@localhost_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 > > And finally, the RPC timeout messages continue to log into > the system log every 30 seconds or so until I reboot. Can > someone see something I''m missing or has worked through this > problem? I wonder if the installation manual & howto wiki is > missing something or assuming something that is different in > my setup. I run nmap localhost and see that port 988/tcp is > open. Thanks in advance for any help. > > -- > Jim Albin > Sr. Systems Administrator, HPC Systems > Scientific Computing Center > National Renewable Energy Laboratory > >
Aaron Knister
2006-Nov-27 12:54 UTC
[Lustre-discuss] install problem; localhost test generates
Looking at the bolded line, lustre doesn''t have a configured NID for "127.0.0.1". Can you run a "grep -B2 nid" on your xml file and post the output? -Aaron> Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for > OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp > Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from > *127.0.0.1 for 127.0.0.1@tcp: No matching NI* > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: >-- Aaron Knister Center for Research on Environment and Water 4041 Powder Mill Road, Suite 302; Calverton MD 20705 Office: (240) 247-1456 Fax: (301) 595-9790 http://crew.iges.org
Jim Albin
2006-Nov-27 13:17 UTC
[Lustre-discuss] install problem; localhost test generates
Here is the nid strings from local.xml
[root@lester4 lustre.localhost]# grep -B2 nid local.xml
<profile_ref uuidref=''PROFILE_localhost_UUID''/>
<network uuid=''NET_localhost_tcp_UUID''
nettype=''tcp''
name=''NET_localhost_tcp''>
<nid>localhost</nid>
Thanks. Jim Albin
On Mon, 2006-11-27 at 14:54 -0500, Aaron Knister wrote:> Looking at the bolded line, lustre doesn''t have a configured NID
for
> "127.0.0.1".
>
> Can you run a "grep -B2 nid" on your xml file and post the
output?
>
> -Aaron
>
> > Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for
> > OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp
> > Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from
> > *127.0.0.1 for 127.0.0.1@tcp: No matching NI*
> > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
> >
>
>
--
Jim Albin
Sr. Systems Administrator, HPC Systems
Scientific Computing Center
National Renewable Energy Laboratory
Aaron Knister
2006-Nov-27 13:22 UTC
[Lustre-discuss] install problem; localhost test generates
Open up your local.xml file and change <nid>localhost</nid> to <nid>127.0.0.1@tcp</nid> then try it again. -Aaron Jim Albin wrote:> Here is the nid strings from local.xml > > [root@lester4 lustre.localhost]# grep -B2 nid local.xml > <profile_ref uuidref=''PROFILE_localhost_UUID''/> > <network uuid=''NET_localhost_tcp_UUID'' nettype=''tcp'' > name=''NET_localhost_tcp''> > <nid>localhost</nid> > > Thanks. Jim Albin > > On Mon, 2006-11-27 at 14:54 -0500, Aaron Knister wrote: > >> Looking at the bolded line, lustre doesn''t have a configured NID for >> "127.0.0.1". >> >> Can you run a "grep -B2 nid" on your xml file and post the output? >> >> -Aaron >> >> >>> Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for >>> OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp >>> Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from >>> *127.0.0.1 for 127.0.0.1@tcp: No matching NI* >>> Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: >>> >>> >>
Nathaniel Rutman
2006-Nov-27 13:37 UTC
[Lustre-discuss] install problem; localhost test generates
You can''t use the loopback IP addr as a nid. Use something that does not resolve to 127.0.0.1 Jim Albin wrote:> I''m trying to install and test lustre 1.4.7 on a new system and am > failing to start the single system test. I''ve installed the kernel and > the libcfs module. I copied the steps for creating a local.xml file > using the local.sh script on this page; > https://mail.clusterfs.com/wikis/lustre/LustreHowto > This seems to work correctly and generates the local.xml file. > > When I try to start it with "lconf -v --node localhost --reformat > local.xml > I get this output to the screen, then it hangs until I ^C out of it. The > messages below are logged in the system log file. > > --- <snip> bottom of output from lconf command ---- > > + /usr/sbin/lctl > cfg_device MDC_head4_mds-test_MNT_localhost > setup mds-test_UUID localhost_UUID > quit > MTPT: MNT_localhost MNT_localhost_UUID /mnt/lustre mds-test_UUID lov- > test_UUID > + mkdir /mnt/lustre > + mount -t lustre_lite -o osc=lov-test,mdc=MDC_head4_mds- > test_MNT_localhost local /mnt/lustre > Traceback (most recent call last): > File "/usr/sbin/lconf", line 2852, in ? > main() > File "/usr/sbin/lconf", line 2845, in main > doHost(lustreDB, node_list) > File "/usr/sbin/lconf", line 2288, in doHost > for_each_profile(node_db, prof_list, doSetup) > File "/usr/sbin/lconf", line 2068, in for_each_profile > operation(services) > File "/usr/sbin/lconf", line 2088, in doSetup > n.prepare() > File "/usr/sbin/lconf", line 1899, in prepare > ret, val = run(cmd) > File "/usr/sbin/lconf", line 530, in run > return runcmd(cmd) > File "/usr/sbin/lconf", line 520, in runcmd > out = f.readlines() > KeyboardInterrupt > > ----- /var/log/messages contents ---- > > Nov 27 11:53:43 head4 kernel: kjournald starting. Commit interval 5 > seconds > Nov 27 11:53:43 head4 kernel: LDISKFS FS on loop2, internal journal > Nov 27 11:53:43 head4 kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for > OSC_head4_ost1-test_mds-test to localhost_UUID/127.0.0.1@tcp > Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from > 127.0.0.1 for 127.0.0.1@tcp: No matching NI > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: > (socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading HELLO from > 127.0.0.1 > Nov 27 11:53:43 head4 kernel: LustreError: Connection to 127.0.0.1@tcp > at host 127.0.0.1 on port 988 was reset: is it running a compatible > version of Lustre and is 127.0.0.1@tcp one of its NIDs? > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: > (socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240 > 192.174.32.87@tcp->127.0.0.1@tcp > Nov 27 11:53:43 head4 kernel: LustreError: 17622:0: > (events.c:53:request_out_callback()) @@@ type 4, status -5 > req@0000010003d31e00 x1/t0 o8->ost1-test_UUID@localhost_UUID:6 lens > 240/272 ref 2 fl Rpc:/0/0 rc 0/0 > Nov 27 11:53:43 head4 kernel: LustreError: 18648:0: > (client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1164653623, 0s ago) req@0000010003d31e00 x1/t0 o8->ost1- > test_UUID@localhost_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 > > And finally, the RPC timeout messages continue to log into the system > log every 30 seconds or so until I reboot. Can someone see something I''m > missing or has worked through this problem? I wonder if the installation > manual & howto wiki is missing something or assuming something that is > different in my setup. I run nmap localhost and see that port 988/tcp is > open. Thanks in advance for any help. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Nathaniel Rutman
2006-Nov-29 10:34 UTC
[Lustre-discuss] install problem; localhost test generates
The node name given in the XML file is used only to identify what
services to set up when you run lconf. The --nid entry is what
determines the LNET network address that servers/clients will try to
communicate with each other on.
As a convenience, you can use the hostname in the nid, and lconf will
resolve it to its ipaddr.
${LMC} -m $CONFIG --add net --node uml1 --nid uml1@tcp --nettype lnet
The nids for a node are determined by LNET from the networks option in
modprobe.conf, not lconf.
But of course, the nids in the XML must match these nids for any
communication to succeed.
The way to think about it is that lconf determines who to talk to
(remote identities), but modprobe.conf determines
the local identity.
BTW, the confusion brought on by trying to configure these aspects of
Lustre has led to a major overhaul of
the configuration system, now called MountConf and debuting in Lustre 1.6.0
https://mail.clusterfs.com/wikis/lustre/MountConf
Jim Albin wrote:> Hi, thanks for the response. Yes I was able to get it working and using
> the ethernet interface I want it to. Have I interpreted the problem
> correctly, it tries to use any or all ethernet interfaces configured
> regardless of the node or IP address in the XML file? And if the
> loopback does not work given this circumstance then the single system
> test instructions are just outdated? I appreciate your help.
>
> Jim Albin
>
> On Tue, 2006-11-28 at 17:04 -0800, Nathaniel Rutman wrote:
>
>> Sorry, I was travelling today - I saw you got help on the list.
>> Yes, we need to update our docs, and we are working on that.
>>
>> Jim Albin wrote:
>>
>>> Hi again,
>>> I used the node name that maps to the interface for eth3 and it
works
>>> now. Not sure but it appears the nid is getting mapped to eth3 so
using
>>> that interface for the single system test seems to work.
>>> thanks again.
>>> Jim Albin
>>>
>>> On Tue, 2006-11-28 at 09:16 -0700, Jim Albin wrote:
>>>
>>>
>>>> Good morning Nathaniel.
>>>> "lctl list_nids" shows this
>>>> # lctl list_nids
>>>> 192.174.32.87@tcp
>>>>
>>>> which is the ip address for eth3; which is not mapped to either
>>>> localhost or the interface of the hostname (head4 = eth1
>>>> 172.16.100.4). My conclusion is that the localhost single
system test
>>>> doesn''t work as described, it is mapping the nid to
interfaces
>>>> regardless of the node name in the xml file. I also found that
if I did
>>>> not add the "--node localhost" to the lconf
--reformat line it will
>>>> complain "No host entry" and stop. (instead of also
trying the localhost
>>>> as described in the installation manual).
>>>>
>>>> The "New Schema" section describes the LNET concept
and mentions that it
>>>> will attempt to use all available interfaces but I
don''t see any more
>>>> advice on how to configure the single system test for a
specific
>>>> interface. I will try using ip addresses next.
>>>>
>>>> Thanks for taking the time to respond.
>>>> Jim Albin
>>>>
>>>> On Mon, 2006-11-27 at 15:12 -0800, Nathaniel Rutman wrote:
>>>>
>>>>
>>>>> Use "lctl list_nids" to show the local nids after
starting lnet (lctl
>>>>> network up). Use one of those in the config.
>>>>> You could "ping head4" to see what it resolves
to.
>>>>>
>>>>> Jim Albin wrote:
>>>>>
>>>>>
>>>>>> Hi Nathaniel,
>>>>>> I followed the steps in the installation manual
section 2.3.1
>>>>>> (LustreManual.html) distributed with it and I tried
cutting and pasting
>>>>>> the local.sh script from the wiki-howto (Using Supplied
Configuration
>>>>>> Tools section)
>>>>>> (https://mail.clusterfs.com/wikis/lustre/LustreHowto).
>>>>>>
>>>>>> My /etc/hosts file shows this for localhost:
>>>>>>
>>>>>> # grep localhost /etc/hosts
>>>>>> 127.0.0.1 localhost.localdomain localhost
>>>>>> # hostname
>>>>>> head4
>>>>>> ( so my hostname does NOT map to 127.0.0.1)
>>>>>>
>>>>>> I''m wondering if it has something to do with
the ethernet interfaces
>>>>>> (there are 4) and my default route is set for one of
them;
>>>>>>
>>>>>> #netstat -rn
>>>>>> Kernel IP routing table
>>>>>> Destination Gateway Genmask Flags
MSS Window irtt
>>>>>> Iface
>>>>>> 192.174.32.0 0.0.0.0 255.255.255.0 U
0 0 0
>>>>>> eth3
>>>>>> 172.18.0.0 0.0.0.0 255.255.0.0 U
0 0 0
>>>>>> eth2
>>>>>> 172.16.0.0 0.0.0.0 255.255.0.0 U
0 0 0
>>>>>> eth1
>>>>>> 169.254.0.0 0.0.0.0 255.255.0.0 U
0 0 0
>>>>>> eth3
>>>>>> 0.0.0.0 192.174.32.26 0.0.0.0 UG
0 0 0
>>>>>> eth3
>>>>>>
>>>>>> This line (from the syslog snippet below) shows the ip
address of eth3
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> (socklnd_cb.c:396:ksocknal_txlist_done())
Deleting packet type 1 len
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>> 240
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> 192.174.32.87@tcp->127.0.0.1@tcp
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>> Thanks for looking at it, I can resend the local.sh
and/or local.xml if
>>>>>> that would help. I suspect this is trivial and I may be
able to set up a
>>>>>> multiple node test but wanted to try and get this
working first.
>>>>>>
>>>>>> Jim Albin
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, 2006-11-27 at 13:28 -0800, Nathaniel Rutman
wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Yikes, I hope not.
>>>>>>> From the LustreHowTo
https://mail.clusterfs.com/wikis/lustre/LustreHowto
>>>>>>> "One common problem with some Linux setups is
that the hostname is
>>>>>>> mapped in /etc/hosts to 127.0.0.1, which causes the
clients to be unable
>>>>>>> to communicate to the servers."
>>>>>>> Where are you looking?
>>>>>>>
>>>>>>> Jim Albin wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Then the documentation and Wiki are incorrect?
>>>>>>>>
>>>>>>>> On Mon, 2006-11-27 at 12:37 -0800, Nathaniel
Rutman wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> You can''t use the loopback IP addr
as a nid.
>>>>>>>>> Use something that does not resolve to
127.0.0.1
>>>>>>>>>
>>>>>>>>> Jim Albin wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I''m trying to install and test
lustre 1.4.7 on a new system and am
>>>>>>>>>> failing to start the single system
test. I''ve installed the kernel and
>>>>>>>>>> the libcfs module. I copied the steps
for creating a local.xml file
>>>>>>>>>> using the local.sh script on this page;
>>>>>>>>>>
https://mail.clusterfs.com/wikis/lustre/LustreHowto
>>>>>>>>>> This seems to work correctly and
generates the local.xml file.
>>>>>>>>>>
>>>>>>>>>> When I try to start it with "lconf
-v --node localhost --reformat
>>>>>>>>>> local.xml
>>>>>>>>>> I get this output to the screen, then
it hangs until I ^C out of it. The
>>>>>>>>>> messages below are logged in the system
log file.
>>>>>>>>>>
>>>>>>>>>> --- <snip> bottom of output from
lconf command ----
>>>>>>>>>>
>>>>>>>>>> + /usr/sbin/lctl
>>>>>>>>>> cfg_device
MDC_head4_mds-test_MNT_localhost
>>>>>>>>>> setup mds-test_UUID localhost_UUID
>>>>>>>>>> quit
>>>>>>>>>> MTPT: MNT_localhost MNT_localhost_UUID
/mnt/lustre mds-test_UUID lov-
>>>>>>>>>> test_UUID
>>>>>>>>>> + mkdir /mnt/lustre
>>>>>>>>>> + mount -t lustre_lite -o
osc=lov-test,mdc=MDC_head4_mds-
>>>>>>>>>> test_MNT_localhost local /mnt/lustre
>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>> File "/usr/sbin/lconf",
line 2852, in ?
>>>>>>>>>> main()
>>>>>>>>>> File "/usr/sbin/lconf",
line 2845, in main
>>>>>>>>>> doHost(lustreDB, node_list)
>>>>>>>>>> File "/usr/sbin/lconf",
line 2288, in doHost
>>>>>>>>>> for_each_profile(node_db,
prof_list, doSetup)
>>>>>>>>>> File "/usr/sbin/lconf",
line 2068, in for_each_profile
>>>>>>>>>> operation(services)
>>>>>>>>>> File "/usr/sbin/lconf",
line 2088, in doSetup
>>>>>>>>>> n.prepare()
>>>>>>>>>> File "/usr/sbin/lconf",
line 1899, in prepare
>>>>>>>>>> ret, val = run(cmd)
>>>>>>>>>> File "/usr/sbin/lconf",
line 530, in run
>>>>>>>>>> return runcmd(cmd)
>>>>>>>>>> File "/usr/sbin/lconf",
line 520, in runcmd
>>>>>>>>>> out = f.readlines()
>>>>>>>>>> KeyboardInterrupt
>>>>>>>>>>
>>>>>>>>>> ----- /var/log/messages contents ----
>>>>>>>>>>
>>>>>>>>>> Nov 27 11:53:43 head4 kernel: kjournald
starting. Commit interval 5
>>>>>>>>>> seconds
>>>>>>>>>> Nov 27 11:53:43 head4 kernel: LDISKFS
FS on loop2, internal journal
>>>>>>>>>> Nov 27 11:53:43 head4 kernel:
LDISKFS-fs: mounted filesystem with
>>>>>>>>>> ordered data mode.
>>>>>>>>>> Nov 27 11:53:43 head4 kernel: Lustre:
Changing connection for
>>>>>>>>>> OSC_head4_ost1-test_mds-test to
localhost_UUID/127.0.0.1@tcp
>>>>>>>>>> Nov 27 11:53:43 head4 kernel:
LustreError: Refusing connection from
>>>>>>>>>> 127.0.0.1 for 127.0.0.1@tcp: No
matching NI
>>>>>>>>>> Nov 27 11:53:43 head4 kernel:
LustreError: 17622:0:
>>>>>>>>>>
(socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading HELLO from
>>>>>>>>>> 127.0.0.1
>>>>>>>>>> Nov 27 11:53:43 head4 kernel:
LustreError: Connection to 127.0.0.1@tcp
>>>>>>>>>> at host 127.0.0.1 on port 988 was
reset: is it running a compatible
>>>>>>>>>> version of Lustre and is 127.0.0.1@tcp
one of its NIDs?
>>>>>>>>>> Nov 27 11:53:43 head4 kernel:
LustreError: 17622:0:
>>>>>>>>>>
(socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240
>>>>>>>>>> 192.174.32.87@tcp->127.0.0.1@tcp
>>>>>>>>>> Nov 27 11:53:43 head4 kernel:
LustreError: 17622:0:
>>>>>>>>>> (events.c:53:request_out_callback())
@@@ type 4, status -5
>>>>>>>>>> req@0000010003d31e00 x1/t0
o8->ost1-test_UUID@localhost_UUID:6 lens
>>>>>>>>>> 240/272 ref 2 fl Rpc:/0/0 rc 0/0
>>>>>>>>>> Nov 27 11:53:43 head4 kernel:
LustreError: 18648:0:
>>>>>>>>>>
(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at
>>>>>>>>>> 1164653623, 0s ago)
req@0000010003d31e00 x1/t0 o8->ost1-
>>>>>>>>>> test_UUID@localhost_UUID:6 lens 240/272
ref 1 fl Rpc:/0/0 rc 0/0
>>>>>>>>>>
>>>>>>>>>> And finally, the RPC timeout messages
continue to log into the system
>>>>>>>>>> log every 30 seconds or so until I
reboot. Can someone see something I''m
>>>>>>>>>> missing or has worked through this
problem? I wonder if the installation
>>>>>>>>>> manual & howto wiki is missing
something or assuming something that is
>>>>>>>>>> different in my setup. I run nmap
localhost and see that port 988/tcp is
>>>>>>>>>> open. Thanks in advance for any help.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> Lustre-discuss mailing list
>>>>>>>>>> Lustre-discuss@clusterfs.com
>>>>>>>>>>
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>