thr3ads.net - Lustre discuss - [Lustre-discuss] Starting Lustre for the first time... [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Kevin L. Buterbaugh

2007-Feb-02 11:54 UTC

[Lustre-discuss] Starting Lustre for the first time...

All,

Apologies in advance for the (long) newbie question, but I''ve Googled 
this and the hits I''ve gotten haven''t helped resolve it (seems
other
people have had this same problem, but what was suggested to them I''m 
already doing (I think)).  If I''ve missed a URL where this is answered
/
explained, please feel free to point me in that direction...

I''m trying to get Lustre 1.4.8 going on a test cluster.  I installed
the
software from the pre-packaged rpm''s and rebooted.  All my nodes show 
"uname -a" output similar to the following:

Linux lustrem 2.6.9-42.0.3.EL_lustre.1.4.8smp #1 SMP Tue Dec 19 09:07:46 
MST 2006 x86_64 x86_64 x86_64 GNU/Linux

My cluster consists of 5 dual-processor Opterons and 14 dual-processor 
P4''s (you can mix 32-bit / 64-bit as long as you install the right 
rpm''s, can''t you?).  One of the Opterons is my MDS (hostname: 
lustrem),
the other four are my OSD''s (hostnames:  lustre1 - 4).  I have two 
dual-controller FC storage arrays.  Both controllers in the first 
storage array are connected to two of my OSD''s (lustre1 / 2) and the
2nd
storage array and lustre3 / 4 are connected identically.  I have 2 RAID 
5 LUNs defined on each of the storage arrays.  lustre1 / 2 can both see 
both of the RAID 5 LUNs as the following shows:

[root@lustre1 ~]# fdisk -l

Disk /dev/hda: 41.1 GB, 41110142976 bytes
255 heads, 63 sectors/track, 4998 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1               1         261     2096451   82  Linux swap
/dev/hda2   *         262        4998    38049952+  83  Linux

Disk /dev/sda: 1253.6 GB, 1253635522560 bytes
255 heads, 63 sectors/track, 152412 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1      152412  1224249358+  83  Linux

Disk /dev/sdb: 1253.6 GB, 1253635522560 bytes
255 heads, 63 sectors/track, 152412 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1      152412  1224249358+  83  Linux
[root@lustre1 ~]#

[root@lustre2 ~]# fdisk -l

Disk /dev/hda: 41.1 GB, 41110142976 bytes
255 heads, 63 sectors/track, 4998 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1               1         261     2096451   82  Linux swap
/dev/hda2   *         262        4998    38049952+  83  Linux

Disk /dev/sda: 1253.6 GB, 1253635522560 bytes
255 heads, 63 sectors/track, 152412 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1      152412  1224249358+  83  Linux

Disk /dev/sdb: 1253.6 GB, 1253635522560 bytes
255 heads, 63 sectors/track, 152412 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1      152412  1224249358+  83  Linux
[root@lustre2 ~]#

To try to simplify things, I''m trying to start out only using my MDS 
(lustrem), 2 of my OSD''s (lustre1 / 2), and one of my clients
(scnode01).

Here''s my config.sh:

#!/bin/sh
# config.sh
#
rm -f config.xml
#
# Create nodes
# Trying to get this to work with 1 MDS, 2 OST''s, and 1 client.  Will 
add the
# others when I get this working. - klb, 2/2/07
#
lmc -m config.xml --add node --node lustrem
lmc -m config.xml --add node --node lustre1
lmc -m config.xml --add node --node lustre2
lmc -m config.xml --add node --node client
#
# Configure networking
#
lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp
lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp
lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp
lmc -m config.xml --add net --node client --nid ''*'' --nettype
tcp
#
# Configure MDS
#
lmc -m config.xml --add mds --node lustrem --mds mds-test --fstype 
ldiskfs --dev /tmp/mds-test --size 50000
#
# Configure OSTs - testing with 2 initially - klb, 2/1/2007
#
lmc -m config.xml --add lov --lov lov-test --mds mds-test --stripe_sz 
1048576 --stripe_cnt 0 --stripe_pattern 0
lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost 
ost1-test --fstype ldiskfs --dev /dev/sda1
lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost 
ost2-test --fstype ldiskfs --dev /dev/sdb1
#
# Configure client (this is a ''generic'' client used for all
client mounts)
# testing with 1 client initially - klb, 2/1/2007
#
lmc -m config.xml --add mtpt --node client --path /mnt/lustre --mds 
mds-test --lov lov-test
#
# Copy config.xml to all the other nodes in the cluster - klb, 2/1/07
#
for i in `seq 1 4`
  do
    echo "Copying config.xml to OST lustre$i..."
    rcp -p config.xml root@lustre$i:~/lustre
done

for i in `seq -w 1 14`
  do
    echo "Copying config.xml to client scnode$i..."
    rcp -p config.xml root@scnode$i:~/lustre
done

After running this script, I logged in to lustre1 and executed "lconf 
--reformat --node lustre1 config.xml", which produces the following output:

loading module: libcfs srcdir None devdir libcfs
loading module: lnet srcdir None devdir lnet
loading module: ksocklnd srcdir None devdir klnds/socklnd
loading module: lvfs srcdir None devdir lvfs
loading module: obdclass srcdir None devdir obdclass
loading module: ptlrpc srcdir None devdir ptlrpc
loading module: ost srcdir None devdir ost
loading module: ldiskfs srcdir None devdir ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
loading module: obdfilter srcdir None devdir obdfilter
NETWORK: NET_lustre1_tcp NET_lustre1_tcp_UUID tcp lustre1
OSD: ost1-test ost1-test_UUID obdfilter /dev/sda1 0 ldiskfs no 0 256

And running "lconf --reformat --node lustre2 config.xml" on lustre2 
produces the following output:

loading module: libcfs srcdir None devdir libcfs
loading module: lnet srcdir None devdir lnet
loading module: ksocklnd srcdir None devdir klnds/socklnd
loading module: lvfs srcdir None devdir lvfs
loading module: obdclass srcdir None devdir obdclass
loading module: ptlrpc srcdir None devdir ptlrpc
loading module: ost srcdir None devdir ost
loading module: ldiskfs srcdir None devdir ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
loading module: obdfilter srcdir None devdir obdfilter
NETWORK: NET_lustre2_tcp NET_lustre2_tcp_UUID tcp lustre2
OSD: ost2-test ost2-test_UUID obdfilter /dev/sdb1 0 ldiskfs no 0 256

Next, I logged in to lustrem and executed "lconf --reformat --node 
lustrem config.xml" and see the following:

loading module: libcfs srcdir None devdir libcfs
loading module: lnet srcdir None devdir lnet
loading module: ksocklnd srcdir None devdir klnds/socklnd
loading module: lvfs srcdir None devdir lvfs
loading module: obdclass srcdir None devdir obdclass
loading module: ptlrpc srcdir None devdir ptlrpc
loading module: mdc srcdir None devdir mdc
loading module: osc srcdir None devdir osc
loading module: lov srcdir None devdir lov
loading module: mds srcdir None devdir mds
loading module: ldiskfs srcdir None devdir ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
NETWORK: NET_lustrem_tcp NET_lustrem_tcp_UUID tcp lustrem
MDSDEV: mds-test mds-test_UUID /tmp/mds-test ldiskfs no
recording clients for filesystem: FS_fsname_UUID
Recording log mds-test on mds-test
LOV: lov_mds-test 950ad_lov_mds-test_189f832962 mds-test_UUID 0 1048576 
0 0 [u''ost1-test_UUID'', u''ost2-test_UUID'']
mds-test
OSC: OSC_lustrem_ost1-test_mds-test 950ad_lov_mds-test_189f832962 
ost1-test_UUID
OSC: OSC_lustrem_ost2-test_mds-test 950ad_lov_mds-test_189f832962 
ost2-test_UUID
End recording log mds-test on mds-test
Recording log client on mds-test
MDSDEV: mds-test mds-test_UUID /tmp/mds-test ldiskfs 50000 no
MDS mount options: errors=remount-ro

But when I log on to scnode01 and execute "mount -t lustre 
lustrem:/mds-test/client /mnt/lustre", I get the following error:

mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed: 
Input/output error
mds nid 0:       129.59.197.130@tcp
mds name:        mds-test
profile:         client
options:         rw
retry:           0

One other thing I''ve tried is instead of calling my client
"client" in
config.sh and the mount command, I''ve used the actual hostname 
(scnode01) instead.  That didn''t help.

Again, I apologize for both the length of this post and the newbie 
question, but I can''t seem to figure this out on my own and
I''ve got a
deadline looming.  Any and all help (and even flames, as long as you 
answer my question or point me in the right direction!) is appreciated...

--

Kevin L. Buterbaugh
Advanced Computing Center for Research & Education - Vanderbilt University
www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu

Oleg Drokin

2007-Feb-02 12:12 UTC

head link

[Lustre-discuss] Starting Lustre for the first time...

Hello!

On Fri, Feb 02, 2007 at 12:53:42PM -0600, Kevin L. Buterbaugh
wrote:> But when I log on to scnode01 and execute "mount -t lustre 
> lustrem:/mds-test/client /mnt/lustre", I get the following error:
> mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed: 
> Input/output error
> mds nid 0:       129.59.197.130@tcp
> mds name:        mds-test
> profile:         client
> options:         rw
> retry:           0
I am sure some messages appeared in /var/log/messages on client and on MDS
and perhaps you can share those with us?

Bye,
    Oleg

Kevin L. Buterbaugh

2007-Feb-02 12:16 UTC

head link

[Lustre-discuss] Starting Lustre for the first time...

Oleg,

Sorry, meant to include that.  Here''s the relevant information from the
client (scnode01):

Feb  2 12:48:15 scnode01 kernel: LustreError: 
16536:0:(client.c:576:ptlrpc_check_status()) @@@ type == 
PTL_RPC_MSG_ERR, err == -19 req@f70d8a00 x13/t0 
o38->mds-test@129.59.197.130@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19
Feb  2 12:48:15 scnode01 kernel: LustreError: mdc_dev: The configuration 
''client'' could not be read from the MDS
''mds-test''.  This may be the
result of communication errors between the client and the MDS, or if the 
MDS is not running.
Feb  2 12:48:15 scnode01 kernel: LustreError: 
16533:0:(llite_lib.c:936:lustre_fill_super()) Unable to process log: client

And from the MDS (lustrem):

Feb  2 12:47:34 lustrem kernel: Lustre: OBD class driver Build Version: 
1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp,
info@clusterfs.com
Feb  2 12:47:34 lustrem kernel: Lustre: Added LNI 129.59.197.130@tcp [8/256]
Feb  2 12:47:34 lustrem kernel: Lustre: Accept secure, port 988
Feb  2 12:47:35 lustrem kernel: loop: loaded (max 8 devices)
Feb  2 12:47:36 lustrem kernel: kjournald starting.  Commit interval 5 
seconds
Feb  2 12:47:36 lustrem kernel: LDISKFS FS on loop0, internal journal
Feb  2 12:47:36 lustrem kernel: LDISKFS-fs: mounted filesystem with 
ordered data mode.
Feb  2 12:47:36 lustrem kernel: Lustre: 
3530:0:(mds_fs.c:239:mds_init_server_data()) mds-test: initializing new 
last_rcvd
Feb  2 12:47:36 lustrem kernel: Lustre: MDT mds-test now serving 
/dev/loop0 (9003a2e8-45d6-49bd-ad28-0f1e37bb1cab) with recovery enabled
Feb  2 12:47:37 lustrem kernel: Lustre: MDT mds-test has stopped.
Feb  2 12:47:37 lustrem kernel: kjournald starting.  Commit interval 5 
seconds
Feb  2 12:47:37 lustrem kernel: LDISKFS FS on loop0, internal journal
Feb  2 12:47:37 lustrem kernel: LDISKFS-fs: mounted filesystem with 
ordered data mode.
Feb  2 12:47:37 lustrem kernel: Lustre: Binding irq 185 to CPU 0 with 
cmd: echo 1 > /proc/irq/185/smp_affinity
Feb  2 12:47:42 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442057, 5s ago) req@000001000173c400 x1/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:48:07 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442082, 5s ago) req@000001007e1c2400 x4/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:48:07 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:48:31 lustrem kernel: LustreError: 
3692:0:(ldlm_lib.c:541:target_handle_connect()) @@@ UUID
''mds-test'' is
not available  for connect (not set up) req@000001007e1dec00 x13/t0 
o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc 0/0
Feb  2 12:48:31 lustrem kernel: LustreError: 
3692:0:(ldlm_lib.c:1288:target_send_reply_msg()) @@@ processing error 
(-19) req@000001007e1dec00 x13/t0 o38-><?>@<?>:-1 lens 240/0 ref
0 fl
Interpret:/0/0 rc -19/0
Feb  2 12:48:32 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442107, 5s ago) req@000001007e1a8600 x6/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:48:32 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:48:57 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442132, 5s ago) req@0000010037c5fe00 x8/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:48:57 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:49:22 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442157, 5s ago) req@000001007d544e00 x10/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:49:22 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:49:47 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442182, 5s ago) req@000001007e3aa400 x12/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:49:47 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:50:12 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442207, 5s ago) req@000001007e3ab600 x14/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:50:12 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:50:37 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442232, 5s ago) req@000001007e3a9800 x16/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:50:37 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:51:02 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442257, 5s ago) req@000001007e12ce00 x18/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:51:02 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:51:27 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442282, 5s ago) req@000001007aaad200 x20/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:51:27 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:51:52 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442307, 5s ago) req@00000100765f0e00 x22/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:51:52 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:52:17 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442332, 5s ago) req@000001007e1a8c00 x24/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:52:17 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:52:42 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442357, 5s ago) req@0000010037c46400 x26/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:52:42 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:53:07 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442382, 5s ago) req@0000010037c4bc00 x28/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:53:07 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:53:32 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442407, 5s ago) req@000001007d544a00 x30/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:53:32 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:53:57 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442432, 5s ago) req@000001007d63cc00 x32/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:53:57 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:54:22 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442457, 5s ago) req@000001000169d400 x34/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:54:22 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous 
similar message
Feb  2 12:55:12 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442507, 5s ago) req@000001007e141400 x38/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:55:12 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 3 previous 
similar messages
Feb  2 12:56:27 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442582, 5s ago) req@0000010076980c00 x44/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:56:27 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 5 previous 
similar messages
Feb  2 12:58:57 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170442732, 5s ago) req@000001000169d200 x56/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 12:58:57 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 11 previous 
similar messages
Feb  2 13:03:32 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170443007, 5s ago) req@000001007e2f7800 x78/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 13:03:32 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 21 previous 
similar messages
Feb  2 13:12:17 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170443532, 5s ago) req@0000010037c62800 x120/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 13:12:17 lustrem kernel: LustreError: 
3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 41 previous 
similar messages

Thanks...

Oleg Drokin wrote:> Hello!
>
> On Fri, Feb 02, 2007 at 12:53:42PM -0600, Kevin L. Buterbaugh wrote:
>   
>> But when I log on to scnode01 and execute "mount -t lustre 
>> lustrem:/mds-test/client /mnt/lustre", I get the following error:
>> mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed: 
>> Input/output error
>> mds nid 0:       129.59.197.130@tcp
>> mds name:        mds-test
>> profile:         client
>> options:         rw
>> retry:           0
>>     
>
> I am sure some messages appeared in /var/log/messages on client and on MDS
> and perhaps you can share those with us?
>
> Bye,
>     Oleg
>   
-- 

Kevin L. Buterbaugh
Advanced Computing Center for Research & Education - Vanderbilt University
www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070202/461752db/attachment.html

Andreas Dilger

2007-Feb-02 13:32 UTC

head link

[Lustre-discuss] Starting Lustre for the first time...

On Feb 02, 2007  13:16 -0600, Kevin L. Buterbaugh wrote:> Sorry, meant to include that.  Here''s the relevant information
from the
> client (scnode01):
> 
> Feb  2 12:48:15 scnode01 kernel: LustreError: 
> 16536:0:(client.c:576:ptlrpc_check_status()) @@@ type == 
> PTL_RPC_MSG_ERR, err == -19 req@f70d8a00 x13/t0 
> o38->mds-test@129.59.197.130@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0 rc
0/-19
> Feb  2 12:48:15 scnode01 kernel: LustreError: mdc_dev: The configuration 
> ''client'' could not be read from the MDS
''mds-test''.  This may be the
> result of communication errors between the client and the MDS, or if the 
> MDS is not running.
Client couldn''t connect to the MDS.  -19 = -ENODEV
> And from the MDS (lustrem):
> 
> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
> 1170442057, 5s ago) req@000001000173c400 x1/t0 
> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
> Feb  2 12:48:07 lustrem kernel: LustreError: 
> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
> 1170442082, 5s ago) req@000001007e1c2400 x4/t0 
> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
> Feb  2 12:48:07 lustrem kernel: LustreError: 
These messages indicate failure to connect to the OSTs (op 8 = OST_CONNECT).
What is in the OST syslog?  Are you positive that /dev/sda1 and /dev/sdb1
on the two nodes are set up the same way, so that e.g. lustre1+sda1
isn''t
talking to the same disk as lustre2+sdb1?

Also minor nit - you don''t need to have a partition table, it can hurt
performance on some RAID setups because of the 512-byte offset of IOs
due to the DOS partition table.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Nathaniel Rutman

2007-Feb-02 13:46 UTC

head link

[Lustre-discuss] Starting Lustre for the first time...

Kevin L. Buterbaugh wrote:> #
> # Configure networking
> #
> lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp
> lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp
> lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp
> And from the MDS (lustrem):
>
> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
> 1170442057, 5s ago) req@000001000173c400 x1/t0 
> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
> Feb  2 12:48:07 lustrem kernel: LustreError: 
> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
> 1170442082, 5s ago) req@000001007e1c2400 x4/t0 
> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
> Feb  2 12:48:07 lustrem kernel: LustreError: 
>   
> These messages indicate failure to connect to the OSTs (op 8 =
OST_CONNECT).
There''s always a possibility that  one or more of your nodes
doesn''t
resolve the hostname properly.  In general, I recommend using the actual 
IP address:
lmc -m config.xml --add net --node lustrem --nid 192.168.0.1@tcp  
--nettype lnet

Also, check that every node can ping every other.  On each node:
modprobe lnet
lctl network up
lctl list_nids
Then on each node:
lctl ping <nids from other nodes>
lctl network down (so you''ll be able to remove the module)

Nathaniel Rutman

2007-Feb-02 15:28 UTC

head link

[Lustre-discuss] Starting Lustre for the first time...

Kevin L. Buterbaugh wrote:> Nathan,
>
> I made the changes you suggested to my config.sh and regenerated the 
> config.xml.  I did the test you suggested on the MDS and the client 
> and here''s what I get:
>
> [root@lustrem lustre]# modprobe lnet
> [root@lustrem lustre]# lctl network up
> LNET configured
> [root@lustrem lustre]# lctl list_nids
> 129.59.197.130@tcp
> [root@lustrem lustre]# lctl ping 129.59.197.101@tcp
> 12345-0@lo
> 12345-129.59.197.101@tcp
> [root@lustrem lustre]#
>
> [root@scnode01 ~]# modprobe lnet
> [root@scnode01 ~]# lctl network up
> LNET configured
> [root@scnode01 ~]# lctl list_nids
> 129.59.197.101@tcp
> [root@scnode01 ~]# lctl ping 129.59.197.130@tcp
> 12345-0@lo
> 12345-129.59.197.130@tcp
> [root@scnode01 ~]#
>
>
> That''s not the kind of output I expect from a ping, so I
don''t even
> know if that means it worked or not...
>Yes, that''s good.  It means that the nodes can talk to each other 
through LNET using those nids.
You need to do that between the OST and and MDT also, and between client 
and OST just for good measure.
If they all can see each other, then we''ll have to see the OST syslog
to
see why it''s refusing to talk to the MDT.

PS please keep the discussion on the list -- it might be useful for the 
next guy.


> Kevin
>
> Nathaniel Rutman wrote:
>> Kevin L. Buterbaugh wrote:
>>> #
>>> # Configure networking
>>> #
>>> lmc -m config.xml --add net --node lustrem --nid lustrem --nettype
tcp
>>> lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype
tcp
>>> lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype
tcp
>>
>>> And from the MDS (lustrem):
>>>
>>> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent
>>> at 1170442057, 5s ago) req@000001000173c400 x1/t0 
>>> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0
rc 0/0
>>> Feb  2 12:48:07 lustrem kernel: LustreError: 
>>> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent
>>> at 1170442082, 5s ago) req@000001007e1c2400 x4/t0 
>>> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0
rc 0/0
>>> Feb  2 12:48:07 lustrem kernel: LustreError:   
>>
>>> These messages indicate failure to connect to the OSTs (op 8 = 
>>> OST_CONNECT).
>>
>> There''s always a possibility that  one or more of your nodes
doesn''t
>> resolve the hostname properly.  In general, I recommend using the 
>> actual IP address:
>> lmc -m config.xml --add net --node lustrem --nid 192.168.0.1@tcp  
>> --nettype lnet
>>
>> Also, check that every node can ping every other.  On each node:
>> modprobe lnet
>> lctl network up
>> lctl list_nids
>> Then on each node:
>> lctl ping <nids from other nodes>
>> lctl network down (so you''ll be able to remove the module)
>>
>

Kevin L. Buterbaugh

2007-Feb-02 15:36 UTC

head link

[Lustre-discuss] Starting Lustre for the first time...

Andreas,

I made the changes Nathan suggested to how the networking was set up in 
config.sh.  I also checked the LUNs and you were correct:  sda on 
lustre1 is sdb on lustre2 and vice versa.  So I also changed config.sh 
to use sda1 for both.

However, I still get the exact same error when I try to mount the client 
(and yes, it''s still the ENODEV, but why?):

[root@scnode01 ~]# mount -v -t lustre lustrem:/mds-test/client /mnt/lustre
verbose: 1
arg[0] = /sbin/mount.lustre
arg[1] = lustrem:/mds-test/client
arg[2] = /mnt/lustre
arg[3] = -v
arg[4] = -o
arg[5] = rw
mds nid 0:       129.59.197.130@tcp
mds name:        mds-test
profile:         client
options:         rw
retry:           0
mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed: 
Input/output error
mds nid 0:       129.59.197.130@tcp
mds name:        mds-test
profile:         client
options:         rw
retry:           0
[root@scnode01 ~]#

MDS (lustre1) /var/log/messages:

Feb  2 16:17:18 lustrem kernel: Lustre: OBD class driver Build Version: 
1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp,
info@clusterfs.com
Feb  2 16:17:19 lustrem kernel: Lustre: Added LNI 129.59.197.130@tcp [8/256]
Feb  2 16:17:19 lustrem kernel: Lustre: Accept secure, port 988
Feb  2 16:17:19 lustrem kernel: loop: loaded (max 8 devices)
Feb  2 16:17:21 lustrem kernel: kjournald starting.  Commit interval 5 
seconds
Feb  2 16:17:21 lustrem kernel: LDISKFS FS on loop0, internal journal
Feb  2 16:17:21 lustrem kernel: LDISKFS-fs: mounted filesystem with 
ordered data mode.
Feb  2 16:17:21 lustrem kernel: Lustre: 
3518:0:(mds_fs.c:239:mds_init_server_data()) mds-test: initializing new 
last_rcvd
Feb  2 16:17:21 lustrem kernel: Lustre: MDT mds-test now serving 
/dev/loop0 (b505d8f0-d424-4bf8-a8cd-8bfa8af0cf36) with recovery enabled
Feb  2 16:17:21 lustrem kernel: Lustre: MDT mds-test has stopped.
Feb  2 16:17:22 lustrem kernel: kjournald starting.  Commit interval 5 
seconds
Feb  2 16:17:22 lustrem kernel: LDISKFS FS on loop0, internal journal
Feb  2 16:17:22 lustrem kernel: LDISKFS-fs: mounted filesystem with 
ordered data mode.
Feb  2 16:17:22 lustrem kernel: Lustre: Binding irq 185 to CPU 0 with 
cmd: echo 1 > /proc/irq/185/smp_affinity
Feb  2 16:17:27 lustrem kernel: LustreError: 
3882:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1170454642, 5s ago) req@000001007d563800 x1/t0 
o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
Feb  2 16:17:28 lustrem kernel: LustreError: 
3680:0:(ldlm_lib.c:541:target_handle_connect()) @@@ UUID
''mds-test'' is
not available  for connect (not set up) req@000001007d563400 x27/t0 
o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc 0/0
Feb  2 16:17:28 lustrem kernel: LustreError: 
3680:0:(ldlm_lib.c:1288:target_send_reply_msg()) @@@ processing error 
(-19) req@000001007d563400 x27/t0 o38-><?>@<?>:-1 lens 240/0 ref
0 fl
Interpret:/0/0 rc -19/0

OST (lustre1) /var/log/messages:

Feb  2 16:16:30 lustre1 kernel: Lustre: OBD class driver Build Version: 
1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp,
info@clusterfs.com
Feb  2 16:16:30 lustre1 kernel: Lustre: Added LNI 129.59.197.131@tcp [8/256]
Feb  2 16:16:30 lustre1 kernel: Lustre: Accept secure, port 988
Feb  2 16:16:31 lustre1 kernel: Lustre: Filtering OBD driver; 
info@clusterfs.com
Feb  2 16:17:00 lustre1 kernel: Lustre: Binding irq 185 to CPU 0 with 
cmd: echo 1 > /proc/irq/185/smp_affinity
Feb  2 16:17:00 lustre1 kernel: Lustre: 
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 
12345-129.59.197.130@tcp portal 6 match 1 offset 0 length 240: 2
Feb  2 16:17:25 lustre1 kernel: Lustre: 
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 
12345-129.59.197.130@tcp portal 6 match 4 offset 0 length 240: 2
Feb  2 16:17:50 lustre1 kernel: Lustre: 
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 
12345-129.59.197.130@tcp portal 6 match 6 offset 0 length 240: 2
Feb  2 16:18:15 lustre1 kernel: Lustre: 
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 
12345-129.59.197.130@tcp portal 6 match 8 offset 0 length 240: 2
Feb  2 16:18:40 lustre1 kernel: Lustre: 
3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 
12345-129.59.197.130@tcp portal 6 match 10 offset 0 length 240: 2

OST (lustre2) /var/log/messages:

Feb  2 16:16:28 lustre2 kernel: Lustre: OBD class driver Build Version: 
1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp,
info@clusterfs.com
Feb  2 16:16:28 lustre2 kernel: Lustre: Added LNI 129.59.197.132@tcp [8/256]
Feb  2 16:16:28 lustre2 kernel: Lustre: Accept secure, port 988
Feb  2 16:16:28 lustre2 kernel: Lustre: Filtering OBD driver; 
info@clusterfs.com
Feb  2 16:16:53 lustre2 kernel: Lustre: Binding irq 185 to CPU 0 with 
cmd: echo 1 > /proc/irq/185/smp_affinity
Feb  2 16:16:53 lustre2 kernel: Lustre: 
3528:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 
12345-129.59.197.130@tcp portal 6 match 2 offset 0 length 240: 2
Feb  2 16:17:18 lustre2 kernel: Lustre: 
3528:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 
12345-129.59.197.130@tcp portal 6 match 5 offset 0 length 240: 2
Feb  2 16:17:43 lustre2 kernel: Lustre: 
3528:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 
12345-129.59.197.130@tcp portal 6 match 7 offset 0 length 240: 2

Client (scnode01) /var/log/messages:

Feb  2 16:17:10 scnode01 kernel: LustreError: 
19745:0:(client.c:576:ptlrpc_check_status()) @@@ type == 
PTL_RPC_MSG_ERR, err == -19 req@f7f07600 x27/t0 
o38->mds-test@129.59.197.130@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19
Feb  2 16:17:10 scnode01 kernel: LustreError: mdc_dev: The configuration 
''client'' could not be read from the MDS
''mds-test''.  This may be the
result of communication errors between the client and the MDS, or if the 
MDS is not running.
Feb  2 16:17:10 scnode01 kernel: LustreError: 
19742:0:(llite_lib.c:936:lustre_fill_super()) Unable to process log: client

config.sh:

#!/bin/sh
# config.sh
#
rm -f config.xml
#
# Create nodes
# Trying to get this to work with 1 MDS, 2 OST''s, and 1 client.  Will 
add the
# others when I get this working. - klb, 2/2/07
#
lmc -m config.xml --add node --node lustrem
lmc -m config.xml --add node --node lustre1
lmc -m config.xml --add node --node lustre2
lmc -m config.xml --add node --node client
#
# Configure networking
#
lmc -m config.xml --add net --node lustrem --nid 129.59.197.130@tcp 
--nettype lnet
lmc -m config.xml --add net --node lustre1 --nid 129.59.197.131@tcp 
--nettype lnet
lmc -m config.xml --add net --node lustre2 --nid 129.59.197.132@tcp 
--nettype lnet
lmc -m config.xml --add net --node client --nid ''*'' --nettype
lnet
#lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp
#lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp
#lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp
#lmc -m config.xml --add net --node client --nid ''*'' --nettype
tcp
#
# Configure MDS
#
lmc -m config.xml --add mds --node lustrem --mds mds-test --fstype 
ldiskfs --dev /tmp/mds-test --size 50000
#
# Configure OSTs - testing with 2 initially - klb, 2/1/2007
#
lmc -m config.xml --add lov --lov lov-test --mds mds-test --stripe_sz 
1048576 --stripe_cnt 0 --stripe_pattern 0
lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost 
ost1-test --fstype ldiskfs --dev /dev/sda1
lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost 
ost2-test --fstype ldiskfs --dev /dev/sda1
#
# Configure client (this is a ''generic'' client used for all
client mounts)
# testing with 1 client initially - klb, 2/1/2007
#
lmc -m config.xml --add mtpt --node client --path /mnt/lustre --mds 
mds-test --lov lov-test
#
# Copy config.xml to all the other nodes in the cluster - klb, 2/1/07
#
for i in `seq 1 4`
  do
    echo "Copying config.xml to OST lustre$i..."
    rcp -p config.xml root@lustre$i:~/lustre
done

for i in `seq -w 1 14`
  do
    echo "Copying config.xml to client scnode$i..."
    rcp -p config.xml root@scnode$i:~/lustre
done


Andreas Dilger wrote:> On Feb 02, 2007  13:16 -0600, Kevin L. Buterbaugh wrote:
>   
>> Sorry, meant to include that.  Here''s the relevant information
from the
>> client (scnode01):
>>
>> Feb  2 12:48:15 scnode01 kernel: LustreError: 
>> 16536:0:(client.c:576:ptlrpc_check_status()) @@@ type == 
>> PTL_RPC_MSG_ERR, err == -19 req@f70d8a00 x13/t0 
>> o38->mds-test@129.59.197.130@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0
rc 0/-19
>> Feb  2 12:48:15 scnode01 kernel: LustreError: mdc_dev: The
configuration
>> ''client'' could not be read from the MDS
''mds-test''.  This may be the
>> result of communication errors between the client and the MDS, or if
the
>> MDS is not running.
>>     
>
> Client couldn''t connect to the MDS.  -19 = -ENODEV
>
>   
>> And from the MDS (lustrem):
>>
>> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
>> 1170442057, 5s ago) req@000001000173c400 x1/t0 
>> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc
0/0
>> Feb  2 12:48:07 lustrem kernel: LustreError: 
>> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 
>> 1170442082, 5s ago) req@000001007e1c2400 x4/t0 
>> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc
0/0
>> Feb  2 12:48:07 lustrem kernel: LustreError: 
>>     
>
> These messages indicate failure to connect to the OSTs (op 8 =
OST_CONNECT).
> What is in the OST syslog?  Are you positive that /dev/sda1 and /dev/sdb1
> on the two nodes are set up the same way, so that e.g. lustre1+sda1
isn''t
> talking to the same disk as lustre2+sdb1?
>
> Also minor nit - you don''t need to have a partition table, it can
hurt
> performance on some RAID setups because of the 512-byte offset of IOs
> due to the DOS partition table.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>
>   
-- 

Kevin L. Buterbaugh
Advanced Computing Center for Research & Education - Vanderbilt University
www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070202/4c375dd9/attachment-0001.html

Kevin L. Buterbaugh

2007-Feb-02 15:44 UTC

head link

[Lustre-discuss] Starting Lustre for the first time...

Hi Nathan,

Hit reply instead of reply all - sorry.  I did the test from each of the 
4 nodes to the other 3 and they all evidently worked:

[root@lustrem lustre]# lctl ping 129.59.197.101@tcp
12345-0@lo
12345-129.59.197.101@tcp
[root@lustrem lustre]# set -o vi
[root@lustrem lustre]# lctl ping 129.59.197.131@tcp
12345-0@lo
12345-129.59.197.131@tcp
[root@lustrem lustre]# lctl ping 129.59.197.132@tcp
12345-0@lo
12345-129.59.197.132@tcp
[root@lustrem lustre]#

[root@lustre1 lustre]# lctl ping 129.59.197.101@tcp
12345-0@lo
12345-129.59.197.101@tcp
[root@lustre1 lustre]# set -o vi
[root@lustre1 lustre]# lctl ping 129.59.197.130@tcp
12345-0@lo
12345-129.59.197.130@tcp
[root@lustre1 lustre]# lctl ping 129.59.197.132@tcp
12345-0@lo
12345-129.59.197.132@tcp
[root@lustre1 lustre]#

[root@lustre2 lustre]# lctl list_nids
129.59.197.132@tcp
[root@lustre2 lustre]# lctl ping 129.59.197.101@tcp
12345-0@lo
12345-129.59.197.101@tcp
[root@lustre2 lustre]# set -o vi
[root@lustre2 lustre]# lctl ping 129.59.197.130@tcp
12345-0@lo
12345-129.59.197.130@tcp
[root@lustre2 lustre]# lctl ping 129.59.197.131@tcp
12345-0@lo
12345-129.59.197.131@tcp
[root@lustre2 lustre]#

[root@scnode01 ~]# lctl ping 129.59.197.130@tcp
12345-0@lo
12345-129.59.197.130@tcp
[root@scnode01 ~]# lctl ping 129.59.197.131@tcp
12345-0@lo
12345-129.59.197.131@tcp
[root@scnode01 ~]# lctl ping 129.59.197.132@tcp
12345-0@lo
12345-129.59.197.132@tcp
[root@scnode01 ~]#

I included the relevant portions of /var/log/messages in my other reply 
to the list.  Please let me know if there''s any other information you 
need.  Thanks...

Kevin

Nathaniel Rutman wrote:>>
>>
> Yes, that''s good.  It means that the nodes can talk to each other 
> through LNET using those nids.
> You need to do that between the OST and and MDT also, and between 
> client and OST just for good measure.
> If they all can see each other, then we''ll have to see the OST
syslog
> to see why it''s refusing to talk to the MDT.
>
> PS please keep the discussion on the list -- it might be useful for 
> the next guy.
>
-- 

Kevin L. Buterbaugh
Advanced Computing Center for Research & Education - Vanderbilt University
www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu

Kevin L. Buterbaugh

2007-Feb-02 16:18 UTC

head link

[Lustre-discuss] Starting Lustre for the first time...

All,

OK, I think I''m getting somewhere.  I noticed that the activity light
on
my actual storage array was going crazy.  Does Lustre have to do some 
sort of filesystem creation that would take quite a while for a pair of 
1.25 TB LUNs?

I also did a test.  Instead of specifying my OST''s as the actual LUNs
on
the storage arrays, instead I specified them as:

lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost 
ost1-test --fstype ldiskfs --dev /tmp/ost1-test --size 100000
lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost 
ost2-test --fstype ldiskfs --dev /tmp/ost2-test --size 100000

I reran config.sh and lconf on the OST''s and MDS and voila!  I can
mount
the filesystem successfully!

[root@scnode01 ~]# mount -v -t lustre lustrem:/mds-test/client /mnt/lustre
verbose: 1
arg[0] = /sbin/mount.lustre
arg[1] = lustrem:/mds-test/client
arg[2] = /mnt/lustre
arg[3] = -v
arg[4] = -o
arg[5] = rw
mds nid 0:       129.59.197.130@tcp
mds name:        mds-test
profile:         client
options:         rw
retry:           0
[root@scnode01 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda2              35G  4.5G   29G  14% /
/dev/hda1             487M   17M  446M   4% /boot
none                  506M     0  506M   0% /dev/shm
lustrem:/mds-test/client
                      184M  8.5M  165M   5% /mnt/lustre
[root@scnode01 ~]#

-- 

Kevin L. Buterbaugh
Advanced Computing Center for Research & Education - Vanderbilt University
www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu

Andreas Dilger

2007-Feb-04 15:43 UTC

head link

[Lustre-discuss] Starting Lustre for the first time...

On Feb 02, 2007  17:17 -0600, Kevin L. Buterbaugh wrote:> OK, I think I''m getting somewhere.  I noticed that the activity
light on
> my actual storage array was going crazy.  Does Lustre have to do some 
> sort of filesystem creation that would take quite a while for a pair of 
> 1.25 TB LUNs?
Well, it does have to format them, but that currently happens at mkfs time
and not at first mount.  I''m not sure of the actual format times, but
they
should be in the neighbourhood of 5 minutes or so.
> I also did a test.  Instead of specifying my OST''s as the actual
LUNs on
> the storage arrays, instead I specified them as:
> 
> lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost 
> ost1-test --fstype ldiskfs --dev /tmp/ost1-test --size 100000
> lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost 
> ost2-test --fstype ldiskfs --dev /tmp/ost2-test --size 100000
This means you are putting your OSTs into the root filesystem and not
onto your storage arrays.  This is fine for basic sanity testing but
is not desirable for production use.
> I reran config.sh and lconf on the OST''s and MDS and voila!  I can
mount
> the filesystem successfully!
This would imply something wrong with your storage arrays.

You didn''t include the OST syslogs in the first message, do you still
have those from your first attempt?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Lustre discuss - Feb 2007 - Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...

[Lustre-discuss] Starting Lustre for the first time...