Kevin L. Buterbaugh
2007-Feb-02 11:54 UTC
[Lustre-discuss] Starting Lustre for the first time...
All, Apologies in advance for the (long) newbie question, but I''ve Googled this and the hits I''ve gotten haven''t helped resolve it (seems other people have had this same problem, but what was suggested to them I''m already doing (I think)). If I''ve missed a URL where this is answered / explained, please feel free to point me in that direction... I''m trying to get Lustre 1.4.8 going on a test cluster. I installed the software from the pre-packaged rpm''s and rebooted. All my nodes show "uname -a" output similar to the following: Linux lustrem 2.6.9-42.0.3.EL_lustre.1.4.8smp #1 SMP Tue Dec 19 09:07:46 MST 2006 x86_64 x86_64 x86_64 GNU/Linux My cluster consists of 5 dual-processor Opterons and 14 dual-processor P4''s (you can mix 32-bit / 64-bit as long as you install the right rpm''s, can''t you?). One of the Opterons is my MDS (hostname: lustrem), the other four are my OSD''s (hostnames: lustre1 - 4). I have two dual-controller FC storage arrays. Both controllers in the first storage array are connected to two of my OSD''s (lustre1 / 2) and the 2nd storage array and lustre3 / 4 are connected identically. I have 2 RAID 5 LUNs defined on each of the storage arrays. lustre1 / 2 can both see both of the RAID 5 LUNs as the following shows: [root@lustre1 ~]# fdisk -l Disk /dev/hda: 41.1 GB, 41110142976 bytes 255 heads, 63 sectors/track, 4998 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 1 261 2096451 82 Linux swap /dev/hda2 * 262 4998 38049952+ 83 Linux Disk /dev/sda: 1253.6 GB, 1253635522560 bytes 255 heads, 63 sectors/track, 152412 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 152412 1224249358+ 83 Linux Disk /dev/sdb: 1253.6 GB, 1253635522560 bytes 255 heads, 63 sectors/track, 152412 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 152412 1224249358+ 83 Linux [root@lustre1 ~]# [root@lustre2 ~]# fdisk -l Disk /dev/hda: 41.1 GB, 41110142976 bytes 255 heads, 63 sectors/track, 4998 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 1 261 2096451 82 Linux swap /dev/hda2 * 262 4998 38049952+ 83 Linux Disk /dev/sda: 1253.6 GB, 1253635522560 bytes 255 heads, 63 sectors/track, 152412 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 152412 1224249358+ 83 Linux Disk /dev/sdb: 1253.6 GB, 1253635522560 bytes 255 heads, 63 sectors/track, 152412 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 152412 1224249358+ 83 Linux [root@lustre2 ~]# To try to simplify things, I''m trying to start out only using my MDS (lustrem), 2 of my OSD''s (lustre1 / 2), and one of my clients (scnode01). Here''s my config.sh: #!/bin/sh # config.sh # rm -f config.xml # # Create nodes # Trying to get this to work with 1 MDS, 2 OST''s, and 1 client. Will add the # others when I get this working. - klb, 2/2/07 # lmc -m config.xml --add node --node lustrem lmc -m config.xml --add node --node lustre1 lmc -m config.xml --add node --node lustre2 lmc -m config.xml --add node --node client # # Configure networking # lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp lmc -m config.xml --add net --node client --nid ''*'' --nettype tcp # # Configure MDS # lmc -m config.xml --add mds --node lustrem --mds mds-test --fstype ldiskfs --dev /tmp/mds-test --size 50000 # # Configure OSTs - testing with 2 initially - klb, 2/1/2007 # lmc -m config.xml --add lov --lov lov-test --mds mds-test --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0 lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost ost1-test --fstype ldiskfs --dev /dev/sda1 lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost ost2-test --fstype ldiskfs --dev /dev/sdb1 # # Configure client (this is a ''generic'' client used for all client mounts) # testing with 1 client initially - klb, 2/1/2007 # lmc -m config.xml --add mtpt --node client --path /mnt/lustre --mds mds-test --lov lov-test # # Copy config.xml to all the other nodes in the cluster - klb, 2/1/07 # for i in `seq 1 4` do echo "Copying config.xml to OST lustre$i..." rcp -p config.xml root@lustre$i:~/lustre done for i in `seq -w 1 14` do echo "Copying config.xml to client scnode$i..." rcp -p config.xml root@scnode$i:~/lustre done After running this script, I logged in to lustre1 and executed "lconf --reformat --node lustre1 config.xml", which produces the following output: loading module: libcfs srcdir None devdir libcfs loading module: lnet srcdir None devdir lnet loading module: ksocklnd srcdir None devdir klnds/socklnd loading module: lvfs srcdir None devdir lvfs loading module: obdclass srcdir None devdir obdclass loading module: ptlrpc srcdir None devdir ptlrpc loading module: ost srcdir None devdir ost loading module: ldiskfs srcdir None devdir ldiskfs loading module: fsfilt_ldiskfs srcdir None devdir lvfs loading module: obdfilter srcdir None devdir obdfilter NETWORK: NET_lustre1_tcp NET_lustre1_tcp_UUID tcp lustre1 OSD: ost1-test ost1-test_UUID obdfilter /dev/sda1 0 ldiskfs no 0 256 And running "lconf --reformat --node lustre2 config.xml" on lustre2 produces the following output: loading module: libcfs srcdir None devdir libcfs loading module: lnet srcdir None devdir lnet loading module: ksocklnd srcdir None devdir klnds/socklnd loading module: lvfs srcdir None devdir lvfs loading module: obdclass srcdir None devdir obdclass loading module: ptlrpc srcdir None devdir ptlrpc loading module: ost srcdir None devdir ost loading module: ldiskfs srcdir None devdir ldiskfs loading module: fsfilt_ldiskfs srcdir None devdir lvfs loading module: obdfilter srcdir None devdir obdfilter NETWORK: NET_lustre2_tcp NET_lustre2_tcp_UUID tcp lustre2 OSD: ost2-test ost2-test_UUID obdfilter /dev/sdb1 0 ldiskfs no 0 256 Next, I logged in to lustrem and executed "lconf --reformat --node lustrem config.xml" and see the following: loading module: libcfs srcdir None devdir libcfs loading module: lnet srcdir None devdir lnet loading module: ksocklnd srcdir None devdir klnds/socklnd loading module: lvfs srcdir None devdir lvfs loading module: obdclass srcdir None devdir obdclass loading module: ptlrpc srcdir None devdir ptlrpc loading module: mdc srcdir None devdir mdc loading module: osc srcdir None devdir osc loading module: lov srcdir None devdir lov loading module: mds srcdir None devdir mds loading module: ldiskfs srcdir None devdir ldiskfs loading module: fsfilt_ldiskfs srcdir None devdir lvfs NETWORK: NET_lustrem_tcp NET_lustrem_tcp_UUID tcp lustrem MDSDEV: mds-test mds-test_UUID /tmp/mds-test ldiskfs no recording clients for filesystem: FS_fsname_UUID Recording log mds-test on mds-test LOV: lov_mds-test 950ad_lov_mds-test_189f832962 mds-test_UUID 0 1048576 0 0 [u''ost1-test_UUID'', u''ost2-test_UUID''] mds-test OSC: OSC_lustrem_ost1-test_mds-test 950ad_lov_mds-test_189f832962 ost1-test_UUID OSC: OSC_lustrem_ost2-test_mds-test 950ad_lov_mds-test_189f832962 ost2-test_UUID End recording log mds-test on mds-test Recording log client on mds-test MDSDEV: mds-test mds-test_UUID /tmp/mds-test ldiskfs 50000 no MDS mount options: errors=remount-ro But when I log on to scnode01 and execute "mount -t lustre lustrem:/mds-test/client /mnt/lustre", I get the following error: mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed: Input/output error mds nid 0: 129.59.197.130@tcp mds name: mds-test profile: client options: rw retry: 0 One other thing I''ve tried is instead of calling my client "client" in config.sh and the mount command, I''ve used the actual hostname (scnode01) instead. That didn''t help. Again, I apologize for both the length of this post and the newbie question, but I can''t seem to figure this out on my own and I''ve got a deadline looming. Any and all help (and even flames, as long as you answer my question or point me in the right direction!) is appreciated... -- Kevin L. Buterbaugh Advanced Computing Center for Research & Education - Vanderbilt University www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu
Hello! On Fri, Feb 02, 2007 at 12:53:42PM -0600, Kevin L. Buterbaugh wrote:> But when I log on to scnode01 and execute "mount -t lustre > lustrem:/mds-test/client /mnt/lustre", I get the following error: > mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed: > Input/output error > mds nid 0: 129.59.197.130@tcp > mds name: mds-test > profile: client > options: rw > retry: 0I am sure some messages appeared in /var/log/messages on client and on MDS and perhaps you can share those with us? Bye, Oleg
Kevin L. Buterbaugh
2007-Feb-02 12:16 UTC
[Lustre-discuss] Starting Lustre for the first time...
Oleg, Sorry, meant to include that. Here''s the relevant information from the client (scnode01): Feb 2 12:48:15 scnode01 kernel: LustreError: 16536:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -19 req@f70d8a00 x13/t0 o38->mds-test@129.59.197.130@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 Feb 2 12:48:15 scnode01 kernel: LustreError: mdc_dev: The configuration ''client'' could not be read from the MDS ''mds-test''. This may be the result of communication errors between the client and the MDS, or if the MDS is not running. Feb 2 12:48:15 scnode01 kernel: LustreError: 16533:0:(llite_lib.c:936:lustre_fill_super()) Unable to process log: client And from the MDS (lustrem): Feb 2 12:47:34 lustrem kernel: Lustre: OBD class driver Build Version: 1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp, info@clusterfs.com Feb 2 12:47:34 lustrem kernel: Lustre: Added LNI 129.59.197.130@tcp [8/256] Feb 2 12:47:34 lustrem kernel: Lustre: Accept secure, port 988 Feb 2 12:47:35 lustrem kernel: loop: loaded (max 8 devices) Feb 2 12:47:36 lustrem kernel: kjournald starting. Commit interval 5 seconds Feb 2 12:47:36 lustrem kernel: LDISKFS FS on loop0, internal journal Feb 2 12:47:36 lustrem kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Feb 2 12:47:36 lustrem kernel: Lustre: 3530:0:(mds_fs.c:239:mds_init_server_data()) mds-test: initializing new last_rcvd Feb 2 12:47:36 lustrem kernel: Lustre: MDT mds-test now serving /dev/loop0 (9003a2e8-45d6-49bd-ad28-0f1e37bb1cab) with recovery enabled Feb 2 12:47:37 lustrem kernel: Lustre: MDT mds-test has stopped. Feb 2 12:47:37 lustrem kernel: kjournald starting. Commit interval 5 seconds Feb 2 12:47:37 lustrem kernel: LDISKFS FS on loop0, internal journal Feb 2 12:47:37 lustrem kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Feb 2 12:47:37 lustrem kernel: Lustre: Binding irq 185 to CPU 0 with cmd: echo 1 > /proc/irq/185/smp_affinity Feb 2 12:47:42 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442057, 5s ago) req@000001000173c400 x1/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:48:07 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442082, 5s ago) req@000001007e1c2400 x4/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:48:07 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:48:31 lustrem kernel: LustreError: 3692:0:(ldlm_lib.c:541:target_handle_connect()) @@@ UUID ''mds-test'' is not available for connect (not set up) req@000001007e1dec00 x13/t0 o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc 0/0 Feb 2 12:48:31 lustrem kernel: LustreError: 3692:0:(ldlm_lib.c:1288:target_send_reply_msg()) @@@ processing error (-19) req@000001007e1dec00 x13/t0 o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc -19/0 Feb 2 12:48:32 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442107, 5s ago) req@000001007e1a8600 x6/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:48:32 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:48:57 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442132, 5s ago) req@0000010037c5fe00 x8/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:48:57 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:49:22 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442157, 5s ago) req@000001007d544e00 x10/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:49:22 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:49:47 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442182, 5s ago) req@000001007e3aa400 x12/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:49:47 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:50:12 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442207, 5s ago) req@000001007e3ab600 x14/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:50:12 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:50:37 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442232, 5s ago) req@000001007e3a9800 x16/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:50:37 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:51:02 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442257, 5s ago) req@000001007e12ce00 x18/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:51:02 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:51:27 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442282, 5s ago) req@000001007aaad200 x20/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:51:27 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:51:52 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442307, 5s ago) req@00000100765f0e00 x22/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:51:52 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:52:17 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442332, 5s ago) req@000001007e1a8c00 x24/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:52:17 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:52:42 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442357, 5s ago) req@0000010037c46400 x26/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:52:42 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:53:07 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442382, 5s ago) req@0000010037c4bc00 x28/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:53:07 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:53:32 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442407, 5s ago) req@000001007d544a00 x30/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:53:32 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:53:57 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442432, 5s ago) req@000001007d63cc00 x32/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:53:57 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:54:22 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442457, 5s ago) req@000001000169d400 x34/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:54:22 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 2 12:55:12 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442507, 5s ago) req@000001007e141400 x38/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:55:12 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Feb 2 12:56:27 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442582, 5s ago) req@0000010076980c00 x44/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:56:27 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 2 12:58:57 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442732, 5s ago) req@000001000169d200 x56/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:58:57 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Feb 2 13:03:32 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170443007, 5s ago) req@000001007e2f7800 x78/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 13:03:32 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 21 previous similar messages Feb 2 13:12:17 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170443532, 5s ago) req@0000010037c62800 x120/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 13:12:17 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) Skipped 41 previous similar messages Thanks... Oleg Drokin wrote:> Hello! > > On Fri, Feb 02, 2007 at 12:53:42PM -0600, Kevin L. Buterbaugh wrote: > >> But when I log on to scnode01 and execute "mount -t lustre >> lustrem:/mds-test/client /mnt/lustre", I get the following error: >> mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed: >> Input/output error >> mds nid 0: 129.59.197.130@tcp >> mds name: mds-test >> profile: client >> options: rw >> retry: 0 >> > > I am sure some messages appeared in /var/log/messages on client and on MDS > and perhaps you can share those with us? > > Bye, > Oleg >-- Kevin L. Buterbaugh Advanced Computing Center for Research & Education - Vanderbilt University www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070202/461752db/attachment.html
Andreas Dilger
2007-Feb-02 13:32 UTC
[Lustre-discuss] Starting Lustre for the first time...
On Feb 02, 2007 13:16 -0600, Kevin L. Buterbaugh wrote:> Sorry, meant to include that. Here''s the relevant information from the > client (scnode01): > > Feb 2 12:48:15 scnode01 kernel: LustreError: > 16536:0:(client.c:576:ptlrpc_check_status()) @@@ type == > PTL_RPC_MSG_ERR, err == -19 req@f70d8a00 x13/t0 > o38->mds-test@129.59.197.130@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 > Feb 2 12:48:15 scnode01 kernel: LustreError: mdc_dev: The configuration > ''client'' could not be read from the MDS ''mds-test''. This may be the > result of communication errors between the client and the MDS, or if the > MDS is not running.Client couldn''t connect to the MDS. -19 = -ENODEV> And from the MDS (lustrem): > > 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1170442057, 5s ago) req@000001000173c400 x1/t0 > o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 > Feb 2 12:48:07 lustrem kernel: LustreError: > 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1170442082, 5s ago) req@000001007e1c2400 x4/t0 > o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 > Feb 2 12:48:07 lustrem kernel: LustreError:These messages indicate failure to connect to the OSTs (op 8 = OST_CONNECT). What is in the OST syslog? Are you positive that /dev/sda1 and /dev/sdb1 on the two nodes are set up the same way, so that e.g. lustre1+sda1 isn''t talking to the same disk as lustre2+sdb1? Also minor nit - you don''t need to have a partition table, it can hurt performance on some RAID setups because of the 512-byte offset of IOs due to the DOS partition table. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Nathaniel Rutman
2007-Feb-02 13:46 UTC
[Lustre-discuss] Starting Lustre for the first time...
Kevin L. Buterbaugh wrote:> # > # Configure networking > # > lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp > lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp > lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp> And from the MDS (lustrem): > > 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1170442057, 5s ago) req@000001000173c400 x1/t0 > o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 > Feb 2 12:48:07 lustrem kernel: LustreError: > 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1170442082, 5s ago) req@000001007e1c2400 x4/t0 > o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 > Feb 2 12:48:07 lustrem kernel: LustreError: >> These messages indicate failure to connect to the OSTs (op 8 = OST_CONNECT).There''s always a possibility that one or more of your nodes doesn''t resolve the hostname properly. In general, I recommend using the actual IP address: lmc -m config.xml --add net --node lustrem --nid 192.168.0.1@tcp --nettype lnet Also, check that every node can ping every other. On each node: modprobe lnet lctl network up lctl list_nids Then on each node: lctl ping <nids from other nodes> lctl network down (so you''ll be able to remove the module)
Nathaniel Rutman
2007-Feb-02 15:28 UTC
[Lustre-discuss] Starting Lustre for the first time...
Kevin L. Buterbaugh wrote:> Nathan, > > I made the changes you suggested to my config.sh and regenerated the > config.xml. I did the test you suggested on the MDS and the client > and here''s what I get: > > [root@lustrem lustre]# modprobe lnet > [root@lustrem lustre]# lctl network up > LNET configured > [root@lustrem lustre]# lctl list_nids > 129.59.197.130@tcp > [root@lustrem lustre]# lctl ping 129.59.197.101@tcp > 12345-0@lo > 12345-129.59.197.101@tcp > [root@lustrem lustre]# > > [root@scnode01 ~]# modprobe lnet > [root@scnode01 ~]# lctl network up > LNET configured > [root@scnode01 ~]# lctl list_nids > 129.59.197.101@tcp > [root@scnode01 ~]# lctl ping 129.59.197.130@tcp > 12345-0@lo > 12345-129.59.197.130@tcp > [root@scnode01 ~]# > > > That''s not the kind of output I expect from a ping, so I don''t even > know if that means it worked or not... >Yes, that''s good. It means that the nodes can talk to each other through LNET using those nids. You need to do that between the OST and and MDT also, and between client and OST just for good measure. If they all can see each other, then we''ll have to see the OST syslog to see why it''s refusing to talk to the MDT. PS please keep the discussion on the list -- it might be useful for the next guy.> Kevin > > Nathaniel Rutman wrote: >> Kevin L. Buterbaugh wrote: >>> # >>> # Configure networking >>> # >>> lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp >>> lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp >>> lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp >> >>> And from the MDS (lustrem): >>> >>> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent >>> at 1170442057, 5s ago) req@000001000173c400 x1/t0 >>> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 >>> Feb 2 12:48:07 lustrem kernel: LustreError: >>> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent >>> at 1170442082, 5s ago) req@000001007e1c2400 x4/t0 >>> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 >>> Feb 2 12:48:07 lustrem kernel: LustreError: >> >>> These messages indicate failure to connect to the OSTs (op 8 = >>> OST_CONNECT). >> >> There''s always a possibility that one or more of your nodes doesn''t >> resolve the hostname properly. In general, I recommend using the >> actual IP address: >> lmc -m config.xml --add net --node lustrem --nid 192.168.0.1@tcp >> --nettype lnet >> >> Also, check that every node can ping every other. On each node: >> modprobe lnet >> lctl network up >> lctl list_nids >> Then on each node: >> lctl ping <nids from other nodes> >> lctl network down (so you''ll be able to remove the module) >> >
Kevin L. Buterbaugh
2007-Feb-02 15:36 UTC
[Lustre-discuss] Starting Lustre for the first time...
Andreas, I made the changes Nathan suggested to how the networking was set up in config.sh. I also checked the LUNs and you were correct: sda on lustre1 is sdb on lustre2 and vice versa. So I also changed config.sh to use sda1 for both. However, I still get the exact same error when I try to mount the client (and yes, it''s still the ENODEV, but why?): [root@scnode01 ~]# mount -v -t lustre lustrem:/mds-test/client /mnt/lustre verbose: 1 arg[0] = /sbin/mount.lustre arg[1] = lustrem:/mds-test/client arg[2] = /mnt/lustre arg[3] = -v arg[4] = -o arg[5] = rw mds nid 0: 129.59.197.130@tcp mds name: mds-test profile: client options: rw retry: 0 mount.lustre: mount(lustrem:/mds-test/client, /mnt/lustre) failed: Input/output error mds nid 0: 129.59.197.130@tcp mds name: mds-test profile: client options: rw retry: 0 [root@scnode01 ~]# MDS (lustre1) /var/log/messages: Feb 2 16:17:18 lustrem kernel: Lustre: OBD class driver Build Version: 1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp, info@clusterfs.com Feb 2 16:17:19 lustrem kernel: Lustre: Added LNI 129.59.197.130@tcp [8/256] Feb 2 16:17:19 lustrem kernel: Lustre: Accept secure, port 988 Feb 2 16:17:19 lustrem kernel: loop: loaded (max 8 devices) Feb 2 16:17:21 lustrem kernel: kjournald starting. Commit interval 5 seconds Feb 2 16:17:21 lustrem kernel: LDISKFS FS on loop0, internal journal Feb 2 16:17:21 lustrem kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Feb 2 16:17:21 lustrem kernel: Lustre: 3518:0:(mds_fs.c:239:mds_init_server_data()) mds-test: initializing new last_rcvd Feb 2 16:17:21 lustrem kernel: Lustre: MDT mds-test now serving /dev/loop0 (b505d8f0-d424-4bf8-a8cd-8bfa8af0cf36) with recovery enabled Feb 2 16:17:21 lustrem kernel: Lustre: MDT mds-test has stopped. Feb 2 16:17:22 lustrem kernel: kjournald starting. Commit interval 5 seconds Feb 2 16:17:22 lustrem kernel: LDISKFS FS on loop0, internal journal Feb 2 16:17:22 lustrem kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Feb 2 16:17:22 lustrem kernel: Lustre: Binding irq 185 to CPU 0 with cmd: echo 1 > /proc/irq/185/smp_affinity Feb 2 16:17:27 lustrem kernel: LustreError: 3882:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170454642, 5s ago) req@000001007d563800 x1/t0 o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 16:17:28 lustrem kernel: LustreError: 3680:0:(ldlm_lib.c:541:target_handle_connect()) @@@ UUID ''mds-test'' is not available for connect (not set up) req@000001007d563400 x27/t0 o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc 0/0 Feb 2 16:17:28 lustrem kernel: LustreError: 3680:0:(ldlm_lib.c:1288:target_send_reply_msg()) @@@ processing error (-19) req@000001007d563400 x27/t0 o38-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc -19/0 OST (lustre1) /var/log/messages: Feb 2 16:16:30 lustre1 kernel: Lustre: OBD class driver Build Version: 1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp, info@clusterfs.com Feb 2 16:16:30 lustre1 kernel: Lustre: Added LNI 129.59.197.131@tcp [8/256] Feb 2 16:16:30 lustre1 kernel: Lustre: Accept secure, port 988 Feb 2 16:16:31 lustre1 kernel: Lustre: Filtering OBD driver; info@clusterfs.com Feb 2 16:17:00 lustre1 kernel: Lustre: Binding irq 185 to CPU 0 with cmd: echo 1 > /proc/irq/185/smp_affinity Feb 2 16:17:00 lustre1 kernel: Lustre: 3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 12345-129.59.197.130@tcp portal 6 match 1 offset 0 length 240: 2 Feb 2 16:17:25 lustre1 kernel: Lustre: 3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 12345-129.59.197.130@tcp portal 6 match 4 offset 0 length 240: 2 Feb 2 16:17:50 lustre1 kernel: Lustre: 3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 12345-129.59.197.130@tcp portal 6 match 6 offset 0 length 240: 2 Feb 2 16:18:15 lustre1 kernel: Lustre: 3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 12345-129.59.197.130@tcp portal 6 match 8 offset 0 length 240: 2 Feb 2 16:18:40 lustre1 kernel: Lustre: 3521:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 12345-129.59.197.130@tcp portal 6 match 10 offset 0 length 240: 2 OST (lustre2) /var/log/messages: Feb 2 16:16:28 lustre2 kernel: Lustre: OBD class driver Build Version: 1.4.8-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_8_RC8-2.6-rhel4-x86_64.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.3.EL_lustre.1.4.8smp, info@clusterfs.com Feb 2 16:16:28 lustre2 kernel: Lustre: Added LNI 129.59.197.132@tcp [8/256] Feb 2 16:16:28 lustre2 kernel: Lustre: Accept secure, port 988 Feb 2 16:16:28 lustre2 kernel: Lustre: Filtering OBD driver; info@clusterfs.com Feb 2 16:16:53 lustre2 kernel: Lustre: Binding irq 185 to CPU 0 with cmd: echo 1 > /proc/irq/185/smp_affinity Feb 2 16:16:53 lustre2 kernel: Lustre: 3528:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 12345-129.59.197.130@tcp portal 6 match 2 offset 0 length 240: 2 Feb 2 16:17:18 lustre2 kernel: Lustre: 3528:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 12345-129.59.197.130@tcp portal 6 match 5 offset 0 length 240: 2 Feb 2 16:17:43 lustre2 kernel: Lustre: 3528:0:(lib-move.c:1627:lnet_parse_put()) Dropping PUT from 12345-129.59.197.130@tcp portal 6 match 7 offset 0 length 240: 2 Client (scnode01) /var/log/messages: Feb 2 16:17:10 scnode01 kernel: LustreError: 19745:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -19 req@f7f07600 x27/t0 o38->mds-test@129.59.197.130@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 Feb 2 16:17:10 scnode01 kernel: LustreError: mdc_dev: The configuration ''client'' could not be read from the MDS ''mds-test''. This may be the result of communication errors between the client and the MDS, or if the MDS is not running. Feb 2 16:17:10 scnode01 kernel: LustreError: 19742:0:(llite_lib.c:936:lustre_fill_super()) Unable to process log: client config.sh: #!/bin/sh # config.sh # rm -f config.xml # # Create nodes # Trying to get this to work with 1 MDS, 2 OST''s, and 1 client. Will add the # others when I get this working. - klb, 2/2/07 # lmc -m config.xml --add node --node lustrem lmc -m config.xml --add node --node lustre1 lmc -m config.xml --add node --node lustre2 lmc -m config.xml --add node --node client # # Configure networking # lmc -m config.xml --add net --node lustrem --nid 129.59.197.130@tcp --nettype lnet lmc -m config.xml --add net --node lustre1 --nid 129.59.197.131@tcp --nettype lnet lmc -m config.xml --add net --node lustre2 --nid 129.59.197.132@tcp --nettype lnet lmc -m config.xml --add net --node client --nid ''*'' --nettype lnet #lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp #lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp #lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp #lmc -m config.xml --add net --node client --nid ''*'' --nettype tcp # # Configure MDS # lmc -m config.xml --add mds --node lustrem --mds mds-test --fstype ldiskfs --dev /tmp/mds-test --size 50000 # # Configure OSTs - testing with 2 initially - klb, 2/1/2007 # lmc -m config.xml --add lov --lov lov-test --mds mds-test --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0 lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost ost1-test --fstype ldiskfs --dev /dev/sda1 lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost ost2-test --fstype ldiskfs --dev /dev/sda1 # # Configure client (this is a ''generic'' client used for all client mounts) # testing with 1 client initially - klb, 2/1/2007 # lmc -m config.xml --add mtpt --node client --path /mnt/lustre --mds mds-test --lov lov-test # # Copy config.xml to all the other nodes in the cluster - klb, 2/1/07 # for i in `seq 1 4` do echo "Copying config.xml to OST lustre$i..." rcp -p config.xml root@lustre$i:~/lustre done for i in `seq -w 1 14` do echo "Copying config.xml to client scnode$i..." rcp -p config.xml root@scnode$i:~/lustre done Andreas Dilger wrote:> On Feb 02, 2007 13:16 -0600, Kevin L. Buterbaugh wrote: > >> Sorry, meant to include that. Here''s the relevant information from the >> client (scnode01): >> >> Feb 2 12:48:15 scnode01 kernel: LustreError: >> 16536:0:(client.c:576:ptlrpc_check_status()) @@@ type == >> PTL_RPC_MSG_ERR, err == -19 req@f70d8a00 x13/t0 >> o38->mds-test@129.59.197.130@tcp:12 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 >> Feb 2 12:48:15 scnode01 kernel: LustreError: mdc_dev: The configuration >> ''client'' could not be read from the MDS ''mds-test''. This may be the >> result of communication errors between the client and the MDS, or if the >> MDS is not running. >> > > Client couldn''t connect to the MDS. -19 = -ENODEV > > >> And from the MDS (lustrem): >> >> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at >> 1170442057, 5s ago) req@000001000173c400 x1/t0 >> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 >> Feb 2 12:48:07 lustrem kernel: LustreError: >> 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at >> 1170442082, 5s ago) req@000001007e1c2400 x4/t0 >> o8->ost1-test_UUID@lustre1_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 >> Feb 2 12:48:07 lustrem kernel: LustreError: >> > > These messages indicate failure to connect to the OSTs (op 8 = OST_CONNECT). > What is in the OST syslog? Are you positive that /dev/sda1 and /dev/sdb1 > on the two nodes are set up the same way, so that e.g. lustre1+sda1 isn''t > talking to the same disk as lustre2+sdb1? > > Also minor nit - you don''t need to have a partition table, it can hurt > performance on some RAID setups because of the 512-byte offset of IOs > due to the DOS partition table. > > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > >-- Kevin L. Buterbaugh Advanced Computing Center for Research & Education - Vanderbilt University www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070202/4c375dd9/attachment-0001.html
Kevin L. Buterbaugh
2007-Feb-02 15:44 UTC
[Lustre-discuss] Starting Lustre for the first time...
Hi Nathan, Hit reply instead of reply all - sorry. I did the test from each of the 4 nodes to the other 3 and they all evidently worked: [root@lustrem lustre]# lctl ping 129.59.197.101@tcp 12345-0@lo 12345-129.59.197.101@tcp [root@lustrem lustre]# set -o vi [root@lustrem lustre]# lctl ping 129.59.197.131@tcp 12345-0@lo 12345-129.59.197.131@tcp [root@lustrem lustre]# lctl ping 129.59.197.132@tcp 12345-0@lo 12345-129.59.197.132@tcp [root@lustrem lustre]# [root@lustre1 lustre]# lctl ping 129.59.197.101@tcp 12345-0@lo 12345-129.59.197.101@tcp [root@lustre1 lustre]# set -o vi [root@lustre1 lustre]# lctl ping 129.59.197.130@tcp 12345-0@lo 12345-129.59.197.130@tcp [root@lustre1 lustre]# lctl ping 129.59.197.132@tcp 12345-0@lo 12345-129.59.197.132@tcp [root@lustre1 lustre]# [root@lustre2 lustre]# lctl list_nids 129.59.197.132@tcp [root@lustre2 lustre]# lctl ping 129.59.197.101@tcp 12345-0@lo 12345-129.59.197.101@tcp [root@lustre2 lustre]# set -o vi [root@lustre2 lustre]# lctl ping 129.59.197.130@tcp 12345-0@lo 12345-129.59.197.130@tcp [root@lustre2 lustre]# lctl ping 129.59.197.131@tcp 12345-0@lo 12345-129.59.197.131@tcp [root@lustre2 lustre]# [root@scnode01 ~]# lctl ping 129.59.197.130@tcp 12345-0@lo 12345-129.59.197.130@tcp [root@scnode01 ~]# lctl ping 129.59.197.131@tcp 12345-0@lo 12345-129.59.197.131@tcp [root@scnode01 ~]# lctl ping 129.59.197.132@tcp 12345-0@lo 12345-129.59.197.132@tcp [root@scnode01 ~]# I included the relevant portions of /var/log/messages in my other reply to the list. Please let me know if there''s any other information you need. Thanks... Kevin Nathaniel Rutman wrote:>> >> > Yes, that''s good. It means that the nodes can talk to each other > through LNET using those nids. > You need to do that between the OST and and MDT also, and between > client and OST just for good measure. > If they all can see each other, then we''ll have to see the OST syslog > to see why it''s refusing to talk to the MDT. > > PS please keep the discussion on the list -- it might be useful for > the next guy. >-- Kevin L. Buterbaugh Advanced Computing Center for Research & Education - Vanderbilt University www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu
Kevin L. Buterbaugh
2007-Feb-02 16:18 UTC
[Lustre-discuss] Starting Lustre for the first time...
All, OK, I think I''m getting somewhere. I noticed that the activity light on my actual storage array was going crazy. Does Lustre have to do some sort of filesystem creation that would take quite a while for a pair of 1.25 TB LUNs? I also did a test. Instead of specifying my OST''s as the actual LUNs on the storage arrays, instead I specified them as: lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost ost1-test --fstype ldiskfs --dev /tmp/ost1-test --size 100000 lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost ost2-test --fstype ldiskfs --dev /tmp/ost2-test --size 100000 I reran config.sh and lconf on the OST''s and MDS and voila! I can mount the filesystem successfully! [root@scnode01 ~]# mount -v -t lustre lustrem:/mds-test/client /mnt/lustre verbose: 1 arg[0] = /sbin/mount.lustre arg[1] = lustrem:/mds-test/client arg[2] = /mnt/lustre arg[3] = -v arg[4] = -o arg[5] = rw mds nid 0: 129.59.197.130@tcp mds name: mds-test profile: client options: rw retry: 0 [root@scnode01 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/hda2 35G 4.5G 29G 14% / /dev/hda1 487M 17M 446M 4% /boot none 506M 0 506M 0% /dev/shm lustrem:/mds-test/client 184M 8.5M 165M 5% /mnt/lustre [root@scnode01 ~]# -- Kevin L. Buterbaugh Advanced Computing Center for Research & Education - Vanderbilt University www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu
Andreas Dilger
2007-Feb-04 15:43 UTC
[Lustre-discuss] Starting Lustre for the first time...
On Feb 02, 2007 17:17 -0600, Kevin L. Buterbaugh wrote:> OK, I think I''m getting somewhere. I noticed that the activity light on > my actual storage array was going crazy. Does Lustre have to do some > sort of filesystem creation that would take quite a while for a pair of > 1.25 TB LUNs?Well, it does have to format them, but that currently happens at mkfs time and not at first mount. I''m not sure of the actual format times, but they should be in the neighbourhood of 5 minutes or so.> I also did a test. Instead of specifying my OST''s as the actual LUNs on > the storage arrays, instead I specified them as: > > lmc -m config.xml --add ost --node lustre1 --lov lov-test --ost > ost1-test --fstype ldiskfs --dev /tmp/ost1-test --size 100000 > lmc -m config.xml --add ost --node lustre2 --lov lov-test --ost > ost2-test --fstype ldiskfs --dev /tmp/ost2-test --size 100000This means you are putting your OSTs into the root filesystem and not onto your storage arrays. This is fine for basic sanity testing but is not desirable for production use.> I reran config.sh and lconf on the OST''s and MDS and voila! I can mount > the filesystem successfully!This would imply something wrong with your storage arrays. You didn''t include the OST syslogs in the first message, do you still have those from your first attempt? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.