Alexander, Jack
2008-Aug-28 14:27 UTC
[Lustre-discuss] Lustre_config fails trying to access mgs - mdt and mgs are configured together.
In my lustre 1.6 config, I have two MSA2000 and two DL380G5 servers. The servers sfs1 and sfs2 are the internal network Ethernet names for my two servers. The system interconnect names, ic-sfs1 and ic-sfs2, correspond to the servers. I''ve successfully (I think) run both "lctl ping ic-sfs1 at o2ib" and "lctl ping ic-sfs1 at o2ib" from server sfs1 and sfs2. Does this look correct. How do you read the output from this command? [root at hpcsfse2 ~]# lctl ping 172.31.97.2 at o2ib0 12345-0 at lo 12345-172.31.97.2 at o2ib [root at hpcsfse2 ~]# lctl ping 172.31.97.1 at o2ib0 12345-0 at lo 12345-172.31.97.1 at o2ib This is the .csv file I''m using as input to the lustre_config command. Note that the mdt and mgs components are mounted together. hpcsfse1:root> cat src/scripts/hpcsfse_lustre_config.csv sfs1,options lnet networks=o2ib0,/dev/mapper/mpath0,/mnt/mdt_mgs,mdt|mgs,testfs,,,,,_netdev,ic-sfs2 at o2ib0 sfs2,options lnet networks=o2ib0,/dev/mapper/mpath1,/mnt/ost0,ost,testfs,ic-sfs1 at o2ib0,,,,_netdev,ic-sfs1 at o2ib0 Configuration of sfs1 server seems to be OK. hpcsfse1:root> lustre_config -vfw sfs1 src/scripts/hpcsfse_lustre_config.csv lustre_config: Operating on the following nodes: sfs1 lustre_config: Checking the cluster network connectivity and hostnames... lc_net: Verifying network connectivity between "hpcsfse1.hpclab.usa.hp.com" and "sfs1"... lc_net: OK lustre_config: Check the cluster network connectivity and hostnames OK! lustre_config: ******** Lustre cluster configuration START ******** lustre_config: Explicit MGS target /dev/mapper/mpath0 in host sfs1. lustre_config: Adding lnet module options to sfs1 lustre_config: Starting lnet network in sfs1 lustre_config: Creating the mount point /mnt/mdt_mgs on sfs1 lustre_config: Formatting Lustre target /dev/mapper/mpath0 on sfs1... lustre_config: Formatting command line is: ssh -x -q sfs1 "PATH=$PATH:/sbin:/usr/sbin; /usr/sbin/mkfs.lustre --reformat --mgs --mdt --fsname=testfs --failnode=ic-sfs2 at o2ib0 /dev/mapper/mpath0" lustre_config: Waiting for the return of the remote command... Permanent disk data: Target: testfs-MDTffff Index: unassigned Lustre FS: testfs Mount type: ldiskfs Flags: 0x75 (MDT MGS needs_index first_time update ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: failover.node=172.31.97.2 at o2ib mgsnode=172.31.97.2 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups device size = 1259804MB 2 6 18 formatting backing filesystem ldiskfs on /dev/mapper/mpath0 target name testfs-MDTffff 4k blocks 0 options -J size=400 -i 4096 -I 512 -q -O dir_index,uninit_groups,mmp -F mkfs_cmd = mkfs.ext2 -j -b 4096 -L testfs-MDTffff -J size=400 -i 4096 -I 512 -q -O dir_index,uninit_groups,mmp -F /dev/mapper/mpath0 Writing CONFIGS/mountdata lustre_config: Success on all Lustre targets! lustre_config: Modify /etc/fstab of host sfs1 to add Lustre target /dev/mapper/mpath0 lustre_config: /dev/mapper/mpath0 /mnt/mdt_mgs lustre _netdev 0 0 lustre_config: ******** Lustre cluster configuration END ********** hpcsfse1:root> mount /mnt/mdt_mgs Configuration of the sfs2 server fails. How do I debug and or correct this? hpcsfse1:root> lustre_config -vfw sfs2 src/scripts/hpcsfse_lustre_config.csv lustre_config: Operating on the following nodes: sfs2 lustre_config: Checking the cluster network connectivity and hostnames... lc_net: Verifying network connectivity between "hpcsfse1.hpclab.usa.hp.com" and "sfs2"... lc_net: OK lustre_config: Check the cluster network connectivity and hostnames OK! lustre_config: ******** Lustre cluster configuration START ******** lustre_config: There is no MGS target in the node list "sfs2". lustre_config: Creating the mount point /mnt/ost0 on sfs2 lustre_config: Adding lnet module options to sfs2 lustre_config: Starting lnet network in sfs2 lustre_config: Checking lnet connectivity between sfs2 and the MGS node lustre_config: check_lnet_connect() error: sfs2 cannot contact the MGS node with nids - "ic-sfs1 at o2ib0"! Check /usr/sbin/lctl command! hpcsfse1:root> lctl dl 0 UP mgs MGS MGS 5 1 UP mgc MGC172.31.97.2 at o2ib c2910fbc-b150-b759-0c41-a5851616e41e 5 2 UP mdt MDS MDS_uuid 3 3 UP lov testfs-mdtlov testfs-mdtlov_UUID 4 4 UP mds testfs-MDT0000 testfs-MDT0000_UUID 3 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080828/0e4817f3/attachment-0001.html