I am testing an Active/Standby MDS configuration using 1.4.6.4, and I''m having trouble. I''m wondering if anyone can help me out. I have a 4-node cluster: Active MDS, Standby MDS, OST, and Client. I bring up the systems in the following order: OST, Active MDS, Client. I create a few files in /mnt/lustre on the client. Then, I fail the MDS The disks are moved to the the Standby MDS (they are a pair of disks that make up a Raid-1 partition). Linux recognizes the disks, and I do a raidstart on the Raid-1 partition. I then use lconf to start up lustre on the Standby MDS, with the following results: roger-ha-2:~ # lconf -v --node roger-ha-2 /home/roger/lustreConfigs/failoverLustre/failoverLustre.xml configuring for host: [''roger-ha-2''] setting /proc/sys/net/core/rmem_max to at least 16777216 setting /proc/sys/net/core/wmem_max to at least 16777216 Service: network NET_roger-ha-2_tcp NET_roger-ha-2_tcp_UUID loading module: libcfs srcdir None devdir libcfs + /sbin/modprobe libcfs loading module: lnet srcdir None devdir lnet + /sbin/modprobe lnet + /sbin/modprobe lnet loading module: ksocklnd srcdir None devdir klnds/socklnd + /sbin/modprobe ksocklnd Service: ldlm ldlm ldlm_UUID loading module: lvfs srcdir None devdir lvfs + /sbin/modprobe lvfs loading module: obdclass srcdir None devdir obdclass + /sbin/modprobe obdclass loading module: ptlrpc srcdir None devdir ptlrpc + /sbin/modprobe ptlrpc + sysctl lnet/debug_path /tmp/lustre-log-roger-ha-2 + /usr/sbin/lctl modules > /tmp/ogdb-roger-ha-2 Service: network NET_roger-ha-2_tcp NET_roger-ha-2_tcp_UUID NETWORK: NET_roger-ha-2_tcp NET_roger-ha-2_tcp_UUID tcp roger-ha-1 Service: ldlm ldlm ldlm_UUID It is not clear if I need to issue a lconf --select on the OST. It makes sense to do this, so I do the following (and I get the following output): blade-lustre2~# lconf -v --select mds=roger-ha-2 --node blade-lustre2 /home/roger/lustreConfigs/failoverLustre/failoverLustre.xml configuring for host: [''blade-lustre2''] setting /proc/sys/net/core/rmem_max to at least 16777216 setting /proc/sys/net/core/wmem_max to at least 16777216 Service: network NET_blade-lustre2_tcp NET_blade-lustre2_tcp_UUID Service: ldlm ldlm ldlm_UUID Service: osd OSD_ost1-ts_blade-lustre2 OSD_ost1-ts_blade-lustre2_UUID + sysctl lnet/debug_path /tmp/lustre-log-blade-lustre2 + /usr/sbin/lctl modules > /tmp/ogdb-blade-lustre2 Service: network NET_blade-lustre2_tcp NET_blade-lustre2_tcp_UUID Service: ldlm ldlm ldlm_UUID Service: osd OSD_ost1-ts_blade-lustre2 OSD_ost1-ts_blade-lustre2_UUID Then, I move on to the client, with these results: blade-lustre0~# lconf -v --select mds=roger-ha-2 --node client /home/roger/lustreConfigs/failoverLustre/failoverLustre.xml configuring for host: [''client''] setting /proc/sys/net/core/rmem_max to at least 16777216 setting /proc/sys/net/core/wmem_max to at least 16777216 Service: network NET_client_tcp NET_client_tcp_UUID Service: ldlm ldlm ldlm_UUID Service: mountpoint MNT_client MNT_client_UUID get_lov_tgts failed, using get_refs dbg LOV __init__: [(<__main__.OSC instance at 0x2a95e065a8>, 0, 1, 1)] [u''ost1-ts_UUID''] 0 + sysctl lnet/debug_path /tmp/lustre-log-blade-lustre0 + /usr/sbin/lctl modules > /tmp/ogdb-blade-lustre0 Service: network NET_client_tcp NET_client_tcp_UUID Service: ldlm ldlm ldlm_UUID Service: mountpoint MNT_client MNT_client_UUID get_lov_tgts failed, using get_refs dbg LOV __init__: [(<__main__.OSC instance at 0x2a95e06440>, 0, 1, 1)] [u''ost1-ts_UUID''] 0 /mnt/lustre already mounted. So, I figured that I should try unmounting /mnt/lustre first, with these results: blade-lustre0:~# lconf -v --cleanup --force --node client /home/roger/lustreConfigs/failoverLustre/failoverLustre.xml configuring for host: [''client''] Service: mountpoint MNT_client MNT_client_UUID get_lov_tgts failed, using get_refs dbg LOV __init__: [(<__main__.OSC instance at 0x2a95e06518>, 0, 1, 1)] [u''ost1-ts_UUID''] 0 MTPT: MNT_client MNT_client_UUID /mnt/lustre ha-mds_UUID lov-ts_UUID + umount -f /mnt/lustre ! umount (error 1): > umount2: Device or resource busy > umount: /mnt/lustre: device is busy > umount2: Device or resource busy > umount: /mnt/lustre: device is busy Now, I don''t know what to do. As background info, I create my xml configuration as follows: CFG=failoverLustre.xml rm -f $CFG MDS1=roger-ha-1 # active MDS2=roger-ha-2 # standby OST1=blade-lustre2 MDSNAME=ha-mds # # There is one LOV # LOV="--lov lov-ts" # # Each FS is ext3, as that is what Lustre requires # FS="--fstype ext3" # # Each node has 1 OSTs. The sizes will be defaulted. # The OSTs are on /dev/md1 # DEV1=" --dev /dev/sda1" # Create nodes # Note, if multiple tcp networks, then use tcp0, tcp1, etc. lmc -m $CFG --add net --node $MDS1 --nid $MDS1 --nettype tcp --failover lmc -m $CFG --add net --node $MDS2 --nid $MDS1 --nettype tcp --failover lmc -m $CFG --add net --node $OST1 --nid $OST1 --nettype tcp lmc -m $CFG --add net --node client --nid ''*'' --nettype tcp # Cofigure MDS lmc -m $CFG --add mds --node $MDS1 --mds $MDSNAME $FS --dev /dev/md1 # Configures OSTs lmc -m $CFG --add lov $LOV --mds $MDSNAME --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0 # Add one on each OSS lmc -m $CFG --add ost --node $OST1 $LOV --ost ost1-ts $FS $DEV1 # Configure client (this is a ''generic'' client used for all client mounts) lmc -m $CFG --add mtpt --node client --path /mnt/lustre --mds $MDSNAME $LOV If anyone can advise me, I''d appreciate it. Thanks. -Roger _________________________________________________________________ Don’t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/