thr3ads.net - Lustre discuss - [Lustre-discuss] Failover Problem; can''t umount on client [Aug 2006]

If this information is useful, please help other people find it:
Share via:
RS RS
2006-Aug-07 14:04 UTC
[Lustre-discuss] Failover Problem; can''t umount on client

I am testing an Active/Standby MDS configuration using 1.4.6.4, and
I''m having trouble.  I''m wondering if anyone can help me out.

I have a 4-node cluster:  Active MDS, Standby MDS, OST, and Client.
I bring up the systems in the following order:  OST, Active MDS, Client.
I create a few files in /mnt/lustre on the client.
Then, I fail the MDS
The disks are moved to the the Standby MDS (they are a pair of disks that
make up a Raid-1 partition).
Linux recognizes the disks, and I do a raidstart on the Raid-1 partition.
I then use lconf to start up lustre on the Standby MDS, with the following 
results:

  roger-ha-2:~ # lconf -v --node roger-ha-2 
/home/roger/lustreConfigs/failoverLustre/failoverLustre.xml
  configuring for host:  [''roger-ha-2'']
  setting /proc/sys/net/core/rmem_max to at least 16777216
  setting /proc/sys/net/core/wmem_max to at least 16777216
  Service: network NET_roger-ha-2_tcp NET_roger-ha-2_tcp_UUID
  loading module: libcfs srcdir None devdir libcfs
  + /sbin/modprobe libcfs
  loading module: lnet srcdir None devdir lnet
  + /sbin/modprobe lnet
  + /sbin/modprobe lnet
  loading module: ksocklnd srcdir None devdir klnds/socklnd
  + /sbin/modprobe ksocklnd
  Service: ldlm ldlm ldlm_UUID
  loading module: lvfs srcdir None devdir lvfs
  + /sbin/modprobe lvfs
  loading module: obdclass srcdir None devdir obdclass
  + /sbin/modprobe obdclass
  loading module: ptlrpc srcdir None devdir ptlrpc
  + /sbin/modprobe ptlrpc
  + sysctl lnet/debug_path /tmp/lustre-log-roger-ha-2
  + /usr/sbin/lctl  modules > /tmp/ogdb-roger-ha-2
  Service: network NET_roger-ha-2_tcp NET_roger-ha-2_tcp_UUID
  NETWORK: NET_roger-ha-2_tcp NET_roger-ha-2_tcp_UUID tcp roger-ha-1
  Service: ldlm ldlm ldlm_UUID


It is not clear if I need to issue a lconf --select on the OST.  It makes
sense to do this, so I do the following (and I get the following output):

  blade-lustre2~# lconf -v --select mds=roger-ha-2 --node blade-lustre2 
/home/roger/lustreConfigs/failoverLustre/failoverLustre.xml
  configuring for host:  [''blade-lustre2'']
  setting /proc/sys/net/core/rmem_max to at least 16777216
  setting /proc/sys/net/core/wmem_max to at least 16777216
  Service: network NET_blade-lustre2_tcp NET_blade-lustre2_tcp_UUID
  Service: ldlm ldlm ldlm_UUID
  Service: osd OSD_ost1-ts_blade-lustre2 OSD_ost1-ts_blade-lustre2_UUID
  + sysctl lnet/debug_path /tmp/lustre-log-blade-lustre2
  + /usr/sbin/lctl  modules > /tmp/ogdb-blade-lustre2
  Service: network NET_blade-lustre2_tcp NET_blade-lustre2_tcp_UUID
  Service: ldlm ldlm ldlm_UUID
  Service: osd OSD_ost1-ts_blade-lustre2 OSD_ost1-ts_blade-lustre2_UUID


Then, I move on to the client, with these results:

  blade-lustre0~# lconf -v --select mds=roger-ha-2 --node client 
/home/roger/lustreConfigs/failoverLustre/failoverLustre.xml
  configuring for host:  [''client'']
  setting /proc/sys/net/core/rmem_max to at least 16777216
  setting /proc/sys/net/core/wmem_max to at least 16777216
  Service: network NET_client_tcp NET_client_tcp_UUID
  Service: ldlm ldlm ldlm_UUID
  Service: mountpoint MNT_client MNT_client_UUID
  get_lov_tgts failed, using get_refs
  dbg LOV __init__: [(<__main__.OSC instance at 0x2a95e065a8>, 0, 1, 1)] 
[u''ost1-ts_UUID''] 0
  + sysctl lnet/debug_path /tmp/lustre-log-blade-lustre0
  + /usr/sbin/lctl  modules > /tmp/ogdb-blade-lustre0
  Service: network NET_client_tcp NET_client_tcp_UUID
  Service: ldlm ldlm ldlm_UUID
  Service: mountpoint MNT_client MNT_client_UUID
  get_lov_tgts failed, using get_refs
  dbg LOV __init__: [(<__main__.OSC instance at 0x2a95e06440>, 0, 1, 1)] 
[u''ost1-ts_UUID''] 0
  /mnt/lustre already mounted.


So, I figured that I should try unmounting /mnt/lustre first, with these 
results:

  blade-lustre0:~# lconf -v --cleanup --force --node client 
/home/roger/lustreConfigs/failoverLustre/failoverLustre.xml
  configuring for host:  [''client'']
  Service: mountpoint MNT_client MNT_client_UUID
  get_lov_tgts failed, using get_refs
  dbg LOV __init__: [(<__main__.OSC instance at 0x2a95e06518>, 0, 1, 1)] 
[u''ost1-ts_UUID''] 0
  MTPT: MNT_client MNT_client_UUID /mnt/lustre ha-mds_UUID lov-ts_UUID
  + umount -f /mnt/lustre
  ! umount (error 1):
  > umount2: Device or resource busy
  > umount: /mnt/lustre: device is busy
  > umount2: Device or resource busy
  > umount: /mnt/lustre: device is busy


Now, I don''t know what to do.  As background info, I create my xml 
configuration as follows:


  CFG=failoverLustre.xml
  rm -f $CFG

  MDS1=roger-ha-1	# active
  MDS2=roger-ha-2	# standby
  OST1=blade-lustre2

  MDSNAME=ha-mds

  #
  # There is one LOV
  #
  LOV="--lov lov-ts"

  #
  # Each FS is ext3, as that is what Lustre requires
  #
  FS="--fstype ext3"

  #
  # Each node has 1 OSTs.  The sizes will be defaulted.
  # The OSTs are on /dev/md1
  #
  DEV1=" --dev /dev/sda1"

  # Create nodes
  # Note, if multiple tcp networks, then use tcp0, tcp1, etc.
  lmc -m $CFG --add net --node $MDS1 --nid $MDS1 --nettype tcp --failover
  lmc -m $CFG --add net --node $MDS2 --nid $MDS1 --nettype tcp --failover
  lmc -m $CFG --add net --node $OST1 --nid $OST1 --nettype tcp
  lmc -m $CFG --add net --node client --nid ''*'' --nettype tcp

  # Cofigure MDS
  lmc -m $CFG --add mds --node $MDS1 --mds $MDSNAME  $FS --dev /dev/md1

  # Configures OSTs
  lmc -m $CFG --add lov $LOV --mds $MDSNAME  --stripe_sz 1048576 
--stripe_cnt 0 --stripe_pattern 0

  # Add one on each OSS
  lmc -m $CFG --add ost --node $OST1 $LOV --ost ost1-ts $FS $DEV1

  # Configure client (this is a ''generic'' client used for all
client mounts)
  lmc -m $CFG --add mtpt --node client --path /mnt/lustre --mds $MDSNAME  
$LOV


If anyone can advise me, I''d appreciate it.

Thanks.

-Roger

_________________________________________________________________
Don’t just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/
Lustre discuss - Aug 2006 - Failover Problem; can't umount on client

[Lustre-discuss] Failover Problem; can''t umount on client