Sky, Lustre itself just provides tools to facilitate external failover management via systems such as RedHat Cluster Manager, or Heartbeat. With Lustre 1.4.5 there is even support for generating the service management scripts that Cluster Manager needs, though I''ve not actually used those facilities myself. Important: The MDS should only be active on one system at a time; starting on more than one system at the same time is a recipe for fatal corruption of the file system. The lconf --failover flag ------------------------- In Lustre terms the MDS device is implicitly declared as a failover capable device. When stopping the device with lconf you just need to add the --failover flag, as well as the --cleanup flag. This --failover device means that the MDS device shuts itself down in such a way that it retains the active state of the client connections/operations, and can recover connections to those clients, if they still exist, when the MDS device is started up again. Service Location Management --------------------------- When starting or stopping the MDS device you will also need to ensure that you tell the lconf command where the MDS should be running; this can be done via number of methods depending on how you manage your config. If you use XML (or http served XML) then you should just have to add the --service=<mds_name> option to the lconf command line. If you use LDAP then use the lactive command to update the active location of the MDS device before starting it on the target system, but remember that it has to have been stopped on the previous system before updating the LDAP database. If you setup a pair of nodes that can share access to the MDS device running Cluster Manager and then add in the MDS service definition, you should find that the Cluster Manager will take over role of ensuring that your device is running. BTW all of this applies just as equally to managing OST devices as well. Fergal. -- Fergal.McCarthy@HP.com (The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as "HP CONFIDENTIAL".) -----Original Message----- From: lustre-discuss-admin@lists.clusterfs.com [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of sky Sent: 23 November 2005 01:19 To: lustre-discuss@lists.clusterfs.com Subject: [Lustre-discuss] (no subject) hi, I''m working with lustre-1.4.5 on a cluster.I need MDS failover, but can not find any details from manual or lustre site. Anyone who have done it can give me a help? thanks -sky =========================263???????????.??b??f?)?+-.??b??X???^? ?m??????^? ??)fj?????-?b?