thr3ads.net - Lustre discuss - [Lustre-discuss] manual ost failover problems [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Marcus Schull

2008-Aug-21 08:29 UTC

[Lustre-discuss] manual ost failover problems

Hi,

We are currently testing lustre 1.6.5.1 on RHEL 5 (64bit) with 3 OSTs  
for a ''data'' filesystem running on server1 and 4 OSTs for a
''common''
filesytem running on server2.  Each OST is a 1TB SAN LUN that can be  
seen from either server.  The idea was to run the servers as an  
active/active failover pair, being able to mount the ''other''
LUNs on
the remaining server if one server failed.   Also, we could have the  
flexibility of striping (between the 2 nodes initially -->  more in  
the future), if the OSTs of each fs  were spread out amongst the  
servers.

At present, this works well if all LUNs are only mounted on the  
initial server they are mounted on after creation.

I had assumed that OSTs could be unmounted from server1 and then  
remounted on then remounted on server2 (never simultaneously  
mounted), but this does not seem to work whether or not clients are  
using (have mounted) the file system, or even whether the servers are  
rebooted in between the change.

The filesystems were created using the --failnode option.

Even though the LUNs will mount on the other server, any clients that  
access the filesytem will ''hang'' until the LUN is mounted back
in its
initial location.

Is there a command to ''update'' the ?MGS/MDT''s
information regarding
this, and so communicate this to the clients?

While I may have missed it, I couldn''t find much information on  
''manual'' failover in the Lustre 1.6 manual or lustre wiki.

We may implement failover with Linux HA down the track, but at this  
stage manual failover would be sufficient if we could understand more  
about how it works.

If this info is clearly documented somewhere (like in the manual), I  
apologise and will attempt to locate this info again.


  ** It seems that I can achieve the above with the MDTs (ie unmount  
from one server and mount on the other) - although with inconsistent  
results so far.


Thanks in advance for any advice.


Marcus.

Systems Administrator,
University of Queensland.

Andreas Dilger

2008-Aug-27 08:27 UTC

head link

[Lustre-discuss] manual ost failover problems

On Aug 21, 2008  18:29 +1000, Marcus Schull wrote:> We are currently testing lustre 1.6.5.1 on RHEL 5 (64bit) with 3 OSTs  
> for a ''data'' filesystem running on server1 and 4 OSTs for
a ''common''
> filesytem running on server2.  Each OST is a 1TB SAN LUN that can be  
> seen from either server.  The idea was to run the servers as an  
> active/active failover pair, being able to mount the
''other'' LUNs on
> the remaining server if one server failed.   Also, we could have the  
> flexibility of striping (between the 2 nodes initially -->  more in  
> the future), if the OSTs of each fs  were spread out amongst the  
> servers.
> 
> At present, this works well if all LUNs are only mounted on the  
> initial server they are mounted on after creation.
> 
> I had assumed that OSTs could be unmounted from server1 and then  
> remounted on then remounted on server2 (never simultaneously  
> mounted), but this does not seem to work whether or not clients are  
> using (have mounted) the file system, or even whether the servers are  
> rebooted in between the change.
> 
> Even though the LUNs will mount on the other server, any clients that  
> access the filesytem will ''hang'' until the LUN is mounted
back in its
> initial location.
> 
> The filesystems were created using the --failnode option.
Odd, this is exactly what should work.  Do the clients report trying
to contact the backup server?
> Is there a command to ''update'' the ?MGS/MDT''s
information regarding
> this, and so communicate this to the clients?
No, the clients should know this from the configuration they got at
mount time, and try automatically with the backup server if the primary
is down.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Aug 2008 - manual ost failover problems

[Lustre-discuss] manual ost failover problems

[Lustre-discuss] manual ost failover problems