On 08/19/2010 10:59 AM, David Noriega wrote:> I''m curious about the underlying framework of lustre in regards to
failover.
>
> When creating the filesystems, one can provide --failnode=x.x.x.x at tcp0
> and even for the OSTs you can provide two nids for the MDS/MGS. What
> do these options tell lustre and the clients? Are these required for
> use with heartbeat? If so why doesn''t that second of the manual
> reference this? Also I think there is a typo in 4.5 Operational
> Scenarios, where it says one can use ''mkfs.lustre --ost --mgs
> --fsname='' That of course returns an error.
>
> David
>
- The --failnode= parameter gives a list of LNET address that will be
tried by a ''client'' (Client in this case can be a client
process on a
Lustre server, as in OSS talking to failover MDS) in the even of
dropped connection to the primary address. This is actually un-related
to the heartbeat setup (which governs services) but critical if
you wish clients to connect to the new service after failover. So
it is a necessary part of _any_ Lustre failover whether performed
by heartbeat or another service.
It is described in the manual, in various places, including section 8.2
Failover Functionality in Lustre and section 4.4.1, please ask further
if that''s not clear.
Thanks for the manual typo catch - have filed a bug.
cliffw