thr3ads.net - Lustre discuss - [Lustre-discuss] mounting lustre in failover configuration [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Erich Focht

2008-Apr-29 09:55 UTC

[Lustre-discuss] mounting lustre in failover configuration

Hi,

I''m puzzled by the following behavior.
An active-passive failover pair of metadata servers have separated MGS and
MDT disks and two networks (o2ib and tcp0(eth0)):

mds1: 10.3.0.230 at o2ib  192.168.50.130 at tcp0
mds2: 10.3.0.231 at o2ib  192.168.50.131 at tcp0

MGS and MDT are formatted with the options:
failover.node=10.3.0.231 at o2ib,192.168.50.131 at tcp0
mgsnode=10.3.0.230 at o2ib,192.168.50.130 at tcp0,10.3.0.231 at
o2ib,192.168.50.131 at tcp0

The _first_ mount of an OST fails if mds1 is the active metadata server.
It succeeds when mds2 is active.

With client mounts I have seen something similar. I could mount clients
with
   mount -t lustre 10.3.0.231 at o2ib:10.3.0.230 at o2ib:/lustre /mnt/lustre
but not with
   mount -t lustre 10.3.0.230 at o2ib:10.3.0.231 at o2ib:/lustre /mnt/lustre
when mds1 was the active MDS. This suggests that the active MDS has to be
the last one on the list.

Strange enough, in my current lab setup I cannot reproduce the client
mount behavior any more.

Did anybody else see this kind of behavior? Are there any reasons for this?

Thanks & best regards,
Erich

Andreas Dilger

2008-May-03 02:30 UTC

head link

[Lustre-discuss] mounting lustre in failover configuration

On Apr 29, 2008  11:55 +0200, Erich Focht wrote:> I''m puzzled by the following behavior.
> An active-passive failover pair of metadata servers have separated MGS and
> MDT disks and two networks (o2ib and tcp0(eth0)):
> 
> mds1: 10.3.0.230 at o2ib  192.168.50.130 at tcp0
> mds2: 10.3.0.231 at o2ib  192.168.50.131 at tcp0
> 
> MGS and MDT are formatted with the options:
> failover.node=10.3.0.231 at o2ib,192.168.50.131 at tcp0
> mgsnode=10.3.0.230 at o2ib,192.168.50.130 at tcp0,10.3.0.231 at
o2ib,192.168.50.131 at tcp0
> 
> The _first_ mount of an OST fails if mds1 is the active metadata server.
> It succeeds when mds2 is active.
> 
> With client mounts I have seen something similar. I could mount clients
> with
>    mount -t lustre 10.3.0.231 at o2ib:10.3.0.230 at o2ib:/lustre
/mnt/lustre
> but not with
>    mount -t lustre 10.3.0.230 at o2ib:10.3.0.231 at o2ib:/lustre
/mnt/lustre
> when mds1 was the active MDS. This suggests that the active MDS has to be
> the last one on the list.
> 
> Strange enough, in my current lab setup I cannot reproduce the client
> mount behavior any more.
> 
> Did anybody else see this kind of behavior? Are there any reasons for this?
I believe there is a bug open on this already - the problem is that the
parsing of the "failover" line in mount.lustre is broken, and it
re-uses
the same buffer to parse all of the MDS NIDs and the last one wins.

I can''t find the bug number offhand, but I believe there was a patch
for it already.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Apr 2008 - mounting lustre in failover configuration

[Lustre-discuss] mounting lustre in failover configuration

[Lustre-discuss] mounting lustre in failover configuration