thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre failover of MGS [Nov 2008]

If this information is useful, please help other people find it:
Share via:

neil rutter

2008-Nov-25 20:45 UTC

[Lustre-discuss] Lustre failover of MGS

hi there,

i have an issue with failover of the MGS device within my cluster.

i''m building a simple lustre environment; just one lustre file system 
(testfs) ...

i have a two node cluster for my MGS/MDT; this is for an active/passive 
config with the MGS and MDT on different devices and mounted separately 
(not co-locating).

i have a two node cluster for my OSTs, in an active/active config in 
that the first OST is on node one and the second OST is on node two.

if i have the above, then heartbeat is happy with the OSTs and the MDT 
mountpoints being mounted on either node in the cluster. the MGS however 
is not. i get the following message when it tries to mount on the 
alternative node:

---8<---
mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.
---8<---

i''ve noticed that if i consolidate and have my MGS and MDT on the same 
device/mountpoint on the MDS cluster nodes, all is well and the file 
system mounts on the alternative node perfectly.

any ideas?

i have ensured i created the file systems with --failnode and --mgsnode= 
for each MDS server, but no joy.

i can see in a previous post to lustre-discuss someone having a similar, 
if not the same issue:

http://lists.lustre.org/pipermail/lustre-discuss/2008-September/008634.html

cheers

Kevin Van Maren

2008-Nov-25 20:57 UTC

head link

[Lustre-discuss] Lustre failover of MGS

Sounds like a problem with your mkfs commands.  Please send the full 
"mkfs" commands for the
MGS and MDT luns, and the IP addresses for the primary and secondary 
nodes ("lctl list_nids").

Kevin


neil rutter wrote:> hi there,
>
> i have an issue with failover of the MGS device within my cluster.
>
> i''m building a simple lustre environment; just one lustre file
system
> (testfs) ...
>
> i have a two node cluster for my MGS/MDT; this is for an active/passive 
> config with the MGS and MDT on different devices and mounted separately 
> (not co-locating).
>
> i have a two node cluster for my OSTs, in an active/active config in 
> that the first OST is on node one and the second OST is on node two.
>
> if i have the above, then heartbeat is happy with the OSTs and the MDT 
> mountpoints being mounted on either node in the cluster. the MGS however 
> is not. i get the following message when it tries to mount on the 
> alternative node:
>
> ---8<---
> mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid argument
> This may have multiple causes.
> Are the mount options correct?
> Check the syslog for more info.
> ---8<---
>
> i''ve noticed that if i consolidate and have my MGS and MDT on the
same
> device/mountpoint on the MDS cluster nodes, all is well and the file 
> system mounts on the alternative node perfectly.
>
> any ideas?
>
> i have ensured i created the file systems with --failnode and --mgsnode= 
> for each MDS server, but no joy.
>
> i can see in a previous post to lustre-discuss someone having a similar, 
> if not the same issue:
>
> http://lists.lustre.org/pipermail/lustre-discuss/2008-September/008634.html
>
> cheers
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Brian J. Murrell

2008-Nov-25 21:28 UTC

head link

[Lustre-discuss] Lustre failover of MGS

On Tue, 2008-11-25 at 20:45 +0000, neil rutter wrote:> hi there,
Hi,
> i have an issue with failover of the MGS device within my cluster.
> 
> i''m building a simple lustre environment; just one lustre file
system
> (testfs) ...
> 
> i have a two node cluster for my MGS/MDT; this is for an active/passive 
> config with the MGS and MDT on different devices and mounted separately 
> (not co-locating).
> 
> i have a two node cluster for my OSTs, in an active/active config in 
> that the first OST is on node one and the second OST is on node two.
So you in fact have 4 nodes as your Lustre servers, yes?  What is your
shared storage technology?  How are the two OSSes accessing the same two
OSTs and how are the two MDSes accessing the single MDT and MGT?
> if i have the above, then heartbeat is happy with the OSTs and the MDT 
> mountpoints being mounted on either node in the cluster. the MGS however 
> is not. i get the following message when it tries to mount on the 
> alternative node:
> 
> ---8<---
> mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid argument
> This may have multiple causes.
> Are the mount options correct?
> Check the syslog for more info.
> ---8<---
Is /dev/sdb actually accessible on the alternative node?  What does
"cat /proc/partitions" say on that node?

What does dmesg tell you after you try to mount /dev/sdb and it fails?

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081125/ed6a6f9c/attachment.bin

neil rutter

2008-Nov-25 23:34 UTC

head link

[Lustre-discuss] Lustre failover of MGS

kevin, brian,

thanks to you both. i had a mkfs.lustre error, which after had been 
pointed out fixed the issue.

Brian J. Murrell wrote:> On Tue, 2008-11-25 at 20:45 +0000, neil rutter wrote:
>> hi there,
> 
> Hi,
> 
>> i have an issue with failover of the MGS device within my cluster.
>>
>> i''m building a simple lustre environment; just one lustre file
system
>> (testfs) ...
>>
>> i have a two node cluster for my MGS/MDT; this is for an active/passive
>> config with the MGS and MDT on different devices and mounted separately
>> (not co-locating).
>>
>> i have a two node cluster for my OSTs, in an active/active config in 
>> that the first OST is on node one and the second OST is on node two.
> 
> So you in fact have 4 nodes as your Lustre servers, yes?  What is your
> shared storage technology?  How are the two OSSes accessing the same two
> OSTs and how are the two MDSes accessing the single MDT and MGT?
> 
>> if i have the above, then heartbeat is happy with the OSTs and the MDT 
>> mountpoints being mounted on either node in the cluster. the MGS
however
>> is not. i get the following message when it tries to mount on the 
>> alternative node:
>>
>> ---8<---
>> mount.lustre: mount /dev/sdb at /lustre/testfs/mgs failed: Invalid
argument
>> This may have multiple causes.
>> Are the mount options correct?
>> Check the syslog for more info.
>> ---8<---
> 
> Is /dev/sdb actually accessible on the alternative node?  What does
> "cat /proc/partitions" say on that node?
> 
> What does dmesg tell you after you try to mount /dev/sdb and it fails?
> 
> b.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Brian J. Murrell

2008-Nov-26 13:39 UTC

head link

[Lustre-discuss] Lustre failover of MGS

On Tue, 2008-11-25 at 23:34 +0000, neil rutter wrote:> kevin, brian,
> 
> thanks to you both. i had a mkfs.lustre error, which after had been 
> pointed out fixed the issue.
Can you share the error here so that future searches of this problem are
complete with a solution?

There''s nothing more frustrating than finding the same problem you are
having in a mailing list archive and with no solution.  Actually there
is something more frustrating and that''s that the problem was solved
but
no details on how.

Thanx,
b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081126/9c41fd43/attachment.bin

Kevin Van Maren

2008-Nov-26 17:03 UTC

head link

[Lustre-discuss] Lustre failover of MGS

Brian J. Murrell wrote:> On Tue, 2008-11-25 at 23:34 +0000, neil rutter wrote:
>   
>> kevin, brian,
>>
>> thanks to you both. i had a mkfs.lustre error, which after had been 
>> pointed out fixed the issue.
>>     
>
> Can you share the error here so that future searches of this problem are
> complete with a solution?
>
> There''s nothing more frustrating than finding the same problem you
are
> having in a mailing list archive and with no solution.  Actually there
> is something more frustrating and that''s that the problem was
solved but
> no details on how.
>
> Thanx,
> b.
>   Come on, if we wanted to make it that easy, it would be in the manual ;-)

I gave Neil the correct mkfs commands:

mkfs.lustre --reformat --failnode=192.168.123.21 at tcp0 --mgs /dev/sdb
mkfs.lustre --reformat --fsname bananafs --failnode=192.168.123.21 at tcp0 
--mgsnode=192.168.123.20 at tcp0 --mgsnode=192.168.123.21 at tcp0 --mdt /dev/sdc

Kevin

Kevin Van Maren

2008-Nov-26 17:22 UTC

head link

[Lustre-discuss] Lustre failover of MGS

Kevin Van Maren wrote:> Brian J. Murrell wrote:
>> On Tue, 2008-11-25 at 23:34 +0000, neil rutter wrote:
>>  
>>> kevin, brian,
>>>
>>> thanks to you both. i had a mkfs.lustre error, which after had been
>>> pointed out fixed the issue.
>>>     
>>
>> Can you share the error here so that future searches of this problem
are
>> complete with a solution?
>>
>> There''s nothing more frustrating than finding the same problem
you are
>> having in a mailing list archive and with no solution.  Actually there
>> is something more frustrating and that''s that the problem was
solved but
>> no details on how.
>>
>> Thanx,
>> b.
>>   
> Come on, if we wanted to make it that easy, it would be in the manual ;-)
>
> I gave Neil the correct mkfs commands:
>
> mkfs.lustre --reformat --failnode=192.168.123.21 at tcp0 --mgs /dev/sdb
> mkfs.lustre --reformat --fsname bananafs 
> --failnode=192.168.123.21 at tcp0 --mgsnode=192.168.123.20 at tcp0 
> --mgsnode=192.168.123.21 at tcp0 --mdt /dev/sdc
>
> Kevin
Brian,

Here are the original, incorrect, mkfs commands:

mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 --mgs 
/dev/sdb
mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 
--mgsnode=lustremds1 --mdt /dev/sdc

Kevin

Brian J. Murrell

2008-Nov-26 17:36 UTC

head link

[Lustre-discuss] Lustre failover of MGS

On Wed, 2008-11-26 at 10:22 -0700, Kevin Van Maren
wrote:> 
> Brian,
Hi Kevin,
> Here are the original, incorrect, mkfs commands:
> 
> mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 --mgs 
> /dev/sdb
> mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 
> --mgsnode=lustremds1 --mdt /dev/sdc
So to be clear, was his failure that he only specified the one --mgsnode
or that his hostname specifications did not resolve properly to the IP
addresses he used in his subsequently, working commands?  Or both?

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081126/6f68b774/attachment.bin

neil rutter

2008-Nov-26 17:42 UTC

head link

[Lustre-discuss] Lustre failover of MGS

guys,

actually, the original cmds i ran did have the = signs for the 
--failnode=<nodename> arguments; i gave the wrong bash history info to 
kevin to analyse when he asked for them from the host :)

however, having said that, it''s the NIDs that were missing from the 
hosts that fixed the issues i was having.

cheers

Kevin Van Maren wrote:> Kevin Van Maren wrote:
>> Brian J. Murrell wrote:
>>> On Tue, 2008-11-25 at 23:34 +0000, neil rutter wrote:
>>>  
>>>> kevin, brian,
>>>>
>>>> thanks to you both. i had a mkfs.lustre error, which after had
been
>>>> pointed out fixed the issue.
>>>>     
>>> Can you share the error here so that future searches of this
problem are
>>> complete with a solution?
>>>
>>> There''s nothing more frustrating than finding the same
problem you are
>>> having in a mailing list archive and with no solution.  Actually
there
>>> is something more frustrating and that''s that the problem
was solved but
>>> no details on how.
>>>
>>> Thanx,
>>> b.
>>>   
>> Come on, if we wanted to make it that easy, it would be in the manual
;-)
>>
>> I gave Neil the correct mkfs commands:
>>
>> mkfs.lustre --reformat --failnode=192.168.123.21 at tcp0 --mgs /dev/sdb
>> mkfs.lustre --reformat --fsname bananafs 
>> --failnode=192.168.123.21 at tcp0 --mgsnode=192.168.123.20 at tcp0 
>> --mgsnode=192.168.123.21 at tcp0 --mdt /dev/sdc
>>
>> Kevin
> 
> Brian,
> 
> Here are the original, incorrect, mkfs commands:
> 
> mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 --mgs 
> /dev/sdb
> mkfs.lustre --reformat --fsname bananafs --failnode lustremds2 
> --mgsnode=lustremds1 --mdt /dev/sdc
> 
> Kevin
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Daniel Kobras

2008-Nov-27 09:12 UTC

head link

[Lustre-discuss] Lustre failover of MGS

Kevin Van Maren <Kevin.Vanmaren at ...> writes:> mkfs.lustre --reformat --failnode=192.168.123.21 at tcp0 --mgs /dev/sdb
Is --failnode evaluated for the MGS? We seem to do fine without it as any client
requires explicit configuration of the MGS failnode anyway. Or is it possible
to override this configuration with the value set on the MGS?
> mkfs.lustre --reformat --fsname bananafs --failnode=192.168.123.21 at tcp0--mgsnode=192.168.123.20 at tcp0 --mgsnode=192.168.123.21 at tcp0 --mdt /dev/sdc

Regards,

Daniel.

Lustre discuss - Nov 2008 - Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS

[Lustre-discuss] Lustre failover of MGS