thr3ads.net - Lustre discuss - [Lustre-discuss] Failover for MGS [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Robert LeBlanc

2007-Nov-12 16:02 UTC

[Lustre-discuss] Failover for MGS

Ok, I feel really stupid. I''ve done this before without any problem,
but I
can''t seem to get it to work and I can''t find my notes from
the last time I
did it. We have separate MGS and MDTs. I can''t seem to get our MGS to
failover correctly after reformatting it.

mkfs.lustre --mkfsoptions="-O dir_index" --reformat --mgs
--failnode=192.168.1.253 at o2ib /dev/mapper/ldiskc-part1

We are running this on Debian, using the Lustre 1.6.3 debs from svn on Lenny
with 2.6.22.12. I''ve tried several permutations of the mkfs.lustre
command,
specifing both nodes as failover, and both nodes as MGS and pretty much
every other combination of the above. With the above command tunefs.lustre
shows that failnode and mgsnode are the failover node.

Thanks,
Robert
 
Robert LeBlanc
College of Life Sciences Computer Support
Brigham Young University
leblanc at byu.edu
(801)422-1882

Nathan Rutman

2007-Nov-12 20:49 UTC

head link

[Lustre-discuss] Failover for MGS

Robert LeBlanc wrote:> Ok, I feel really stupid. I''ve done this before without any
problem, but I
> can''t seem to get it to work and I can''t find my notes
from the last time I
> did it. We have separate MGS and MDTs. I can''t seem to get our MGS
to
> failover correctly after reformatting it.
>
> mkfs.lustre --mkfsoptions="-O dir_index" --reformat --mgs
> --failnode=192.168.1.253 at o2ib /dev/mapper/ldiskc-part1
>
>   The MGS doesn''t actually use the --failnode option (although it
won''t
hurt).  You actually have to tell the other nodes
in the system (servers and clients) about the failover options for the 
MGS (use the --mgsnode parameter on servers, and mount address for 
clients).   The reason is because the servers must contact the MGS for 
the configuration information, and they can''t ask the MGS where its 
failover partner is if e.g. the failover partner is the one that''s
running.
> We are running this on Debian, using the Lustre 1.6.3 debs from svn on
Lenny
> with 2.6.22.12. I''ve tried several permutations of the mkfs.lustre
command,
> specifing both nodes as failover, and both nodes as MGS and pretty much
> every other combination of the above. With the above command tunefs.lustre
> shows that failnode and mgsnode are the failover node.
>
> Thanks,
> Robert
>  
> Robert LeBlanc
> College of Life Sciences Computer Support
> Brigham Young University
> leblanc at byu.edu
> (801)422-1882
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

Wojciech Turek

2007-Nov-12 21:24 UTC

head link

[Lustre-discuss] Failover for MGS

Hi,

How will look my tunefs.lustre command line if I would like to  
configure failnode for my MDS. I have two MDT''s and MGS is on the  
same block device that one of MDT''s ? I have also two servers  
connected to share matadata storage.

Thanks,

Wojciech
On 12 Nov 2007, at 20:49, Nathan Rutman wrote:
> Robert LeBlanc wrote:
>> Ok, I feel really stupid. I''ve done this before without any  
>> problem, but I
>> can''t seem to get it to work and I can''t find my
notes from the
>> last time I
>> did it. We have separate MGS and MDTs. I can''t seem to get our
MGS to
>> failover correctly after reformatting it.
>>
>> mkfs.lustre --mkfsoptions="-O dir_index" --reformat --mgs
>> --failnode=192.168.1.253 at o2ib /dev/mapper/ldiskc-part1
>>
>>
> The MGS doesn''t actually use the --failnode option (although it
won''t
> hurt).  You actually have to tell the other nodes
> in the system (servers and clients) about the failover options for the
> MGS (use the --mgsnode parameter on servers, and mount address for
> clients).   The reason is because the servers must contact the MGS for
> the configuration information, and they can''t ask the MGS where
its
> failover partner is if e.g. the failover partner is the one that''s
> running.
>
>> We are running this on Debian, using the Lustre 1.6.3 debs from  
>> svn on Lenny
>> with 2.6.22.12. I''ve tried several permutations of the
mkfs.lustre
>> command,
>> specifing both nodes as failover, and both nodes as MGS and pretty  
>> much
>> every other combination of the above. With the above command  
>> tunefs.lustre
>> shows that failnode and mgsnode are the failover node.
>>
>> Thanks,
>> Robert
>>
>> Robert LeBlanc
>> College of Life Sciences Computer Support
>> Brigham Young University
>> leblanc at byu.edu
>> (801)422-1882
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: wjt27 at cam.ac.uk
tel. +441223763517



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071112/3d3134ef/attachment-0002.html

Robert LeBlanc

2007-Nov-12 21:36 UTC

head link

[Lustre-discuss] Failover for MGS

You should just unmount all the clients, all OSTs and then:

tunefs.lustre ?failnode 10.0.0.2 at tcp ?writeconf /dev/shared/disk

If your volume is already on the shared disk, them mount everything and you
should be good to go. You can also do it on a live mounted system by using
lctl, but I?m not exactly sure how to do that.

Robert

On 11/12/07 2:24 PM, "Wojciech Turek" <wjt27 at cam.ac.uk>
wrote:
> Hi,
> 
> How will look my tunefs.lustre command line if I would like to configure
> failnode for my MDS. I have two MDT''s and MGS is on the same block
device that
> one of MDT''s ? I have also two servers connected to share matadata
storage.
> 
> Thanks,
> 
> Wojciech?
> On 12 Nov 2007, at 20:49, Nathan Rutman wrote:
> 
>> Robert LeBlanc wrote:
>>  
>>> Ok, I feel really stupid. I''ve done this before without
any problem, but I
>>> can''t seem to get it to work and I can''t find my
notes from the last time I
>>> did it. We have separate MGS and MDTs. I can''t seem to get
our MGS to
>>> failover correctly after reformatting it.
>>> 
>>> mkfs.lustre --mkfsoptions="-O dir_index" --reformat --mgs
>>> --failnode=192.168.1.253 at o2ib /dev/mapper/ldiskc-part1
>>> 
>>> 
>>>  
>> The MGS doesn''t actually use the --failnode option (although
it won''t?
>> hurt).? You actually have to tell the other nodes
>> in the system (servers and clients) about the failover options for the?
>> MGS (use the --mgsnode parameter on servers, and mount address for?
>> clients). ? The reason is because the servers must contact the MGS for?
>> the configuration information, and they can''t ask the MGS
where its?
>> failover partner is if e.g. the failover partner is the one
that''s running.
>> 
>>  
>>> We are running this on Debian, using the Lustre 1.6.3 debs from svn
on Lenny
>>> with 2.6.22.12. I''ve tried several permutations of the
mkfs.lustre command,
>>> specifing both nodes as failover, and both nodes as MGS and pretty
much
>>> every other combination of the above. With the above command
tunefs.lustre
>>> shows that failnode and mgsnode are the failover node.
>>> 
>>> Thanks,
>>> Robert
>>> 
>>> Robert LeBlanc
>>> College of Life Sciences Computer Support
>>> Brigham Young University
>>> leblanc at byu.edu
>>> (801)422-1882
>>> 
>>> 
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at clusterfs.com
>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>> 
>>>  
>>> 
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at clusterfs.com
>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>  
>>> 
>>>  
>>> Mr Wojciech Turek
>>> Assistant System Manager
>>> University of Cambridge
>>> High Performance Computing service?
>>> email: wjt27 at cam.ac.uk
>>> tel. +441223763517
>>> 
>>> 
>>>  
>>> 
>>> 
 
Robert LeBlanc
College of Life Sciences Computer Support
Brigham Young University
leblanc at byu.edu
(801)422-1882


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071112/0012ec37/attachment-0002.html

Wojciech Turek

2007-Nov-12 22:23 UTC

head link

[Lustre-discuss] Failover for MGS

Hi,

Thanks for that. Actually I have a little more complex situation  
here. I have two sets of clients. First set is working in  
10.142.10.0/24 network and the second set is working in 10.143.0.0/16  
network.
Each server has two NIC''s.
NIC1 = ETH0 10.143.0.0/16 and NIC2= ETH1 10.142.10.0/24
lnet configures network in the following manner:
eth0 = <ip>@tcp0
eth1 = <ip>@tcp1

I am going to change lustre configuration in order to introduce  
failover features.
MGS is cobined with with mdt01=/dev/dm-0

on mds01
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1

on oss1
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5

on oss2
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11

on oss3
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5

on oss4
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
tunefs.lustre --erase-params --writeconf -- 
failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1 -- 
mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11

Will above be correct?

Cheers,

Wojciech Turek


On 12 Nov 2007, at 21:36, Robert LeBlanc wrote:
> You should just unmount all the clients, all OSTs and then:
>
> tunefs.lustre ?failnode 10.0.0.2 at tcp ?writeconf /dev/shared/disk
>
> If your volume is already on the shared disk, them mount everything  
> and you should be good to go. You can also do it on a live mounted  
> system by using lctl, but I?m not exactly sure how to do that.
>
> Robert
>
> On 11/12/07 2:24 PM, "Wojciech Turek" <wjt27 at cam.ac.uk>
wrote:
>
>> Hi,
>>
>> How will look my tunefs.lustre command line if I would like to  
>> configure failnode for my MDS. I have two MDT''s and MGS is on
the
>> same block device that one of MDT''s ? I have also two servers
>> connected to share matadata storage.
>>
>> Thanks,
>>
>> Wojciech
>> On 12 Nov 2007, at 20:49, Nathan Rutman wrote:
>>
>>> Robert LeBlanc wrote:
>>>
>>>> Ok, I feel really stupid. I''ve done this before
without any
>>>> problem, but I
>>>> can''t seem to get it to work and I can''t find
my notes from the
>>>> last time I
>>>> did it. We have separate MGS and MDTs. I can''t seem to
get our
>>>> MGS to
>>>> failover correctly after reformatting it.
>>>>
>>>> mkfs.lustre --mkfsoptions="-O dir_index" --reformat
--mgs
>>>> --failnode=192.168.1.253 at o2ib /dev/mapper/ldiskc-part1
>>>>
>>>>
>>>>
>>> The MGS doesn''t actually use the --failnode option
(although it
>>> won''t
>>> hurt).  You actually have to tell the other nodes
>>> in the system (servers and clients) about the failover options  
>>> for the
>>> MGS (use the --mgsnode parameter on servers, and mount address for
>>> clients).   The reason is because the servers must contact the  
>>> MGS for
>>> the configuration information, and they can''t ask the MGS
where its
>>> failover partner is if e.g. the failover partner is the one  
>>> that''s running.
>>>
>>>
>>>> We are running this on Debian, using the Lustre 1.6.3 debs from
>>>> svn on Lenny
>>>> with 2.6.22.12. I''ve tried several permutations of the
>>>> mkfs.lustre command,
>>>> specifing both nodes as failover, and both nodes as MGS and  
>>>> pretty much
>>>> every other combination of the above. With the above command  
>>>> tunefs.lustre
>>>> shows that failnode and mgsnode are the failover node.
>>>>
>>>> Thanks,
>>>> Robert
>>>>
>>>> Robert LeBlanc
>>>> College of Life Sciences Computer Support
>>>> Brigham Young University
>>>> leblanc at byu.edu
>>>> (801)422-1882
>>>>
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at clusterfs.com
>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at clusterfs.com
>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>>>
>>>> Mr Wojciech Turek
>>>> Assistant System Manager
>>>> University of Cambridge
>>>> High Performance Computing service
>>>> email: wjt27 at cam.ac.uk
>>>> tel. +441223763517
>>>>
>>>>
>>>>
>>>>
>>>>
>
>
> Robert LeBlanc
> College of Life Sciences Computer Support
> Brigham Young University
> leblanc at byu.edu
> (801)422-1882
>
Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: wjt27 at cam.ac.uk
tel. +441223763517



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071112/f07e40ee/attachment-0002.html

Robert LeBlanc

2007-Nov-12 22:31 UTC

head link

[Lustre-discuss] Failover for MGS

Since you are only adding parameters you don?t need the ?erase-params
option. I think.

Robert


On 11/12/07 3:23 PM, "Wojciech Turek" <wjt27 at cam.ac.uk>
wrote:
> Hi,
> 
> Thanks for that. Actually I have a little more complex situation here. I
have
> two sets of clients. First set is working in?10.142.10.0/24?network and the
> second set is working in 10.143.0.0/16 network.
> Each server has two NIC''s.?
> NIC1 = ETH0 10.143.0.0/16?and?NIC2= ETH1 10.142.10.0/24?
> lnet configures network in the following manner:
> eth0 = <ip>@tcp0
> eth1 = <ip>@tcp1
> 
> I am going to change lustre configuration in order to introduce failover
> features.
> MGS is cobined with with mdt01=/dev/dm-0
> 
> on mds01
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
> 
> on oss1
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.8 at tcp0,10.142.10.8 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5
> 
> on oss2
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.7 at tcp0,10.142.10.7 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11
> 
> on oss3
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-0
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-1
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-2
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-3
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-4
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.10 at tcp0,10.142.10.10 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-5
> 
> on oss4
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-6
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-7
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-8
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-9
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-10
> tunefs.lustre --erase-params --writeconf
> --failnode=10.143.245.9 at tcp0,10.142.10.9 at tcp1
> --mgsnode=10.143.245.201 at tcp0,10.142.10.201 at tcp1
> --mgsnode=10.143.245.202 at tcp0,10.142.10.202 at tcp1 /dev/dm-11
>  
> Will above be correct?
> 
> Cheers,
> 
> Wojciech Turek
> 
> 
> On 12 Nov 2007, at 21:36, Robert LeBlanc wrote:
> 
>>  You should just unmount all the clients, all OSTs and then:
>>  
>>  tunefs.lustre ?failnode 10.0.0.2 at tcp ?writeconf /dev/shared/disk
>>  
>>  If your volume is already on the shared disk, them mount everything
and you
>> should be good to go. You can also do it on a live mounted system by
using
>> lctl, but I?m not exactly sure how to do that.
>>  
>>  Robert
>>  
>>  On 11/12/07 2:24 PM, "Wojciech Turek" <wjt27 at
cam.ac.uk> wrote:
>>  
>>  
>>> Hi,
>>>  
>>>  How will look my tunefs.lustre command line if I would like to
configure
>>> failnode for my MDS. I have two MDT''s and MGS is on the
same block device
>>> that one of MDT''s ? I have also two servers connected to
share matadata
>>> storage.
>>>  
>>>  Thanks,
>>>  
>>>  Wojciech?
>>>  On 12 Nov 2007, at 20:49, Nathan Rutman wrote:
>>>  
>>>  
>>>> Robert LeBlanc wrote:
>>>>  ?
>>>>  
>>>>> Ok, I feel really stupid. I''ve done this before
without any problem, but I
>>>>>  can''t seem to get it to work and I can''t
find my notes from the last time
>>>>> I
>>>>>  did it. We have separate MGS and MDTs. I can''t
seem to get our MGS to
>>>>>  failover correctly after reformatting it.
>>>>>  
>>>>>  mkfs.lustre --mkfsoptions="-O dir_index"
--reformat --mgs
>>>>>  --failnode=192.168.1.253 at o2ib /dev/mapper/ldiskc-part1
>>>>>  
>>>>>  
>>>>>  ?
>>>>>  
>>>> The MGS doesn''t actually use the --failnode option
(although it won''t?
>>>>  hurt).? You actually have to tell the other nodes
>>>>  in the system (servers and clients) about the failover options
for the?
>>>>  MGS (use the --mgsnode parameter on servers, and mount address
for?
>>>>  clients). ? The reason is because the servers must contact the
MGS for?
>>>>  the configuration information, and they can''t ask the
MGS where its?
>>>>  failover partner is if e.g. the failover partner is the one
that''s
>>>> running.
>>>>  
>>>>  ?
>>>>  
>>>>> We are running this on Debian, using the Lustre 1.6.3 debs
from svn on
>>>>> Lenny
>>>>>  with 2.6.22.12. I''ve tried several permutations
of the mkfs.lustre
>>>>> command,
>>>>>  specifing both nodes as failover, and both nodes as MGS
and pretty much
>>>>>  every other combination of the above. With the above
command
>>>>> tunefs.lustre
>>>>>  shows that failnode and mgsnode are the failover node.
>>>>>  
>>>>>  Thanks,
>>>>>  Robert
>>>>>  
>>>>>  Robert LeBlanc
>>>>>  College of Life Sciences Computer Support
>>>>>  Brigham Young University
>>>>>  leblanc at byu.edu
>>>>>  (801)422-1882
>>>>>  
>>>>>  
>>>>>  _______________________________________________
>>>>>  Lustre-discuss mailing list
>>>>>  Lustre-discuss at clusterfs.com
>>>>>  https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>>  
>>>>>  ?
>>>>>  
>>>>>  _______________________________________________
>>>>>  Lustre-discuss mailing list
>>>>>  Lustre-discuss at clusterfs.com
>>>>>  https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>>>  ?
>>>>>  
>>>>>  ?
>>>>>  Mr Wojciech Turek
>>>>>  Assistant System Manager
>>>>>  University of Cambridge
>>>>>  High Performance Computing service?
>>>>>  email: wjt27 at cam.ac.uk
>>>>>  tel. +441223763517
>>>>>  
>>>>>  
>>>>>  ?
>>>>>  
>>>>>  
>>>>>  
>>>>> 
>>>>>  ?
>>>>>  Robert LeBlanc
>>>>>  College of Life Sciences Computer Support
>>>>>  Brigham Young University
>>>>>  leblanc at byu.edu
>>>>>  (801)422-1882
>>>>>  
>>>>>     
>>>>> 
>>>>>  
>>>>> Mr Wojciech Turek
>>>>> Assistant System Manager
>>>>> University of Cambridge
>>>>> High Performance Computing service?
>>>>> email: wjt27 at cam.ac.uk
>>>>> tel. +441223763517
>>>>> 
>>>>> 
>>>>>  
>>>>> 
>>>>> 
 
Robert LeBlanc
College of Life Sciences Computer Support
Brigham Young University
leblanc at byu.edu
(801)422-1882


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071112/58727c5c/attachment-0002.html

Lustre discuss - Nov 2007 - Failover for MGS

[Lustre-discuss] Failover for MGS

[Lustre-discuss] Failover for MGS

[Lustre-discuss] Failover for MGS

[Lustre-discuss] Failover for MGS

[Lustre-discuss] Failover for MGS

[Lustre-discuss] Failover for MGS