thr3ads.net - Lustre discuss - [Lustre-discuss] Question on setting up fail-over [Aug 2010]

If this information is useful, please help other people find it:
Share via:

David Noriega

2010-Aug-09 17:45 UTC

[Lustre-discuss] Question on setting up fail-over

My understanding of setting up fail-over is you need some control over
the power so with a script it can turn off a machine by cutting its
power? Is this correct? Is there a way to do fail-over without having
access to the pdu(power strips)?

Thanks
David

-- 
Personally, I liked the university. They gave us money and facilities,
we didn''t have to produce anything! You''ve never been out of
college!
You don''t know what it''s like out there! I''ve worked
in the private
sector. They expect results. -Ray Ghostbusters

Brian J. Murrell

2010-Aug-09 18:06 UTC

head link

[Lustre-discuss] Question on setting up fail-over

On Mon, 2010-08-09 at 12:45 -0500, David Noriega wrote: > My understanding of setting up fail-over is you need some control over
> the power so with a script it can turn off a machine by cutting its
> power? Is this correct? Is there a way to do fail-over without having
> access to the pdu(power strips)?
Lustre failover in and of itself does not require power control.  We do
however, recommend having power control to prevent double mounts.  If we
assume that node1 and node2 both serve ost1 and at a given moment node1
is active and has it mounted.  If node2 thinks that node1 is dead and
wants to take over ost1, and it''s procedure for doing so dictates that
it MUST power off node1 before it can mount ost1, then you are
guaranteed (to the limit of the reliability of the power control) that
both node1 and node2 won''t mount ost1 at the same time, yes?

This is even true if node1 was perfectly functional (and has the ost
mounted still) but it was node2''s determination that node1 was down
that
was faulty.

Without power control, there is a risk that node2 mounts ost1 while
node1 still has it mounted -- MMP aside.  MMP is a good belt to have
with your power control suspenders.  :-)  Since a double-mount has such
serious consequences, you cannot do too much to prevent it.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100809/7f075033/attachment.bin

Kevin Van Maren

2010-Aug-09 18:08 UTC

head link

[Lustre-discuss] Question on setting up fail-over

On Aug 9, 2010, at 11:45 AM, David Noriega <tsk133 at my.utsa.edu> wrote:
> My understanding of setting up fail-over is you need some control over
> the power so with a script it can turn off a machine by cutting its
> power? Is this correct?
It is the recommended configuration because it is simple to understand  
and implement.

But the only _hard_ requirement is that both nodes can access the  
storage.

> Is there a way to do fail-over without having
> access to the pdu(power strips)?
If you have IPMI support, that can be used for power control, instead  
of a switched PDU.  Depending on the storage, you may be able to do  
resource fencing of the disks instead of STONITH.  Or you can run fast- 
and-loose, without any way to ensure the dead node is really "dead"  
and not accessing storage (at your risk).  While Lustre has MMP, it is  
really more to protect against a mount typo than to guarantee resource  
fencing.

> Thanks
> David
>
> -- 
> Personally, I liked the university. They gave us money and facilities,
> we didn''t have to produce anything! You''ve never been out
of college!
> You don''t know what it''s like out there! I''ve
worked in the private
> sector. They expect results. -Ray Ghostbusters
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

David Noriega

2010-Aug-10 15:11 UTC

head link

[Lustre-discuss] Question on setting up fail-over

Could you describe this resource fencing in more detail? As for
regards to STONITH, the pdu already has the grubby hands of IT plugged
into it and doubt they would be happy if I unplugged them.  What about
the network management port or ILOM?

On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
<Kevin.Van.Maren at oracle.com> wrote:> On Aug 9, 2010, at 11:45 AM, David Noriega <tsk133 at my.utsa.edu>
wrote:
>
>> My understanding of setting up fail-over is you need some control over
>> the power so with a script it can turn off a machine by cutting its
>> power? Is this correct?
>
> It is the recommended configuration because it is simple to understand and
> implement.
>
> But the only _hard_ requirement is that both nodes can access the storage.
>
>
>> Is there a way to do fail-over without having
>> access to the pdu(power strips)?
>
> If you have IPMI support, that can be used for power control, instead of a
> switched PDU. ?Depending on the storage, you may be able to do resource
> fencing of the disks instead of STONITH. ?Or you can run fast-and-loose,
> without any way to ensure the dead node is really "dead" and not
accessing
> storage (at your risk). ?While Lustre has MMP, it is really more to protect
> against a mount typo than to guarantee resource fencing.
>
>
>> Thanks
>> David
>>
>> --
>> Personally, I liked the university. They gave us money and facilities,
>> we didn''t have to produce anything! You''ve never been
out of college!
>> You don''t know what it''s like out there!
I''ve worked in the private
>> sector. They expect results. -Ray Ghostbusters
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>


-- 
Personally, I liked the university. They gave us money and facilities,
we didn''t have to produce anything! You''ve never been out of
college!
You don''t know what it''s like out there! I''ve worked
in the private
sector. They expect results. -Ray Ghostbusters

Kevin Van Maren

2010-Aug-10 15:57 UTC

head link

[Lustre-discuss] Question on setting up fail-over

David Noriega wrote:> Could you describe this resource fencing in more detail? As for
> regards to STONITH, the pdu already has the grubby hands of IT plugged
> into it and doubt they would be happy if I unplugged them.  What about
> the network management port or ILOM?
>   
Resource fencing is needed to ensure that a node does not take over a 
resource (ie, OST)
while the other node is still accessing it (as could happen if the node 
only partly crashes,
where it is not responding to the HA package but still writing to the disk).

STONITH is a pretty common way to ensure the other node is dead and can 
no longer
access the resource.  If you can''t use your switched PDU, then using
the
ILOM for IPMI-based
power control works.  The other common way to do resource fencing is to 
use scsi reserve
commands (if supported by the hardware and the HA package) to ensure 
exclusive access.

Kevin
> On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
> <Kevin.Van.Maren at oracle.com> wrote:
>   
>> On Aug 9, 2010, at 11:45 AM, David Noriega <tsk133 at
my.utsa.edu> wrote:
>>
>>     
>>> My understanding of setting up fail-over is you need some control
over
>>> the power so with a script it can turn off a machine by cutting its
>>> power? Is this correct?
>>>       
>> It is the recommended configuration because it is simple to understand
and
>> implement.
>>
>> But the only _hard_ requirement is that both nodes can access the
storage.
>>
>>
>>     
>>> Is there a way to do fail-over without having
>>> access to the pdu(power strips)?
>>>       
>> If you have IPMI support, that can be used for power control, instead
of a
>> switched PDU.  Depending on the storage, you may be able to do resource
>> fencing of the disks instead of STONITH.  Or you can run
fast-and-loose,
>> without any way to ensure the dead node is really "dead" and
not accessing
>> storage (at your risk).  While Lustre has MMP, it is really more to
protect
>> against a mount typo than to guarantee resource fencing.
>>
>>
>>     
>>> Thanks
>>> David
>>>
>>> --
>>> Personally, I liked the university. They gave us money and
facilities,
>>> we didn''t have to produce anything! You''ve never
been out of college!
>>> You don''t know what it''s like out there!
I''ve worked in the private
>>> sector. They expect results. -Ray Ghostbusters
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>       
>
>
>
>

David Noriega

2010-Aug-10 16:03 UTC

head link

[Lustre-discuss] Question on setting up fail-over

I think I''ll go the ipmi route. So reading on STONITH, its just a
script, so all I would need is a script to run ipmi that tells the
server to power off, right?

Also while reading through the lustre manual, seems some things are
being deleted from the wiki,
http://wiki.lustre.org/index.php?title=Clu_Manager no longer exists,
and noticed this too when I found the lustre quick guide is no longer
available.

Thanks
David

On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren
<kevin.van.maren at oracle.com> wrote:> David Noriega wrote:
>>
>> Could you describe this resource fencing in more detail? As for
>> regards to STONITH, the pdu already has the grubby hands of IT plugged
>> into it and doubt they would be happy if I unplugged them. ?What about
>> the network management port or ILOM?
>>
>
> Resource fencing is needed to ensure that a node does not take over a
> resource (ie, OST)
> while the other node is still accessing it (as could happen if the node
only
> partly crashes,
> where it is not responding to the HA package but still writing to the
disk).
>
> STONITH is a pretty common way to ensure the other node is dead and can no
> longer
> access the resource. ?If you can''t use your switched PDU, then
using the
> ILOM for IPMI-based
> power control works. ?The other common way to do resource fencing is to use
> scsi reserve
> commands (if supported by the hardware and the HA package) to ensure
> exclusive access.
>
> Kevin
>
>> On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
>> <Kevin.Van.Maren at oracle.com> wrote:
>>
>>>
>>> On Aug 9, 2010, at 11:45 AM, David Noriega <tsk133 at
my.utsa.edu> wrote:
>>>
>>>
>>>>
>>>> My understanding of setting up fail-over is you need some
control over
>>>> the power so with a script it can turn off a machine by cutting
its
>>>> power? Is this correct?
>>>>
>>>
>>> It is the recommended configuration because it is simple to
understand
>>> and
>>> implement.
>>>
>>> But the only _hard_ requirement is that both nodes can access the
>>> storage.
>>>
>>>
>>>
>>>>
>>>> Is there a way to do fail-over without having
>>>> access to the pdu(power strips)?
>>>>
>>>
>>> If you have IPMI support, that can be used for power control,
instead of
>>> a
>>> switched PDU. ?Depending on the storage, you may be able to do
resource
>>> fencing of the disks instead of STONITH. ?Or you can run
fast-and-loose,
>>> without any way to ensure the dead node is really "dead"
and not
>>> accessing
>>> storage (at your risk). ?While Lustre has MMP, it is really more to
>>> protect
>>> against a mount typo than to guarantee resource fencing.
>>>
>>>
>>>
>>>>
>>>> Thanks
>>>> David
>>>>
>>>> --
>>>> Personally, I liked the university. They gave us money and
facilities,
>>>> we didn''t have to produce anything! You''ve
never been out of college!
>>>> You don''t know what it''s like out there!
I''ve worked in the private
>>>> sector. They expect results. -Ray Ghostbusters
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>
>>
>>
>>
>
>


-- 
Personally, I liked the university. They gave us money and facilities,
we didn''t have to produce anything! You''ve never been out of
college!
You don''t know what it''s like out there! I''ve worked
in the private
sector. They expect results. -Ray Ghostbusters

Kevin Van Maren

2010-Aug-10 16:11 UTC

head link

[Lustre-discuss] Question on setting up fail-over

Depends on the HA package you are using.  Heartbeat comes with a script 
that supports IPMI.

The important thing is that stonith NOT succeed if you don''t _know_
that
the node is off.
So it is absolutely not a 1-line script.

Kevin


David Noriega wrote:> I think I''ll go the ipmi route. So reading on STONITH, its just a
> script, so all I would need is a script to run ipmi that tells the
> server to power off, right?
>
> Also while reading through the lustre manual, seems some things are
> being deleted from the wiki,
> http://wiki.lustre.org/index.php?title=Clu_Manager no longer exists,
> and noticed this too when I found the lustre quick guide is no longer
> available.
>
> Thanks
> David
>
> On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren
> <kevin.van.maren at oracle.com> wrote:
>   
>> David Noriega wrote:
>>     
>>> Could you describe this resource fencing in more detail? As for
>>> regards to STONITH, the pdu already has the grubby hands of IT
plugged
>>> into it and doubt they would be happy if I unplugged them.  What
about
>>> the network management port or ILOM?
>>>
>>>       
>> Resource fencing is needed to ensure that a node does not take over a
>> resource (ie, OST)
>> while the other node is still accessing it (as could happen if the node
only
>> partly crashes,
>> where it is not responding to the HA package but still writing to the
disk).
>>
>> STONITH is a pretty common way to ensure the other node is dead and can
no
>> longer
>> access the resource.  If you can''t use your switched PDU, then
using the
>> ILOM for IPMI-based
>> power control works.  The other common way to do resource fencing is to
use
>> scsi reserve
>> commands (if supported by the hardware and the HA package) to ensure
>> exclusive access.
>>
>> Kevin
>>
>>     
>>> On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
>>> <Kevin.Van.Maren at oracle.com> wrote:
>>>
>>>       
>>>> On Aug 9, 2010, at 11:45 AM, David Noriega <tsk133 at
my.utsa.edu> wrote:
>>>>
>>>>
>>>>         
>>>>> My understanding of setting up fail-over is you need some
control over
>>>>> the power so with a script it can turn off a machine by
cutting its
>>>>> power? Is this correct?
>>>>>
>>>>>           
>>>> It is the recommended configuration because it is simple to
understand
>>>> and
>>>> implement.
>>>>
>>>> But the only _hard_ requirement is that both nodes can access
the
>>>> storage.
>>>>
>>>>
>>>>
>>>>         
>>>>> Is there a way to do fail-over without having
>>>>> access to the pdu(power strips)?
>>>>>
>>>>>           
>>>> If you have IPMI support, that can be used for power control,
instead of
>>>> a
>>>> switched PDU.  Depending on the storage, you may be able to do
resource
>>>> fencing of the disks instead of STONITH.  Or you can run
fast-and-loose,
>>>> without any way to ensure the dead node is really
"dead" and not
>>>> accessing
>>>> storage (at your risk).  While Lustre has MMP, it is really
more to
>>>> protect
>>>> against a mount typo than to guarantee resource fencing.
>>>>
>>>>
>>>>
>>>>         
>>>>> Thanks
>>>>> David
>>>>>
>>>>> --
>>>>> Personally, I liked the university. They gave us money and
facilities,
>>>>> we didn''t have to produce anything!
You''ve never been out of college!
>>>>> You don''t know what it''s like out there!
I''ve worked in the private
>>>>> sector. They expect results. -Ray Ghostbusters
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list
>>>>> Lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>
>>>>>           
>>>
>>>
>>>       
>>     
>
>
>
>

David Noriega

2010-Aug-10 16:20 UTC

head link

[Lustre-discuss] Question on setting up fail-over

Another question. Is it possible to use centos/redhat''s clustering
software? In the manual it mentions using that for metadata
failover(since having more then one metadata server online isnt
possible right now), so why not use that for all of lustre? But since
the information is missing, can someone fill in the blanks on setting
up metadata failover?

David

On Tue, Aug 10, 2010 at 11:11 AM, Kevin Van Maren
<kevin.van.maren at oracle.com> wrote:> Depends on the HA package you are using. ?Heartbeat comes with a script
that
> supports IPMI.
>
> The important thing is that stonith NOT succeed if you don''t
_know_ that the
> node is off.
> So it is absolutely not a 1-line script.
>
> Kevin
>
>
> David Noriega wrote:
>>
>> I think I''ll go the ipmi route. So reading on STONITH, its
just a
>> script, so all I would need is a script to run ipmi that tells the
>> server to power off, right?
>>
>> Also while reading through the lustre manual, seems some things are
>> being deleted from the wiki,
>> http://wiki.lustre.org/index.php?title=Clu_Manager no longer exists,
>> and noticed this too when I found the lustre quick guide is no longer
>> available.
>>
>> Thanks
>> David
>>
>> On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren
>> <kevin.van.maren at oracle.com> wrote:
>>
>>>
>>> David Noriega wrote:
>>>
>>>>
>>>> Could you describe this resource fencing in more detail? As for
>>>> regards to STONITH, the pdu already has the grubby hands of IT
plugged
>>>> into it and doubt they would be happy if I unplugged them.
?What about
>>>> the network management port or ILOM?
>>>>
>>>>
>>>
>>> Resource fencing is needed to ensure that a node does not take over
a
>>> resource (ie, OST)
>>> while the other node is still accessing it (as could happen if the
node
>>> only
>>> partly crashes,
>>> where it is not responding to the HA package but still writing to
the
>>> disk).
>>>
>>> STONITH is a pretty common way to ensure the other node is dead and
can
>>> no
>>> longer
>>> access the resource. ?If you can''t use your switched PDU,
then using the
>>> ILOM for IPMI-based
>>> power control works. ?The other common way to do resource fencing
is to
>>> use
>>> scsi reserve
>>> commands (if supported by the hardware and the HA package) to
ensure
>>> exclusive access.
>>>
>>> Kevin
>>>
>>>
>>>>
>>>> On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
>>>> <Kevin.Van.Maren at oracle.com> wrote:
>>>>
>>>>
>>>>>
>>>>> On Aug 9, 2010, at 11:45 AM, David Noriega <tsk133 at
my.utsa.edu> wrote:
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> My understanding of setting up fail-over is you need
some control over
>>>>>> the power so with a script it can turn off a machine by
cutting its
>>>>>> power? Is this correct?
>>>>>>
>>>>>>
>>>>>
>>>>> It is the recommended configuration because it is simple to
understand
>>>>> and
>>>>> implement.
>>>>>
>>>>> But the only _hard_ requirement is that both nodes can
access the
>>>>> storage.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Is there a way to do fail-over without having
>>>>>> access to the pdu(power strips)?
>>>>>>
>>>>>>
>>>>>
>>>>> If you have IPMI support, that can be used for power
control, instead
>>>>> of
>>>>> a
>>>>> switched PDU. ?Depending on the storage, you may be able to
do resource
>>>>> fencing of the disks instead of STONITH. ?Or you can run
>>>>> fast-and-loose,
>>>>> without any way to ensure the dead node is really
"dead" and not
>>>>> accessing
>>>>> storage (at your risk). ?While Lustre has MMP, it is really
more to
>>>>> protect
>>>>> against a mount typo than to guarantee resource fencing.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> David
>>>>>>
>>>>>> --
>>>>>> Personally, I liked the university. They gave us money
and facilities,
>>>>>> we didn''t have to produce anything!
You''ve never been out of college!
>>>>>> You don''t know what it''s like out
there! I''ve worked in the private
>>>>>> sector. They expect results. -Ray Ghostbusters
>>>>>> _______________________________________________
>>>>>> Lustre-discuss mailing list
>>>>>> Lustre-discuss at lists.lustre.org
>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>
>


-- 
Personally, I liked the university. They gave us money and facilities,
we didn''t have to produce anything! You''ve never been out of
college!
You don''t know what it''s like out there! I''ve worked
in the private
sector. They expect results. -Ray Ghostbusters

laotsao 老曹

2010-Aug-10 16:23 UTC

head link

[Lustre-discuss] Question on setting up fail-over

On 8/10/2010 12:03 PM, David Noriega wrote:> I think I''ll go the ipmi route. So reading on STONITH, its just a
> script, so all I would need is a script to run ipmi that tells the
> server to power off, right?
>
> Also while reading through the lustre manual, seems some things are
> being deleted from the wiki,
> http://wiki.lustre.org/index.php?title=Clu_Manager no longer exists,
> and noticed this too when I found the lustre quick guide is no longer
> available.lustre qucik start guide
http://www.filibeto.org/sun/lib/blueprints/820-7390.pdf
> Thanks
> David
>
> On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren
> <kevin.van.maren at oracle.com>  wrote:
>> David Noriega wrote:
>>> Could you describe this resource fencing in more detail? As for
>>> regards to STONITH, the pdu already has the grubby hands of IT
plugged
>>> into it and doubt they would be happy if I unplugged them.  What
about
>>> the network management port or ILOM?
>>>
>> Resource fencing is needed to ensure that a node does not take over a
>> resource (ie, OST)
>> while the other node is still accessing it (as could happen if the node
only
>> partly crashes,
>> where it is not responding to the HA package but still writing to the
disk).
>>
>> STONITH is a pretty common way to ensure the other node is dead and can
no
>> longer
>> access the resource.  If you can''t use your switched PDU, then
using the
>> ILOM for IPMI-based
>> power control works.  The other common way to do resource fencing is to
use
>> scsi reserve
>> commands (if supported by the hardware and the HA package) to ensure
>> exclusive access.
>>
>> Kevin
>>
>>> On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
>>> <Kevin.Van.Maren at oracle.com>  wrote:
>>>
>>>> On Aug 9, 2010, at 11:45 AM, David Noriega<tsk133 at
my.utsa.edu>  wrote:
>>>>
>>>>
>>>>> My understanding of setting up fail-over is you need some
control over
>>>>> the power so with a script it can turn off a machine by
cutting its
>>>>> power? Is this correct?
>>>>>
>>>> It is the recommended configuration because it is simple to
understand
>>>> and
>>>> implement.
>>>>
>>>> But the only _hard_ requirement is that both nodes can access
the
>>>> storage.
>>>>
>>>>
>>>>
>>>>> Is there a way to do fail-over without having
>>>>> access to the pdu(power strips)?
>>>>>
>>>> If you have IPMI support, that can be used for power control,
instead of
>>>> a
>>>> switched PDU.  Depending on the storage, you may be able to do
resource
>>>> fencing of the disks instead of STONITH.  Or you can run
fast-and-loose,
>>>> without any way to ensure the dead node is really
"dead" and not
>>>> accessing
>>>> storage (at your risk).  While Lustre has MMP, it is really
more to
>>>> protect
>>>> against a mount typo than to guarantee resource fencing.
>>>>
>>>>
>>>>
>>>>> Thanks
>>>>> David
>>>>>
>>>>> --
>>>>> Personally, I liked the university. They gave us money and
facilities,
>>>>> we didn''t have to produce anything!
You''ve never been out of college!
>>>>> You don''t know what it''s like out there!
I''ve worked in the private
>>>>> sector. They expect results. -Ray Ghostbusters
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list
>>>>> Lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>
>>>
>>>
>>>
>>
>
>-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard
Size: 139 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100810/80b1c7bd/attachment.vcf

laotsao 老曹

2010-Aug-10 16:44 UTC

head link

[Lustre-discuss] Question on setting up fail-over

On 8/10/2010 12:20 PM, David Noriega wrote:> Another question. Is it possible to use centos/redhat''s clustering
> software?main issues, IMHO, are that lustre today use the physical hostname/ip 
for all MDS, OSS, MGS etc
cluster SW use the VIP, so there are some work need to be done to make 
VIP work for lustre
my 2c
> In the manual it mentions using that for metadata
> failover(since having more then one metadata server online isnt
> possible right now), so why not use that for all of lustre? But since
> the information is missing, can someone fill in the blanks on setting
> up metadata failover?
>
> David
>
> On Tue, Aug 10, 2010 at 11:11 AM, Kevin Van Maren
> <kevin.van.maren at oracle.com>  wrote:
>> Depends on the HA package you are using.  Heartbeat comes with a script
that
>> supports IPMI.
>>
>> The important thing is that stonith NOT succeed if you don''t
_know_ that the
>> node is off.
>> So it is absolutely not a 1-line script.
>>
>> Kevin
>>
>>
>> David Noriega wrote:
>>> I think I''ll go the ipmi route. So reading on STONITH, its
just a
>>> script, so all I would need is a script to run ipmi that tells the
>>> server to power off, right?
>>>
>>> Also while reading through the lustre manual, seems some things are
>>> being deleted from the wiki,
>>> http://wiki.lustre.org/index.php?title=Clu_Manager no longer
exists,
>>> and noticed this too when I found the lustre quick guide is no
longer
>>> available.
>>>
>>> Thanks
>>> David
>>>
>>> On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren
>>> <kevin.van.maren at oracle.com>  wrote:
>>>
>>>> David Noriega wrote:
>>>>
>>>>> Could you describe this resource fencing in more detail? As
for
>>>>> regards to STONITH, the pdu already has the grubby hands of
IT plugged
>>>>> into it and doubt they would be happy if I unplugged them. 
What about
>>>>> the network management port or ILOM?
>>>>>
>>>>>
>>>> Resource fencing is needed to ensure that a node does not take
over a
>>>> resource (ie, OST)
>>>> while the other node is still accessing it (as could happen if
the node
>>>> only
>>>> partly crashes,
>>>> where it is not responding to the HA package but still writing
to the
>>>> disk).
>>>>
>>>> STONITH is a pretty common way to ensure the other node is dead
and can
>>>> no
>>>> longer
>>>> access the resource.  If you can''t use your switched
PDU, then using the
>>>> ILOM for IPMI-based
>>>> power control works.  The other common way to do resource
fencing is to
>>>> use
>>>> scsi reserve
>>>> commands (if supported by the hardware and the HA package) to
ensure
>>>> exclusive access.
>>>>
>>>> Kevin
>>>>
>>>>
>>>>> On Mon, Aug 9, 2010 at 1:08 PM, Kevin Van Maren
>>>>> <Kevin.Van.Maren at oracle.com>  wrote:
>>>>>
>>>>>
>>>>>> On Aug 9, 2010, at 11:45 AM, David Noriega<tsk133 at
my.utsa.edu>  wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My understanding of setting up fail-over is you
need some control over
>>>>>>> the power so with a script it can turn off a
machine by cutting its
>>>>>>> power? Is this correct?
>>>>>>>
>>>>>>>
>>>>>> It is the recommended configuration because it is
simple to understand
>>>>>> and
>>>>>> implement.
>>>>>>
>>>>>> But the only _hard_ requirement is that both nodes can
access the
>>>>>> storage.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Is there a way to do fail-over without having
>>>>>>> access to the pdu(power strips)?
>>>>>>>
>>>>>>>
>>>>>> If you have IPMI support, that can be used for power
control, instead
>>>>>> of
>>>>>> a
>>>>>> switched PDU.  Depending on the storage, you may be
able to do resource
>>>>>> fencing of the disks instead of STONITH.  Or you can
run
>>>>>> fast-and-loose,
>>>>>> without any way to ensure the dead node is really
"dead" and not
>>>>>> accessing
>>>>>> storage (at your risk).  While Lustre has MMP, it is
really more to
>>>>>> protect
>>>>>> against a mount typo than to guarantee resource
fencing.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>> David
>>>>>>>
>>>>>>> --
>>>>>>> Personally, I liked the university. They gave us
money and facilities,
>>>>>>> we didn''t have to produce anything!
You''ve never been out of college!
>>>>>>> You don''t know what it''s like out
there! I''ve worked in the private
>>>>>>> sector. They expect results. -Ray Ghostbusters
>>>>>>> _______________________________________________
>>>>>>> Lustre-discuss mailing list
>>>>>>> Lustre-discuss at lists.lustre.org
>>>>>>>
http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard
Size: 139 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100810/141501d1/attachment.vcf

Bernd Schubert

2010-Aug-10 18:39 UTC

head link

[Lustre-discuss] Question on setting up fail-over

On Tuesday, August 10, 2010, Kevin Van Maren wrote:> Depends on the HA package you are using.  Heartbeat comes with a script
> that supports IPMI.
> 
For our installations we even use a modified external/ipmi_ddn stonith script 
that does uses power-off/status/on to make sure the system is really reset. 
The heartbeat/pacemaker script uses the ipmi reset method by default, but ipmi 
commands are not required by specs to succeed. So ipmitool (used by 
external/ipmi) might successfully return, but does in way ensure the node was 
really reset. I have seen that rather often in real life already.
The default script also supports the power-off/on method, but also does not 
check for the status. 

So our modified script first powers off, then checks if the node is really 
offline, then powers on again and only then successfully returns. 
Unfortunately, that is at the cost of an increased fail-over time, as power-
off and then power-on needs some minimal downtime in between (ca. 30s) and 
heartbeats/pacemaker stonith does not support async events (power-off would be 
sufficient, but once stonith successfully returns, it is not called again till 
the next fencing).

-- 
Bernd Schubert
DataDirect Networks

David Noriega

2010-Aug-10 19:15 UTC

head link

[Lustre-discuss] Question on setting up fail-over

So your script resets the server so there is no fail-over(ie the other
server takes over resources from that server?) or there is failover
but you then manually return resources back to the server that was
reset?

On Tue, Aug 10, 2010 at 1:39 PM, Bernd Schubert
<bs_lists at aakef.fastmail.fm> wrote:> On Tuesday, August 10, 2010, Kevin Van Maren wrote:
>> Depends on the HA package you are using. ?Heartbeat comes with a script
>> that supports IPMI.
>>
>
> For our installations we even use a modified external/ipmi_ddn stonith
script
> that does uses power-off/status/on to make sure the system is really reset.
> The heartbeat/pacemaker script uses the ipmi reset method by default, but
ipmi
> commands are not required by specs to succeed. So ipmitool (used by
> external/ipmi) might successfully return, but does in way ensure the node
was
> really reset. I have seen that rather often in real life already.
> The default script also supports the power-off/on method, but also does not
> check for the status.
>
> So our modified script first powers off, then checks if the node is really
> offline, then powers on again and only then successfully returns.
> Unfortunately, that is at the cost of an increased fail-over time, as
power-
> off and then power-on needs some minimal downtime in between (ca. 30s) and
> heartbeats/pacemaker stonith does not support async events (power-off would
be
> sufficient, but once stonith successfully returns, it is not called again
till
> the next fencing).
>
> --
> Bernd Schubert
> DataDirect Networks
>


-- 
Personally, I liked the university. They gave us money and facilities,
we didn''t have to produce anything! You''ve never been out of
college!
You don''t know what it''s like out there! I''ve worked
in the private
sector. They expect results. -Ray Ghostbusters

Bernd Schubert

2010-Aug-10 19:47 UTC

head link

[Lustre-discuss] Question on setting up fail-over

On Tuesday, August 10, 2010, David Noriega wrote:> So your script resets the server so there is no fail-over(ie the other
> server takes over resources from that server?) or there is failover
> but you then manually return resources back to the server that was
> reset?
Our ddn ipmi stonith script (external/ipmi_ddn in heartbeat/pacemaker stonith 
terms) only makes absolutely sure the node was really reset. If something 
fails, an error code is reported to pacemaker and then pacemaker (*) will not 
initiate resource fail-over in order to prevent split-brain. 
As Lustre devices use MMP (multiple-mount protection) that is not strictly 
required, in principal. But if something goes wrong. e.g. MMP was accidentally 
not enabled, a double mount could come up and that would cause serious 
filesystem and data corruption... 


Cheers,
Bernd

PS: (*) hearbeat-v1 (and v2/v3 if not in xml/crm mode) also *should* accept 
stonith error codes, but in general, I have seen it more than once that 
hearbeat-v1 run into split-brain and started resources on both cluster nodes. 
That is something where pacemaker does a much better job.

-- 
Bernd Schubert
DataDirect Networks

Wojciech Turek

2010-Aug-10 22:09 UTC

head link

[Lustre-discuss] Question on setting up fail-over

I would recommend the heartbeat with pacemaker setup for the fail-over
control. The configuration may seem complex at the beginning but after
enough reading (and there is many good sources) it is quite easy to setup. I
have recently set up a Lustre system with 3 OSSs and two MDSs (DRBD with LVM
between them) working as a single HA cluster and it was easy enough.
Pacemaker allows single point of administration of lustre system (starting
and stopping the filesystem) and there is a neat GUI for those who want to
show something to their managers :)

Best regards,

Wojciech

On 10 August 2010 20:47, Bernd Schubert <bs_lists at aakef.fastmail.fm>
wrote:
>
> On Tuesday, August 10, 2010, David Noriega wrote:
> > So your script resets the server so there is no fail-over(ie the other
> > server takes over resources from that server?) or there is failover
> > but you then manually return resources back to the server that was
> > reset?
>
> Our ddn ipmi stonith script (external/ipmi_ddn in heartbeat/pacemaker
> stonith
> terms) only makes absolutely sure the node was really reset. If something
> fails, an error code is reported to pacemaker and then pacemaker (*) will
> not
> initiate resource fail-over in order to prevent split-brain.
> As Lustre devices use MMP (multiple-mount protection) that is not strictly
> required, in principal. But if something goes wrong. e.g. MMP was
> accidentally
> not enabled, a double mount could come up and that would cause serious
> filesystem and data corruption...
>
>
> Cheers,
> Bernd
>
> PS: (*) hearbeat-v1 (and v2/v3 if not in xml/crm mode) also *should* accept
> stonith error codes, but in general, I have seen it more than once that
> hearbeat-v1 run into split-brain and started resources on both cluster
> nodes.
> That is something where pacemaker does a much better job.
>
> --
> Bernd Schubert
> DataDirect Networks
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>


-- 
Wojciech Turek

Senior System Architect

High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100810/1f5bf39a/attachment.html

David Noriega

2010-Aug-16 19:06 UTC

head link

[Lustre-discuss] Question on setting up fail-over

Ok I''ve gotten heartbeat setup with the two OSSs, but I do have a
question that isn''t stated in the documentation. Shouldn''t the
lustre
mounts be removed from fstab once they are given to heartbeat since
when it comes online, it will mount the resources, correct?

David

-- 
Personally, I liked the university. They gave us money and facilities,
we didn''t have to produce anything! You''ve never been out of
college!
You don''t know what it''s like out there! I''ve worked
in the private
sector. They expect results. -Ray Ghostbusters

Kevin Van Maren

2010-Aug-16 19:14 UTC

head link

[Lustre-discuss] Question on setting up fail-over

David Noriega wrote:> Ok I''ve gotten heartbeat setup with the two OSSs, but I do have a
> question that isn''t stated in the documentation.
Shouldn''t the lustre
> mounts be removed from fstab once they are given to heartbeat since
> when it comes online, it will mount the resources, correct?
>
> David
>   

Yes: on the servers, they must be not there or "noauto".  Once you
start
running heartbeat,
you have given control of the resource away, and must not mount/umount 
it yourself
(unless you stop heartbeat on both nodes in the HA pair to get control 
back).

Kevin

David Noriega

2010-Aug-17 16:26 UTC

head link

[Lustre-discuss] Question on setting up fail-over

Some info:
MDS/MGS 192.168.5.104
Passive failover MDS/MGS 192.168.5.105
OSS1 192.168.5.100
OSS2 192.168.5.101

I''ve got some more questions about setting up failover. Besides having
heartbeat setup, what about using tunefs.lustre to set options?

On the MDS/MGS I set the following options
tunefs.lustre --failnode=192.168.5.105 /dev/lustre-mdt-dg/lv1
Heartbeat works just fine, can mount on the primary node and then
failover to the other and back.

Now on the OSSs things get a bit more confusing. Reading these two blog posts:
http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failover.html
http://jermen.posterous.com/lustre-mds-failover
>From these I tried these options:tunefs.lustre --erase-params --mgsnode=192.168.5.104 at tcp0
--mgsnode=192.168.5.105 at tcp0 --failover=192.168.5.101 at tcp0
-write-params /dev/lustre-ost1-dg1/lv1

I ran that for all for OSTs, changing the failover option on the last
two OSTs to point OSS1 while the first two point to OST2.

My understanding is that you mount the OSTs first, then the MDS, but
the OSTs are failing to mount. Are all these options needed? Or is
simply specifying the primary MDS is enough for it to find out about
the second MDS?

David

On Mon, Aug 16, 2010 at 2:14 PM, Kevin Van Maren
<kevin.van.maren at oracle.com> wrote:> David Noriega wrote:
>>
>> Ok I''ve gotten heartbeat setup with the two OSSs, but I do
have a
>> question that isn''t stated in the documentation.
Shouldn''t the lustre
>> mounts be removed from fstab once they are given to heartbeat since
>> when it comes online, it will mount the resources, correct?
>>
>> David
>>
>
>
> Yes: on the servers, they must be not there or "noauto". ?Once
you start
> running heartbeat,
> you have given control of the resource away, and must not mount/umount it
> yourself
> (unless you stop heartbeat on both nodes in the HA pair to get control
> back).
>
> Kevin
>
>

-- 
Personally, I liked the university. They gave us money and facilities,
we didn''t have to produce anything! You''ve never been out of
college!
You don''t know what it''s like out there! I''ve worked
in the private
sector. They expect results. -Ray Ghostbusters

David Noriega

2010-Aug-17 17:19 UTC

head link

[Lustre-discuss] Question on setting up fail-over

Oppps some how I changed the target name of all OSTs to lustre-OST0000
and trying to mount any other ost fails. I''ve gone and found the
''More
Complicated Configuration'' section which details the usage of
--mgsnode=nid1,nid2 and so using this I think I''ll just reformat.

On Tue, Aug 17, 2010 at 11:26 AM, David Noriega <tsk133 at my.utsa.edu>
wrote:> Some info:
> MDS/MGS 192.168.5.104
> Passive failover MDS/MGS 192.168.5.105
> OSS1 192.168.5.100
> OSS2 192.168.5.101
>
> I''ve got some more questions about setting up failover. Besides
having
> heartbeat setup, what about using tunefs.lustre to set options?
>
> On the MDS/MGS I set the following options
> tunefs.lustre --failnode=192.168.5.105 /dev/lustre-mdt-dg/lv1
> Heartbeat works just fine, can mount on the primary node and then
> failover to the other and back.
>
> Now on the OSSs things get a bit more confusing. Reading these two blog
posts:
>
http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failover.html
> http://jermen.posterous.com/lustre-mds-failover
>
> From these I tried these options:
> tunefs.lustre --erase-params --mgsnode=192.168.5.104 at tcp0
> --mgsnode=192.168.5.105 at tcp0 --failover=192.168.5.101 at tcp0
> -write-params /dev/lustre-ost1-dg1/lv1
>
> I ran that for all for OSTs, changing the failover option on the last
> two OSTs to point OSS1 while the first two point to OST2.
>
> My understanding is that you mount the OSTs first, then the MDS, but
> the OSTs are failing to mount. Are all these options needed? Or is
> simply specifying the primary MDS is enough for it to find out about
> the second MDS?
>
> David
>
> On Mon, Aug 16, 2010 at 2:14 PM, Kevin Van Maren
> <kevin.van.maren at oracle.com> wrote:
>> David Noriega wrote:
>>>
>>> Ok I''ve gotten heartbeat setup with the two OSSs, but I do
have a
>>> question that isn''t stated in the documentation.
Shouldn''t the lustre
>>> mounts be removed from fstab once they are given to heartbeat since
>>> when it comes online, it will mount the resources, correct?
>>>
>>> David
>>>
>>
>>
>> Yes: on the servers, they must be not there or "noauto".
?Once you start
>> running heartbeat,
>> you have given control of the resource away, and must not mount/umount
it
>> yourself
>> (unless you stop heartbeat on both nodes in the HA pair to get control
>> back).
>>
>> Kevin
>>
>>
>
>
>
> --
> Personally, I liked the university. They gave us money and facilities,
> we didn''t have to produce anything! You''ve never been out
of college!
> You don''t know what it''s like out there! I''ve
worked in the private
> sector. They expect results. -Ray Ghostbusters
>


-- 
Personally, I liked the university. They gave us money and facilities,
we didn''t have to produce anything! You''ve never been out of
college!
You don''t know what it''s like out there! I''ve worked
in the private
sector. They expect results. -Ray Ghostbusters

Wojciech Turek

2010-Aug-17 17:27 UTC

head link

[Lustre-discuss] Question on setting up fail-over

Hi David,

You need to umount your OSTs and MDTs and run tunefs.lustre  --writeconf
/dev/<lustre device> on all Lustre OSTs and MDTs This will force the
lustre
targets to fetch new configuration next time they are mounted. The order of
mounting is: MGT -> MDT -> OSTs

Best regards,

Wojciech



On 17 August 2010 18:19, David Noriega <tsk133 at my.utsa.edu> wrote:
> Oppps some how I changed the target name of all OSTs to lustre-OST0000
> and trying to mount any other ost fails. I''ve gone and found the
''More
> Complicated Configuration'' section which details the usage of
> --mgsnode=nid1,nid2 and so using this I think I''ll just reformat.
>
> On Tue, Aug 17, 2010 at 11:26 AM, David Noriega <tsk133 at
my.utsa.edu>
> wrote:
> > Some info:
> > MDS/MGS 192.168.5.104
> > Passive failover MDS/MGS 192.168.5.105
> > OSS1 192.168.5.100
> > OSS2 192.168.5.101
> >
> > I''ve got some more questions about setting up failover.
Besides having
> > heartbeat setup, what about using tunefs.lustre to set options?
> >
> > On the MDS/MGS I set the following options
> > tunefs.lustre --failnode=192.168.5.105 /dev/lustre-mdt-dg/lv1
> > Heartbeat works just fine, can mount on the primary node and then
> > failover to the other and back.
> >
> > Now on the OSSs things get a bit more confusing. Reading these two
blog
> posts:
> >
>
http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failover.html
> > http://jermen.posterous.com/lustre-mds-failover
> >
> > From these I tried these options:
> > tunefs.lustre --erase-params --mgsnode=192.168.5.104 at tcp0
> > --mgsnode=192.168.5.105 at tcp0 --failover=192.168.5.101 at tcp0
> > -write-params /dev/lustre-ost1-dg1/lv1
> >
> > I ran that for all for OSTs, changing the failover option on the last
> > two OSTs to point OSS1 while the first two point to OST2.
> >
> > My understanding is that you mount the OSTs first, then the MDS, but
> > the OSTs are failing to mount. Are all these options needed? Or is
> > simply specifying the primary MDS is enough for it to find out about
> > the second MDS?
> >
> > David
> >
> > On Mon, Aug 16, 2010 at 2:14 PM, Kevin Van Maren
> > <kevin.van.maren at oracle.com> wrote:
> >> David Noriega wrote:
> >>>
> >>> Ok I''ve gotten heartbeat setup with the two OSSs, but
I do have a
> >>> question that isn''t stated in the documentation.
Shouldn''t the lustre
> >>> mounts be removed from fstab once they are given to heartbeat
since
> >>> when it comes online, it will mount the resources, correct?
> >>>
> >>> David
> >>>
> >>
> >>
> >> Yes: on the servers, they must be not there or "noauto".
Once you start
> >> running heartbeat,
> >> you have given control of the resource away, and must not
mount/umount
> it
> >> yourself
> >> (unless you stop heartbeat on both nodes in the HA pair to get
control
> >> back).
> >>
> >> Kevin
> >>
> >>
> >
> >
> >
> > --
> > Personally, I liked the university. They gave us money and facilities,
> > we didn''t have to produce anything! You''ve never
been out of college!
> > You don''t know what it''s like out there!
I''ve worked in the private
> > sector. They expect results. -Ray Ghostbusters
> >
>
>
>
> --
> Personally, I liked the university. They gave us money and facilities,
> we didn''t have to produce anything! You''ve never been out
of college!
> You don''t know what it''s like out there! I''ve
worked in the private
> sector. They expect results. -Ray Ghostbusters
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>


-- 
Wojciech Turek

Senior System Architect

High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100817/689fcae6/attachment.html

David Noriega

2010-Aug-17 19:24 UTC

head link

[Lustre-discuss] Question on setting up fail-over

That is good to know, but already started formatting. No issues as it
hasn''t been put into production, just playing with it and working
kinks like this out. Though formatting the OSTs was rather quick while
the MDT is taking some time. Is this normal?

192.168.5.105 is the other(standby) mds node.
[root at meta1 ~]# mkfs.lustre --reformat --fsname=lustre --mgs --mdt
--failnode=192.168.5.105 at tcp0 /dev/lustre-mdt-dg/lv1

   Permanent disk data:
Target:     lustre-MDTffff
Index:      unassigned
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x75
              (MDT MGS needs_index first_time update )
Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
Parameters: failover.node=192.168.5.105 at tcp
mdt.group_upcall=/usr/sbin/l_getgroups

device size = 2323456MB
2 6 18
formatting backing filesystem ldiskfs on /dev/lustre-mdt-dg/lv1
	target name  lustre-MDTffff
	4k blocks     594804736
	options        -J size=400 -i 4096 -I 512 -q -O
dir_index,extents,uninit_groups,mmp -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre-MDTffff  -J size=400 -i 4096 -I
512 -q -O dir_index,extents,uninit_groups,mmp -F
/dev/lustre-mdt-dg/lv1 594804736

David

On Tue, Aug 17, 2010 at 12:27 PM, Wojciech Turek <wjt27 at cam.ac.uk>
wrote:> Hi David,
>
> You need to umount your OSTs and MDTs and run tunefs.lustre? --writeconf
> /dev/<lustre device> on all Lustre OSTs and MDTs This will force the
lustre
> targets to fetch new configuration next time they are mounted. The order of
> mounting is: MGT -> MDT -> OSTs
>
> Best regards,
>
> Wojciech
>
>
>
> On 17 August 2010 18:19, David Noriega <tsk133 at my.utsa.edu> wrote:
>>
>> Oppps some how I changed the target name of all OSTs to lustre-OST0000
>> and trying to mount any other ost fails. I''ve gone and found
the ''More
>> Complicated Configuration'' section which details the usage of
>> --mgsnode=nid1,nid2 and so using this I think I''ll just
reformat.
>>
>> On Tue, Aug 17, 2010 at 11:26 AM, David Noriega <tsk133 at
my.utsa.edu>
>> wrote:
>> > Some info:
>> > MDS/MGS 192.168.5.104
>> > Passive failover MDS/MGS 192.168.5.105
>> > OSS1 192.168.5.100
>> > OSS2 192.168.5.101
>> >
>> > I''ve got some more questions about setting up failover.
Besides having
>> > heartbeat setup, what about using tunefs.lustre to set options?
>> >
>> > On the MDS/MGS I set the following options
>> > tunefs.lustre --failnode=192.168.5.105 /dev/lustre-mdt-dg/lv1
>> > Heartbeat works just fine, can mount on the primary node and then
>> > failover to the other and back.
>> >
>> > Now on the OSSs things get a bit more confusing. Reading these two
blog
>> > posts:
>> >
>> >
http://mergingbusinessandit.blogspot.com/2008/12/implementing-lustre-failover.html
>> > http://jermen.posterous.com/lustre-mds-failover
>> >
>> > From these I tried these options:
>> > tunefs.lustre --erase-params --mgsnode=192.168.5.104 at tcp0
>> > --mgsnode=192.168.5.105 at tcp0 --failover=192.168.5.101 at tcp0
>> > -write-params /dev/lustre-ost1-dg1/lv1
>> >
>> > I ran that for all for OSTs, changing the failover option on the
last
>> > two OSTs to point OSS1 while the first two point to OST2.
>> >
>> > My understanding is that you mount the OSTs first, then the MDS,
but
>> > the OSTs are failing to mount. Are all these options needed? Or is
>> > simply specifying the primary MDS is enough for it to find out
about
>> > the second MDS?
>> >
>> > David
>> >
>> > On Mon, Aug 16, 2010 at 2:14 PM, Kevin Van Maren
>> > <kevin.van.maren at oracle.com> wrote:
>> >> David Noriega wrote:
>> >>>
>> >>> Ok I''ve gotten heartbeat setup with the two OSSs,
but I do have a
>> >>> question that isn''t stated in the documentation.
Shouldn''t the lustre
>> >>> mounts be removed from fstab once they are given to
heartbeat since
>> >>> when it comes online, it will mount the resources,
correct?
>> >>>
>> >>> David
>> >>>
>> >>
>> >>
>> >> Yes: on the servers, they must be not there or
"noauto". ?Once you
>> >> start
>> >> running heartbeat,
>> >> you have given control of the resource away, and must not
mount/umount
>> >> it
>> >> yourself
>> >> (unless you stop heartbeat on both nodes in the HA pair to get
control
>> >> back).
>> >>
>> >> Kevin
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Personally, I liked the university. They gave us money and
facilities,
>> > we didn''t have to produce anything! You''ve never
been out of college!
>> > You don''t know what it''s like out there!
I''ve worked in the private
>> > sector. They expect results. -Ray Ghostbusters
>> >
>>
>>
>>
>> --
>> Personally, I liked the university. They gave us money and facilities,
>> we didn''t have to produce anything! You''ve never been
out of college!
>> You don''t know what it''s like out there!
I''ve worked in the private
>> sector. They expect results. -Ray Ghostbusters
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
> --
> Wojciech Turek
>
> Senior System Architect
>
> High Performance Computing Service
> University of Cambridge
> Email: wjt27 at cam.ac.uk
> Tel: (+)44 1223 763517
>


-- 
Personally, I liked the university. They gave us money and facilities,
we didn''t have to produce anything! You''ve never been out of
college!
You don''t know what it''s like out there! I''ve worked
in the private
sector. They expect results. -Ray Ghostbusters

Lustre discuss - Aug 2010 - Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over

[Lustre-discuss] Question on setting up fail-over