thr3ads.net - Lustre discuss - [Lustre-discuss] setting up failover in lustre... any recommendations? [May 2006]

If this information is useful, please help other people find it:
Share via:

Nielsen, Steve

2006-May-19 07:36 UTC

[Lustre-discuss] setting up failover in lustre... any recommendations?

Hi,

I have a couple questions about setting up failver with Lustre. Any help
is appreciated.

Definitions:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ssd - shared storage device
oss - object storage server
mds - meta data server

Here is my OSS setup:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
   oss1 #----(failover)----# ssd2
    #                          #
    |                          |
    |                          |
    |(primary)                 | (primary)
    |                          |
    |                          |
    #                          #
   ssd1 #----(failover)----# oss2

So:
    oss1 is primary for ssd1
    oss1 is failover for ssd2
    oss2 is primary for ssd2
    oss2 is failover for ssd1

Here is my MDS setup:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
     +-----# ssd3 #------+
     |                   |
     | (primary)         | (failover)
     |                   |
     |                   |
     #                   #
    mds1               mds2

So:
    mds1 is primary for ssd3
    mds2 is failover for ssd3

I am now at the stage where I want to implement failover for the OSS''s
and MDS seutp.

I would prefer to use heartbeat from linux-ha.org for the following
reasons:
    * it''s actively maintained
    * we use it in house extensively
    * i am very familiar with it
    * its straight forward to use (start/stop resources on failover)

Has anyone else used heartbeat to do failover? Are there docs that I can
be pointed on this specific type of setup?

I know how to configure heartbeat and to use STONITH to make sure the
secondary device will not write to the shared storage device at the
same time as the primary device.

My main questions lie in what resources to stop/start on failver since
both (for example) OSS''s are active for one OST and failver for the
other OST.

Thanks,
Steve

Nielsen, Steve

2006-May-19 07:36 UTC

head link

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

Cliff,

I am not clear on how to set MDS active/passive failover.

Here is my MDS setup:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--add node --node mds01 || exit 10
--add node --node mds02 || exit 11

--add net --node mds01 --nid mds01 --nettype tcp || exit 20
--add net --node mds02 --nid mds02 --nettype tcp || exit 21

--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30

--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31

--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
--stripe_pattern 0 || exit 32


Then on my client in /etc/fstab I have:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
mds01:/mds1/client /mnt/lustre             lustre  rw              0 0

When I say take down mds01 and want mds02 to take over (for an upgrade
or something) how do my client now know to contact mds02 instead mds01 ?
Wouldn''t a floating IP address make sense in this case?

Here here is appreciated.

Thanks,
Steve

-----Original Message-----
From: cliff white [mailto:cliffw@clusterfs.com]=20
Sent: Tuesday, September 06, 2005 12:31 PM
To: Nielsen, Steve
Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
support@clusterfs.com
Subject: Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> So for both OSS/MDS i don''t need to float any ips between the
boxes
and> i don''t need to restart the services.  the only thing then via
heartbeat> I need to do is detect the other side down and "stonith" it? Then
things> will be good ?
Steve -
Just wanted to check back and see how you were doing with this.
One thing I didn''t mention, when failing back the service from
secondary
to primary, you should stop the service with ''lconf
--failover'' which
will be quicker.
cliffw

>=20
> Steve
>=20
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com]=20
> Sent: Thursday, September 01, 2005 11:59 AM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
>=20
> Nielsen, Steve wrote:
>=20
>>I am working on setting this up now. Just need to trudge through it.
>>
>>I will share my experiences when done.
>>
>>Quick question.  For failover over to work I need to have ips that
>=20
> float
>=20
>>between the devices.  Won''t this require me to restart the
lustre
>>service ? (i am on rhel 4).  So a "service lustre restart"
should work
>>right ?
>=20
>=20
> Unfortunately, we do not support IP takeover at this time. What we do
is> this:
> The servers are configured with a specific IP.
> The clients know about both IPs, and will attempt to connect in a=20
> round-robin fashion until they succeed.
>=20
> Here''s a typically configuration for OST failover: (servers
orlando
and=20> oscar)
>=20
> --add ost --node orlando --ost ost1-home --failover --group orlando \
> --lov lov-home --dev /dev/ost1
> --add ost --node orlando --ost ost2-home --failover \
> --lov lov-home --dev /dev/ost2
>=20
> --add ost --node oscar --ost ost2-home --failover --group oscar \
> --lov lov-home --dev /dev/ost2
> --add ost --node oscar --ost ost1-home --failover \
> --lov lov-home --dev /dev/ost1
>=20
> cliffw
>=20
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>Sent: Thursday, September 01, 2005 11:43 AM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>Hi,
>>>
>>>I have a couple questions about setting up failver with Lustre. Any
>>
>>help
>>
>>
>>>is appreciated.
>>>
>>>Definitions:
>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>ssd - shared storage device
>>>oss - object storage server
>>>mds - meta data server
>>>
>>>Here is my OSS setup:
>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>  oss1 #----(failover)----# ssd2
>>>   #                          #
>>>   |                          |
>>>   |                          |
>>>   |(primary)                 | (primary)
>>>   |                          |
>>>   |                          |
>>>   #                          #
>>>  ssd1 #----(failover)----# oss2
>>>
>>>So:
>>>   oss1 is primary for ssd1
>>>   oss1 is failover for ssd2
>>>   oss2 is primary for ssd2
>>>   oss2 is failover for ssd1
>>>
>>>Here is my MDS setup:
>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>    +-----# ssd3 #------+
>>>    |                   |
>>>    | (primary)         | (failover)
>>>    |                   |
>>>    |                   |
>>>    #                   #
>>>   mds1               mds2
>>>
>>>So:
>>>   mds1 is primary for ssd3
>>>   mds2 is failover for ssd3
>>>
>>>I am now at the stage where I want to implement failover for the
OSS''s>>>and MDS seutp.
>>>
>>>I would prefer to use heartbeat from linux-ha.org for the following
>>>reasons:
>>>   * it''s actively maintained
>>>   * we use it in house extensively
>>>   * i am very familiar with it
>>>   * its straight forward to use (start/stop resources on failover)
>>>
>>>Has anyone else used heartbeat to do failover? Are there docs that I
>>
>>can
>>
>>
>>>be pointed on this specific type of setup?
>>
>>
>>Heartbeat should work fine. We have customers using Red
Hat''s=20
>>CluManager, which is similar. We are currently writing the docs, I am=20
>>very interested in incorporating your experiences, especially since
>=20
> you=20
>=20
>>have Heartbeat familiarity.
>>
>>
>>>I know how to configure heartbeat and to use STONITH to make sure
the
>>>secondary device will not write to the shared storage device at the
>>>same time as the primary device.
>>
>>That''s the key - the shared storage must never be touched by
two
>=20
> servers
>=20
>>at once.
>>
>>
>>>My main questions lie in what resources to stop/start on failver
since>>>both (for example) OSS''s are active for one OST and failver
for the
>>>other OST.
>>
>>You should never have to stop the active primary OST when failing over
>=20
>=20
>>the other OST to the secondary. Failover is generally transparent to
>=20
> the
>=20
>>clients, their applications may block for a moment during failover,
>=20
> but=20
>=20
>>should continue on with the new server. Failback of course requires
>=20
> you=20
>=20
>>to stop the service on the secondary before starting it on the
>=20
> primary.
>=20
>>cliffw
>>
>>
>>
>>>Thanks,
>>>Steve
>>
>>
>=20

Ragnar Kjørstad

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

On Wed, Sep 07, 2005 at 02:55:10PM -0700, cliff white
wrote:> >When I say take down mds01 and want mds02 to take over (for an upgrade
> >or something) how do my client now know to contact mds02 instead mds01
?
> >Wouldn''t a floating IP address make sense in this case?
> >
> >Here here is appreciated.
> 
> Steve -
> Two answers, future and current.
> 
> Our new mountconfig will allow you to specify
> multiple MDSs as part of the mount command. Unfortunately, the new
> mountconfig hasn''t been released yet, it will be soon. For now,
you
> will not be able to specify the client mount in /etc/fstab.
> Instead, you will have to use lconf to mount the clients.
So one can''t use floating IP for this?


-- 
Ragnar Kjørstad
Software Engineer
Scali - http://www.scali.com
Scaling the Linux Datacenter

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Ragnar Kjørstad wrote:> On Wed, Sep 07, 2005 at 02:55:10PM -0700, cliff white wrote:
> 
>>>When I say take down mds01 and want mds02 to take over (for an
upgrade
>>>or something) how do my client now know to contact mds02 instead
mds01 ?
>>>Wouldn''t a floating IP address make sense in this case?
>>>
>>>Here here is appreciated.
>>
>>Steve -
>>Two answers, future and current.
>>
>>Our new mountconfig will allow you to specify
>>multiple MDSs as part of the mount command. Unfortunately, the new
>>mountconfig hasn''t been released yet, it will be soon. For now,
you
>>will not be able to specify the client mount in /etc/fstab.
>>Instead, you will have to use lconf to mount the clients.
> 
> 
> So one can''t use floating IP for this?
> 
> We do not support IP takeover at all. Portals cannot handle it.
cliffw

Mc Carthy, Fergal

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

However be warned that when the lconf --failover completes that doesn''t
necessarily mean that the device has fully finished on the current node;
it happens in the background and can take a few seconds sometimes.

This can lead to the lconf --failover complaining about failing to
unload modules sometimes.

However with the latest versions lconf --failover should change the
stopped devices to readonly so the local node can''t make anymore
changes
to them and it is safe to start them running on the alternate server.

Fergal.

--

Fergal.McCarthy@HP.com

(The contents of this message and any attachments to it are confidential
and may be legally privileged. If you have received this message in
error you should delete it from your system immediately and advise the
sender. To any recipient of this message within HP, unless otherwise
stated, you should consider this message and attachments as "HP
CONFIDENTIAL".)


-----Original Message-----
From: lustre-discuss-admin@lists.clusterfs.com
[mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of cliff
white
Sent: 06 September 2005 18:31
To: Nielsen, Steve
Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
support@clusterfs.com
Subject: [Lustre-discuss] Re: setting up failover in lustre... any
recommendations?

Nielsen, Steve wrote:> So for both OSS/MDS i don''t need to float any ips between the
boxes
and> i don''t need to restart the services.  the only thing then via
heartbeat> I need to do is detect the other side down and "stonith" it? Then
things> will be good ?
Steve -
Just wanted to check back and see how you were doing with this.
One thing I didn''t mention, when failing back the service from
secondary
to primary, you should stop the service with ''lconf
--failover'' which
will be quicker.
cliffw

>=20
> Steve
>=20
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com]=20
> Sent: Thursday, September 01, 2005 11:59 AM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
>=20
> Nielsen, Steve wrote:
>=20
>>I am working on setting this up now. Just need to trudge through it.
>>
>>I will share my experiences when done.
>>
>>Quick question.  For failover over to work I need to have ips that
>=20
> float
>=20
>>between the devices.  Won''t this require me to restart the
lustre
>>service ? (i am on rhel 4).  So a "service lustre restart"
should work
>>right ?
>=20
>=20
> Unfortunately, we do not support IP takeover at this time. What we do
is> this:
> The servers are configured with a specific IP.
> The clients know about both IPs, and will attempt to connect in a=20
> round-robin fashion until they succeed.
>=20
> Here''s a typically configuration for OST failover: (servers
orlando
and=20> oscar)
>=20
> --add ost --node orlando --ost ost1-home --failover --group orlando \
> --lov lov-home --dev /dev/ost1
> --add ost --node orlando --ost ost2-home --failover \
> --lov lov-home --dev /dev/ost2
>=20
> --add ost --node oscar --ost ost2-home --failover --group oscar \
> --lov lov-home --dev /dev/ost2
> --add ost --node oscar --ost ost1-home --failover \
> --lov lov-home --dev /dev/ost1
>=20
> cliffw
>=20
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>Sent: Thursday, September 01, 2005 11:43 AM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>Hi,
>>>
>>>I have a couple questions about setting up failver with Lustre. Any
>>
>>help
>>
>>
>>>is appreciated.
>>>
>>>Definitions:
>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>ssd - shared storage device
>>>oss - object storage server
>>>mds - meta data server
>>>
>>>Here is my OSS setup:
>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>  oss1 #----(failover)----# ssd2
>>>   #                          #
>>>   |                          |
>>>   |                          |
>>>   |(primary)                 | (primary)
>>>   |                          |
>>>   |                          |
>>>   #                          #
>>>  ssd1 #----(failover)----# oss2
>>>
>>>So:
>>>   oss1 is primary for ssd1
>>>   oss1 is failover for ssd2
>>>   oss2 is primary for ssd2
>>>   oss2 is failover for ssd1
>>>
>>>Here is my MDS setup:
>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>    +-----# ssd3 #------+
>>>    |                   |
>>>    | (primary)         | (failover)
>>>    |                   |
>>>    |                   |
>>>    #                   #
>>>   mds1               mds2
>>>
>>>So:
>>>   mds1 is primary for ssd3
>>>   mds2 is failover for ssd3
>>>
>>>I am now at the stage where I want to implement failover for the
OSS''s>>>and MDS seutp.
>>>
>>>I would prefer to use heartbeat from linux-ha.org for the following
>>>reasons:
>>>   * it''s actively maintained
>>>   * we use it in house extensively
>>>   * i am very familiar with it
>>>   * its straight forward to use (start/stop resources on failover)
>>>
>>>Has anyone else used heartbeat to do failover? Are there docs that I
>>
>>can
>>
>>
>>>be pointed on this specific type of setup?
>>
>>
>>Heartbeat should work fine. We have customers using Red
Hat''s=20
>>CluManager, which is similar. We are currently writing the docs, I am=20
>>very interested in incorporating your experiences, especially since
>=20
> you=20
>=20
>>have Heartbeat familiarity.
>>
>>
>>>I know how to configure heartbeat and to use STONITH to make sure
the
>>>secondary device will not write to the shared storage device at the
>>>same time as the primary device.
>>
>>That''s the key - the shared storage must never be touched by
two
>=20
> servers
>=20
>>at once.
>>
>>
>>>My main questions lie in what resources to stop/start on failver
since>>>both (for example) OSS''s are active for one OST and failver
for the
>>>other OST.
>>
>>You should never have to stop the active primary OST when failing over
>=20
>=20
>>the other OST to the secondary. Failover is generally transparent to
>=20
> the
>=20
>>clients, their applications may block for a moment during failover,
>=20
> but=20
>=20
>>should continue on with the new server. Failback of course requires
>=20
> you=20
>=20
>>to stop the service on the secondary before starting it on the
>=20
> primary.
>=20
>>cliffw
>>
>>
>>
>>>Thanks,
>>>Steve
>>
>>
>=20
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.clusterfs.com
https://lists.clusterfs.com/mailman/listinfo/lustre-discuss

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> Cliff,
> 
> I am a little lost then as to what the command lines would be to
> start/stop MDS within heartbeat.  That is where my confusion lies.  If
> you could help with the commandlines that would be great.No problem.
First, what version of Lustre are you using, and what type of network?
(Ethernet, IB, Elan, etc)> 
> Here is a guess at at what I need to do:
> 
> - mds01 and mds02 are my MDS servers with a shared storage device for
> the metadata that they are both connected to.
> 
> - using heartbeat to control failover between the two MDS servers
> 
> - both servers are configured to NOT bring up the lustre service via the
> initscripts (heatbeat will control this) 
> 
> - configure mds01 in heartbeat as the "prefered" server to run
the
> active MDS (initially)
This is all correct - you need to make one change to your configuration.
You have:

--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30

--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31

Remove the ''--group'' entry for the secondary server.

--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
/dev/vg_mds1/lv_mds1 --failover || exit 31> 
> What commandline would I run to bring up MDS then? The init script that
> comes with lustre or a "raw" lconf command?
If you are running Lustre > 1.4.4 you can symlink /etc/init.d/lustre
to <path to service scripts>/mds01 (this is the --group name, so it 
would be ''mds01'' on both nodes) and use the symlink to
start/stop the
mds pair, this is our preferred failover method.
Second choice would be hand-running ''/etc/init.d/lustre start''
Last, you could run lconf.
> 
> On failover heartbeat on mds02 needs to execute a series of commands to
> take over MDS from mds01:
> 	- stonith mds01 (that way it reboots, comes up, sees mds02 is
> the master and does nothing)
> 	- start lustre locally on mds02.  What is the command line for
> this ? I suspect its an lconf command?
The symlink method should work for starting either node, it works from 
the ''--group'' parameter. That is why you need to remove the
duplicate
--group. You can also use lconf - ''lconf --group mds01 --select 
mds01=mds02 <config file>''> 
> On failing back to mds01 what commands do I need to run? 
> 	- stonith mds02
> 	- start lustre locally on mds01. What is the command line for
> this ? I suspect its an lconf command?
  lconf with --failover will do a quick shutdown of mds02, once
that shutdown is complete, you would start mds01, with the service 
symlink, /etc/init.d/lustre or lconf. That would be transparent for the 
clients.

Personally, I would avoid the stonith on failback, but that would work,
and should also be transparent for the clients.
> 
> As a side note:
> In trying different things to get MDS to failover I get the error
> message below: Do you know what it means? I am running the same version
> of lustre everywhere (kernel, modules, ..)
> 
> 2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol
> error connecting to host 10.10.1.2 on port 988: Is it running a
> compatible version of Lustre?
This is a known issue, port 988 is withing the range of ports handed out
by the portmapper. It is possible that an RPC service is starting before
Lustre and grabbing that port. Typically, disabling nfs and nfslock
avoids the problem, it''s a issue with other services that collide with 
the portmapper range, see
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=103401
for more explaination of the issue.
cliffw
> 
> Thanks,
> Steve
> 
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com] 
> Sent: Thursday, September 08, 2005 11:08 AM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
> 
> Nielsen, Steve wrote:
> 
>>Cliff,
>>
>>I changed my clients to use the lconf call as you recommend below.
>>
>>I am able to mount the file system without issues on the client using
> 
> this method.
> 
>>However when I shutdown lustre on mds01 (my main mds server) as a test
> 
> mds02 does not take over.
> 
>>Are there commands I need to run on my standby mds02 server to enable
> 
> it to take over?
> 
>>Also when things failback to mds01 is there something I need to run to
> 
> enable that?
> I think I may need some more detail on your setup, please
> expand if necessary.
> 
> First, only one MDS should be running at a time. This is
> very important - you show _never_ have both servers in the
> failover pair active at the same time. Bad Things can happen
> to your metadata.
> 
> For failover, you will have to start the second mds server
> after the first mds is down. Normally, this is done with an
> HA package (Heartbeat, CluManager, etc). When the secondary
> server starts, it should access the shared storage and go.
> 
> If that''s not happening, we need to see the logs (syslog,dmesg,
> anything on the console) there should be errors. Check logs on the
> MDS and OST.
> 
> For failback, you should stop mds02 with the --failover option.
> This will do a quick shutdown - then start mds01.
> cliffw
> 
> 
>>Thanks,
>>Steve
>>
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com]
>>Sent: Wed 9/7/2005 5:55 PM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> 
> support@clusterfs.com
> 
>>Subject: Re: setting up failover in lustre... any recommendations?
>> 
>>Nielsen, Steve wrote:
>>
>>
>>>Cliff,
>>>
>>>I am not clear on how to set MDS active/passive failover.
>>>
>>>Here is my MDS setup:
>>>====================>>>--add node --node mds01 || exit 10
>>>--add node --node mds02 || exit 11
>>>
>>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20
>>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21
>>>
>>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit
30
>>>
>>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
>>>
>>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
>>>--stripe_pattern 0 || exit 32
>>>
>>>
>>>Then on my client in /etc/fstab I have:
>>>======================================>>>mds01:/mds1/client
/mnt/lustre             lustre  rw              0 0
>>>
>>>When I say take down mds01 and want mds02 to take over (for an
upgrade
>>>or something) how do my client now know to contact mds02 instead
mds01
> 
> ?
> 
>>>Wouldn''t a floating IP address make sense in this case?
>>>
>>>Here here is appreciated.
>>
>>
>>Steve -
>>Two answers, future and current.
>>
>>Our new mountconfig will allow you to specify
>>multiple MDSs as part of the mount command. Unfortunately, the new
>>mountconfig hasn''t been released yet, it will be soon. For now,
you
>>will not be able to specify the client mount in /etc/fstab.
>>Instead, you will have to use lconf to mount the clients.
>>
>>In your setup script, be sure you specify a client
>>  (the single line will cover any number of actual clients):
>>
>>--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1
>>
>>Then on the client node:
>>''lconf --node client <your xml file>''
>>
>>will mount the filesystem - failover of the mds will be transparent to
>>the clients, they will know to try the secondary mds if the primary is
>>unavailable.
>>For now, you may wish to put the lconf command in a script, and have
> 
> the
> 
>>script called as part of your normal startup.
>>
>>cliffw
>>
>>
>>
>>
>>>Thanks,
>>>Steve
>>>
>>>-----Original Message-----
>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>Sent: Tuesday, September 06, 2005 12:31 PM
>>>To: Nielsen, Steve
>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>support@clusterfs.com
>>>Subject: Re: setting up failover in lustre... any recommendations?
>>>
>>>Nielsen, Steve wrote:
>>>
>>>
>>>
>>>>So for both OSS/MDS i don''t need to float any ips
between the boxes
>>>
>>>and
>>>
>>>
>>>
>>>>i don''t need to restart the services.  the only thing
then via
>>>
>>>heartbeat
>>>
>>>
>>>
>>>>I need to do is detect the other side down and
"stonith" it? Then
>>>
>>>things
>>>
>>>
>>>
>>>>will be good ?
>>>
>>>
>>>Steve -
>>>Just wanted to check back and see how you were doing with this.
>>>One thing I didn''t mention, when failing back the service
from
> 
> secondary
> 
>>>to primary, you should stop the service with ''lconf
--failover'' which
>>>will be quicker.
>>>cliffw
>>>
>>>
>>>
>>>
>>>
>>>>Steve
>>>>
>>>>-----Original Message-----
>>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>>Sent: Thursday, September 01, 2005 11:59 AM
>>>>To: Nielsen, Steve
>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>support@clusterfs.com
>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>
>>>>Nielsen, Steve wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>I am working on setting this up now. Just need to trudge
through it.
>>>>>
>>>>>I will share my experiences when done.
>>>>>
>>>>>Quick question.  For failover over to work I need to have
ips that
>>>>
>>>>float
>>>>
>>>>
>>>>
>>>>
>>>>>between the devices.  Won''t this require me to
restart the lustre
>>>>>service ? (i am on rhel 4).  So a "service lustre
restart" should
> 
> work
> 
>>>>>right ?
>>>>
>>>>
>>>>Unfortunately, we do not support IP takeover at this time. What
we do
>>>
>>>is
>>>
>>>
>>>
>>>>this:
>>>>The servers are configured with a specific IP.
>>>>The clients know about both IPs, and will attempt to connect in
a
>>>>round-robin fashion until they succeed.
>>>>
>>>>Here''s a typically configuration for OST failover:
(servers orlando
>>>
>>>and 
>>>
>>>
>>>
>>>>oscar)
>>>>
>>>>--add ost --node orlando --ost ost1-home --failover --group
orlando \
>>>>--lov lov-home --dev /dev/ost1
>>>>--add ost --node orlando --ost ost2-home --failover \
>>>>--lov lov-home --dev /dev/ost2
>>>>
>>>>--add ost --node oscar --ost ost2-home --failover --group oscar
\
>>>>--lov lov-home --dev /dev/ost2
>>>>--add ost --node oscar --ost ost1-home --failover \
>>>>--lov lov-home --dev /dev/ost1
>>>>
>>>>cliffw
>>>>
>>>>
>>>>
>>>>
>>>>>Steve
>>>>>
>>>>>-----Original Message-----
>>>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>>>Sent: Thursday, September 01, 2005 11:43 AM
>>>>>To: Nielsen, Steve
>>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>>support@clusterfs.com
>>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>>
>>>>>Nielsen, Steve wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>I have a couple questions about setting up failver with
Lustre. Any
>>>>>
>>>>>help
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>is appreciated.
>>>>>>
>>>>>>Definitions:
>>>>>>===========>>>>>>ssd - shared storage
device
>>>>>>oss - object storage server
>>>>>>mds - meta data server
>>>>>>
>>>>>>Here is my OSS setup:
>>>>>>====================>>>>>>oss1
#----(failover)----# ssd2
>>>>>>#                          #
>>>>>>|                          |
>>>>>>|                          |
>>>>>>|(primary)                 | (primary)
>>>>>>|                          |
>>>>>>|                          |
>>>>>>#                          #
>>>>>>ssd1 #----(failover)----# oss2
>>>>>>
>>>>>>So:
>>>>>>oss1 is primary for ssd1
>>>>>>oss1 is failover for ssd2
>>>>>>oss2 is primary for ssd2
>>>>>>oss2 is failover for ssd1
>>>>>>
>>>>>>Here is my MDS setup:
>>>>>>====================>>>>>> +-----#
ssd3 #------+
>>>>>> |                   |
>>>>>> | (primary)         | (failover)
>>>>>> |                   |
>>>>>> |                   |
>>>>>> #                   #
>>>>>>mds1               mds2
>>>>>>
>>>>>>So:
>>>>>>mds1 is primary for ssd3
>>>>>>mds2 is failover for ssd3
>>>>>>
>>>>>>I am now at the stage where I want to implement failover
for the
>>>
>>>OSS''s
>>>
>>>
>>>
>>>>>>and MDS seutp.
>>>>>>
>>>>>>I would prefer to use heartbeat from linux-ha.org for
the following
>>>>>>reasons:
>>>>>>* it''s actively maintained
>>>>>>* we use it in house extensively
>>>>>>* i am very familiar with it
>>>>>>* its straight forward to use (start/stop resources on
failover)
>>>>>>
>>>>>>Has anyone else used heartbeat to do failover? Are there
docs that
> 
> I
> 
>>>>>can
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>be pointed on this specific type of setup?
>>>>>
>>>>>
>>>>>Heartbeat should work fine. We have customers using Red
Hat''s
>>>>>CluManager, which is similar. We are currently writing the
docs, I
> 
> am 
> 
>>>>>very interested in incorporating your experiences,
especially since
>>>>
>>>>you 
>>>>
>>>>
>>>>
>>>>
>>>>>have Heartbeat familiarity.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I know how to configure heartbeat and to use STONITH to
make sure
> 
> the
> 
>>>>>>secondary device will not write to the shared storage
device at the
>>>>>>same time as the primary device.
>>>>>
>>>>>That''s the key - the shared storage must never be
touched by two
>>>>
>>>>servers
>>>>
>>>>
>>>>
>>>>
>>>>>at once.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>My main questions lie in what resources to stop/start on
failver
>>>
>>>since
>>>
>>>
>>>
>>>>>>both (for example) OSS''s are active for one OST
and failver for the
>>>>>>other OST.
>>>>>
>>>>>You should never have to stop the active primary OST when
failing
> 
> over
> 
>>>>
>>>>>the other OST to the secondary. Failover is generally
transparent to
>>>>
>>>>the
>>>>
>>>>
>>>>
>>>>
>>>>>clients, their applications may block for a moment during
failover,
>>>>
>>>>but 
>>>>
>>>>
>>>>
>>>>
>>>>>should continue on with the new server. Failback of course
requires
>>>>
>>>>you 
>>>>
>>>>
>>>>
>>>>
>>>>>to stop the service on the secondary before starting it on
the
>>>>
>>>>primary.
>>>>
>>>>
>>>>
>>>>
>>>>>cliffw
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Thanks,
>>>>>>Steve
>>>>>
>>>>>
>

Nielsen, Steve

2006-May-19 07:36 UTC

head link

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

Thanks.

I will setup this up and test it.

BTW, I am using lustre 1.4.5 and regular old ethternet.

I assume for the OSTs the same command lineas as MDSs would apply as
well (with correct --group config and symlinking lustre) ?

On the --failover for failback does the lconf command wait till its
complete? Or should I sleep in the script after issuing lconf --failvoer
?

Steve

-----Original Message-----
From: cliff white [mailto:cliffw@clusterfs.com]=20
Sent: Thursday, September 08, 2005 2:12 PM
To: Nielsen, Steve
Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
support@clusterfs.com
Subject: Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> Cliff,
>=20
> I am a little lost then as to what the command lines would be to
> start/stop MDS within heartbeat.  That is where my confusion lies.  If
> you could help with the commandlines that would be great.No problem.
First, what version of Lustre are you using, and what type of network?
(Ethernet, IB, Elan, etc)>=20
> Here is a guess at at what I need to do:
>=20
> - mds01 and mds02 are my MDS servers with a shared storage device for
> the metadata that they are both connected to.
>=20
> - using heartbeat to control failover between the two MDS servers
>=20
> - both servers are configured to NOT bring up the lustre service via
the> initscripts (heatbeat will control this)=20
>=20
> - configure mds01 in heartbeat as the "prefered" server to run
the
> active MDS (initially)
This is all correct - you need to make one change to your configuration.
You have:

--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30

--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31

Remove the ''--group'' entry for the secondary server.

--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
/dev/vg_mds1/lv_mds1 --failover || exit 31>=20
> What commandline would I run to bring up MDS then? The init script
that> comes with lustre or a "raw" lconf command?
If you are running Lustre > 1.4.4 you can symlink /etc/init.d/lustre
to <path to service scripts>/mds01 (this is the --group name, so it=20
would be ''mds01'' on both nodes) and use the symlink to
start/stop the=20
mds pair, this is our preferred failover method.
Second choice would be hand-running ''/etc/init.d/lustre start''
Last, you could run lconf.
>=20
> On failover heartbeat on mds02 needs to execute a series of commands
to> take over MDS from mds01:
> 	- stonith mds01 (that way it reboots, comes up, sees mds02 is
> the master and does nothing)
> 	- start lustre locally on mds02.  What is the command line for
> this ? I suspect its an lconf command?
The symlink method should work for starting either node, it works from=20
the ''--group'' parameter. That is why you need to remove the
duplicate=20
--group. You can also use lconf - ''lconf --group mds01 --select=20
mds01=3Dmds02 <config file>''>=20
> On failing back to mds01 what commands do I need to run?=20
> 	- stonith mds02
> 	- start lustre locally on mds01. What is the command line for
> this ? I suspect its an lconf command?
  lconf with --failover will do a quick shutdown of mds02, once
that shutdown is complete, you would start mds01, with the service=20
symlink, /etc/init.d/lustre or lconf. That would be transparent for the=20
clients.

Personally, I would avoid the stonith on failback, but that would work,
and should also be transparent for the clients.
>=20
> As a side note:
> In trying different things to get MDS to failover I get the error
> message below: Do you know what it means? I am running the same
version> of lustre everywhere (kernel, modules, ..)
>=20
> 2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol
> error connecting to host 10.10.1.2 on port 988: Is it running a
> compatible version of Lustre?
This is a known issue, port 988 is withing the range of ports handed out
by the portmapper. It is possible that an RPC service is starting before
Lustre and grabbing that port. Typically, disabling nfs and nfslock
avoids the problem, it''s a issue with other services that collide
with=20
the portmapper range, see
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=3D103401
for more explaination of the issue.
cliffw
>=20
> Thanks,
> Steve
>=20
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com]=20
> Sent: Thursday, September 08, 2005 11:08 AM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
>=20
> Nielsen, Steve wrote:
>=20
>>Cliff,
>>
>>I changed my clients to use the lconf call as you recommend below.
>>
>>I am able to mount the file system without issues on the client using
>=20
> this method.
>=20
>>However when I shutdown lustre on mds01 (my main mds server) as a test
>=20
> mds02 does not take over.
>=20
>>Are there commands I need to run on my standby mds02 server to enable
>=20
> it to take over?
>=20
>>Also when things failback to mds01 is there something I need to run to
>=20
> enable that?
> I think I may need some more detail on your setup, please
> expand if necessary.
>=20
> First, only one MDS should be running at a time. This is
> very important - you show _never_ have both servers in the
> failover pair active at the same time. Bad Things can happen
> to your metadata.
>=20
> For failover, you will have to start the second mds server
> after the first mds is down. Normally, this is done with an
> HA package (Heartbeat, CluManager, etc). When the secondary
> server starts, it should access the shared storage and go.
>=20
> If that''s not happening, we need to see the logs (syslog,dmesg,
> anything on the console) there should be errors. Check logs on the
> MDS and OST.
>=20
> For failback, you should stop mds02 with the --failover option.
> This will do a quick shutdown - then start mds01.
> cliffw
>=20
>=20
>>Thanks,
>>Steve
>>
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com]
>>Sent: Wed 9/7/2005 5:55 PM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>=20
> support@clusterfs.com
>=20
>>Subject: Re: setting up failover in lustre... any recommendations?
>>=20
>>Nielsen, Steve wrote:
>>
>>
>>>Cliff,
>>>
>>>I am not clear on how to set MDS active/passive failover.
>>>
>>>Here is my MDS setup:
>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>--add node --node mds01 || exit 10
>>>--add node --node mds02 || exit 11
>>>
>>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20
>>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21
>>>
>>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit
30
>>>
>>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
>>>
>>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
>>>--stripe_pattern 0 || exit 32
>>>
>>>
>>>Then on my client in /etc/fstab I have:
>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>mds01:/mds1/client /mnt/lustre             lustre  rw              0
0>>>
>>>When I say take down mds01 and want mds02 to take over (for an
upgrade>>>or something) how do my client now know to contact mds02 instead
mds01>=20
> ?
>=20
>>>Wouldn''t a floating IP address make sense in this case?
>>>
>>>Here here is appreciated.
>>
>>
>>Steve -
>>Two answers, future and current.
>>
>>Our new mountconfig will allow you to specify
>>multiple MDSs as part of the mount command. Unfortunately, the new
>>mountconfig hasn''t been released yet, it will be soon. For now,
you
>>will not be able to specify the client mount in /etc/fstab.
>>Instead, you will have to use lconf to mount the clients.
>>
>>In your setup script, be sure you specify a client
>>  (the single line will cover any number of actual clients):
>>
>>--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1
>>
>>Then on the client node:
>>''lconf --node client <your xml file>''
>>
>>will mount the filesystem - failover of the mds will be transparent to
>>the clients, they will know to try the secondary mds if the primary is
>>unavailable.
>>For now, you may wish to put the lconf command in a script, and have
>=20
> the
>=20
>>script called as part of your normal startup.
>>
>>cliffw
>>
>>
>>
>>
>>>Thanks,
>>>Steve
>>>
>>>-----Original Message-----
>>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>>Sent: Tuesday, September 06, 2005 12:31 PM
>>>To: Nielsen, Steve
>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>support@clusterfs.com
>>>Subject: Re: setting up failover in lustre... any recommendations?
>>>
>>>Nielsen, Steve wrote:
>>>
>>>
>>>
>>>>So for both OSS/MDS i don''t need to float any ips
between the boxes
>>>
>>>and
>>>
>>>
>>>
>>>>i don''t need to restart the services.  the only thing
then via
>>>
>>>heartbeat
>>>
>>>
>>>
>>>>I need to do is detect the other side down and
"stonith" it? Then
>>>
>>>things
>>>
>>>
>>>
>>>>will be good ?
>>>
>>>
>>>Steve -
>>>Just wanted to check back and see how you were doing with this.
>>>One thing I didn''t mention, when failing back the service
from
>=20
> secondary
>=20
>>>to primary, you should stop the service with ''lconf
--failover'' which
>>>will be quicker.
>>>cliffw
>>>
>>>
>>>
>>>
>>>
>>>>Steve
>>>>
>>>>-----Original Message-----
>>>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>>>Sent: Thursday, September 01, 2005 11:59 AM
>>>>To: Nielsen, Steve
>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>support@clusterfs.com
>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>
>>>>Nielsen, Steve wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>I am working on setting this up now. Just need to trudge
through
it.>>>>>
>>>>>I will share my experiences when done.
>>>>>
>>>>>Quick question.  For failover over to work I need to have
ips that
>>>>
>>>>float
>>>>
>>>>
>>>>
>>>>
>>>>>between the devices.  Won''t this require me to
restart the lustre
>>>>>service ? (i am on rhel 4).  So a "service lustre
restart" should
>=20
> work
>=20
>>>>>right ?
>>>>
>>>>
>>>>Unfortunately, we do not support IP takeover at this time. What
we
do>>>
>>>is
>>>
>>>
>>>
>>>>this:
>>>>The servers are configured with a specific IP.
>>>>The clients know about both IPs, and will attempt to connect in
a=20
>>>>round-robin fashion until they succeed.
>>>>
>>>>Here''s a typically configuration for OST failover:
(servers orlando
>>>
>>>and=20
>>>
>>>
>>>
>>>>oscar)
>>>>
>>>>--add ost --node orlando --ost ost1-home --failover --group
orlando
\>>>>--lov lov-home --dev /dev/ost1
>>>>--add ost --node orlando --ost ost2-home --failover \
>>>>--lov lov-home --dev /dev/ost2
>>>>
>>>>--add ost --node oscar --ost ost2-home --failover --group oscar
\
>>>>--lov lov-home --dev /dev/ost2
>>>>--add ost --node oscar --ost ost1-home --failover \
>>>>--lov lov-home --dev /dev/ost1
>>>>
>>>>cliffw
>>>>
>>>>
>>>>
>>>>
>>>>>Steve
>>>>>
>>>>>-----Original Message-----
>>>>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>>>>Sent: Thursday, September 01, 2005 11:43 AM
>>>>>To: Nielsen, Steve
>>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>>support@clusterfs.com
>>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>>
>>>>>Nielsen, Steve wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>I have a couple questions about setting up failver with
Lustre.
Any>>>>>
>>>>>help
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>is appreciated.
>>>>>>
>>>>>>Definitions:
>>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>>>ssd - shared storage device
>>>>>>oss - object storage server
>>>>>>mds - meta data server
>>>>>>
>>>>>>Here is my OSS setup:
>>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>>>oss1 #----(failover)----# ssd2
>>>>>>#                          #
>>>>>>|                          |
>>>>>>|                          |
>>>>>>|(primary)                 | (primary)
>>>>>>|                          |
>>>>>>|                          |
>>>>>>#                          #
>>>>>>ssd1 #----(failover)----# oss2
>>>>>>
>>>>>>So:
>>>>>>oss1 is primary for ssd1
>>>>>>oss1 is failover for ssd2
>>>>>>oss2 is primary for ssd2
>>>>>>oss2 is failover for ssd1
>>>>>>
>>>>>>Here is my MDS setup:
>>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>>> +-----# ssd3 #------+
>>>>>> |                   |
>>>>>> | (primary)         | (failover)
>>>>>> |                   |
>>>>>> |                   |
>>>>>> #                   #
>>>>>>mds1               mds2
>>>>>>
>>>>>>So:
>>>>>>mds1 is primary for ssd3
>>>>>>mds2 is failover for ssd3
>>>>>>
>>>>>>I am now at the stage where I want to implement failover
for the
>>>
>>>OSS''s
>>>
>>>
>>>
>>>>>>and MDS seutp.
>>>>>>
>>>>>>I would prefer to use heartbeat from linux-ha.org for
the
following>>>>>>reasons:
>>>>>>* it''s actively maintained
>>>>>>* we use it in house extensively
>>>>>>* i am very familiar with it
>>>>>>* its straight forward to use (start/stop resources on
failover)
>>>>>>
>>>>>>Has anyone else used heartbeat to do failover? Are there
docs that
>=20
> I
>=20
>>>>>can
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>be pointed on this specific type of setup?
>>>>>
>>>>>
>>>>>Heartbeat should work fine. We have customers using Red
Hat''s=20
>>>>>CluManager, which is similar. We are currently writing the
docs, I
>=20
> am=20
>=20
>>>>>very interested in incorporating your experiences,
especially since
>>>>
>>>>you=20
>>>>
>>>>
>>>>
>>>>
>>>>>have Heartbeat familiarity.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I know how to configure heartbeat and to use STONITH to
make sure
>=20
> the
>=20
>>>>>>secondary device will not write to the shared storage
device at
the>>>>>>same time as the primary device.
>>>>>
>>>>>That''s the key - the shared storage must never be
touched by two
>>>>
>>>>servers
>>>>
>>>>
>>>>
>>>>
>>>>>at once.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>My main questions lie in what resources to stop/start on
failver
>>>
>>>since
>>>
>>>
>>>
>>>>>>both (for example) OSS''s are active for one OST
and failver for
the>>>>>>other OST.
>>>>>
>>>>>You should never have to stop the active primary OST when
failing
>=20
> over
>=20
>>>>
>>>>>the other OST to the secondary. Failover is generally
transparent
to>>>>
>>>>the
>>>>
>>>>
>>>>
>>>>
>>>>>clients, their applications may block for a moment during
failover,
>>>>
>>>>but=20
>>>>
>>>>
>>>>
>>>>
>>>>>should continue on with the new server. Failback of course
requires
>>>>
>>>>you=20
>>>>
>>>>
>>>>
>>>>
>>>>>to stop the service on the secondary before starting it on
the
>>>>
>>>>primary.
>>>>
>>>>
>>>>
>>>>
>>>>>cliffw
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Thanks,
>>>>>>Steve
>>>>>
>>>>>
>=20

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> Thanks.
> 
> I will setup this up and test it.
Great, let us know how it goes.> 
> BTW, I am using lustre 1.4.5 and regular old ethternet.
Thanks.> 
> I assume for the OSTs the same command lineas as MDSs would apply as
> well (with correct --group config and symlinking lustre) ?
Yes> 
> On the --failover for failback does the lconf command wait till its
> complete? Or should I sleep in the script after issuing lconf --failvoer
Things should be complete when lconf returns, it sets the device 
read-only, so you should be okay starting the failback without a sleep.
cliffw
> ?
> 
> Steve
> 
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com] 
> Sent: Thursday, September 08, 2005 2:12 PM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
> 
> Nielsen, Steve wrote:
> 
>>Cliff,
>>
>>I am a little lost then as to what the command lines would be to
>>start/stop MDS within heartbeat.  That is where my confusion lies.  If
>>you could help with the commandlines that would be great.
> 
> No problem.
> First, what version of Lustre are you using, and what type of network?
> (Ethernet, IB, Elan, etc)
> 
>>Here is a guess at at what I need to do:
>>
>>- mds01 and mds02 are my MDS servers with a shared storage device for
>>the metadata that they are both connected to.
>>
>>- using heartbeat to control failover between the two MDS servers
>>
>>- both servers are configured to NOT bring up the lustre service via
> 
> the
> 
>>initscripts (heatbeat will control this) 
>>
>>- configure mds01 in heartbeat as the "prefered" server to run
the
>>active MDS (initially)
> 
> 
> This is all correct - you need to make one change to your configuration.
> You have:
> 
> --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30
> 
> --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
> 
> Remove the ''--group'' entry for the secondary server.
> 
> --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover || exit 31
> 
>>What commandline would I run to bring up MDS then? The init script
> 
> that
> 
>>comes with lustre or a "raw" lconf command?
> 
> 
> If you are running Lustre > 1.4.4 you can symlink /etc/init.d/lustre
> to <path to service scripts>/mds01 (this is the --group name, so it 
> would be ''mds01'' on both nodes) and use the symlink to
start/stop the
> mds pair, this is our preferred failover method.
> Second choice would be hand-running ''/etc/init.d/lustre
start''
> Last, you could run lconf.
> 
> 
>>On failover heartbeat on mds02 needs to execute a series of commands
> 
> to
> 
>>take over MDS from mds01:
>>	- stonith mds01 (that way it reboots, comes up, sees mds02 is
>>the master and does nothing)
>>	- start lustre locally on mds02.  What is the command line for
>>this ? I suspect its an lconf command?
> 
> 
> The symlink method should work for starting either node, it works from 
> the ''--group'' parameter. That is why you need to remove
the duplicate
> --group. You can also use lconf - ''lconf --group mds01 --select 
> mds01=mds02 <config file>''
> 
>>On failing back to mds01 what commands do I need to run? 
>>	- stonith mds02
>>	- start lustre locally on mds01. What is the command line for
>>this ? I suspect its an lconf command?
> 
> 
>   lconf with --failover will do a quick shutdown of mds02, once
> that shutdown is complete, you would start mds01, with the service 
> symlink, /etc/init.d/lustre or lconf. That would be transparent for the 
> clients.
> 
> Personally, I would avoid the stonith on failback, but that would work,
> and should also be transparent for the clients.
> 
> 
>>As a side note:
>>In trying different things to get MDS to failover I get the error
>>message below: Do you know what it means? I am running the same
> 
> version
> 
>>of lustre everywhere (kernel, modules, ..)
>>
>>2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol
>>error connecting to host 10.10.1.2 on port 988: Is it running a
>>compatible version of Lustre?
> 
> 
> This is a known issue, port 988 is withing the range of ports handed out
> by the portmapper. It is possible that an RPC service is starting before
> Lustre and grabbing that port. Typically, disabling nfs and nfslock
> avoids the problem, it''s a issue with other services that collide
with
> the portmapper range, see
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=103401
> for more explaination of the issue.
> cliffw
> 
> 
>>Thanks,
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>Sent: Thursday, September 08, 2005 11:08 AM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>Cliff,
>>>
>>>I changed my clients to use the lconf call as you recommend below.
>>>
>>>I am able to mount the file system without issues on the client
using
>>
>>this method.
>>
>>
>>>However when I shutdown lustre on mds01 (my main mds server) as a
test
>>
>>mds02 does not take over.
>>
>>
>>>Are there commands I need to run on my standby mds02 server to
enable
>>
>>it to take over?
>>
>>
>>>Also when things failback to mds01 is there something I need to run
to
>>
>>enable that?
>>I think I may need some more detail on your setup, please
>>expand if necessary.
>>
>>First, only one MDS should be running at a time. This is
>>very important - you show _never_ have both servers in the
>>failover pair active at the same time. Bad Things can happen
>>to your metadata.
>>
>>For failover, you will have to start the second mds server
>>after the first mds is down. Normally, this is done with an
>>HA package (Heartbeat, CluManager, etc). When the secondary
>>server starts, it should access the shared storage and go.
>>
>>If that''s not happening, we need to see the logs (syslog,dmesg,
>>anything on the console) there should be errors. Check logs on the
>>MDS and OST.
>>
>>For failback, you should stop mds02 with the --failover option.
>>This will do a quick shutdown - then start mds01.
>>cliffw
>>
>>
>>
>>>Thanks,
>>>Steve
>>>
>>>
>>>-----Original Message-----
>>>From: cliff white [mailto:cliffw@clusterfs.com]
>>>Sent: Wed 9/7/2005 5:55 PM
>>>To: Nielsen, Steve
>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>
>>support@clusterfs.com
>>
>>
>>>Subject: Re: setting up failover in lustre... any recommendations?
>>>
>>>Nielsen, Steve wrote:
>>>
>>>
>>>
>>>>Cliff,
>>>>
>>>>I am not clear on how to set MDS active/passive failover.
>>>>
>>>>Here is my MDS setup:
>>>>====================>>>>--add node --node mds01 ||
exit 10
>>>>--add node --node mds02 || exit 11
>>>>
>>>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20
>>>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21
>>>>
>>>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
>>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 ||
exit 30
>>>>
>>>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
>>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
>>>>
>>>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt
1
>>>>--stripe_pattern 0 || exit 32
>>>>
>>>>
>>>>Then on my client in /etc/fstab I have:
>>>>======================================>>>>mds01:/mds1/client
/mnt/lustre             lustre  rw              0
> 
> 0
> 
>>>>When I say take down mds01 and want mds02 to take over (for an
> 
> upgrade
> 
>>>>or something) how do my client now know to contact mds02 instead
> 
> mds01
> 
>>?
>>
>>
>>>>Wouldn''t a floating IP address make sense in this case?
>>>>
>>>>Here here is appreciated.
>>>
>>>
>>>Steve -
>>>Two answers, future and current.
>>>
>>>Our new mountconfig will allow you to specify
>>>multiple MDSs as part of the mount command. Unfortunately, the new
>>>mountconfig hasn''t been released yet, it will be soon. For
now, you
>>>will not be able to specify the client mount in /etc/fstab.
>>>Instead, you will have to use lconf to mount the clients.
>>>
>>>In your setup script, be sure you specify a client
>>> (the single line will cover any number of actual clients):
>>>
>>>--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1
>>>
>>>Then on the client node:
>>>''lconf --node client <your xml file>''
>>>
>>>will mount the filesystem - failover of the mds will be transparent
to
>>>the clients, they will know to try the secondary mds if the primary
is
>>>unavailable.
>>>For now, you may wish to put the lconf command in a script, and have
>>
>>the
>>
>>
>>>script called as part of your normal startup.
>>>
>>>cliffw
>>>
>>>
>>>
>>>
>>>
>>>>Thanks,
>>>>Steve
>>>>
>>>>-----Original Message-----
>>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>>Sent: Tuesday, September 06, 2005 12:31 PM
>>>>To: Nielsen, Steve
>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>support@clusterfs.com
>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>
>>>>Nielsen, Steve wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>So for both OSS/MDS i don''t need to float any ips
between the boxes
>>>>
>>>>and
>>>>
>>>>
>>>>
>>>>
>>>>>i don''t need to restart the services.  the only
thing then via
>>>>
>>>>heartbeat
>>>>
>>>>
>>>>
>>>>
>>>>>I need to do is detect the other side down and
"stonith" it? Then
>>>>
>>>>things
>>>>
>>>>
>>>>
>>>>
>>>>>will be good ?
>>>>
>>>>
>>>>Steve -
>>>>Just wanted to check back and see how you were doing with this.
>>>>One thing I didn''t mention, when failing back the
service from
>>
>>secondary
>>
>>
>>>>to primary, you should stop the service with ''lconf
--failover'' which
>>>>will be quicker.
>>>>cliffw
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Steve
>>>>>
>>>>>-----Original Message-----
>>>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>>>Sent: Thursday, September 01, 2005 11:59 AM
>>>>>To: Nielsen, Steve
>>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>>support@clusterfs.com
>>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>>
>>>>>Nielsen, Steve wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I am working on setting this up now. Just need to trudge
through
> 
> it.
> 
>>>>>>I will share my experiences when done.
>>>>>>
>>>>>>Quick question.  For failover over to work I need to
have ips that
>>>>>
>>>>>float
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>between the devices.  Won''t this require me to
restart the lustre
>>>>>>service ? (i am on rhel 4).  So a "service lustre
restart" should
>>
>>work
>>
>>
>>>>>>right ?
>>>>>
>>>>>
>>>>>Unfortunately, we do not support IP takeover at this time.
What we
> 
> do
> 
>>>>is
>>>>
>>>>
>>>>
>>>>
>>>>>this:
>>>>>The servers are configured with a specific IP.
>>>>>The clients know about both IPs, and will attempt to connect
in a
>>>>>round-robin fashion until they succeed.
>>>>>
>>>>>Here''s a typically configuration for OST failover:
(servers orlando
>>>>
>>>>and 
>>>>
>>>>
>>>>
>>>>
>>>>>oscar)
>>>>>
>>>>>--add ost --node orlando --ost ost1-home --failover --group
orlando
> 
> \
> 
>>>>>--lov lov-home --dev /dev/ost1
>>>>>--add ost --node orlando --ost ost2-home --failover \
>>>>>--lov lov-home --dev /dev/ost2
>>>>>
>>>>>--add ost --node oscar --ost ost2-home --failover --group
oscar \
>>>>>--lov lov-home --dev /dev/ost2
>>>>>--add ost --node oscar --ost ost1-home --failover \
>>>>>--lov lov-home --dev /dev/ost1
>>>>>
>>>>>cliffw
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Steve
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>>>>Sent: Thursday, September 01, 2005 11:43 AM
>>>>>>To: Nielsen, Steve
>>>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey
Denworth;
>>>>>>support@clusterfs.com
>>>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>>>
>>>>>>Nielsen, Steve wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>I have a couple questions about setting up failver
with Lustre.
> 
> Any
> 
>>>>>>help
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>is appreciated.
>>>>>>>
>>>>>>>Definitions:
>>>>>>>===========>>>>>>>ssd - shared
storage device
>>>>>>>oss - object storage server
>>>>>>>mds - meta data server
>>>>>>>
>>>>>>>Here is my OSS setup:
>>>>>>>====================>>>>>>>oss1
#----(failover)----# ssd2
>>>>>>>#                          #
>>>>>>>|                          |
>>>>>>>|                          |
>>>>>>>|(primary)                 | (primary)
>>>>>>>|                          |
>>>>>>>|                          |
>>>>>>>#                          #
>>>>>>>ssd1 #----(failover)----# oss2
>>>>>>>
>>>>>>>So:
>>>>>>>oss1 is primary for ssd1
>>>>>>>oss1 is failover for ssd2
>>>>>>>oss2 is primary for ssd2
>>>>>>>oss2 is failover for ssd1
>>>>>>>
>>>>>>>Here is my MDS setup:
>>>>>>>====================>>>>>>>+-----#
ssd3 #------+
>>>>>>>|                   |
>>>>>>>| (primary)         | (failover)
>>>>>>>|                   |
>>>>>>>|                   |
>>>>>>>#                   #
>>>>>>>mds1               mds2
>>>>>>>
>>>>>>>So:
>>>>>>>mds1 is primary for ssd3
>>>>>>>mds2 is failover for ssd3
>>>>>>>
>>>>>>>I am now at the stage where I want to implement
failover for the
>>>>
>>>>OSS''s
>>>>
>>>>
>>>>
>>>>
>>>>>>>and MDS seutp.
>>>>>>>
>>>>>>>I would prefer to use heartbeat from linux-ha.org
for the
> 
> following
> 
>>>>>>>reasons:
>>>>>>>* it''s actively maintained
>>>>>>>* we use it in house extensively
>>>>>>>* i am very familiar with it
>>>>>>>* its straight forward to use (start/stop resources
on failover)
>>>>>>>
>>>>>>>Has anyone else used heartbeat to do failover? Are
there docs that
>>
>>I
>>
>>
>>>>>>can
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>be pointed on this specific type of setup?
>>>>>>
>>>>>>
>>>>>>Heartbeat should work fine. We have customers using Red
Hat''s
>>>>>>CluManager, which is similar. We are currently writing
the docs, I
>>
>>am 
>>
>>
>>>>>>very interested in incorporating your experiences,
especially since
>>>>>
>>>>>you 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>have Heartbeat familiarity.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>I know how to configure heartbeat and to use STONITH
to make sure
>>
>>the
>>
>>
>>>>>>>secondary device will not write to the shared
storage device at
> 
> the
> 
>>>>>>>same time as the primary device.
>>>>>>
>>>>>>That''s the key - the shared storage must never
be touched by two
>>>>>
>>>>>servers
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>at once.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>My main questions lie in what resources to
stop/start on failver
>>>>
>>>>since
>>>>
>>>>
>>>>
>>>>
>>>>>>>both (for example) OSS''s are active for one
OST and failver for
> 
> the
> 
>>>>>>>other OST.
>>>>>>
>>>>>>You should never have to stop the active primary OST
when failing
>>
>>over
>>
>>
>>>>>>the other OST to the secondary. Failover is generally
transparent
> 
> to
> 
>>>>>the
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>clients, their applications may block for a moment
during failover,
>>>>>
>>>>>but 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>should continue on with the new server. Failback of
course requires
>>>>>
>>>>>you 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>to stop the service on the secondary before starting it
on the
>>>>>
>>>>>primary.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>cliffw
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Thanks,
>>>>>>>Steve
>>>>>>
>>>>>>
>

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> Cliff,
> 
> I am not clear on how to set MDS active/passive failover.
> 
> Here is my MDS setup:
> ====================> --add node --node mds01 || exit 10
> --add node --node mds02 || exit 11
> 
> --add net --node mds01 --nid mds01 --nettype tcp || exit 20
> --add net --node mds02 --nid mds02 --nettype tcp || exit 21
> 
> --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30
> 
> --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
> 
> --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
> --stripe_pattern 0 || exit 32
> 
> 
> Then on my client in /etc/fstab I have:
> ======================================> mds01:/mds1/client /mnt/lustre  
lustre  rw              0 0
> 
> When I say take down mds01 and want mds02 to take over (for an upgrade
> or something) how do my client now know to contact mds02 instead mds01 ?
> Wouldn''t a floating IP address make sense in this case?
> 
> Here here is appreciated.
Steve -
Two answers, future and current.

Our new mountconfig will allow you to specify
multiple MDSs as part of the mount command. Unfortunately, the new
mountconfig hasn''t been released yet, it will be soon. For now, you
will not be able to specify the client mount in /etc/fstab.
Instead, you will have to use lconf to mount the clients.

In your setup script, be sure you specify a client
  (the single line will cover any number of actual clients):

--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1

Then on the client node:
''lconf --node client <your xml file>''

will mount the filesystem - failover of the mds will be transparent to
the clients, they will know to try the secondary mds if the primary is
unavailable.
For now, you may wish to put the lconf command in a script, and have the
script called as part of your normal startup.

cliffw

> 
> Thanks,
> Steve
> 
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com] 
> Sent: Tuesday, September 06, 2005 12:31 PM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
> 
> Nielsen, Steve wrote:
> 
>>So for both OSS/MDS i don''t need to float any ips between the
boxes
> 
> and
> 
>>i don''t need to restart the services.  the only thing then via
> 
> heartbeat
> 
>>I need to do is detect the other side down and "stonith" it?
Then
> 
> things
> 
>>will be good ?
> 
> 
> Steve -
> Just wanted to check back and see how you were doing with this.
> One thing I didn''t mention, when failing back the service from
secondary
> to primary, you should stop the service with ''lconf
--failover'' which
> will be quicker.
> cliffw
> 
> 
> 
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>Sent: Thursday, September 01, 2005 11:59 AM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>I am working on setting this up now. Just need to trudge through it.
>>>
>>>I will share my experiences when done.
>>>
>>>Quick question.  For failover over to work I need to have ips that
>>
>>float
>>
>>
>>>between the devices.  Won''t this require me to restart the
lustre
>>>service ? (i am on rhel 4).  So a "service lustre restart"
should work
>>>right ?
>>
>>
>>Unfortunately, we do not support IP takeover at this time. What we do
> 
> is
> 
>>this:
>>The servers are configured with a specific IP.
>>The clients know about both IPs, and will attempt to connect in a 
>>round-robin fashion until they succeed.
>>
>>Here''s a typically configuration for OST failover: (servers
orlando
> 
> and 
> 
>>oscar)
>>
>>--add ost --node orlando --ost ost1-home --failover --group orlando \
>>--lov lov-home --dev /dev/ost1
>>--add ost --node orlando --ost ost2-home --failover \
>>--lov lov-home --dev /dev/ost2
>>
>>--add ost --node oscar --ost ost2-home --failover --group oscar \
>>--lov lov-home --dev /dev/ost2
>>--add ost --node oscar --ost ost1-home --failover \
>>--lov lov-home --dev /dev/ost1
>>
>>cliffw
>>
>>
>>>Steve
>>>
>>>-----Original Message-----
>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>Sent: Thursday, September 01, 2005 11:43 AM
>>>To: Nielsen, Steve
>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>support@clusterfs.com
>>>Subject: Re: setting up failover in lustre... any recommendations?
>>>
>>>Nielsen, Steve wrote:
>>>
>>>
>>>
>>>>Hi,
>>>>
>>>>I have a couple questions about setting up failver with Lustre.
Any
>>>
>>>help
>>>
>>>
>>>
>>>>is appreciated.
>>>>
>>>>Definitions:
>>>>===========>>>>ssd - shared storage device
>>>>oss - object storage server
>>>>mds - meta data server
>>>>
>>>>Here is my OSS setup:
>>>>====================>>>> oss1 #----(failover)----#
ssd2
>>>>  #                          #
>>>>  |                          |
>>>>  |                          |
>>>>  |(primary)                 | (primary)
>>>>  |                          |
>>>>  |                          |
>>>>  #                          #
>>>> ssd1 #----(failover)----# oss2
>>>>
>>>>So:
>>>>  oss1 is primary for ssd1
>>>>  oss1 is failover for ssd2
>>>>  oss2 is primary for ssd2
>>>>  oss2 is failover for ssd1
>>>>
>>>>Here is my MDS setup:
>>>>====================>>>>   +-----# ssd3 #------+
>>>>   |                   |
>>>>   | (primary)         | (failover)
>>>>   |                   |
>>>>   |                   |
>>>>   #                   #
>>>>  mds1               mds2
>>>>
>>>>So:
>>>>  mds1 is primary for ssd3
>>>>  mds2 is failover for ssd3
>>>>
>>>>I am now at the stage where I want to implement failover for the
> 
> OSS''s
> 
>>>>and MDS seutp.
>>>>
>>>>I would prefer to use heartbeat from linux-ha.org for the
following
>>>>reasons:
>>>>  * it''s actively maintained
>>>>  * we use it in house extensively
>>>>  * i am very familiar with it
>>>>  * its straight forward to use (start/stop resources on
failover)
>>>>
>>>>Has anyone else used heartbeat to do failover? Are there docs
that I
>>>
>>>can
>>>
>>>
>>>
>>>>be pointed on this specific type of setup?
>>>
>>>
>>>Heartbeat should work fine. We have customers using Red
Hat''s
>>>CluManager, which is similar. We are currently writing the docs, I
am
>>>very interested in incorporating your experiences, especially since
>>
>>you 
>>
>>
>>>have Heartbeat familiarity.
>>>
>>>
>>>
>>>>I know how to configure heartbeat and to use STONITH to make
sure the
>>>>secondary device will not write to the shared storage device at
the
>>>>same time as the primary device.
>>>
>>>That''s the key - the shared storage must never be touched
by two
>>
>>servers
>>
>>
>>>at once.
>>>
>>>
>>>
>>>>My main questions lie in what resources to stop/start on failver
> 
> since
> 
>>>>both (for example) OSS''s are active for one OST and
failver for the
>>>>other OST.
>>>
>>>You should never have to stop the active primary OST when failing
over
>>
>>
>>>the other OST to the secondary. Failover is generally transparent to
>>
>>the
>>
>>
>>>clients, their applications may block for a moment during failover,
>>
>>but 
>>
>>
>>>should continue on with the new server. Failback of course requires
>>
>>you 
>>
>>
>>>to stop the service on the secondary before starting it on the
>>
>>primary.
>>
>>
>>>cliffw
>>>
>>>
>>>
>>>
>>>>Thanks,
>>>>Steve
>>>
>>>
>

Nielsen, Steve

2006-May-19 07:36 UTC

head link

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

Cliff,

I changed my clients to use the lconf call as you recommend below.

I am able to mount the file system without issues on the client using this
method.

However when I shutdown lustre on mds01 (my main mds server) as a test mds02
does not take over.

Are there commands I need to run on my standby mds02 server to enable it to take
over?

Also when things failback to mds01 is there something I need to run to enable
that?

Thanks,
Steve


-----Original Message-----
From: cliff white [mailto:cliffw@clusterfs.com]
Sent: Wed 9/7/2005 5:55 PM
To: Nielsen, Steve
Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; support@clusterfs.com
Subject: Re: setting up failover in lustre... any recommendations?
=20
Nielsen, Steve wrote:> Cliff,
>=20
> I am not clear on how to set MDS active/passive failover.
>=20
> Here is my MDS setup:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --add node --node mds01 || exit 10
> --add node --node mds02 || exit 11
>=20
> --add net --node mds01 --nid mds01 --nettype tcp || exit 20
> --add net --node mds02 --nid mds02 --nettype tcp || exit 21
>=20
> --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30
>=20
> --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
>=20
> --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
> --stripe_pattern 0 || exit 32
>=20
>=20
> Then on my client in /etc/fstab I have:
>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> mds01:/mds1/client /mnt/lustre             lustre  rw              0 0
>=20
> When I say take down mds01 and want mds02 to take over (for an upgrade
> or something) how do my client now know to contact mds02 instead mds01 ?
> Wouldn''t a floating IP address make sense in this case?
>=20
> Here here is appreciated.
Steve -
Two answers, future and current.

Our new mountconfig will allow you to specify
multiple MDSs as part of the mount command. Unfortunately, the new
mountconfig hasn''t been released yet, it will be soon. For now, you
will not be able to specify the client mount in /etc/fstab.
Instead, you will have to use lconf to mount the clients.

In your setup script, be sure you specify a client
  (the single line will cover any number of actual clients):

--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1

Then on the client node:
''lconf --node client <your xml file>''

will mount the filesystem - failover of the mds will be transparent to
the clients, they will know to try the secondary mds if the primary is
unavailable.
For now, you may wish to put the lconf command in a script, and have the
script called as part of your normal startup.

cliffw

>=20
> Thanks,
> Steve
>=20
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com]=20
> Sent: Tuesday, September 06, 2005 12:31 PM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
>=20
> Nielsen, Steve wrote:
>=20
>>So for both OSS/MDS i don''t need to float any ips between the
boxes
>=20
> and
>=20
>>i don''t need to restart the services.  the only thing then via
>=20
> heartbeat
>=20
>>I need to do is detect the other side down and "stonith" it?
Then
>=20
> things
>=20
>>will be good ?
>=20
>=20
> Steve -
> Just wanted to check back and see how you were doing with this.
> One thing I didn''t mention, when failing back the service from
secondary
> to primary, you should stop the service with ''lconf
--failover'' which
> will be quicker.
> cliffw
>=20
>=20
>=20
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>Sent: Thursday, September 01, 2005 11:59 AM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>I am working on setting this up now. Just need to trudge through it.
>>>
>>>I will share my experiences when done.
>>>
>>>Quick question.  For failover over to work I need to have ips that
>>
>>float
>>
>>
>>>between the devices.  Won''t this require me to restart the
lustre
>>>service ? (i am on rhel 4).  So a "service lustre restart"
should work
>>>right ?
>>
>>
>>Unfortunately, we do not support IP takeover at this time. What we do
>=20
> is
>=20
>>this:
>>The servers are configured with a specific IP.
>>The clients know about both IPs, and will attempt to connect in a=20
>>round-robin fashion until they succeed.
>>
>>Here''s a typically configuration for OST failover: (servers
orlando
>=20
> and=20
>=20
>>oscar)
>>
>>--add ost --node orlando --ost ost1-home --failover --group orlando \
>>--lov lov-home --dev /dev/ost1
>>--add ost --node orlando --ost ost2-home --failover \
>>--lov lov-home --dev /dev/ost2
>>
>>--add ost --node oscar --ost ost2-home --failover --group oscar \
>>--lov lov-home --dev /dev/ost2
>>--add ost --node oscar --ost ost1-home --failover \
>>--lov lov-home --dev /dev/ost1
>>
>>cliffw
>>
>>
>>>Steve
>>>
>>>-----Original Message-----
>>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>>Sent: Thursday, September 01, 2005 11:43 AM
>>>To: Nielsen, Steve
>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>support@clusterfs.com
>>>Subject: Re: setting up failover in lustre... any recommendations?
>>>
>>>Nielsen, Steve wrote:
>>>
>>>
>>>
>>>>Hi,
>>>>
>>>>I have a couple questions about setting up failver with Lustre.
Any
>>>
>>>help
>>>
>>>
>>>
>>>>is appreciated.
>>>>
>>>>Definitions:
>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>ssd - shared storage device
>>>>oss - object storage server
>>>>mds - meta data server
>>>>
>>>>Here is my OSS setup:
>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>> oss1 #----(failover)----# ssd2
>>>>  #                          #
>>>>  |                          |
>>>>  |                          |
>>>>  |(primary)                 | (primary)
>>>>  |                          |
>>>>  |                          |
>>>>  #                          #
>>>> ssd1 #----(failover)----# oss2
>>>>
>>>>So:
>>>>  oss1 is primary for ssd1
>>>>  oss1 is failover for ssd2
>>>>  oss2 is primary for ssd2
>>>>  oss2 is failover for ssd1
>>>>
>>>>Here is my MDS setup:
>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>   +-----# ssd3 #------+
>>>>   |                   |
>>>>   | (primary)         | (failover)
>>>>   |                   |
>>>>   |                   |
>>>>   #                   #
>>>>  mds1               mds2
>>>>
>>>>So:
>>>>  mds1 is primary for ssd3
>>>>  mds2 is failover for ssd3
>>>>
>>>>I am now at the stage where I want to implement failover for the
>=20
> OSS''s
>=20
>>>>and MDS seutp.
>>>>
>>>>I would prefer to use heartbeat from linux-ha.org for the
following
>>>>reasons:
>>>>  * it''s actively maintained
>>>>  * we use it in house extensively
>>>>  * i am very familiar with it
>>>>  * its straight forward to use (start/stop resources on
failover)
>>>>
>>>>Has anyone else used heartbeat to do failover? Are there docs
that I
>>>
>>>can
>>>
>>>
>>>
>>>>be pointed on this specific type of setup?
>>>
>>>
>>>Heartbeat should work fine. We have customers using Red
Hat''s=20
>>>CluManager, which is similar. We are currently writing the docs, I
am
>>>very interested in incorporating your experiences, especially since
>>
>>you=20
>>
>>
>>>have Heartbeat familiarity.
>>>
>>>
>>>
>>>>I know how to configure heartbeat and to use STONITH to make
sure the
>>>>secondary device will not write to the shared storage device at
the
>>>>same time as the primary device.
>>>
>>>That''s the key - the shared storage must never be touched
by two
>>
>>servers
>>
>>
>>>at once.
>>>
>>>
>>>
>>>>My main questions lie in what resources to stop/start on failver
>=20
> since
>=20
>>>>both (for example) OSS''s are active for one OST and
failver for the
>>>>other OST.
>>>
>>>You should never have to stop the active primary OST when failing
over
>>
>>
>>>the other OST to the secondary. Failover is generally transparent to
>>
>>the
>>
>>
>>>clients, their applications may block for a moment during failover,
>>
>>but=20
>>
>>
>>>should continue on with the new server. Failback of course requires
>>
>>you=20
>>
>>
>>>to stop the service on the secondary before starting it on the
>>
>>primary.
>>
>>
>>>cliffw
>>>
>>>
>>>
>>>
>>>>Thanks,
>>>>Steve
>>>
>>>
>=20

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> Cliff,
> 
> I changed my clients to use the lconf call as you recommend below.
> 
> I am able to mount the file system without issues on the client using this
method.
> 
> However when I shutdown lustre on mds01 (my main mds server) as a test
mds02 does not take over.
> 
> Are there commands I need to run on my standby mds02 server to enable it to
take over?
> 
> Also when things failback to mds01 is there something I need to run to
enable that?I think I may need some more detail on your setup, please
expand if necessary.

First, only one MDS should be running at a time. This is
very important - you show _never_ have both servers in the
failover pair active at the same time. Bad Things can happen
to your metadata.

For failover, you will have to start the second mds server
after the first mds is down. Normally, this is done with an
HA package (Heartbeat, CluManager, etc). When the secondary
server starts, it should access the shared storage and go.

If that''s not happening, we need to see the logs (syslog,dmesg,
anything on the console) there should be errors. Check logs on the
MDS and OST.

For failback, you should stop mds02 with the --failover option.
This will do a quick shutdown - then start mds01.
cliffw
> 
> Thanks,
> Steve
> 
> 
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com]
> Sent: Wed 9/7/2005 5:55 PM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
>  
> Nielsen, Steve wrote:
> 
>>Cliff,
>>
>>I am not clear on how to set MDS active/passive failover.
>>
>>Here is my MDS setup:
>>====================>>--add node --node mds01 || exit 10
>>--add node --node mds02 || exit 11
>>
>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20
>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21
>>
>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30
>>
>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
>>
>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
>>--stripe_pattern 0 || exit 32
>>
>>
>>Then on my client in /etc/fstab I have:
>>======================================>>mds01:/mds1/client
/mnt/lustre             lustre  rw              0 0
>>
>>When I say take down mds01 and want mds02 to take over (for an upgrade
>>or something) how do my client now know to contact mds02 instead mds01 ?
>>Wouldn''t a floating IP address make sense in this case?
>>
>>Here here is appreciated.
> 
> 
> Steve -
> Two answers, future and current.
> 
> Our new mountconfig will allow you to specify
> multiple MDSs as part of the mount command. Unfortunately, the new
> mountconfig hasn''t been released yet, it will be soon. For now,
you
> will not be able to specify the client mount in /etc/fstab.
> Instead, you will have to use lconf to mount the clients.
> 
> In your setup script, be sure you specify a client
>   (the single line will cover any number of actual clients):
> 
> --add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1
> 
> Then on the client node:
> ''lconf --node client <your xml file>''
> 
> will mount the filesystem - failover of the mds will be transparent to
> the clients, they will know to try the secondary mds if the primary is
> unavailable.
> For now, you may wish to put the lconf command in a script, and have the
> script called as part of your normal startup.
> 
> cliffw
> 
> 
> 
>>Thanks,
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>Sent: Tuesday, September 06, 2005 12:31 PM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>So for both OSS/MDS i don''t need to float any ips between
the boxes
>>
>>and
>>
>>
>>>i don''t need to restart the services.  the only thing then
via
>>
>>heartbeat
>>
>>
>>>I need to do is detect the other side down and "stonith"
it? Then
>>
>>things
>>
>>
>>>will be good ?
>>
>>
>>Steve -
>>Just wanted to check back and see how you were doing with this.
>>One thing I didn''t mention, when failing back the service from
secondary
>>to primary, you should stop the service with ''lconf
--failover'' which
>>will be quicker.
>>cliffw
>>
>>
>>
>>
>>>Steve
>>>
>>>-----Original Message-----
>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>Sent: Thursday, September 01, 2005 11:59 AM
>>>To: Nielsen, Steve
>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>support@clusterfs.com
>>>Subject: Re: setting up failover in lustre... any recommendations?
>>>
>>>Nielsen, Steve wrote:
>>>
>>>
>>>
>>>>I am working on setting this up now. Just need to trudge through
it.
>>>>
>>>>I will share my experiences when done.
>>>>
>>>>Quick question.  For failover over to work I need to have ips
that
>>>
>>>float
>>>
>>>
>>>
>>>>between the devices.  Won''t this require me to restart
the lustre
>>>>service ? (i am on rhel 4).  So a "service lustre
restart" should work
>>>>right ?
>>>
>>>
>>>Unfortunately, we do not support IP takeover at this time. What we
do
>>
>>is
>>
>>
>>>this:
>>>The servers are configured with a specific IP.
>>>The clients know about both IPs, and will attempt to connect in a 
>>>round-robin fashion until they succeed.
>>>
>>>Here''s a typically configuration for OST failover: (servers
orlando
>>
>>and 
>>
>>
>>>oscar)
>>>
>>>--add ost --node orlando --ost ost1-home --failover --group orlando
\
>>>--lov lov-home --dev /dev/ost1
>>>--add ost --node orlando --ost ost2-home --failover \
>>>--lov lov-home --dev /dev/ost2
>>>
>>>--add ost --node oscar --ost ost2-home --failover --group oscar \
>>>--lov lov-home --dev /dev/ost2
>>>--add ost --node oscar --ost ost1-home --failover \
>>>--lov lov-home --dev /dev/ost1
>>>
>>>cliffw
>>>
>>>
>>>
>>>>Steve
>>>>
>>>>-----Original Message-----
>>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>>Sent: Thursday, September 01, 2005 11:43 AM
>>>>To: Nielsen, Steve
>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>support@clusterfs.com
>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>
>>>>Nielsen, Steve wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>Hi,
>>>>>
>>>>>I have a couple questions about setting up failver with
Lustre. Any
>>>>
>>>>help
>>>>
>>>>
>>>>
>>>>
>>>>>is appreciated.
>>>>>
>>>>>Definitions:
>>>>>===========>>>>>ssd - shared storage device
>>>>>oss - object storage server
>>>>>mds - meta data server
>>>>>
>>>>>Here is my OSS setup:
>>>>>====================>>>>>oss1
#----(failover)----# ssd2
>>>>> #                          #
>>>>> |                          |
>>>>> |                          |
>>>>> |(primary)                 | (primary)
>>>>> |                          |
>>>>> |                          |
>>>>> #                          #
>>>>>ssd1 #----(failover)----# oss2
>>>>>
>>>>>So:
>>>>> oss1 is primary for ssd1
>>>>> oss1 is failover for ssd2
>>>>> oss2 is primary for ssd2
>>>>> oss2 is failover for ssd1
>>>>>
>>>>>Here is my MDS setup:
>>>>>====================>>>>>  +-----# ssd3
#------+
>>>>>  |                   |
>>>>>  | (primary)         | (failover)
>>>>>  |                   |
>>>>>  |                   |
>>>>>  #                   #
>>>>> mds1               mds2
>>>>>
>>>>>So:
>>>>> mds1 is primary for ssd3
>>>>> mds2 is failover for ssd3
>>>>>
>>>>>I am now at the stage where I want to implement failover for
the
>>
>>OSS''s
>>
>>
>>>>>and MDS seutp.
>>>>>
>>>>>I would prefer to use heartbeat from linux-ha.org for the
following
>>>>>reasons:
>>>>> * it''s actively maintained
>>>>> * we use it in house extensively
>>>>> * i am very familiar with it
>>>>> * its straight forward to use (start/stop resources on
failover)
>>>>>
>>>>>Has anyone else used heartbeat to do failover? Are there
docs that I
>>>>
>>>>can
>>>>
>>>>
>>>>
>>>>
>>>>>be pointed on this specific type of setup?
>>>>
>>>>
>>>>Heartbeat should work fine. We have customers using Red
Hat''s
>>>>CluManager, which is similar. We are currently writing the docs,
I am
>>>>very interested in incorporating your experiences, especially
since
>>>
>>>you 
>>>
>>>
>>>
>>>>have Heartbeat familiarity.
>>>>
>>>>
>>>>
>>>>
>>>>>I know how to configure heartbeat and to use STONITH to make
sure the
>>>>>secondary device will not write to the shared storage device
at the
>>>>>same time as the primary device.
>>>>
>>>>That''s the key - the shared storage must never be
touched by two
>>>
>>>servers
>>>
>>>
>>>
>>>>at once.
>>>>
>>>>
>>>>
>>>>
>>>>>My main questions lie in what resources to stop/start on
failver
>>
>>since
>>
>>
>>>>>both (for example) OSS''s are active for one OST and
failver for the
>>>>>other OST.
>>>>
>>>>You should never have to stop the active primary OST when
failing over
>>>
>>>
>>>>the other OST to the secondary. Failover is generally
transparent to
>>>
>>>the
>>>
>>>
>>>
>>>>clients, their applications may block for a moment during
failover,
>>>
>>>but 
>>>
>>>
>>>
>>>>should continue on with the new server. Failback of course
requires
>>>
>>>you 
>>>
>>>
>>>
>>>>to stop the service on the secondary before starting it on the
>>>
>>>primary.
>>>
>>>
>>>
>>>>cliffw
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Thanks,
>>>>>Steve
>>>>
>>>>

Nielsen, Steve

2006-May-19 07:36 UTC

head link

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

Cliff,

I am a little lost then as to what the command lines would be to
start/stop MDS within heartbeat.  That is where my confusion lies.  If
you could help with the commandlines that would be great.

Here is a guess at at what I need to do:

- mds01 and mds02 are my MDS servers with a shared storage device for
the metadata that they are both connected to.

- using heartbeat to control failover between the two MDS servers

- both servers are configured to NOT bring up the lustre service via the
initscripts (heatbeat will control this)=20

- configure mds01 in heartbeat as the "prefered" server to run the
active MDS (initially)

What commandline would I run to bring up MDS then? The init script that
comes with lustre or a "raw" lconf command?

On failover heartbeat on mds02 needs to execute a series of commands to
take over MDS from mds01:
	- stonith mds01 (that way it reboots, comes up, sees mds02 is
the master and does nothing)
	- start lustre locally on mds02.  What is the command line for
this ? I suspect its an lconf command?

On failing back to mds01 what commands do I need to run?=20
	- stonith mds02
	- start lustre locally on mds01. What is the command line for
this ? I suspect its an lconf command?

As a side note:
In trying different things to get MDS to failover I get the error
message below: Do you know what it means? I am running the same version
of lustre everywhere (kernel, modules, ..)

2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol
error connecting to host 10.10.1.2 on port 988: Is it running a
compatible version of Lustre?

Thanks,
Steve

-----Original Message-----
From: cliff white [mailto:cliffw@clusterfs.com]=20
Sent: Thursday, September 08, 2005 11:08 AM
To: Nielsen, Steve
Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
support@clusterfs.com
Subject: Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> Cliff,
>=20
> I changed my clients to use the lconf call as you recommend below.
>=20
> I am able to mount the file system without issues on the client using
this method.>=20
> However when I shutdown lustre on mds01 (my main mds server) as a test
mds02 does not take over.>=20
> Are there commands I need to run on my standby mds02 server to enable
it to take over?>=20
> Also when things failback to mds01 is there something I need to run toenable that?
I think I may need some more detail on your setup, please
expand if necessary.

First, only one MDS should be running at a time. This is
very important - you show _never_ have both servers in the
failover pair active at the same time. Bad Things can happen
to your metadata.

For failover, you will have to start the second mds server
after the first mds is down. Normally, this is done with an
HA package (Heartbeat, CluManager, etc). When the secondary
server starts, it should access the shared storage and go.

If that''s not happening, we need to see the logs (syslog,dmesg,
anything on the console) there should be errors. Check logs on the
MDS and OST.

For failback, you should stop mds02 with the --failover option.
This will do a quick shutdown - then start mds01.
cliffw
>=20
> Thanks,
> Steve
>=20
>=20
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com]
> Sent: Wed 9/7/2005 5:55 PM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
support@clusterfs.com> Subject: Re: setting up failover in lustre... any recommendations?
> =20
> Nielsen, Steve wrote:
>=20
>>Cliff,
>>
>>I am not clear on how to set MDS active/passive failover.
>>
>>Here is my MDS setup:
>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>--add node --node mds01 || exit 10
>>--add node --node mds02 || exit 11
>>
>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20
>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21
>>
>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30
>>
>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
>>
>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
>>--stripe_pattern 0 || exit 32
>>
>>
>>Then on my client in /etc/fstab I have:
>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>mds01:/mds1/client /mnt/lustre             lustre  rw              0 0
>>
>>When I say take down mds01 and want mds02 to take over (for an upgrade
>>or something) how do my client now know to contact mds02 instead mds01
?>>Wouldn''t a floating IP address make sense in this case?
>>
>>Here here is appreciated.
>=20
>=20
> Steve -
> Two answers, future and current.
>=20
> Our new mountconfig will allow you to specify
> multiple MDSs as part of the mount command. Unfortunately, the new
> mountconfig hasn''t been released yet, it will be soon. For now,
you
> will not be able to specify the client mount in /etc/fstab.
> Instead, you will have to use lconf to mount the clients.
>=20
> In your setup script, be sure you specify a client
>   (the single line will cover any number of actual clients):
>=20
> --add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1
>=20
> Then on the client node:
> ''lconf --node client <your xml file>''
>=20
> will mount the filesystem - failover of the mds will be transparent to
> the clients, they will know to try the secondary mds if the primary is
> unavailable.
> For now, you may wish to put the lconf command in a script, and have
the> script called as part of your normal startup.
>=20
> cliffw
>=20
>=20
>=20
>>Thanks,
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>Sent: Tuesday, September 06, 2005 12:31 PM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>So for both OSS/MDS i don''t need to float any ips between
the boxes
>>
>>and
>>
>>
>>>i don''t need to restart the services.  the only thing then
via
>>
>>heartbeat
>>
>>
>>>I need to do is detect the other side down and "stonith"
it? Then
>>
>>things
>>
>>
>>>will be good ?
>>
>>
>>Steve -
>>Just wanted to check back and see how you were doing with this.
>>One thing I didn''t mention, when failing back the service from
secondary>>to primary, you should stop the service with ''lconf
--failover'' which
>>will be quicker.
>>cliffw
>>
>>
>>
>>
>>>Steve
>>>
>>>-----Original Message-----
>>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>>Sent: Thursday, September 01, 2005 11:59 AM
>>>To: Nielsen, Steve
>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>support@clusterfs.com
>>>Subject: Re: setting up failover in lustre... any recommendations?
>>>
>>>Nielsen, Steve wrote:
>>>
>>>
>>>
>>>>I am working on setting this up now. Just need to trudge through
it.
>>>>
>>>>I will share my experiences when done.
>>>>
>>>>Quick question.  For failover over to work I need to have ips
that
>>>
>>>float
>>>
>>>
>>>
>>>>between the devices.  Won''t this require me to restart
the lustre
>>>>service ? (i am on rhel 4).  So a "service lustre
restart" should
work>>>>right ?
>>>
>>>
>>>Unfortunately, we do not support IP takeover at this time. What we
do
>>
>>is
>>
>>
>>>this:
>>>The servers are configured with a specific IP.
>>>The clients know about both IPs, and will attempt to connect in a=20
>>>round-robin fashion until they succeed.
>>>
>>>Here''s a typically configuration for OST failover: (servers
orlando
>>
>>and=20
>>
>>
>>>oscar)
>>>
>>>--add ost --node orlando --ost ost1-home --failover --group orlando
\
>>>--lov lov-home --dev /dev/ost1
>>>--add ost --node orlando --ost ost2-home --failover \
>>>--lov lov-home --dev /dev/ost2
>>>
>>>--add ost --node oscar --ost ost2-home --failover --group oscar \
>>>--lov lov-home --dev /dev/ost2
>>>--add ost --node oscar --ost ost1-home --failover \
>>>--lov lov-home --dev /dev/ost1
>>>
>>>cliffw
>>>
>>>
>>>
>>>>Steve
>>>>
>>>>-----Original Message-----
>>>>From: cliff white [mailto:cliffw@clusterfs.com]=20
>>>>Sent: Thursday, September 01, 2005 11:43 AM
>>>>To: Nielsen, Steve
>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>support@clusterfs.com
>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>
>>>>Nielsen, Steve wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>Hi,
>>>>>
>>>>>I have a couple questions about setting up failver with
Lustre. Any
>>>>
>>>>help
>>>>
>>>>
>>>>
>>>>
>>>>>is appreciated.
>>>>>
>>>>>Definitions:
>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>>ssd - shared storage device
>>>>>oss - object storage server
>>>>>mds - meta data server
>>>>>
>>>>>Here is my OSS setup:
>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>>oss1 #----(failover)----# ssd2
>>>>> #                          #
>>>>> |                          |
>>>>> |                          |
>>>>> |(primary)                 | (primary)
>>>>> |                          |
>>>>> |                          |
>>>>> #                          #
>>>>>ssd1 #----(failover)----# oss2
>>>>>
>>>>>So:
>>>>> oss1 is primary for ssd1
>>>>> oss1 is failover for ssd2
>>>>> oss2 is primary for ssd2
>>>>> oss2 is failover for ssd1
>>>>>
>>>>>Here is my MDS setup:
>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>>  +-----# ssd3 #------+
>>>>>  |                   |
>>>>>  | (primary)         | (failover)
>>>>>  |                   |
>>>>>  |                   |
>>>>>  #                   #
>>>>> mds1               mds2
>>>>>
>>>>>So:
>>>>> mds1 is primary for ssd3
>>>>> mds2 is failover for ssd3
>>>>>
>>>>>I am now at the stage where I want to implement failover for
the
>>
>>OSS''s
>>
>>
>>>>>and MDS seutp.
>>>>>
>>>>>I would prefer to use heartbeat from linux-ha.org for the
following
>>>>>reasons:
>>>>> * it''s actively maintained
>>>>> * we use it in house extensively
>>>>> * i am very familiar with it
>>>>> * its straight forward to use (start/stop resources on
failover)
>>>>>
>>>>>Has anyone else used heartbeat to do failover? Are there
docs that
I>>>>
>>>>can
>>>>
>>>>
>>>>
>>>>
>>>>>be pointed on this specific type of setup?
>>>>
>>>>
>>>>Heartbeat should work fine. We have customers using Red
Hat''s=20
>>>>CluManager, which is similar. We are currently writing the docs,
I
am=20>>>>very interested in incorporating your experiences, especially
since
>>>
>>>you=20
>>>
>>>
>>>
>>>>have Heartbeat familiarity.
>>>>
>>>>
>>>>
>>>>
>>>>>I know how to configure heartbeat and to use STONITH to make
sure
the>>>>>secondary device will not write to the shared storage device
at the
>>>>>same time as the primary device.
>>>>
>>>>That''s the key - the shared storage must never be
touched by two
>>>
>>>servers
>>>
>>>
>>>
>>>>at once.
>>>>
>>>>
>>>>
>>>>
>>>>>My main questions lie in what resources to stop/start on
failver
>>
>>since
>>
>>
>>>>>both (for example) OSS''s are active for one OST and
failver for the
>>>>>other OST.
>>>>
>>>>You should never have to stop the active primary OST when
failing
over>>>
>>>
>>>>the other OST to the secondary. Failover is generally
transparent to
>>>
>>>the
>>>
>>>
>>>
>>>>clients, their applications may block for a moment during
failover,
>>>
>>>but=20
>>>
>>>
>>>
>>>>should continue on with the new server. Failback of course
requires
>>>
>>>you=20
>>>
>>>
>>>
>>>>to stop the service on the secondary before starting it on the
>>>
>>>primary.
>>>
>>>
>>>
>>>>cliffw
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Thanks,
>>>>>Steve
>>>>
>>>>

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> So for both OSS/MDS i don''t need to float any ips between the
boxes and
> i don''t need to restart the services.  the only thing then via
heartbeat
> I need to do is detect the other side down and "stonith" it? Then
things
> will be good ?
Steve -
Just wanted to check back and see how you were doing with this.
One thing I didn''t mention, when failing back the service from
secondary
to primary, you should stop the service with ''lconf
--failover'' which
will be quicker.
cliffw

> 
> Steve
> 
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com] 
> Sent: Thursday, September 01, 2005 11:59 AM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
> 
> Nielsen, Steve wrote:
> 
>>I am working on setting this up now. Just need to trudge through it.
>>
>>I will share my experiences when done.
>>
>>Quick question.  For failover over to work I need to have ips that
> 
> float
> 
>>between the devices.  Won''t this require me to restart the
lustre
>>service ? (i am on rhel 4).  So a "service lustre restart"
should work
>>right ?
> 
> 
> Unfortunately, we do not support IP takeover at this time. What we do is
> this:
> The servers are configured with a specific IP.
> The clients know about both IPs, and will attempt to connect in a 
> round-robin fashion until they succeed.
> 
> Here''s a typically configuration for OST failover: (servers
orlando and
> oscar)
> 
> --add ost --node orlando --ost ost1-home --failover --group orlando \
> --lov lov-home --dev /dev/ost1
> --add ost --node orlando --ost ost2-home --failover \
> --lov lov-home --dev /dev/ost2
> 
> --add ost --node oscar --ost ost2-home --failover --group oscar \
> --lov lov-home --dev /dev/ost2
> --add ost --node oscar --ost ost1-home --failover \
> --lov lov-home --dev /dev/ost1
> 
> cliffw
> 
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>Sent: Thursday, September 01, 2005 11:43 AM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>Hi,
>>>
>>>I have a couple questions about setting up failver with Lustre. Any
>>
>>help
>>
>>
>>>is appreciated.
>>>
>>>Definitions:
>>>===========>>>ssd - shared storage device
>>>oss - object storage server
>>>mds - meta data server
>>>
>>>Here is my OSS setup:
>>>====================>>>  oss1 #----(failover)----# ssd2
>>>   #                          #
>>>   |                          |
>>>   |                          |
>>>   |(primary)                 | (primary)
>>>   |                          |
>>>   |                          |
>>>   #                          #
>>>  ssd1 #----(failover)----# oss2
>>>
>>>So:
>>>   oss1 is primary for ssd1
>>>   oss1 is failover for ssd2
>>>   oss2 is primary for ssd2
>>>   oss2 is failover for ssd1
>>>
>>>Here is my MDS setup:
>>>====================>>>    +-----# ssd3 #------+
>>>    |                   |
>>>    | (primary)         | (failover)
>>>    |                   |
>>>    |                   |
>>>    #                   #
>>>   mds1               mds2
>>>
>>>So:
>>>   mds1 is primary for ssd3
>>>   mds2 is failover for ssd3
>>>
>>>I am now at the stage where I want to implement failover for the
OSS''s
>>>and MDS seutp.
>>>
>>>I would prefer to use heartbeat from linux-ha.org for the following
>>>reasons:
>>>   * it''s actively maintained
>>>   * we use it in house extensively
>>>   * i am very familiar with it
>>>   * its straight forward to use (start/stop resources on failover)
>>>
>>>Has anyone else used heartbeat to do failover? Are there docs that I
>>
>>can
>>
>>
>>>be pointed on this specific type of setup?
>>
>>
>>Heartbeat should work fine. We have customers using Red Hat''s 
>>CluManager, which is similar. We are currently writing the docs, I am 
>>very interested in incorporating your experiences, especially since
> 
> you 
> 
>>have Heartbeat familiarity.
>>
>>
>>>I know how to configure heartbeat and to use STONITH to make sure
the
>>>secondary device will not write to the shared storage device at the
>>>same time as the primary device.
>>
>>That''s the key - the shared storage must never be touched by
two
> 
> servers
> 
>>at once.
>>
>>
>>>My main questions lie in what resources to stop/start on failver
since
>>>both (for example) OSS''s are active for one OST and failver
for the
>>>other OST.
>>
>>You should never have to stop the active primary OST when failing over
> 
> 
>>the other OST to the secondary. Failover is generally transparent to
> 
> the
> 
>>clients, their applications may block for a moment during failover,
> 
> but 
> 
>>should continue on with the new server. Failback of course requires
> 
> you 
> 
>>to stop the service on the secondary before starting it on the
> 
> primary.
> 
>>cliffw
>>
>>
>>
>>>Thanks,
>>>Steve
>>
>>
>

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

Ragnar Kjørstad wrote:> On Thu, Sep 01, 2005 at 01:03:11PM -0400, Nielsen, Steve wrote:
> 
>>So for both OSS/MDS i don''t need to float any ips between the
boxes and
>>i don''t need to restart the services.  the only thing then via
heartbeat
>>I need to do is detect the other side down and "stonith" it?
Then things
>>will be good ?
> 
> 
> Disclamer: I''ve just started looking at this too. 
> 
> I believe you also need to activate the OST on the secondary server by
> running the appropriate lconf command?
> (starting another resource, in linux-ha speak)
> 
> Basically a OCF Resource Agent is required - a shellscript that
> heartbeat can use to start and stop the service as required (not the
> whole lustre service, but the specific OST).
> 
> This can be done either by writing a new shell script, or extending the
> lustre init-script to double as a OCF Resource Agent. (it needs to be
> able to start/stop specific groups of OSTs rather than the whole
> service).
If you are using current Lustre, you can symlink /etc/init.d/lustre to 
the service name, and that script (the symlink) should work as a 
Resource Agent script. This has not been tested extensively, so feedback 
is appreciated.
cliffw
> 
> 
> 
>>>-----Original Message-----
>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>Sent: Thursday, September 01, 2005 11:43 AM
>>>To: Nielsen, Steve
>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>support@clusterfs.com
>>>Subject: Re: setting up failover in lustre... any recommendations?
>>>
>>>Nielsen, Steve wrote:
>>>
>>>
>>>>Hi,
>>>>
>>>>I have a couple questions about setting up failver with Lustre.
Any
>>>
>>>help
>>>
>>>
>>>>is appreciated.
>>>>
>>>>Definitions:
>>>>===========>>>>ssd - shared storage device
>>>>oss - object storage server
>>>>mds - meta data server
>>>>
>>>>Here is my OSS setup:
>>>>====================>>>>  oss1 #----(failover)----#
ssd2
>>>>   #                          #
>>>>   |                          |
>>>>   |                          |
>>>>   |(primary)                 | (primary)
>>>>   |                          |
>>>>   |                          |
>>>>   #                          #
>>>>  ssd1 #----(failover)----# oss2
>>>>
>>>>So:
>>>>   oss1 is primary for ssd1
>>>>   oss1 is failover for ssd2
>>>>   oss2 is primary for ssd2
>>>>   oss2 is failover for ssd1
>>>>
>>>>Here is my MDS setup:
>>>>====================>>>>    +-----# ssd3 #------+
>>>>    |                   |
>>>>    | (primary)         | (failover)
>>>>    |                   |
>>>>    |                   |
>>>>    #                   #
>>>>   mds1               mds2
>>>>
>>>>So:
>>>>   mds1 is primary for ssd3
>>>>   mds2 is failover for ssd3
>>>>
>>>>I am now at the stage where I want to implement failover for the
OSS''s
>>>>and MDS seutp.
>>>>
>>>>I would prefer to use heartbeat from linux-ha.org for the
following
>>>>reasons:
>>>>   * it''s actively maintained
>>>>   * we use it in house extensively
>>>>   * i am very familiar with it
>>>>   * its straight forward to use (start/stop resources on
failover)
>>>>
>>>>Has anyone else used heartbeat to do failover? Are there docs
that I
>>>
>>>can
>>>
>>>
>>>>be pointed on this specific type of setup?
>>>
>>>
>>>Heartbeat should work fine. We have customers using Red
Hat''s
>>>CluManager, which is similar. We are currently writing the docs, I
am
>>>very interested in incorporating your experiences, especially since
>>
>>you 
>>
>>>have Heartbeat familiarity.
>>>
>>>
>>>>I know how to configure heartbeat and to use STONITH to make
sure the
>>>>secondary device will not write to the shared storage device at
the
>>>>same time as the primary device.
>>>
>>>That''s the key - the shared storage must never be touched
by two
>>
>>servers
>>
>>>at once.
>>>
>>>
>>>>My main questions lie in what resources to stop/start on failver
since
>>>>both (for example) OSS''s are active for one OST and
failver for the
>>>>other OST.
>>>
>>>You should never have to stop the active primary OST when failing
over
>>
>>>the other OST to the secondary. Failover is generally transparent to
>>
>>the
>>
>>>clients, their applications may block for a moment during failover,
>>
>>but 
>>
>>>should continue on with the new server. Failback of course requires
>>
>>you 
>>
>>>to stop the service on the secondary before starting it on the
>>
>>primary.
>>
>>>cliffw
>>>
>>>
>>>
>>>>Thanks,
>>>>Steve
>>>
>>>
>>_______________________________________________
>>Lustre-discuss mailing list
>>Lustre-discuss@lists.clusterfs.com
>>https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
> 
>

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> Hi,
> 
> I have a couple questions about setting up failver with Lustre. Any help
> is appreciated.
> 
> Definitions:
> ===========> ssd - shared storage device
> oss - object storage server
> mds - meta data server
> 
> Here is my OSS setup:
> ====================>    oss1 #----(failover)----# ssd2
>     #                          #
>     |                          |
>     |                          |
>     |(primary)                 | (primary)
>     |                          |
>     |                          |
>     #                          #
>    ssd1 #----(failover)----# oss2
> 
> So:
>     oss1 is primary for ssd1
>     oss1 is failover for ssd2
>     oss2 is primary for ssd2
>     oss2 is failover for ssd1
> 
> Here is my MDS setup:
> ====================>      +-----# ssd3 #------+
>      |                   |
>      | (primary)         | (failover)
>      |                   |
>      |                   |
>      #                   #
>     mds1               mds2
> 
> So:
>     mds1 is primary for ssd3
>     mds2 is failover for ssd3
> 
> I am now at the stage where I want to implement failover for the
OSS''s
> and MDS seutp.
> 
> I would prefer to use heartbeat from linux-ha.org for the following
> reasons:
>     * it''s actively maintained
>     * we use it in house extensively
>     * i am very familiar with it
>     * its straight forward to use (start/stop resources on failover)
> 
> Has anyone else used heartbeat to do failover? Are there docs that I can
> be pointed on this specific type of setup?
Heartbeat should work fine. We have customers using Red Hat''s 
CluManager, which is similar. We are currently writing the docs, I am 
very interested in incorporating your experiences, especially since you 
have Heartbeat familiarity.> 
> I know how to configure heartbeat and to use STONITH to make sure the
> secondary device will not write to the shared storage device at the
> same time as the primary device.That''s the key - the shared storage must never be touched by two
servers
at once.> 
> My main questions lie in what resources to stop/start on failver since
> both (for example) OSS''s are active for one OST and failver for
the
> other OST.You should never have to stop the active primary OST when failing over 
the other OST to the secondary. Failover is generally transparent to the 
clients, their applications may block for a moment during failover, but 
should continue on with the new server. Failback of course requires you 
to stop the service on the secondary before starting it on the primary.
cliffw
> 
> Thanks,
> Steve

Nielsen, Steve

2006-May-19 07:36 UTC

head link

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

I am working on setting this up now. Just need to trudge through it.

I will share my experiences when done.

Quick question.  For failover over to work I need to have ips that float
between the devices.  Won''t this require me to restart the lustre
service ? (i am on rhel 4).  So a "service lustre restart" should work
right ?

Steve

-----Original Message-----
From: cliff white [mailto:cliffw@clusterfs.com]=20
Sent: Thursday, September 01, 2005 11:43 AM
To: Nielsen, Steve
Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
support@clusterfs.com
Subject: Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> Hi,
>=20
> I have a couple questions about setting up failver with Lustre. Any
help> is appreciated.
>=20
> Definitions:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> ssd - shared storage device
> oss - object storage server
> mds - meta data server
>=20
> Here is my OSS setup:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>    oss1 #----(failover)----# ssd2
>     #                          #
>     |                          |
>     |                          |
>     |(primary)                 | (primary)
>     |                          |
>     |                          |
>     #                          #
>    ssd1 #----(failover)----# oss2
>=20
> So:
>     oss1 is primary for ssd1
>     oss1 is failover for ssd2
>     oss2 is primary for ssd2
>     oss2 is failover for ssd1
>=20
> Here is my MDS setup:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>      +-----# ssd3 #------+
>      |                   |
>      | (primary)         | (failover)
>      |                   |
>      |                   |
>      #                   #
>     mds1               mds2
>=20
> So:
>     mds1 is primary for ssd3
>     mds2 is failover for ssd3
>=20
> I am now at the stage where I want to implement failover for the
OSS''s
> and MDS seutp.
>=20
> I would prefer to use heartbeat from linux-ha.org for the following
> reasons:
>     * it''s actively maintained
>     * we use it in house extensively
>     * i am very familiar with it
>     * its straight forward to use (start/stop resources on failover)
>=20
> Has anyone else used heartbeat to do failover? Are there docs that I
can> be pointed on this specific type of setup?
Heartbeat should work fine. We have customers using Red Hat''s=20
CluManager, which is similar. We are currently writing the docs, I am=20
very interested in incorporating your experiences, especially since you=20
have Heartbeat familiarity.>=20
> I know how to configure heartbeat and to use STONITH to make sure the
> secondary device will not write to the shared storage device at the
> same time as the primary device.That''s the key - the shared storage must never be touched by two
servers
at once.>=20
> My main questions lie in what resources to stop/start on failver since
> both (for example) OSS''s are active for one OST and failver for
the
> other OST.You should never have to stop the active primary OST when failing over=20
the other OST to the secondary. Failover is generally transparent to the

clients, their applications may block for a moment during failover, but=20
should continue on with the new server. Failback of course requires you=20
to stop the service on the secondary before starting it on the primary.
cliffw
>=20
> Thanks,
> Steve

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> I am working on setting this up now. Just need to trudge through it.
> 
> I will share my experiences when done.
> 
> Quick question.  For failover over to work I need to have ips that float
> between the devices.  Won''t this require me to restart the lustre
> service ? (i am on rhel 4).  So a "service lustre restart" should
work
> right ?
Unfortunately, we do not support IP takeover at this time. What we do is
this:
The servers are configured with a specific IP.
The clients know about both IPs, and will attempt to connect in a 
round-robin fashion until they succeed.

Here''s a typically configuration for OST failover: (servers orlando and
oscar)

--add ost --node orlando --ost ost1-home --failover --group orlando \
--lov lov-home --dev /dev/ost1
--add ost --node orlando --ost ost2-home --failover \
--lov lov-home --dev /dev/ost2

--add ost --node oscar --ost ost2-home --failover --group oscar \
--lov lov-home --dev /dev/ost2
--add ost --node oscar --ost ost1-home --failover \
--lov lov-home --dev /dev/ost1

cliffw> 
> Steve
> 
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com] 
> Sent: Thursday, September 01, 2005 11:43 AM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
> 
> Nielsen, Steve wrote:
> 
>>Hi,
>>
>>I have a couple questions about setting up failver with Lustre. Any
> 
> help
> 
>>is appreciated.
>>
>>Definitions:
>>===========>>ssd - shared storage device
>>oss - object storage server
>>mds - meta data server
>>
>>Here is my OSS setup:
>>====================>>   oss1 #----(failover)----# ssd2
>>    #                          #
>>    |                          |
>>    |                          |
>>    |(primary)                 | (primary)
>>    |                          |
>>    |                          |
>>    #                          #
>>   ssd1 #----(failover)----# oss2
>>
>>So:
>>    oss1 is primary for ssd1
>>    oss1 is failover for ssd2
>>    oss2 is primary for ssd2
>>    oss2 is failover for ssd1
>>
>>Here is my MDS setup:
>>====================>>     +-----# ssd3 #------+
>>     |                   |
>>     | (primary)         | (failover)
>>     |                   |
>>     |                   |
>>     #                   #
>>    mds1               mds2
>>
>>So:
>>    mds1 is primary for ssd3
>>    mds2 is failover for ssd3
>>
>>I am now at the stage where I want to implement failover for the
OSS''s
>>and MDS seutp.
>>
>>I would prefer to use heartbeat from linux-ha.org for the following
>>reasons:
>>    * it''s actively maintained
>>    * we use it in house extensively
>>    * i am very familiar with it
>>    * its straight forward to use (start/stop resources on failover)
>>
>>Has anyone else used heartbeat to do failover? Are there docs that I
> 
> can
> 
>>be pointed on this specific type of setup?
> 
> 
> Heartbeat should work fine. We have customers using Red Hat''s 
> CluManager, which is similar. We are currently writing the docs, I am 
> very interested in incorporating your experiences, especially since you 
> have Heartbeat familiarity.
> 
>>I know how to configure heartbeat and to use STONITH to make sure the
>>secondary device will not write to the shared storage device at the
>>same time as the primary device.
> 
> That''s the key - the shared storage must never be touched by two
servers
> at once.
> 
>>My main questions lie in what resources to stop/start on failver since
>>both (for example) OSS''s are active for one OST and failver for
the
>>other OST.
> 
> You should never have to stop the active primary OST when failing over 
> the other OST to the secondary. Failover is generally transparent to the
> 
> clients, their applications may block for a moment during failover, but 
> should continue on with the new server. Failback of course requires you 
> to stop the service on the secondary before starting it on the primary.
> cliffw
> 
> 
>>Thanks,
>>Steve
> 
>

Nielsen, Steve

2006-May-19 07:36 UTC

head link

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

So for both OSS/MDS i don''t need to float any ips between the boxes and
i don''t need to restart the services.  the only thing then via
heartbeat
I need to do is detect the other side down and "stonith" it? Then
things
will be good ?

Steve

-----Original Message-----
From: cliff white [mailto:cliffw@clusterfs.com]=20
Sent: Thursday, September 01, 2005 11:59 AM
To: Nielsen, Steve
Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
support@clusterfs.com
Subject: Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> I am working on setting this up now. Just need to trudge through it.
>=20
> I will share my experiences when done.
>=20
> Quick question.  For failover over to work I need to have ips that
float> between the devices.  Won''t this require me to restart the lustre
> service ? (i am on rhel 4).  So a "service lustre restart" should
work
> right ?
Unfortunately, we do not support IP takeover at this time. What we do is
this:
The servers are configured with a specific IP.
The clients know about both IPs, and will attempt to connect in a=20
round-robin fashion until they succeed.

Here''s a typically configuration for OST failover: (servers orlando
and=20
oscar)

--add ost --node orlando --ost ost1-home --failover --group orlando \
--lov lov-home --dev /dev/ost1
--add ost --node orlando --ost ost2-home --failover \
--lov lov-home --dev /dev/ost2

--add ost --node oscar --ost ost2-home --failover --group oscar \
--lov lov-home --dev /dev/ost2
--add ost --node oscar --ost ost1-home --failover \
--lov lov-home --dev /dev/ost1

cliffw>=20
> Steve
>=20
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com]=20
> Sent: Thursday, September 01, 2005 11:43 AM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
>=20
> Nielsen, Steve wrote:
>=20
>>Hi,
>>
>>I have a couple questions about setting up failver with Lustre. Any
>=20
> help
>=20
>>is appreciated.
>>
>>Definitions:
>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>ssd - shared storage device
>>oss - object storage server
>>mds - meta data server
>>
>>Here is my OSS setup:
>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>   oss1 #----(failover)----# ssd2
>>    #                          #
>>    |                          |
>>    |                          |
>>    |(primary)                 | (primary)
>>    |                          |
>>    |                          |
>>    #                          #
>>   ssd1 #----(failover)----# oss2
>>
>>So:
>>    oss1 is primary for ssd1
>>    oss1 is failover for ssd2
>>    oss2 is primary for ssd2
>>    oss2 is failover for ssd1
>>
>>Here is my MDS setup:
>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>     +-----# ssd3 #------+
>>     |                   |
>>     | (primary)         | (failover)
>>     |                   |
>>     |                   |
>>     #                   #
>>    mds1               mds2
>>
>>So:
>>    mds1 is primary for ssd3
>>    mds2 is failover for ssd3
>>
>>I am now at the stage where I want to implement failover for the
OSS''s
>>and MDS seutp.
>>
>>I would prefer to use heartbeat from linux-ha.org for the following
>>reasons:
>>    * it''s actively maintained
>>    * we use it in house extensively
>>    * i am very familiar with it
>>    * its straight forward to use (start/stop resources on failover)
>>
>>Has anyone else used heartbeat to do failover? Are there docs that I
>=20
> can
>=20
>>be pointed on this specific type of setup?
>=20
>=20
> Heartbeat should work fine. We have customers using Red Hat''s=20
> CluManager, which is similar. We are currently writing the docs, I am=20
> very interested in incorporating your experiences, especially since
you=20> have Heartbeat familiarity.
>=20
>>I know how to configure heartbeat and to use STONITH to make sure the
>>secondary device will not write to the shared storage device at the
>>same time as the primary device.
>=20
> That''s the key - the shared storage must never be touched by two
servers> at once.
>=20
>>My main questions lie in what resources to stop/start on failver since
>>both (for example) OSS''s are active for one OST and failver for
the
>>other OST.
>=20
> You should never have to stop the active primary OST when failing over
> the other OST to the secondary. Failover is generally transparent to
the>=20
> clients, their applications may block for a moment during failover,
but=20> should continue on with the new server. Failback of course requires
you=20> to stop the service on the secondary before starting it on the
primary.> cliffw
>=20
>=20
>>Thanks,
>>Steve
>=20
>=20

Ragnar Kjørstad

2006-May-19 07:36 UTC

head link

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

On Thu, Sep 01, 2005 at 01:03:11PM -0400, Nielsen, Steve
wrote:> So for both OSS/MDS i don''t need to float any ips between the
boxes and
> i don''t need to restart the services.  the only thing then via
heartbeat
> I need to do is detect the other side down and "stonith" it? Then
things
> will be good ?
Disclamer: I''ve just started looking at this too. 

I believe you also need to activate the OST on the secondary server by
running the appropriate lconf command?
(starting another resource, in linux-ha speak)

Basically a OCF Resource Agent is required - a shellscript that
heartbeat can use to start and stop the service as required (not the
whole lustre service, but the specific OST).

This can be done either by writing a new shell script, or extending the
lustre init-script to double as a OCF Resource Agent. (it needs to be
able to start/stop specific groups of OSTs rather than the whole
service).

> > -----Original Message-----
> > From: cliff white [mailto:cliffw@clusterfs.com] 
> > Sent: Thursday, September 01, 2005 11:43 AM
> > To: Nielsen, Steve
> > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> > support@clusterfs.com
> > Subject: Re: setting up failover in lustre... any recommendations?
> > 
> > Nielsen, Steve wrote:
> > 
> >>Hi,
> >>
> >>I have a couple questions about setting up failver with Lustre. Any
> > 
> > help
> > 
> >>is appreciated.
> >>
> >>Definitions:
> >>===========> >>ssd - shared storage device
> >>oss - object storage server
> >>mds - meta data server
> >>
> >>Here is my OSS setup:
> >>====================> >>   oss1 #----(failover)----# ssd2
> >>    #                          #
> >>    |                          |
> >>    |                          |
> >>    |(primary)                 | (primary)
> >>    |                          |
> >>    |                          |
> >>    #                          #
> >>   ssd1 #----(failover)----# oss2
> >>
> >>So:
> >>    oss1 is primary for ssd1
> >>    oss1 is failover for ssd2
> >>    oss2 is primary for ssd2
> >>    oss2 is failover for ssd1
> >>
> >>Here is my MDS setup:
> >>====================> >>     +-----# ssd3 #------+
> >>     |                   |
> >>     | (primary)         | (failover)
> >>     |                   |
> >>     |                   |
> >>     #                   #
> >>    mds1               mds2
> >>
> >>So:
> >>    mds1 is primary for ssd3
> >>    mds2 is failover for ssd3
> >>
> >>I am now at the stage where I want to implement failover for the
OSS''s
> >>and MDS seutp.
> >>
> >>I would prefer to use heartbeat from linux-ha.org for the following
> >>reasons:
> >>    * it''s actively maintained
> >>    * we use it in house extensively
> >>    * i am very familiar with it
> >>    * its straight forward to use (start/stop resources on
failover)
> >>
> >>Has anyone else used heartbeat to do failover? Are there docs that
I
> > 
> > can
> > 
> >>be pointed on this specific type of setup?
> > 
> > 
> > Heartbeat should work fine. We have customers using Red Hat''s
> > CluManager, which is similar. We are currently writing the docs, I am 
> > very interested in incorporating your experiences, especially since
> you 
> > have Heartbeat familiarity.
> > 
> >>I know how to configure heartbeat and to use STONITH to make sure
the
> >>secondary device will not write to the shared storage device at the
> >>same time as the primary device.
> > 
> > That''s the key - the shared storage must never be touched by
two
> servers
> > at once.
> > 
> >>My main questions lie in what resources to stop/start on failver
since
> >>both (for example) OSS''s are active for one OST and
failver for the
> >>other OST.
> > 
> > You should never have to stop the active primary OST when failing over
> 
> > the other OST to the secondary. Failover is generally transparent to
> the
> > 
> > clients, their applications may block for a moment during failover,
> but 
> > should continue on with the new server. Failback of course requires
> you 
> > to stop the service on the secondary before starting it on the
> primary.
> > cliffw
> > 
> > 
> >>Thanks,
> >>Steve
> > 
> > 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.clusterfs.com
> https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
-- 
Ragnar Kjørstad
Software Engineer
Scali - http://www.scali.com
Scaling the Linux Datacenter

Adam Cassar

2006-May-19 07:36 UTC

head link

[Lustre-discuss] setting up failover in lustre... any recommendations?

--=-wMlg7ZFu/y3ktNEKh+UT
Content-Type: text/plain
Content-Transfer-Encoding: 7bit


Hi,

We do failover for luster using clumanager, I''m sure you could adapt
them to a heartbeat set up. I''ve attached the scripts that we use for
health checking and are based on the lustre init scripts.

One thing that we have found is that lustre fails over quicker if you
are not sharing services on a node.

On Thu, 2005-09-01 at 11:33 -0400, Nielsen, Steve wrote:> Hi,
> 
> I have a couple questions about setting up failver with Lustre. Any help
> is appreciated.
> 
> Definitions:
> ===========> ssd - shared storage device
> oss - object storage server
> mds - meta data server
> 
> Here is my OSS setup:
> ====================>    oss1 #----(failover)----# ssd2
>     #                          #
>     |                          |
>     |                          |
>     |(primary)                 | (primary)
>     |                          |
>     |                          |
>     #                          #
>    ssd1 #----(failover)----# oss2
> 
> So:
>     oss1 is primary for ssd1
>     oss1 is failover for ssd2
>     oss2 is primary for ssd2
>     oss2 is failover for ssd1
> 
> Here is my MDS setup:
> ====================>      +-----# ssd3 #------+
>      |                   |
>      | (primary)         | (failover)
>      |                   |
>      |                   |
>      #                   #
>     mds1               mds2
> 
> So:
>     mds1 is primary for ssd3
>     mds2 is failover for ssd3
> 
> I am now at the stage where I want to implement failover for the
OSS''s
> and MDS seutp.
> 
> I would prefer to use heartbeat from linux-ha.org for the following
> reasons:
>     * it''s actively maintained
>     * we use it in house extensively
>     * i am very familiar with it
>     * its straight forward to use (start/stop resources on failover)
> 
> Has anyone else used heartbeat to do failover? Are there docs that I can
> be pointed on this specific type of setup?
> 
> I know how to configure heartbeat and to use STONITH to make sure the
> secondary device will not write to the shared storage device at the
> same time as the primary device.
> 
> My main questions lie in what resources to stop/start on failver since
> both (for example) OSS''s are active for one OST and failver for
the
> other OST.
> 
> Thanks,
> Steve
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.clusterfs.com
> https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
> -- 
Adam Cassar
ICT Manager
NetRegistry Pty Ltd
______________________________________________
http://www.netregistry.com.au
Tel:  02 9699 6099          Fax:  02 9699 6088
PO Box 270    Broadway      NSW   2007

Domains |Business Email|Web Hosting|E-Commerce
Trusted  by  10,000s of  businesses since 1997
______________________________________________


--=-wMlg7ZFu/y3ktNEKh+UT
Content-Disposition: attachment; filename=lustre_functions.sh
Content-Type: application/x-shellscript; name=lustre_functions.sh
Content-Transfer-Encoding: 7bit

: ${LUSTRE_CFG:=/etc/lustre/lustre.cfg}
[ -f ${LUSTRE_CFG} ] && . ${LUSTRE_CFG}

: ${LUSTRE_CONFIG_XML:=/etc/lustre/config.xml}
: ${LCONF:=/usr/sbin/lconf}
: ${LCONF_START_ARGS:="${LUSTRE_CONFIG_XML}"}
: ${LCONF_FORCE_STOP_ARGS:="--force --cleanup ${LUSTRE_CONFIG_XML}"}
: ${LCONF_STOP_ARGS:="--failover --cleanup ${LUSTRE_CONFIG_XML}"}
: ${LCTL:=/usr/sbin/lctl}

PRIMARY_SERVICE=ost1
STATUS=/var/lustre/status
HEALTHCHECK=/proc/fs/lustre/health_check
NODE=$HOSTNAME

start() {
	SERVICE=$1

	if [ $SERVICE != $PRIMARY_SERVICE ]; then
		START_ARGS="--group $SERVICE --select $SERVICE=$NODE
$LCONF_START_ARGS"
	else
		START_ARGS=$LCONF_START_ARGS
	fi

	echo -n "Starting $SERVICE ${LCONF} ${START_ARGS}: "

	${LCONF} ${START_ARGS}
	RETVAL=$?

	echo $SERVICE

	if [ $RETVAL -eq 0 ]; then
		echo "online" > $STATUS
	else
		echo "online pending" > $STATUS
	fi
}

stop() {
	SERVICE=$1

	if [ $SERVICE != $PRIMARY_SERVICE ]; then
		STOP_ARGS="--group $SERVICE --select $SERVICE=$NODE
$LCONF_STOP_ARGS"
	else
		STOP_ARGS=$LCONF_STOP_ARGS
	fi

	echo -n "Shutting down $SERVICE ${LCONF} ${STOP_ARGS}: "

	${LCONF} ${STOP_ARGS}
	RETVAL=$?

	echo $SERVICE

	if [ $RETVAL -eq 0 ]; then
		echo "offline" >$STATUS
	else
		echo "offline pending" >$STATUS
	fi
}

status() {
	STATE="stopped"

	egrep -q "libcfs|lvfs|portals" /proc/modules &&
STATE="loaded"

	# check for any configured devices (may indicate partial startup)
	[ "`cat /proc/fs/lustre/devices 2> /dev/null`" ] &&
STATE="running" #was partial

	# check for servers in recovery
	MDS="`ls /proc/fs/lustre/mds/*/recovery_status 2> /dev/null`"
	OST="`ls /proc/fs/lustre/ost/*/recovery_status 2> /dev/null`"

	[ "$MDS$OST" ] && grep -q RECOV $MDS $OST &&
STATE="recovery"

	# check for server disconnections 
	DISCON="`grep -v FULL /proc/fs/lustre/*c/*/*server_uuid 2>
/dev/null`"
	[ "$DISCON" ] && STATE="disconnected"

	#how do we tell if it is actually serving?

	echo $STATE
}

health() {
	STATUS="not running"

	if [ -f $HEALTHCHECK ]; then 
		STATUS=`cat $HEALTHCHECK`
	fi

	echo $STATUS;

	if [ "$STATUS" != "healthy" ]; then
		return 1
	fi

	return 0
}

--=-wMlg7ZFu/y3ktNEKh+UT
Content-Disposition: attachment; filename=ost1.sh
Content-Type: application/x-shellscript; name=ost1.sh
Content-Transfer-Encoding: 7bit

#!/bin/dash

MYNAME=$(basename $0)

# The clug utility uses the normal logging levels as defined in
# sys/syslog.h.  Calls to clulog will use the logging level defined
# for the Service Manager (clusvcmgrd).

LOG_EMERG=0	# system is unusable
LOG_ALERT=1	# action must be taken immediately
LOG_CRIT=2	# critical conditions
LOG_ERR=3	# error conditions
LOG_WARNING=4	# warning conditions
LOG_NOTICE=5	# normal but significant condition
LOG_INFO=6	# informational
LOG_DEBUG=7	# debug-level messages

if [ $# -ne 1 ]; then
	echo "Usage: $0 {start, stop, status}"
	exit 1
fi

action=$1		# type of action, i.e. ''start'',
''stop'' or ''status''

#
# Record all output into a temp file in case of error
#
exec >> /var/log/lustre/$MYNAME.$action.log 2>&1

clulog -s $LOG_DEBUG "In $0 with action=$action"

# source lustre scripts
. /usr/local/bin/lustre_functions.sh

PRIMARY_SERVICE=ost1 
MY_SERVICE=ost1

case $action in
''start'')
	clulog -s $LOG_INFO "Running $MY_SERVICE start script"
	RESULT=`start $MY_SERVICE`
	clulog -s $LOG_INFO "$MY_SERVICE $action script returned: $RESULT"
	;;
''stop'')
	clulog -s $LOG_INFO "Running $MY_SERVICE stop script"
	RESULT=`stop $MY_SERVICE`
	clulog -s $LOG_INFO "$MY_SERVICE $action script returned: $RESULT"
	;;
''status'')
	clulog -s $LOG_INFO "Running $MY_SERVICE status script"
	RESULT=`health`
	clulog -s $LOG_INFO "$MY_SERVICE $action script returned: $RESULT"
	exit $?
	;;
*)
	clulog -s $LOG_ERR "Unknown action ''$action'' passed to
user script"
	exit 1 # return failure
esac

exit 0 # return success

--=-wMlg7ZFu/y3ktNEKh+UT--

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> So for both OSS/MDS i don''t need to float any ips between the
boxes and
> i don''t need to restart the services.  the only thing then via
heartbeat
> I need to do is detect the other side down and "stonith" it? Then
things
> will be good ?
I believe so - look at /etc/init.d/lustre for the current health check
you should be able to run ''/etc/init.d/lustre status'' and use
that
response for service health. With current Lustre, we also support SNMP 
traps for health checking.
cliffw
> 
> Steve
> 
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com] 
> Sent: Thursday, September 01, 2005 11:59 AM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
> 
> Nielsen, Steve wrote:
> 
>>I am working on setting this up now. Just need to trudge through it.
>>
>>I will share my experiences when done.
>>
>>Quick question.  For failover over to work I need to have ips that
> 
> float
> 
>>between the devices.  Won''t this require me to restart the
lustre
>>service ? (i am on rhel 4).  So a "service lustre restart"
should work
>>right ?
> 
> 
> Unfortunately, we do not support IP takeover at this time. What we do is
> this:
> The servers are configured with a specific IP.
> The clients know about both IPs, and will attempt to connect in a 
> round-robin fashion until they succeed.
> 
> Here''s a typically configuration for OST failover: (servers
orlando and
> oscar)
> 
> --add ost --node orlando --ost ost1-home --failover --group orlando \
> --lov lov-home --dev /dev/ost1
> --add ost --node orlando --ost ost2-home --failover \
> --lov lov-home --dev /dev/ost2
> 
> --add ost --node oscar --ost ost2-home --failover --group oscar \
> --lov lov-home --dev /dev/ost2
> --add ost --node oscar --ost ost1-home --failover \
> --lov lov-home --dev /dev/ost1
> 
> cliffw
> 
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>Sent: Thursday, September 01, 2005 11:43 AM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>Hi,
>>>
>>>I have a couple questions about setting up failver with Lustre. Any
>>
>>help
>>
>>
>>>is appreciated.
>>>
>>>Definitions:
>>>===========>>>ssd - shared storage device
>>>oss - object storage server
>>>mds - meta data server
>>>
>>>Here is my OSS setup:
>>>====================>>>  oss1 #----(failover)----# ssd2
>>>   #                          #
>>>   |                          |
>>>   |                          |
>>>   |(primary)                 | (primary)
>>>   |                          |
>>>   |                          |
>>>   #                          #
>>>  ssd1 #----(failover)----# oss2
>>>
>>>So:
>>>   oss1 is primary for ssd1
>>>   oss1 is failover for ssd2
>>>   oss2 is primary for ssd2
>>>   oss2 is failover for ssd1
>>>
>>>Here is my MDS setup:
>>>====================>>>    +-----# ssd3 #------+
>>>    |                   |
>>>    | (primary)         | (failover)
>>>    |                   |
>>>    |                   |
>>>    #                   #
>>>   mds1               mds2
>>>
>>>So:
>>>   mds1 is primary for ssd3
>>>   mds2 is failover for ssd3
>>>
>>>I am now at the stage where I want to implement failover for the
OSS''s
>>>and MDS seutp.
>>>
>>>I would prefer to use heartbeat from linux-ha.org for the following
>>>reasons:
>>>   * it''s actively maintained
>>>   * we use it in house extensively
>>>   * i am very familiar with it
>>>   * its straight forward to use (start/stop resources on failover)
>>>
>>>Has anyone else used heartbeat to do failover? Are there docs that I
>>
>>can
>>
>>
>>>be pointed on this specific type of setup?
>>
>>
>>Heartbeat should work fine. We have customers using Red Hat''s 
>>CluManager, which is similar. We are currently writing the docs, I am 
>>very interested in incorporating your experiences, especially since
> 
> you 
> 
>>have Heartbeat familiarity.
>>
>>
>>>I know how to configure heartbeat and to use STONITH to make sure
the
>>>secondary device will not write to the shared storage device at the
>>>same time as the primary device.
>>
>>That''s the key - the shared storage must never be touched by
two
> 
> servers
> 
>>at once.
>>
>>
>>>My main questions lie in what resources to stop/start on failver
since
>>>both (for example) OSS''s are active for one OST and failver
for the
>>>other OST.
>>
>>You should never have to stop the active primary OST when failing over
> 
> 
>>the other OST to the secondary. Failover is generally transparent to
> 
> the
> 
>>clients, their applications may block for a moment during failover,
> 
> but 
> 
>>should continue on with the new server. Failback of course requires
> 
> you 
> 
>>to stop the service on the secondary before starting it on the
> 
> primary.
> 
>>cliffw
>>
>>
>>
>>>Thanks,
>>>Steve
>>
>>
>

cliff white

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

Nielsen, Steve wrote:> Thanks.
> 
> I will setup this up and test it.
> 
> BTW, I am using lustre 1.4.5 and regular old ethternet.
> 
> I assume for the OSTs the same command lineas as MDSs would apply as
> well (with correct --group config and symlinking lustre) ?
> 
> On the --failover for failback does the lconf command wait till its
> complete? Or should I sleep in the script after issuing lconf --failvoer
> ?
> 
One correction - with Lustre 1.4.5, you no longer need the
''--group''
parameter. You should remove that from your setup script. For lconf,
only the --service parameter is needed.
cliffw
> Steve
> 
> -----Original Message-----
> From: cliff white [mailto:cliffw@clusterfs.com] 
> Sent: Thursday, September 08, 2005 2:12 PM
> To: Nielsen, Steve
> Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
> support@clusterfs.com
> Subject: Re: setting up failover in lustre... any recommendations?
> 
> Nielsen, Steve wrote:
> 
>>Cliff,
>>
>>I am a little lost then as to what the command lines would be to
>>start/stop MDS within heartbeat.  That is where my confusion lies.  If
>>you could help with the commandlines that would be great.
> 
> No problem.
> First, what version of Lustre are you using, and what type of network?
> (Ethernet, IB, Elan, etc)
> 
>>Here is a guess at at what I need to do:
>>
>>- mds01 and mds02 are my MDS servers with a shared storage device for
>>the metadata that they are both connected to.
>>
>>- using heartbeat to control failover between the two MDS servers
>>
>>- both servers are configured to NOT bring up the lustre service via
> 
> the
> 
>>initscripts (heatbeat will control this) 
>>
>>- configure mds01 in heartbeat as the "prefered" server to run
the
>>active MDS (initially)
> 
> 
> This is all correct - you need to make one change to your configuration.
> You have:
> 
> --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30
> 
> --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
> 
> Remove the ''--group'' entry for the secondary server.
> 
> --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
> /dev/vg_mds1/lv_mds1 --failover || exit 31
> 
>>What commandline would I run to bring up MDS then? The init script
> 
> that
> 
>>comes with lustre or a "raw" lconf command?
> 
> 
> If you are running Lustre > 1.4.4 you can symlink /etc/init.d/lustre
> to <path to service scripts>/mds01 (this is the --group name, so it 
> would be ''mds01'' on both nodes) and use the symlink to
start/stop the
> mds pair, this is our preferred failover method.
> Second choice would be hand-running ''/etc/init.d/lustre
start''
> Last, you could run lconf.
> 
> 
>>On failover heartbeat on mds02 needs to execute a series of commands
> 
> to
> 
>>take over MDS from mds01:
>>	- stonith mds01 (that way it reboots, comes up, sees mds02 is
>>the master and does nothing)
>>	- start lustre locally on mds02.  What is the command line for
>>this ? I suspect its an lconf command?
> 
> 
> The symlink method should work for starting either node, it works from 
> the ''--group'' parameter. That is why you need to remove
the duplicate
> --group. You can also use lconf - ''lconf --group mds01 --select 
> mds01=mds02 <config file>''
> 
>>On failing back to mds01 what commands do I need to run? 
>>	- stonith mds02
>>	- start lustre locally on mds01. What is the command line for
>>this ? I suspect its an lconf command?
> 
> 
>   lconf with --failover will do a quick shutdown of mds02, once
> that shutdown is complete, you would start mds01, with the service 
> symlink, /etc/init.d/lustre or lconf. That would be transparent for the 
> clients.
> 
> Personally, I would avoid the stonith on failback, but that would work,
> and should also be transparent for the clients.
> 
> 
>>As a side note:
>>In trying different things to get MDS to failover I get the error
>>message below: Do you know what it means? I am running the same
> 
> version
> 
>>of lustre everywhere (kernel, modules, ..)
>>
>>2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol
>>error connecting to host 10.10.1.2 on port 988: Is it running a
>>compatible version of Lustre?
> 
> 
> This is a known issue, port 988 is withing the range of ports handed out
> by the portmapper. It is possible that an RPC service is starting before
> Lustre and grabbing that port. Typically, disabling nfs and nfslock
> avoids the problem, it''s a issue with other services that collide
with
> the portmapper range, see
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=103401
> for more explaination of the issue.
> cliffw
> 
> 
>>Thanks,
>>Steve
>>
>>-----Original Message-----
>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>Sent: Thursday, September 08, 2005 11:08 AM
>>To: Nielsen, Steve
>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>support@clusterfs.com
>>Subject: Re: setting up failover in lustre... any recommendations?
>>
>>Nielsen, Steve wrote:
>>
>>
>>>Cliff,
>>>
>>>I changed my clients to use the lconf call as you recommend below.
>>>
>>>I am able to mount the file system without issues on the client
using
>>
>>this method.
>>
>>
>>>However when I shutdown lustre on mds01 (my main mds server) as a
test
>>
>>mds02 does not take over.
>>
>>
>>>Are there commands I need to run on my standby mds02 server to
enable
>>
>>it to take over?
>>
>>
>>>Also when things failback to mds01 is there something I need to run
to
>>
>>enable that?
>>I think I may need some more detail on your setup, please
>>expand if necessary.
>>
>>First, only one MDS should be running at a time. This is
>>very important - you show _never_ have both servers in the
>>failover pair active at the same time. Bad Things can happen
>>to your metadata.
>>
>>For failover, you will have to start the second mds server
>>after the first mds is down. Normally, this is done with an
>>HA package (Heartbeat, CluManager, etc). When the secondary
>>server starts, it should access the shared storage and go.
>>
>>If that''s not happening, we need to see the logs (syslog,dmesg,
>>anything on the console) there should be errors. Check logs on the
>>MDS and OST.
>>
>>For failback, you should stop mds02 with the --failover option.
>>This will do a quick shutdown - then start mds01.
>>cliffw
>>
>>
>>
>>>Thanks,
>>>Steve
>>>
>>>
>>>-----Original Message-----
>>>From: cliff white [mailto:cliffw@clusterfs.com]
>>>Sent: Wed 9/7/2005 5:55 PM
>>>To: Nielsen, Steve
>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>
>>support@clusterfs.com
>>
>>
>>>Subject: Re: setting up failover in lustre... any recommendations?
>>>
>>>Nielsen, Steve wrote:
>>>
>>>
>>>
>>>>Cliff,
>>>>
>>>>I am not clear on how to set MDS active/passive failover.
>>>>
>>>>Here is my MDS setup:
>>>>====================>>>>--add node --node mds01 ||
exit 10
>>>>--add node --node mds02 || exit 11
>>>>
>>>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20
>>>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21
>>>>
>>>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev
>>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 ||
exit 30
>>>>
>>>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev
>>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31
>>>>
>>>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt
1
>>>>--stripe_pattern 0 || exit 32
>>>>
>>>>
>>>>Then on my client in /etc/fstab I have:
>>>>======================================>>>>mds01:/mds1/client
/mnt/lustre             lustre  rw              0
> 
> 0
> 
>>>>When I say take down mds01 and want mds02 to take over (for an
> 
> upgrade
> 
>>>>or something) how do my client now know to contact mds02 instead
> 
> mds01
> 
>>?
>>
>>
>>>>Wouldn''t a floating IP address make sense in this case?
>>>>
>>>>Here here is appreciated.
>>>
>>>
>>>Steve -
>>>Two answers, future and current.
>>>
>>>Our new mountconfig will allow you to specify
>>>multiple MDSs as part of the mount command. Unfortunately, the new
>>>mountconfig hasn''t been released yet, it will be soon. For
now, you
>>>will not be able to specify the client mount in /etc/fstab.
>>>Instead, you will have to use lconf to mount the clients.
>>>
>>>In your setup script, be sure you specify a client
>>> (the single line will cover any number of actual clients):
>>>
>>>--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1
>>>
>>>Then on the client node:
>>>''lconf --node client <your xml file>''
>>>
>>>will mount the filesystem - failover of the mds will be transparent
to
>>>the clients, they will know to try the secondary mds if the primary
is
>>>unavailable.
>>>For now, you may wish to put the lconf command in a script, and have
>>
>>the
>>
>>
>>>script called as part of your normal startup.
>>>
>>>cliffw
>>>
>>>
>>>
>>>
>>>
>>>>Thanks,
>>>>Steve
>>>>
>>>>-----Original Message-----
>>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>>Sent: Tuesday, September 06, 2005 12:31 PM
>>>>To: Nielsen, Steve
>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>support@clusterfs.com
>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>
>>>>Nielsen, Steve wrote:
>>>>
>>>>
> 
> 
>>>>>So for both OSS/MDS i don''t need to float any ips
between the boxes
>>>>
>>>>and
>>>>
>>>>
>>>>
>>>>
>>>>>i don''t need to restart the services.  the only
thing then via
>>>>
>>>>heartbeat
>>>>
>>>>
>>>>
>>>>
>>>>>I need to do is detect the other side down and
"stonith" it? Then
>>>>
>>>>things
>>>>
>>>>
>>>>
>>>>
>>>>>will be good ?
>>>>
>>>>
>>>>Steve -
>>>>Just wanted to check back and see how you were doing with this.
>>>>One thing I didn''t mention, when failing back the
service from
>>
>>secondary
>>
>>
>>>>to primary, you should stop the service with ''lconf
--failover'' which
>>>>will be quicker.
>>>>cliffw
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Steve
>>>>>
>>>>>-----Original Message-----
>>>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>>>Sent: Thursday, September 01, 2005 11:59 AM
>>>>>To: Nielsen, Steve
>>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;
>>>>>support@clusterfs.com
>>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>>
>>>>>Nielsen, Steve wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I am working on setting this up now. Just need to trudge
through
> 
> it.
> 
>>>>>>I will share my experiences when done.
>>>>>>
>>>>>>Quick question.  For failover over to work I need to
have ips that
>>>>>
>>>>>float
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>between the devices.  Won''t this require me to
restart the lustre
>>>>>>service ? (i am on rhel 4).  So a "service lustre
restart" should
>>
>>work
>>
>>
>>>>>>right ?
>>>>>
>>>>>
>>>>>Unfortunately, we do not support IP takeover at this time.
What we
> 
> do
> 
>>>>is
>>>>
>>>>
>>>>
>>>>
>>>>>this:
>>>>>The servers are configured with a specific IP.
>>>>>The clients know about both IPs, and will attempt to connect
in a
>>>>>round-robin fashion until they succeed.
>>>>>
>>>>>Here''s a typically configuration for OST failover:
(servers orlando
>>>>
>>>>and 
>>>>
>>>>
>>>>
>>>>
>>>>>oscar)
>>>>>
>>>>>--add ost --node orlando --ost ost1-home --failover --group
orlando
> 
> \
> 
>>>>>--lov lov-home --dev /dev/ost1
>>>>>--add ost --node orlando --ost ost2-home --failover \
>>>>>--lov lov-home --dev /dev/ost2
>>>>>
>>>>>--add ost --node oscar --ost ost2-home --failover --group
oscar \
>>>>>--lov lov-home --dev /dev/ost2
>>>>>--add ost --node oscar --ost ost1-home --failover \
>>>>>--lov lov-home --dev /dev/ost1
>>>>>
>>>>>cliffw
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Steve
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: cliff white [mailto:cliffw@clusterfs.com] 
>>>>>>Sent: Thursday, September 01, 2005 11:43 AM
>>>>>>To: Nielsen, Steve
>>>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey
Denworth;
>>>>>>support@clusterfs.com
>>>>>>Subject: Re: setting up failover in lustre... any
recommendations?
>>>>>>
>>>>>>Nielsen, Steve wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>I have a couple questions about setting up failver
with Lustre.
> 
> Any
> 
>>>>>>help
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>is appreciated.
>>>>>>>
>>>>>>>Definitions:
>>>>>>>===========>>>>>>>ssd - shared
storage device
>>>>>>>oss - object storage server
>>>>>>>mds - meta data server
>>>>>>>
>>>>>>>Here is my OSS setup:
>>>>>>>====================>>>>>>>oss1
#----(failover)----# ssd2
>>>>>>>#                          #
>>>>>>>|                          |
>>>>>>>|                          |
>>>>>>>|(primary)                 | (primary)
>>>>>>>|                          |
>>>>>>>|                          |
>>>>>>>#                          #
>>>>>>>ssd1 #----(failover)----# oss2
>>>>>>>
>>>>>>>So:
>>>>>>>oss1 is primary for ssd1
>>>>>>>oss1 is failover for ssd2
>>>>>>>oss2 is primary for ssd2
>>>>>>>oss2 is failover for ssd1
>>>>>>>
>>>>>>>Here is my MDS setup:
>>>>>>>====================>>>>>>>+-----#
ssd3 #------+
>>>>>>>|                   |
>>>>>>>| (primary)         | (failover)
>>>>>>>|                   |
>>>>>>>|                   |
>>>>>>>#                   #
>>>>>>>mds1               mds2
>>>>>>>
>>>>>>>So:
>>>>>>>mds1 is primary for ssd3
>>>>>>>mds2 is failover for ssd3
>>>>>>>
>>>>>>>I am now at the stage where I want to implement
failover for the
>>>>
>>>>OSS''s
>>>>
>>>>
>>>>
>>>>
>>>>>>>and MDS seutp.
>>>>>>>
>>>>>>>I would prefer to use heartbeat from linux-ha.org
for the
> 
> following
> 
>>>>>>>reasons:
>>>>>>>* it''s actively maintained
>>>>>>>* we use it in house extensively
>>>>>>>* i am very familiar with it
>>>>>>>* its straight forward to use (start/stop resources
on failover)
>>>>>>>
>>>>>>>Has anyone else used heartbeat to do failover? Are
there docs that
>>
>>I
>>
>>
>>>>>>can
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>be pointed on this specific type of setup?
>>>>>>
>>>>>>
>>>>>>Heartbeat should work fine. We have customers using Red
Hat''s
>>>>>>CluManager, which is similar. We are currently writing
the docs, I
>>
>>am 
>>
>>
>>>>>>very interested in incorporating your experiences,
especially since
>>>>>
>>>>>you 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>have Heartbeat familiarity.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>I know how to configure heartbeat and to use STONITH
to make sure
>>
>>the
>>
>>
>>>>>>>secondary device will not write to the shared
storage device at
> 
> the
> 
>>>>>>>same time as the primary device.
>>>>>>
>>>>>>That''s the key - the shared storage must never
be touched by two
>>>>>
>>>>>servers
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>at once.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>My main questions lie in what resources to
stop/start on failver
>>>>
>>>>since
>>>>
>>>>
>>>>
>>>>
>>>>>>>both (for example) OSS''s are active for one
OST and failver for
> 
> the
> 
>>>>>>>other OST.
>>>>>>
>>>>>>You should never have to stop the active primary OST
when failing
>>
>>over
>>
>>
>>>>>>the other OST to the secondary. Failover is generally
transparent
> 
> to
> 
>>>>>the
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>clients, their applications may block for a moment
during failover,
>>>>>
>>>>>but 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>should continue on with the new server. Failback of
course requires
>>>>>
>>>>>you 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>to stop the service on the secondary before starting it
on the
>>>>>
>>>>>primary.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>cliffw
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Thanks,
>>>>>>>Steve
>>>>>>
>>>>>>
>

Lustre discuss - May 2006 - setting up failover in lustre... any recommendations?

[Lustre-discuss] setting up failover in lustre... any recommendations?

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

[Lustre-discuss] RE: setting up failover in lustre... any recommendations?

[Lustre-discuss] setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?

[Lustre-discuss] Re: setting up failover in lustre... any recommendations?