Nielsen, Steve
2006-May-19 07:36 UTC
[Lustre-discuss] setting up failover in lustre... any recommendations?
Hi, I have a couple questions about setting up failver with Lustre. Any help is appreciated. Definitions: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D ssd - shared storage device oss - object storage server mds - meta data server Here is my OSS setup: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D oss1 #----(failover)----# ssd2 # # | | | | |(primary) | (primary) | | | | # # ssd1 #----(failover)----# oss2 So: oss1 is primary for ssd1 oss1 is failover for ssd2 oss2 is primary for ssd2 oss2 is failover for ssd1 Here is my MDS setup: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +-----# ssd3 #------+ | | | (primary) | (failover) | | | | # # mds1 mds2 So: mds1 is primary for ssd3 mds2 is failover for ssd3 I am now at the stage where I want to implement failover for the OSS''s and MDS seutp. I would prefer to use heartbeat from linux-ha.org for the following reasons: * it''s actively maintained * we use it in house extensively * i am very familiar with it * its straight forward to use (start/stop resources on failover) Has anyone else used heartbeat to do failover? Are there docs that I can be pointed on this specific type of setup? I know how to configure heartbeat and to use STONITH to make sure the secondary device will not write to the shared storage device at the same time as the primary device. My main questions lie in what resources to stop/start on failver since both (for example) OSS''s are active for one OST and failver for the other OST. Thanks, Steve
Nielsen, Steve
2006-May-19 07:36 UTC
[Lustre-discuss] RE: setting up failover in lustre... any recommendations?
Cliff, I am not clear on how to set MDS active/passive failover. Here is my MDS setup: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --add node --node mds01 || exit 10 --add node --node mds02 || exit 11 --add net --node mds01 --nid mds01 --nettype tcp || exit 20 --add net --node mds02 --nid mds02 --nettype tcp || exit 21 --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 --stripe_pattern 0 || exit 32 Then on my client in /etc/fstab I have: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D mds01:/mds1/client /mnt/lustre lustre rw 0 0 When I say take down mds01 and want mds02 to take over (for an upgrade or something) how do my client now know to contact mds02 instead mds01 ? Wouldn''t a floating IP address make sense in this case? Here here is appreciated. Thanks, Steve -----Original Message----- From: cliff white [mailto:cliffw@clusterfs.com]=20 Sent: Tuesday, September 06, 2005 12:31 PM To: Nielsen, Steve Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; support@clusterfs.com Subject: Re: setting up failover in lustre... any recommendations? Nielsen, Steve wrote:> So for both OSS/MDS i don''t need to float any ips between the boxesand> i don''t need to restart the services. the only thing then viaheartbeat> I need to do is detect the other side down and "stonith" it? Thenthings> will be good ?Steve - Just wanted to check back and see how you were doing with this. One thing I didn''t mention, when failing back the service from secondary to primary, you should stop the service with ''lconf --failover'' which will be quicker. cliffw>=20 > Steve >=20 > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com]=20 > Sent: Thursday, September 01, 2005 11:59 AM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? >=20 > Nielsen, Steve wrote: >=20 >>I am working on setting this up now. Just need to trudge through it. >> >>I will share my experiences when done. >> >>Quick question. For failover over to work I need to have ips that >=20 > float >=20 >>between the devices. Won''t this require me to restart the lustre >>service ? (i am on rhel 4). So a "service lustre restart" should work >>right ? >=20 >=20 > Unfortunately, we do not support IP takeover at this time. What we dois> this: > The servers are configured with a specific IP. > The clients know about both IPs, and will attempt to connect in a=20 > round-robin fashion until they succeed. >=20 > Here''s a typically configuration for OST failover: (servers orlandoand=20> oscar) >=20 > --add ost --node orlando --ost ost1-home --failover --group orlando \ > --lov lov-home --dev /dev/ost1 > --add ost --node orlando --ost ost2-home --failover \ > --lov lov-home --dev /dev/ost2 >=20 > --add ost --node oscar --ost ost2-home --failover --group oscar \ > --lov lov-home --dev /dev/ost2 > --add ost --node oscar --ost ost1-home --failover \ > --lov lov-home --dev /dev/ost1 >=20 > cliffw >=20 >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>Sent: Thursday, September 01, 2005 11:43 AM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>Hi, >>> >>>I have a couple questions about setting up failver with Lustre. Any >> >>help >> >> >>>is appreciated. >>> >>>Definitions: >>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>ssd - shared storage device >>>oss - object storage server >>>mds - meta data server >>> >>>Here is my OSS setup: >>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> oss1 #----(failover)----# ssd2 >>> # # >>> | | >>> | | >>> |(primary) | (primary) >>> | | >>> | | >>> # # >>> ssd1 #----(failover)----# oss2 >>> >>>So: >>> oss1 is primary for ssd1 >>> oss1 is failover for ssd2 >>> oss2 is primary for ssd2 >>> oss2 is failover for ssd1 >>> >>>Here is my MDS setup: >>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> +-----# ssd3 #------+ >>> | | >>> | (primary) | (failover) >>> | | >>> | | >>> # # >>> mds1 mds2 >>> >>>So: >>> mds1 is primary for ssd3 >>> mds2 is failover for ssd3 >>> >>>I am now at the stage where I want to implement failover for theOSS''s>>>and MDS seutp. >>> >>>I would prefer to use heartbeat from linux-ha.org for the following >>>reasons: >>> * it''s actively maintained >>> * we use it in house extensively >>> * i am very familiar with it >>> * its straight forward to use (start/stop resources on failover) >>> >>>Has anyone else used heartbeat to do failover? Are there docs that I >> >>can >> >> >>>be pointed on this specific type of setup? >> >> >>Heartbeat should work fine. We have customers using Red Hat''s=20 >>CluManager, which is similar. We are currently writing the docs, I am=20 >>very interested in incorporating your experiences, especially since >=20 > you=20 >=20 >>have Heartbeat familiarity. >> >> >>>I know how to configure heartbeat and to use STONITH to make sure the >>>secondary device will not write to the shared storage device at the >>>same time as the primary device. >> >>That''s the key - the shared storage must never be touched by two >=20 > servers >=20 >>at once. >> >> >>>My main questions lie in what resources to stop/start on failversince>>>both (for example) OSS''s are active for one OST and failver for the >>>other OST. >> >>You should never have to stop the active primary OST when failing over >=20 >=20 >>the other OST to the secondary. Failover is generally transparent to >=20 > the >=20 >>clients, their applications may block for a moment during failover, >=20 > but=20 >=20 >>should continue on with the new server. Failback of course requires >=20 > you=20 >=20 >>to stop the service on the secondary before starting it on the >=20 > primary. >=20 >>cliffw >> >> >> >>>Thanks, >>>Steve >> >> >=20
Ragnar Kjørstad
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
On Wed, Sep 07, 2005 at 02:55:10PM -0700, cliff white wrote:> >When I say take down mds01 and want mds02 to take over (for an upgrade > >or something) how do my client now know to contact mds02 instead mds01 ? > >Wouldn''t a floating IP address make sense in this case? > > > >Here here is appreciated. > > Steve - > Two answers, future and current. > > Our new mountconfig will allow you to specify > multiple MDSs as part of the mount command. Unfortunately, the new > mountconfig hasn''t been released yet, it will be soon. For now, you > will not be able to specify the client mount in /etc/fstab. > Instead, you will have to use lconf to mount the clients.So one can''t use floating IP for this? -- Ragnar Kjørstad Software Engineer Scali - http://www.scali.com Scaling the Linux Datacenter
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Ragnar Kjørstad wrote:> On Wed, Sep 07, 2005 at 02:55:10PM -0700, cliff white wrote: > >>>When I say take down mds01 and want mds02 to take over (for an upgrade >>>or something) how do my client now know to contact mds02 instead mds01 ? >>>Wouldn''t a floating IP address make sense in this case? >>> >>>Here here is appreciated. >> >>Steve - >>Two answers, future and current. >> >>Our new mountconfig will allow you to specify >>multiple MDSs as part of the mount command. Unfortunately, the new >>mountconfig hasn''t been released yet, it will be soon. For now, you >>will not be able to specify the client mount in /etc/fstab. >>Instead, you will have to use lconf to mount the clients. > > > So one can''t use floating IP for this? > >We do not support IP takeover at all. Portals cannot handle it. cliffw
Mc Carthy, Fergal
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
However be warned that when the lconf --failover completes that doesn''t necessarily mean that the device has fully finished on the current node; it happens in the background and can take a few seconds sometimes. This can lead to the lconf --failover complaining about failing to unload modules sometimes. However with the latest versions lconf --failover should change the stopped devices to readonly so the local node can''t make anymore changes to them and it is safe to start them running on the alternate server. Fergal. -- Fergal.McCarthy@HP.com (The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as "HP CONFIDENTIAL".) -----Original Message----- From: lustre-discuss-admin@lists.clusterfs.com [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of cliff white Sent: 06 September 2005 18:31 To: Nielsen, Steve Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; support@clusterfs.com Subject: [Lustre-discuss] Re: setting up failover in lustre... any recommendations? Nielsen, Steve wrote:> So for both OSS/MDS i don''t need to float any ips between the boxesand> i don''t need to restart the services. the only thing then viaheartbeat> I need to do is detect the other side down and "stonith" it? Thenthings> will be good ?Steve - Just wanted to check back and see how you were doing with this. One thing I didn''t mention, when failing back the service from secondary to primary, you should stop the service with ''lconf --failover'' which will be quicker. cliffw>=20 > Steve >=20 > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com]=20 > Sent: Thursday, September 01, 2005 11:59 AM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? >=20 > Nielsen, Steve wrote: >=20 >>I am working on setting this up now. Just need to trudge through it. >> >>I will share my experiences when done. >> >>Quick question. For failover over to work I need to have ips that >=20 > float >=20 >>between the devices. Won''t this require me to restart the lustre >>service ? (i am on rhel 4). So a "service lustre restart" should work >>right ? >=20 >=20 > Unfortunately, we do not support IP takeover at this time. What we dois> this: > The servers are configured with a specific IP. > The clients know about both IPs, and will attempt to connect in a=20 > round-robin fashion until they succeed. >=20 > Here''s a typically configuration for OST failover: (servers orlandoand=20> oscar) >=20 > --add ost --node orlando --ost ost1-home --failover --group orlando \ > --lov lov-home --dev /dev/ost1 > --add ost --node orlando --ost ost2-home --failover \ > --lov lov-home --dev /dev/ost2 >=20 > --add ost --node oscar --ost ost2-home --failover --group oscar \ > --lov lov-home --dev /dev/ost2 > --add ost --node oscar --ost ost1-home --failover \ > --lov lov-home --dev /dev/ost1 >=20 > cliffw >=20 >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>Sent: Thursday, September 01, 2005 11:43 AM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>Hi, >>> >>>I have a couple questions about setting up failver with Lustre. Any >> >>help >> >> >>>is appreciated. >>> >>>Definitions: >>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>ssd - shared storage device >>>oss - object storage server >>>mds - meta data server >>> >>>Here is my OSS setup: >>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> oss1 #----(failover)----# ssd2 >>> # # >>> | | >>> | | >>> |(primary) | (primary) >>> | | >>> | | >>> # # >>> ssd1 #----(failover)----# oss2 >>> >>>So: >>> oss1 is primary for ssd1 >>> oss1 is failover for ssd2 >>> oss2 is primary for ssd2 >>> oss2 is failover for ssd1 >>> >>>Here is my MDS setup: >>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> +-----# ssd3 #------+ >>> | | >>> | (primary) | (failover) >>> | | >>> | | >>> # # >>> mds1 mds2 >>> >>>So: >>> mds1 is primary for ssd3 >>> mds2 is failover for ssd3 >>> >>>I am now at the stage where I want to implement failover for theOSS''s>>>and MDS seutp. >>> >>>I would prefer to use heartbeat from linux-ha.org for the following >>>reasons: >>> * it''s actively maintained >>> * we use it in house extensively >>> * i am very familiar with it >>> * its straight forward to use (start/stop resources on failover) >>> >>>Has anyone else used heartbeat to do failover? Are there docs that I >> >>can >> >> >>>be pointed on this specific type of setup? >> >> >>Heartbeat should work fine. We have customers using Red Hat''s=20 >>CluManager, which is similar. We are currently writing the docs, I am=20 >>very interested in incorporating your experiences, especially since >=20 > you=20 >=20 >>have Heartbeat familiarity. >> >> >>>I know how to configure heartbeat and to use STONITH to make sure the >>>secondary device will not write to the shared storage device at the >>>same time as the primary device. >> >>That''s the key - the shared storage must never be touched by two >=20 > servers >=20 >>at once. >> >> >>>My main questions lie in what resources to stop/start on failversince>>>both (for example) OSS''s are active for one OST and failver for the >>>other OST. >> >>You should never have to stop the active primary OST when failing over >=20 >=20 >>the other OST to the secondary. Failover is generally transparent to >=20 > the >=20 >>clients, their applications may block for a moment during failover, >=20 > but=20 >=20 >>should continue on with the new server. Failback of course requires >=20 > you=20 >=20 >>to stop the service on the secondary before starting it on the >=20 > primary. >=20 >>cliffw >> >> >> >>>Thanks, >>>Steve >> >> >=20_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.clusterfs.com https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Nielsen, Steve wrote:> Cliff, > > I am a little lost then as to what the command lines would be to > start/stop MDS within heartbeat. That is where my confusion lies. If > you could help with the commandlines that would be great.No problem. First, what version of Lustre are you using, and what type of network? (Ethernet, IB, Elan, etc)> > Here is a guess at at what I need to do: > > - mds01 and mds02 are my MDS servers with a shared storage device for > the metadata that they are both connected to. > > - using heartbeat to control failover between the two MDS servers > > - both servers are configured to NOT bring up the lustre service via the > initscripts (heatbeat will control this) > > - configure mds01 in heartbeat as the "prefered" server to run the > active MDS (initially)This is all correct - you need to make one change to your configuration. You have: --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 Remove the ''--group'' entry for the secondary server. --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev /dev/vg_mds1/lv_mds1 --failover || exit 31> > What commandline would I run to bring up MDS then? The init script that > comes with lustre or a "raw" lconf command?If you are running Lustre > 1.4.4 you can symlink /etc/init.d/lustre to <path to service scripts>/mds01 (this is the --group name, so it would be ''mds01'' on both nodes) and use the symlink to start/stop the mds pair, this is our preferred failover method. Second choice would be hand-running ''/etc/init.d/lustre start'' Last, you could run lconf.> > On failover heartbeat on mds02 needs to execute a series of commands to > take over MDS from mds01: > - stonith mds01 (that way it reboots, comes up, sees mds02 is > the master and does nothing) > - start lustre locally on mds02. What is the command line for > this ? I suspect its an lconf command?The symlink method should work for starting either node, it works from the ''--group'' parameter. That is why you need to remove the duplicate --group. You can also use lconf - ''lconf --group mds01 --select mds01=mds02 <config file>''> > On failing back to mds01 what commands do I need to run? > - stonith mds02 > - start lustre locally on mds01. What is the command line for > this ? I suspect its an lconf command?lconf with --failover will do a quick shutdown of mds02, once that shutdown is complete, you would start mds01, with the service symlink, /etc/init.d/lustre or lconf. That would be transparent for the clients. Personally, I would avoid the stonith on failback, but that would work, and should also be transparent for the clients.> > As a side note: > In trying different things to get MDS to failover I get the error > message below: Do you know what it means? I am running the same version > of lustre everywhere (kernel, modules, ..) > > 2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol > error connecting to host 10.10.1.2 on port 988: Is it running a > compatible version of Lustre?This is a known issue, port 988 is withing the range of ports handed out by the portmapper. It is possible that an RPC service is starting before Lustre and grabbing that port. Typically, disabling nfs and nfslock avoids the problem, it''s a issue with other services that collide with the portmapper range, see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=103401 for more explaination of the issue. cliffw> > Thanks, > Steve > > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com] > Sent: Thursday, September 08, 2005 11:08 AM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? > > Nielsen, Steve wrote: > >>Cliff, >> >>I changed my clients to use the lconf call as you recommend below. >> >>I am able to mount the file system without issues on the client using > > this method. > >>However when I shutdown lustre on mds01 (my main mds server) as a test > > mds02 does not take over. > >>Are there commands I need to run on my standby mds02 server to enable > > it to take over? > >>Also when things failback to mds01 is there something I need to run to > > enable that? > I think I may need some more detail on your setup, please > expand if necessary. > > First, only one MDS should be running at a time. This is > very important - you show _never_ have both servers in the > failover pair active at the same time. Bad Things can happen > to your metadata. > > For failover, you will have to start the second mds server > after the first mds is down. Normally, this is done with an > HA package (Heartbeat, CluManager, etc). When the secondary > server starts, it should access the shared storage and go. > > If that''s not happening, we need to see the logs (syslog,dmesg, > anything on the console) there should be errors. Check logs on the > MDS and OST. > > For failback, you should stop mds02 with the --failover option. > This will do a quick shutdown - then start mds01. > cliffw > > >>Thanks, >>Steve >> >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com] >>Sent: Wed 9/7/2005 5:55 PM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > > support@clusterfs.com > >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>Cliff, >>> >>>I am not clear on how to set MDS active/passive failover. >>> >>>Here is my MDS setup: >>>====================>>>--add node --node mds01 || exit 10 >>>--add node --node mds02 || exit 11 >>> >>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20 >>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21 >>> >>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev >>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 >>> >>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev >>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 >>> >>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 >>>--stripe_pattern 0 || exit 32 >>> >>> >>>Then on my client in /etc/fstab I have: >>>======================================>>>mds01:/mds1/client /mnt/lustre lustre rw 0 0 >>> >>>When I say take down mds01 and want mds02 to take over (for an upgrade >>>or something) how do my client now know to contact mds02 instead mds01 > > ? > >>>Wouldn''t a floating IP address make sense in this case? >>> >>>Here here is appreciated. >> >> >>Steve - >>Two answers, future and current. >> >>Our new mountconfig will allow you to specify >>multiple MDSs as part of the mount command. Unfortunately, the new >>mountconfig hasn''t been released yet, it will be soon. For now, you >>will not be able to specify the client mount in /etc/fstab. >>Instead, you will have to use lconf to mount the clients. >> >>In your setup script, be sure you specify a client >> (the single line will cover any number of actual clients): >> >>--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 >> >>Then on the client node: >>''lconf --node client <your xml file>'' >> >>will mount the filesystem - failover of the mds will be transparent to >>the clients, they will know to try the secondary mds if the primary is >>unavailable. >>For now, you may wish to put the lconf command in a script, and have > > the > >>script called as part of your normal startup. >> >>cliffw >> >> >> >> >>>Thanks, >>>Steve >>> >>>-----Original Message----- >>>From: cliff white [mailto:cliffw@clusterfs.com] >>>Sent: Tuesday, September 06, 2005 12:31 PM >>>To: Nielsen, Steve >>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>support@clusterfs.com >>>Subject: Re: setting up failover in lustre... any recommendations? >>> >>>Nielsen, Steve wrote: >>> >>> >>> >>>>So for both OSS/MDS i don''t need to float any ips between the boxes >>> >>>and >>> >>> >>> >>>>i don''t need to restart the services. the only thing then via >>> >>>heartbeat >>> >>> >>> >>>>I need to do is detect the other side down and "stonith" it? Then >>> >>>things >>> >>> >>> >>>>will be good ? >>> >>> >>>Steve - >>>Just wanted to check back and see how you were doing with this. >>>One thing I didn''t mention, when failing back the service from > > secondary > >>>to primary, you should stop the service with ''lconf --failover'' which >>>will be quicker. >>>cliffw >>> >>> >>> >>> >>> >>>>Steve >>>> >>>>-----Original Message----- >>>>From: cliff white [mailto:cliffw@clusterfs.com] >>>>Sent: Thursday, September 01, 2005 11:59 AM >>>>To: Nielsen, Steve >>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>support@clusterfs.com >>>>Subject: Re: setting up failover in lustre... any recommendations? >>>> >>>>Nielsen, Steve wrote: >>>> >>>> >>>> >>>> >>>>>I am working on setting this up now. Just need to trudge through it. >>>>> >>>>>I will share my experiences when done. >>>>> >>>>>Quick question. For failover over to work I need to have ips that >>>> >>>>float >>>> >>>> >>>> >>>> >>>>>between the devices. Won''t this require me to restart the lustre >>>>>service ? (i am on rhel 4). So a "service lustre restart" should > > work > >>>>>right ? >>>> >>>> >>>>Unfortunately, we do not support IP takeover at this time. What we do >>> >>>is >>> >>> >>> >>>>this: >>>>The servers are configured with a specific IP. >>>>The clients know about both IPs, and will attempt to connect in a >>>>round-robin fashion until they succeed. >>>> >>>>Here''s a typically configuration for OST failover: (servers orlando >>> >>>and >>> >>> >>> >>>>oscar) >>>> >>>>--add ost --node orlando --ost ost1-home --failover --group orlando \ >>>>--lov lov-home --dev /dev/ost1 >>>>--add ost --node orlando --ost ost2-home --failover \ >>>>--lov lov-home --dev /dev/ost2 >>>> >>>>--add ost --node oscar --ost ost2-home --failover --group oscar \ >>>>--lov lov-home --dev /dev/ost2 >>>>--add ost --node oscar --ost ost1-home --failover \ >>>>--lov lov-home --dev /dev/ost1 >>>> >>>>cliffw >>>> >>>> >>>> >>>> >>>>>Steve >>>>> >>>>>-----Original Message----- >>>>>From: cliff white [mailto:cliffw@clusterfs.com] >>>>>Sent: Thursday, September 01, 2005 11:43 AM >>>>>To: Nielsen, Steve >>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>>support@clusterfs.com >>>>>Subject: Re: setting up failover in lustre... any recommendations? >>>>> >>>>>Nielsen, Steve wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Hi, >>>>>> >>>>>>I have a couple questions about setting up failver with Lustre. Any >>>>> >>>>>help >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>is appreciated. >>>>>> >>>>>>Definitions: >>>>>>===========>>>>>>ssd - shared storage device >>>>>>oss - object storage server >>>>>>mds - meta data server >>>>>> >>>>>>Here is my OSS setup: >>>>>>====================>>>>>>oss1 #----(failover)----# ssd2 >>>>>># # >>>>>>| | >>>>>>| | >>>>>>|(primary) | (primary) >>>>>>| | >>>>>>| | >>>>>># # >>>>>>ssd1 #----(failover)----# oss2 >>>>>> >>>>>>So: >>>>>>oss1 is primary for ssd1 >>>>>>oss1 is failover for ssd2 >>>>>>oss2 is primary for ssd2 >>>>>>oss2 is failover for ssd1 >>>>>> >>>>>>Here is my MDS setup: >>>>>>====================>>>>>> +-----# ssd3 #------+ >>>>>> | | >>>>>> | (primary) | (failover) >>>>>> | | >>>>>> | | >>>>>> # # >>>>>>mds1 mds2 >>>>>> >>>>>>So: >>>>>>mds1 is primary for ssd3 >>>>>>mds2 is failover for ssd3 >>>>>> >>>>>>I am now at the stage where I want to implement failover for the >>> >>>OSS''s >>> >>> >>> >>>>>>and MDS seutp. >>>>>> >>>>>>I would prefer to use heartbeat from linux-ha.org for the following >>>>>>reasons: >>>>>>* it''s actively maintained >>>>>>* we use it in house extensively >>>>>>* i am very familiar with it >>>>>>* its straight forward to use (start/stop resources on failover) >>>>>> >>>>>>Has anyone else used heartbeat to do failover? Are there docs that > > I > >>>>>can >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>be pointed on this specific type of setup? >>>>> >>>>> >>>>>Heartbeat should work fine. We have customers using Red Hat''s >>>>>CluManager, which is similar. We are currently writing the docs, I > > am > >>>>>very interested in incorporating your experiences, especially since >>>> >>>>you >>>> >>>> >>>> >>>> >>>>>have Heartbeat familiarity. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>I know how to configure heartbeat and to use STONITH to make sure > > the > >>>>>>secondary device will not write to the shared storage device at the >>>>>>same time as the primary device. >>>>> >>>>>That''s the key - the shared storage must never be touched by two >>>> >>>>servers >>>> >>>> >>>> >>>> >>>>>at once. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>My main questions lie in what resources to stop/start on failver >>> >>>since >>> >>> >>> >>>>>>both (for example) OSS''s are active for one OST and failver for the >>>>>>other OST. >>>>> >>>>>You should never have to stop the active primary OST when failing > > over > >>>> >>>>>the other OST to the secondary. Failover is generally transparent to >>>> >>>>the >>>> >>>> >>>> >>>> >>>>>clients, their applications may block for a moment during failover, >>>> >>>>but >>>> >>>> >>>> >>>> >>>>>should continue on with the new server. Failback of course requires >>>> >>>>you >>>> >>>> >>>> >>>> >>>>>to stop the service on the secondary before starting it on the >>>> >>>>primary. >>>> >>>> >>>> >>>> >>>>>cliffw >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Thanks, >>>>>>Steve >>>>> >>>>> >
Nielsen, Steve
2006-May-19 07:36 UTC
[Lustre-discuss] RE: setting up failover in lustre... any recommendations?
Thanks. I will setup this up and test it. BTW, I am using lustre 1.4.5 and regular old ethternet. I assume for the OSTs the same command lineas as MDSs would apply as well (with correct --group config and symlinking lustre) ? On the --failover for failback does the lconf command wait till its complete? Or should I sleep in the script after issuing lconf --failvoer ? Steve -----Original Message----- From: cliff white [mailto:cliffw@clusterfs.com]=20 Sent: Thursday, September 08, 2005 2:12 PM To: Nielsen, Steve Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; support@clusterfs.com Subject: Re: setting up failover in lustre... any recommendations? Nielsen, Steve wrote:> Cliff, >=20 > I am a little lost then as to what the command lines would be to > start/stop MDS within heartbeat. That is where my confusion lies. If > you could help with the commandlines that would be great.No problem. First, what version of Lustre are you using, and what type of network? (Ethernet, IB, Elan, etc)>=20 > Here is a guess at at what I need to do: >=20 > - mds01 and mds02 are my MDS servers with a shared storage device for > the metadata that they are both connected to. >=20 > - using heartbeat to control failover between the two MDS servers >=20 > - both servers are configured to NOT bring up the lustre service viathe> initscripts (heatbeat will control this)=20 >=20 > - configure mds01 in heartbeat as the "prefered" server to run the > active MDS (initially)This is all correct - you need to make one change to your configuration. You have: --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 Remove the ''--group'' entry for the secondary server. --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev /dev/vg_mds1/lv_mds1 --failover || exit 31>=20 > What commandline would I run to bring up MDS then? The init scriptthat> comes with lustre or a "raw" lconf command?If you are running Lustre > 1.4.4 you can symlink /etc/init.d/lustre to <path to service scripts>/mds01 (this is the --group name, so it=20 would be ''mds01'' on both nodes) and use the symlink to start/stop the=20 mds pair, this is our preferred failover method. Second choice would be hand-running ''/etc/init.d/lustre start'' Last, you could run lconf.>=20 > On failover heartbeat on mds02 needs to execute a series of commandsto> take over MDS from mds01: > - stonith mds01 (that way it reboots, comes up, sees mds02 is > the master and does nothing) > - start lustre locally on mds02. What is the command line for > this ? I suspect its an lconf command?The symlink method should work for starting either node, it works from=20 the ''--group'' parameter. That is why you need to remove the duplicate=20 --group. You can also use lconf - ''lconf --group mds01 --select=20 mds01=3Dmds02 <config file>''>=20 > On failing back to mds01 what commands do I need to run?=20 > - stonith mds02 > - start lustre locally on mds01. What is the command line for > this ? I suspect its an lconf command?lconf with --failover will do a quick shutdown of mds02, once that shutdown is complete, you would start mds01, with the service=20 symlink, /etc/init.d/lustre or lconf. That would be transparent for the=20 clients. Personally, I would avoid the stonith on failback, but that would work, and should also be transparent for the clients.>=20 > As a side note: > In trying different things to get MDS to failover I get the error > message below: Do you know what it means? I am running the sameversion> of lustre everywhere (kernel, modules, ..) >=20 > 2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol > error connecting to host 10.10.1.2 on port 988: Is it running a > compatible version of Lustre?This is a known issue, port 988 is withing the range of ports handed out by the portmapper. It is possible that an RPC service is starting before Lustre and grabbing that port. Typically, disabling nfs and nfslock avoids the problem, it''s a issue with other services that collide with=20 the portmapper range, see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=3D103401 for more explaination of the issue. cliffw>=20 > Thanks, > Steve >=20 > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com]=20 > Sent: Thursday, September 08, 2005 11:08 AM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? >=20 > Nielsen, Steve wrote: >=20 >>Cliff, >> >>I changed my clients to use the lconf call as you recommend below. >> >>I am able to mount the file system without issues on the client using >=20 > this method. >=20 >>However when I shutdown lustre on mds01 (my main mds server) as a test >=20 > mds02 does not take over. >=20 >>Are there commands I need to run on my standby mds02 server to enable >=20 > it to take over? >=20 >>Also when things failback to mds01 is there something I need to run to >=20 > enable that? > I think I may need some more detail on your setup, please > expand if necessary. >=20 > First, only one MDS should be running at a time. This is > very important - you show _never_ have both servers in the > failover pair active at the same time. Bad Things can happen > to your metadata. >=20 > For failover, you will have to start the second mds server > after the first mds is down. Normally, this is done with an > HA package (Heartbeat, CluManager, etc). When the secondary > server starts, it should access the shared storage and go. >=20 > If that''s not happening, we need to see the logs (syslog,dmesg, > anything on the console) there should be errors. Check logs on the > MDS and OST. >=20 > For failback, you should stop mds02 with the --failover option. > This will do a quick shutdown - then start mds01. > cliffw >=20 >=20 >>Thanks, >>Steve >> >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com] >>Sent: Wed 9/7/2005 5:55 PM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >=20 > support@clusterfs.com >=20 >>Subject: Re: setting up failover in lustre... any recommendations? >>=20 >>Nielsen, Steve wrote: >> >> >>>Cliff, >>> >>>I am not clear on how to set MDS active/passive failover. >>> >>>Here is my MDS setup: >>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>--add node --node mds01 || exit 10 >>>--add node --node mds02 || exit 11 >>> >>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20 >>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21 >>> >>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev >>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 >>> >>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev >>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 >>> >>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 >>>--stripe_pattern 0 || exit 32 >>> >>> >>>Then on my client in /etc/fstab I have: >>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>mds01:/mds1/client /mnt/lustre lustre rw 00>>> >>>When I say take down mds01 and want mds02 to take over (for anupgrade>>>or something) how do my client now know to contact mds02 insteadmds01>=20 > ? >=20 >>>Wouldn''t a floating IP address make sense in this case? >>> >>>Here here is appreciated. >> >> >>Steve - >>Two answers, future and current. >> >>Our new mountconfig will allow you to specify >>multiple MDSs as part of the mount command. Unfortunately, the new >>mountconfig hasn''t been released yet, it will be soon. For now, you >>will not be able to specify the client mount in /etc/fstab. >>Instead, you will have to use lconf to mount the clients. >> >>In your setup script, be sure you specify a client >> (the single line will cover any number of actual clients): >> >>--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 >> >>Then on the client node: >>''lconf --node client <your xml file>'' >> >>will mount the filesystem - failover of the mds will be transparent to >>the clients, they will know to try the secondary mds if the primary is >>unavailable. >>For now, you may wish to put the lconf command in a script, and have >=20 > the >=20 >>script called as part of your normal startup. >> >>cliffw >> >> >> >> >>>Thanks, >>>Steve >>> >>>-----Original Message----- >>>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>>Sent: Tuesday, September 06, 2005 12:31 PM >>>To: Nielsen, Steve >>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>support@clusterfs.com >>>Subject: Re: setting up failover in lustre... any recommendations? >>> >>>Nielsen, Steve wrote: >>> >>> >>> >>>>So for both OSS/MDS i don''t need to float any ips between the boxes >>> >>>and >>> >>> >>> >>>>i don''t need to restart the services. the only thing then via >>> >>>heartbeat >>> >>> >>> >>>>I need to do is detect the other side down and "stonith" it? Then >>> >>>things >>> >>> >>> >>>>will be good ? >>> >>> >>>Steve - >>>Just wanted to check back and see how you were doing with this. >>>One thing I didn''t mention, when failing back the service from >=20 > secondary >=20 >>>to primary, you should stop the service with ''lconf --failover'' which >>>will be quicker. >>>cliffw >>> >>> >>> >>> >>> >>>>Steve >>>> >>>>-----Original Message----- >>>>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>>>Sent: Thursday, September 01, 2005 11:59 AM >>>>To: Nielsen, Steve >>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>support@clusterfs.com >>>>Subject: Re: setting up failover in lustre... any recommendations? >>>> >>>>Nielsen, Steve wrote: >>>> >>>> >>>> >>>> >>>>>I am working on setting this up now. Just need to trudge throughit.>>>>> >>>>>I will share my experiences when done. >>>>> >>>>>Quick question. For failover over to work I need to have ips that >>>> >>>>float >>>> >>>> >>>> >>>> >>>>>between the devices. Won''t this require me to restart the lustre >>>>>service ? (i am on rhel 4). So a "service lustre restart" should >=20 > work >=20 >>>>>right ? >>>> >>>> >>>>Unfortunately, we do not support IP takeover at this time. What wedo>>> >>>is >>> >>> >>> >>>>this: >>>>The servers are configured with a specific IP. >>>>The clients know about both IPs, and will attempt to connect in a=20 >>>>round-robin fashion until they succeed. >>>> >>>>Here''s a typically configuration for OST failover: (servers orlando >>> >>>and=20 >>> >>> >>> >>>>oscar) >>>> >>>>--add ost --node orlando --ost ost1-home --failover --group orlando\>>>>--lov lov-home --dev /dev/ost1 >>>>--add ost --node orlando --ost ost2-home --failover \ >>>>--lov lov-home --dev /dev/ost2 >>>> >>>>--add ost --node oscar --ost ost2-home --failover --group oscar \ >>>>--lov lov-home --dev /dev/ost2 >>>>--add ost --node oscar --ost ost1-home --failover \ >>>>--lov lov-home --dev /dev/ost1 >>>> >>>>cliffw >>>> >>>> >>>> >>>> >>>>>Steve >>>>> >>>>>-----Original Message----- >>>>>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>>>>Sent: Thursday, September 01, 2005 11:43 AM >>>>>To: Nielsen, Steve >>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>>support@clusterfs.com >>>>>Subject: Re: setting up failover in lustre... any recommendations? >>>>> >>>>>Nielsen, Steve wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Hi, >>>>>> >>>>>>I have a couple questions about setting up failver with Lustre.Any>>>>> >>>>>help >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>is appreciated. >>>>>> >>>>>>Definitions: >>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>>ssd - shared storage device >>>>>>oss - object storage server >>>>>>mds - meta data server >>>>>> >>>>>>Here is my OSS setup: >>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>>oss1 #----(failover)----# ssd2 >>>>>># # >>>>>>| | >>>>>>| | >>>>>>|(primary) | (primary) >>>>>>| | >>>>>>| | >>>>>># # >>>>>>ssd1 #----(failover)----# oss2 >>>>>> >>>>>>So: >>>>>>oss1 is primary for ssd1 >>>>>>oss1 is failover for ssd2 >>>>>>oss2 is primary for ssd2 >>>>>>oss2 is failover for ssd1 >>>>>> >>>>>>Here is my MDS setup: >>>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>> +-----# ssd3 #------+ >>>>>> | | >>>>>> | (primary) | (failover) >>>>>> | | >>>>>> | | >>>>>> # # >>>>>>mds1 mds2 >>>>>> >>>>>>So: >>>>>>mds1 is primary for ssd3 >>>>>>mds2 is failover for ssd3 >>>>>> >>>>>>I am now at the stage where I want to implement failover for the >>> >>>OSS''s >>> >>> >>> >>>>>>and MDS seutp. >>>>>> >>>>>>I would prefer to use heartbeat from linux-ha.org for thefollowing>>>>>>reasons: >>>>>>* it''s actively maintained >>>>>>* we use it in house extensively >>>>>>* i am very familiar with it >>>>>>* its straight forward to use (start/stop resources on failover) >>>>>> >>>>>>Has anyone else used heartbeat to do failover? Are there docs that >=20 > I >=20 >>>>>can >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>be pointed on this specific type of setup? >>>>> >>>>> >>>>>Heartbeat should work fine. We have customers using Red Hat''s=20 >>>>>CluManager, which is similar. We are currently writing the docs, I >=20 > am=20 >=20 >>>>>very interested in incorporating your experiences, especially since >>>> >>>>you=20 >>>> >>>> >>>> >>>> >>>>>have Heartbeat familiarity. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>I know how to configure heartbeat and to use STONITH to make sure >=20 > the >=20 >>>>>>secondary device will not write to the shared storage device atthe>>>>>>same time as the primary device. >>>>> >>>>>That''s the key - the shared storage must never be touched by two >>>> >>>>servers >>>> >>>> >>>> >>>> >>>>>at once. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>My main questions lie in what resources to stop/start on failver >>> >>>since >>> >>> >>> >>>>>>both (for example) OSS''s are active for one OST and failver forthe>>>>>>other OST. >>>>> >>>>>You should never have to stop the active primary OST when failing >=20 > over >=20 >>>> >>>>>the other OST to the secondary. Failover is generally transparentto>>>> >>>>the >>>> >>>> >>>> >>>> >>>>>clients, their applications may block for a moment during failover, >>>> >>>>but=20 >>>> >>>> >>>> >>>> >>>>>should continue on with the new server. Failback of course requires >>>> >>>>you=20 >>>> >>>> >>>> >>>> >>>>>to stop the service on the secondary before starting it on the >>>> >>>>primary. >>>> >>>> >>>> >>>> >>>>>cliffw >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Thanks, >>>>>>Steve >>>>> >>>>> >=20
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Nielsen, Steve wrote:> Thanks. > > I will setup this up and test it.Great, let us know how it goes.> > BTW, I am using lustre 1.4.5 and regular old ethternet.Thanks.> > I assume for the OSTs the same command lineas as MDSs would apply as > well (with correct --group config and symlinking lustre) ?Yes> > On the --failover for failback does the lconf command wait till its > complete? Or should I sleep in the script after issuing lconf --failvoerThings should be complete when lconf returns, it sets the device read-only, so you should be okay starting the failback without a sleep. cliffw> ? > > Steve > > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com] > Sent: Thursday, September 08, 2005 2:12 PM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? > > Nielsen, Steve wrote: > >>Cliff, >> >>I am a little lost then as to what the command lines would be to >>start/stop MDS within heartbeat. That is where my confusion lies. If >>you could help with the commandlines that would be great. > > No problem. > First, what version of Lustre are you using, and what type of network? > (Ethernet, IB, Elan, etc) > >>Here is a guess at at what I need to do: >> >>- mds01 and mds02 are my MDS servers with a shared storage device for >>the metadata that they are both connected to. >> >>- using heartbeat to control failover between the two MDS servers >> >>- both servers are configured to NOT bring up the lustre service via > > the > >>initscripts (heatbeat will control this) >> >>- configure mds01 in heartbeat as the "prefered" server to run the >>active MDS (initially) > > > This is all correct - you need to make one change to your configuration. > You have: > > --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 > > --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 > > Remove the ''--group'' entry for the secondary server. > > --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover || exit 31 > >>What commandline would I run to bring up MDS then? The init script > > that > >>comes with lustre or a "raw" lconf command? > > > If you are running Lustre > 1.4.4 you can symlink /etc/init.d/lustre > to <path to service scripts>/mds01 (this is the --group name, so it > would be ''mds01'' on both nodes) and use the symlink to start/stop the > mds pair, this is our preferred failover method. > Second choice would be hand-running ''/etc/init.d/lustre start'' > Last, you could run lconf. > > >>On failover heartbeat on mds02 needs to execute a series of commands > > to > >>take over MDS from mds01: >> - stonith mds01 (that way it reboots, comes up, sees mds02 is >>the master and does nothing) >> - start lustre locally on mds02. What is the command line for >>this ? I suspect its an lconf command? > > > The symlink method should work for starting either node, it works from > the ''--group'' parameter. That is why you need to remove the duplicate > --group. You can also use lconf - ''lconf --group mds01 --select > mds01=mds02 <config file>'' > >>On failing back to mds01 what commands do I need to run? >> - stonith mds02 >> - start lustre locally on mds01. What is the command line for >>this ? I suspect its an lconf command? > > > lconf with --failover will do a quick shutdown of mds02, once > that shutdown is complete, you would start mds01, with the service > symlink, /etc/init.d/lustre or lconf. That would be transparent for the > clients. > > Personally, I would avoid the stonith on failback, but that would work, > and should also be transparent for the clients. > > >>As a side note: >>In trying different things to get MDS to failover I get the error >>message below: Do you know what it means? I am running the same > > version > >>of lustre everywhere (kernel, modules, ..) >> >>2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol >>error connecting to host 10.10.1.2 on port 988: Is it running a >>compatible version of Lustre? > > > This is a known issue, port 988 is withing the range of ports handed out > by the portmapper. It is possible that an RPC service is starting before > Lustre and grabbing that port. Typically, disabling nfs and nfslock > avoids the problem, it''s a issue with other services that collide with > the portmapper range, see > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=103401 > for more explaination of the issue. > cliffw > > >>Thanks, >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com] >>Sent: Thursday, September 08, 2005 11:08 AM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>Cliff, >>> >>>I changed my clients to use the lconf call as you recommend below. >>> >>>I am able to mount the file system without issues on the client using >> >>this method. >> >> >>>However when I shutdown lustre on mds01 (my main mds server) as a test >> >>mds02 does not take over. >> >> >>>Are there commands I need to run on my standby mds02 server to enable >> >>it to take over? >> >> >>>Also when things failback to mds01 is there something I need to run to >> >>enable that? >>I think I may need some more detail on your setup, please >>expand if necessary. >> >>First, only one MDS should be running at a time. This is >>very important - you show _never_ have both servers in the >>failover pair active at the same time. Bad Things can happen >>to your metadata. >> >>For failover, you will have to start the second mds server >>after the first mds is down. Normally, this is done with an >>HA package (Heartbeat, CluManager, etc). When the secondary >>server starts, it should access the shared storage and go. >> >>If that''s not happening, we need to see the logs (syslog,dmesg, >>anything on the console) there should be errors. Check logs on the >>MDS and OST. >> >>For failback, you should stop mds02 with the --failover option. >>This will do a quick shutdown - then start mds01. >>cliffw >> >> >> >>>Thanks, >>>Steve >>> >>> >>>-----Original Message----- >>>From: cliff white [mailto:cliffw@clusterfs.com] >>>Sent: Wed 9/7/2005 5:55 PM >>>To: Nielsen, Steve >>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >> >>support@clusterfs.com >> >> >>>Subject: Re: setting up failover in lustre... any recommendations? >>> >>>Nielsen, Steve wrote: >>> >>> >>> >>>>Cliff, >>>> >>>>I am not clear on how to set MDS active/passive failover. >>>> >>>>Here is my MDS setup: >>>>====================>>>>--add node --node mds01 || exit 10 >>>>--add node --node mds02 || exit 11 >>>> >>>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20 >>>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21 >>>> >>>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev >>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 >>>> >>>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev >>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 >>>> >>>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 >>>>--stripe_pattern 0 || exit 32 >>>> >>>> >>>>Then on my client in /etc/fstab I have: >>>>======================================>>>>mds01:/mds1/client /mnt/lustre lustre rw 0 > > 0 > >>>>When I say take down mds01 and want mds02 to take over (for an > > upgrade > >>>>or something) how do my client now know to contact mds02 instead > > mds01 > >>? >> >> >>>>Wouldn''t a floating IP address make sense in this case? >>>> >>>>Here here is appreciated. >>> >>> >>>Steve - >>>Two answers, future and current. >>> >>>Our new mountconfig will allow you to specify >>>multiple MDSs as part of the mount command. Unfortunately, the new >>>mountconfig hasn''t been released yet, it will be soon. For now, you >>>will not be able to specify the client mount in /etc/fstab. >>>Instead, you will have to use lconf to mount the clients. >>> >>>In your setup script, be sure you specify a client >>> (the single line will cover any number of actual clients): >>> >>>--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 >>> >>>Then on the client node: >>>''lconf --node client <your xml file>'' >>> >>>will mount the filesystem - failover of the mds will be transparent to >>>the clients, they will know to try the secondary mds if the primary is >>>unavailable. >>>For now, you may wish to put the lconf command in a script, and have >> >>the >> >> >>>script called as part of your normal startup. >>> >>>cliffw >>> >>> >>> >>> >>> >>>>Thanks, >>>>Steve >>>> >>>>-----Original Message----- >>>>From: cliff white [mailto:cliffw@clusterfs.com] >>>>Sent: Tuesday, September 06, 2005 12:31 PM >>>>To: Nielsen, Steve >>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>support@clusterfs.com >>>>Subject: Re: setting up failover in lustre... any recommendations? >>>> >>>>Nielsen, Steve wrote: >>>> >>>> >>>> >>>> >>>>>So for both OSS/MDS i don''t need to float any ips between the boxes >>>> >>>>and >>>> >>>> >>>> >>>> >>>>>i don''t need to restart the services. the only thing then via >>>> >>>>heartbeat >>>> >>>> >>>> >>>> >>>>>I need to do is detect the other side down and "stonith" it? Then >>>> >>>>things >>>> >>>> >>>> >>>> >>>>>will be good ? >>>> >>>> >>>>Steve - >>>>Just wanted to check back and see how you were doing with this. >>>>One thing I didn''t mention, when failing back the service from >> >>secondary >> >> >>>>to primary, you should stop the service with ''lconf --failover'' which >>>>will be quicker. >>>>cliffw >>>> >>>> >>>> >>>> >>>> >>>> >>>>>Steve >>>>> >>>>>-----Original Message----- >>>>>From: cliff white [mailto:cliffw@clusterfs.com] >>>>>Sent: Thursday, September 01, 2005 11:59 AM >>>>>To: Nielsen, Steve >>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>>support@clusterfs.com >>>>>Subject: Re: setting up failover in lustre... any recommendations? >>>>> >>>>>Nielsen, Steve wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>I am working on setting this up now. Just need to trudge through > > it. > >>>>>>I will share my experiences when done. >>>>>> >>>>>>Quick question. For failover over to work I need to have ips that >>>>> >>>>>float >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>between the devices. Won''t this require me to restart the lustre >>>>>>service ? (i am on rhel 4). So a "service lustre restart" should >> >>work >> >> >>>>>>right ? >>>>> >>>>> >>>>>Unfortunately, we do not support IP takeover at this time. What we > > do > >>>>is >>>> >>>> >>>> >>>> >>>>>this: >>>>>The servers are configured with a specific IP. >>>>>The clients know about both IPs, and will attempt to connect in a >>>>>round-robin fashion until they succeed. >>>>> >>>>>Here''s a typically configuration for OST failover: (servers orlando >>>> >>>>and >>>> >>>> >>>> >>>> >>>>>oscar) >>>>> >>>>>--add ost --node orlando --ost ost1-home --failover --group orlando > > \ > >>>>>--lov lov-home --dev /dev/ost1 >>>>>--add ost --node orlando --ost ost2-home --failover \ >>>>>--lov lov-home --dev /dev/ost2 >>>>> >>>>>--add ost --node oscar --ost ost2-home --failover --group oscar \ >>>>>--lov lov-home --dev /dev/ost2 >>>>>--add ost --node oscar --ost ost1-home --failover \ >>>>>--lov lov-home --dev /dev/ost1 >>>>> >>>>>cliffw >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Steve >>>>>> >>>>>>-----Original Message----- >>>>>>From: cliff white [mailto:cliffw@clusterfs.com] >>>>>>Sent: Thursday, September 01, 2005 11:43 AM >>>>>>To: Nielsen, Steve >>>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>>>support@clusterfs.com >>>>>>Subject: Re: setting up failover in lustre... any recommendations? >>>>>> >>>>>>Nielsen, Steve wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Hi, >>>>>>> >>>>>>>I have a couple questions about setting up failver with Lustre. > > Any > >>>>>>help >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>is appreciated. >>>>>>> >>>>>>>Definitions: >>>>>>>===========>>>>>>>ssd - shared storage device >>>>>>>oss - object storage server >>>>>>>mds - meta data server >>>>>>> >>>>>>>Here is my OSS setup: >>>>>>>====================>>>>>>>oss1 #----(failover)----# ssd2 >>>>>>># # >>>>>>>| | >>>>>>>| | >>>>>>>|(primary) | (primary) >>>>>>>| | >>>>>>>| | >>>>>>># # >>>>>>>ssd1 #----(failover)----# oss2 >>>>>>> >>>>>>>So: >>>>>>>oss1 is primary for ssd1 >>>>>>>oss1 is failover for ssd2 >>>>>>>oss2 is primary for ssd2 >>>>>>>oss2 is failover for ssd1 >>>>>>> >>>>>>>Here is my MDS setup: >>>>>>>====================>>>>>>>+-----# ssd3 #------+ >>>>>>>| | >>>>>>>| (primary) | (failover) >>>>>>>| | >>>>>>>| | >>>>>>># # >>>>>>>mds1 mds2 >>>>>>> >>>>>>>So: >>>>>>>mds1 is primary for ssd3 >>>>>>>mds2 is failover for ssd3 >>>>>>> >>>>>>>I am now at the stage where I want to implement failover for the >>>> >>>>OSS''s >>>> >>>> >>>> >>>> >>>>>>>and MDS seutp. >>>>>>> >>>>>>>I would prefer to use heartbeat from linux-ha.org for the > > following > >>>>>>>reasons: >>>>>>>* it''s actively maintained >>>>>>>* we use it in house extensively >>>>>>>* i am very familiar with it >>>>>>>* its straight forward to use (start/stop resources on failover) >>>>>>> >>>>>>>Has anyone else used heartbeat to do failover? Are there docs that >> >>I >> >> >>>>>>can >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>be pointed on this specific type of setup? >>>>>> >>>>>> >>>>>>Heartbeat should work fine. We have customers using Red Hat''s >>>>>>CluManager, which is similar. We are currently writing the docs, I >> >>am >> >> >>>>>>very interested in incorporating your experiences, especially since >>>>> >>>>>you >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>have Heartbeat familiarity. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>I know how to configure heartbeat and to use STONITH to make sure >> >>the >> >> >>>>>>>secondary device will not write to the shared storage device at > > the > >>>>>>>same time as the primary device. >>>>>> >>>>>>That''s the key - the shared storage must never be touched by two >>>>> >>>>>servers >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>at once. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>My main questions lie in what resources to stop/start on failver >>>> >>>>since >>>> >>>> >>>> >>>> >>>>>>>both (for example) OSS''s are active for one OST and failver for > > the > >>>>>>>other OST. >>>>>> >>>>>>You should never have to stop the active primary OST when failing >> >>over >> >> >>>>>>the other OST to the secondary. Failover is generally transparent > > to > >>>>>the >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>clients, their applications may block for a moment during failover, >>>>> >>>>>but >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>should continue on with the new server. Failback of course requires >>>>> >>>>>you >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>to stop the service on the secondary before starting it on the >>>>> >>>>>primary. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>cliffw >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Thanks, >>>>>>>Steve >>>>>> >>>>>> >
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Nielsen, Steve wrote:> Cliff, > > I am not clear on how to set MDS active/passive failover. > > Here is my MDS setup: > ====================> --add node --node mds01 || exit 10 > --add node --node mds02 || exit 11 > > --add net --node mds01 --nid mds01 --nettype tcp || exit 20 > --add net --node mds02 --nid mds02 --nettype tcp || exit 21 > > --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 > > --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 > > --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 > --stripe_pattern 0 || exit 32 > > > Then on my client in /etc/fstab I have: > ======================================> mds01:/mds1/client /mnt/lustre lustre rw 0 0 > > When I say take down mds01 and want mds02 to take over (for an upgrade > or something) how do my client now know to contact mds02 instead mds01 ? > Wouldn''t a floating IP address make sense in this case? > > Here here is appreciated.Steve - Two answers, future and current. Our new mountconfig will allow you to specify multiple MDSs as part of the mount command. Unfortunately, the new mountconfig hasn''t been released yet, it will be soon. For now, you will not be able to specify the client mount in /etc/fstab. Instead, you will have to use lconf to mount the clients. In your setup script, be sure you specify a client (the single line will cover any number of actual clients): --add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 Then on the client node: ''lconf --node client <your xml file>'' will mount the filesystem - failover of the mds will be transparent to the clients, they will know to try the secondary mds if the primary is unavailable. For now, you may wish to put the lconf command in a script, and have the script called as part of your normal startup. cliffw> > Thanks, > Steve > > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com] > Sent: Tuesday, September 06, 2005 12:31 PM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? > > Nielsen, Steve wrote: > >>So for both OSS/MDS i don''t need to float any ips between the boxes > > and > >>i don''t need to restart the services. the only thing then via > > heartbeat > >>I need to do is detect the other side down and "stonith" it? Then > > things > >>will be good ? > > > Steve - > Just wanted to check back and see how you were doing with this. > One thing I didn''t mention, when failing back the service from secondary > to primary, you should stop the service with ''lconf --failover'' which > will be quicker. > cliffw > > > >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com] >>Sent: Thursday, September 01, 2005 11:59 AM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>I am working on setting this up now. Just need to trudge through it. >>> >>>I will share my experiences when done. >>> >>>Quick question. For failover over to work I need to have ips that >> >>float >> >> >>>between the devices. Won''t this require me to restart the lustre >>>service ? (i am on rhel 4). So a "service lustre restart" should work >>>right ? >> >> >>Unfortunately, we do not support IP takeover at this time. What we do > > is > >>this: >>The servers are configured with a specific IP. >>The clients know about both IPs, and will attempt to connect in a >>round-robin fashion until they succeed. >> >>Here''s a typically configuration for OST failover: (servers orlando > > and > >>oscar) >> >>--add ost --node orlando --ost ost1-home --failover --group orlando \ >>--lov lov-home --dev /dev/ost1 >>--add ost --node orlando --ost ost2-home --failover \ >>--lov lov-home --dev /dev/ost2 >> >>--add ost --node oscar --ost ost2-home --failover --group oscar \ >>--lov lov-home --dev /dev/ost2 >>--add ost --node oscar --ost ost1-home --failover \ >>--lov lov-home --dev /dev/ost1 >> >>cliffw >> >> >>>Steve >>> >>>-----Original Message----- >>>From: cliff white [mailto:cliffw@clusterfs.com] >>>Sent: Thursday, September 01, 2005 11:43 AM >>>To: Nielsen, Steve >>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>support@clusterfs.com >>>Subject: Re: setting up failover in lustre... any recommendations? >>> >>>Nielsen, Steve wrote: >>> >>> >>> >>>>Hi, >>>> >>>>I have a couple questions about setting up failver with Lustre. Any >>> >>>help >>> >>> >>> >>>>is appreciated. >>>> >>>>Definitions: >>>>===========>>>>ssd - shared storage device >>>>oss - object storage server >>>>mds - meta data server >>>> >>>>Here is my OSS setup: >>>>====================>>>> oss1 #----(failover)----# ssd2 >>>> # # >>>> | | >>>> | | >>>> |(primary) | (primary) >>>> | | >>>> | | >>>> # # >>>> ssd1 #----(failover)----# oss2 >>>> >>>>So: >>>> oss1 is primary for ssd1 >>>> oss1 is failover for ssd2 >>>> oss2 is primary for ssd2 >>>> oss2 is failover for ssd1 >>>> >>>>Here is my MDS setup: >>>>====================>>>> +-----# ssd3 #------+ >>>> | | >>>> | (primary) | (failover) >>>> | | >>>> | | >>>> # # >>>> mds1 mds2 >>>> >>>>So: >>>> mds1 is primary for ssd3 >>>> mds2 is failover for ssd3 >>>> >>>>I am now at the stage where I want to implement failover for the > > OSS''s > >>>>and MDS seutp. >>>> >>>>I would prefer to use heartbeat from linux-ha.org for the following >>>>reasons: >>>> * it''s actively maintained >>>> * we use it in house extensively >>>> * i am very familiar with it >>>> * its straight forward to use (start/stop resources on failover) >>>> >>>>Has anyone else used heartbeat to do failover? Are there docs that I >>> >>>can >>> >>> >>> >>>>be pointed on this specific type of setup? >>> >>> >>>Heartbeat should work fine. We have customers using Red Hat''s >>>CluManager, which is similar. We are currently writing the docs, I am >>>very interested in incorporating your experiences, especially since >> >>you >> >> >>>have Heartbeat familiarity. >>> >>> >>> >>>>I know how to configure heartbeat and to use STONITH to make sure the >>>>secondary device will not write to the shared storage device at the >>>>same time as the primary device. >>> >>>That''s the key - the shared storage must never be touched by two >> >>servers >> >> >>>at once. >>> >>> >>> >>>>My main questions lie in what resources to stop/start on failver > > since > >>>>both (for example) OSS''s are active for one OST and failver for the >>>>other OST. >>> >>>You should never have to stop the active primary OST when failing over >> >> >>>the other OST to the secondary. Failover is generally transparent to >> >>the >> >> >>>clients, their applications may block for a moment during failover, >> >>but >> >> >>>should continue on with the new server. Failback of course requires >> >>you >> >> >>>to stop the service on the secondary before starting it on the >> >>primary. >> >> >>>cliffw >>> >>> >>> >>> >>>>Thanks, >>>>Steve >>> >>> >
Nielsen, Steve
2006-May-19 07:36 UTC
[Lustre-discuss] RE: setting up failover in lustre... any recommendations?
Cliff, I changed my clients to use the lconf call as you recommend below. I am able to mount the file system without issues on the client using this method. However when I shutdown lustre on mds01 (my main mds server) as a test mds02 does not take over. Are there commands I need to run on my standby mds02 server to enable it to take over? Also when things failback to mds01 is there something I need to run to enable that? Thanks, Steve -----Original Message----- From: cliff white [mailto:cliffw@clusterfs.com] Sent: Wed 9/7/2005 5:55 PM To: Nielsen, Steve Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; support@clusterfs.com Subject: Re: setting up failover in lustre... any recommendations? =20 Nielsen, Steve wrote:> Cliff, >=20 > I am not clear on how to set MDS active/passive failover. >=20 > Here is my MDS setup: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --add node --node mds01 || exit 10 > --add node --node mds02 || exit 11 >=20 > --add net --node mds01 --nid mds01 --nettype tcp || exit 20 > --add net --node mds02 --nid mds02 --nettype tcp || exit 21 >=20 > --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 >=20 > --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 >=20 > --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 > --stripe_pattern 0 || exit 32 >=20 >=20 > Then on my client in /etc/fstab I have: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > mds01:/mds1/client /mnt/lustre lustre rw 0 0 >=20 > When I say take down mds01 and want mds02 to take over (for an upgrade > or something) how do my client now know to contact mds02 instead mds01 ? > Wouldn''t a floating IP address make sense in this case? >=20 > Here here is appreciated.Steve - Two answers, future and current. Our new mountconfig will allow you to specify multiple MDSs as part of the mount command. Unfortunately, the new mountconfig hasn''t been released yet, it will be soon. For now, you will not be able to specify the client mount in /etc/fstab. Instead, you will have to use lconf to mount the clients. In your setup script, be sure you specify a client (the single line will cover any number of actual clients): --add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 Then on the client node: ''lconf --node client <your xml file>'' will mount the filesystem - failover of the mds will be transparent to the clients, they will know to try the secondary mds if the primary is unavailable. For now, you may wish to put the lconf command in a script, and have the script called as part of your normal startup. cliffw>=20 > Thanks, > Steve >=20 > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com]=20 > Sent: Tuesday, September 06, 2005 12:31 PM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? >=20 > Nielsen, Steve wrote: >=20 >>So for both OSS/MDS i don''t need to float any ips between the boxes >=20 > and >=20 >>i don''t need to restart the services. the only thing then via >=20 > heartbeat >=20 >>I need to do is detect the other side down and "stonith" it? Then >=20 > things >=20 >>will be good ? >=20 >=20 > Steve - > Just wanted to check back and see how you were doing with this. > One thing I didn''t mention, when failing back the service from secondary > to primary, you should stop the service with ''lconf --failover'' which > will be quicker. > cliffw >=20 >=20 >=20 >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>Sent: Thursday, September 01, 2005 11:59 AM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>I am working on setting this up now. Just need to trudge through it. >>> >>>I will share my experiences when done. >>> >>>Quick question. For failover over to work I need to have ips that >> >>float >> >> >>>between the devices. Won''t this require me to restart the lustre >>>service ? (i am on rhel 4). So a "service lustre restart" should work >>>right ? >> >> >>Unfortunately, we do not support IP takeover at this time. What we do >=20 > is >=20 >>this: >>The servers are configured with a specific IP. >>The clients know about both IPs, and will attempt to connect in a=20 >>round-robin fashion until they succeed. >> >>Here''s a typically configuration for OST failover: (servers orlando >=20 > and=20 >=20 >>oscar) >> >>--add ost --node orlando --ost ost1-home --failover --group orlando \ >>--lov lov-home --dev /dev/ost1 >>--add ost --node orlando --ost ost2-home --failover \ >>--lov lov-home --dev /dev/ost2 >> >>--add ost --node oscar --ost ost2-home --failover --group oscar \ >>--lov lov-home --dev /dev/ost2 >>--add ost --node oscar --ost ost1-home --failover \ >>--lov lov-home --dev /dev/ost1 >> >>cliffw >> >> >>>Steve >>> >>>-----Original Message----- >>>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>>Sent: Thursday, September 01, 2005 11:43 AM >>>To: Nielsen, Steve >>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>support@clusterfs.com >>>Subject: Re: setting up failover in lustre... any recommendations? >>> >>>Nielsen, Steve wrote: >>> >>> >>> >>>>Hi, >>>> >>>>I have a couple questions about setting up failver with Lustre. Any >>> >>>help >>> >>> >>> >>>>is appreciated. >>>> >>>>Definitions: >>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>ssd - shared storage device >>>>oss - object storage server >>>>mds - meta data server >>>> >>>>Here is my OSS setup: >>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>> oss1 #----(failover)----# ssd2 >>>> # # >>>> | | >>>> | | >>>> |(primary) | (primary) >>>> | | >>>> | | >>>> # # >>>> ssd1 #----(failover)----# oss2 >>>> >>>>So: >>>> oss1 is primary for ssd1 >>>> oss1 is failover for ssd2 >>>> oss2 is primary for ssd2 >>>> oss2 is failover for ssd1 >>>> >>>>Here is my MDS setup: >>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>> +-----# ssd3 #------+ >>>> | | >>>> | (primary) | (failover) >>>> | | >>>> | | >>>> # # >>>> mds1 mds2 >>>> >>>>So: >>>> mds1 is primary for ssd3 >>>> mds2 is failover for ssd3 >>>> >>>>I am now at the stage where I want to implement failover for the >=20 > OSS''s >=20 >>>>and MDS seutp. >>>> >>>>I would prefer to use heartbeat from linux-ha.org for the following >>>>reasons: >>>> * it''s actively maintained >>>> * we use it in house extensively >>>> * i am very familiar with it >>>> * its straight forward to use (start/stop resources on failover) >>>> >>>>Has anyone else used heartbeat to do failover? Are there docs that I >>> >>>can >>> >>> >>> >>>>be pointed on this specific type of setup? >>> >>> >>>Heartbeat should work fine. We have customers using Red Hat''s=20 >>>CluManager, which is similar. We are currently writing the docs, I am >>>very interested in incorporating your experiences, especially since >> >>you=20 >> >> >>>have Heartbeat familiarity. >>> >>> >>> >>>>I know how to configure heartbeat and to use STONITH to make sure the >>>>secondary device will not write to the shared storage device at the >>>>same time as the primary device. >>> >>>That''s the key - the shared storage must never be touched by two >> >>servers >> >> >>>at once. >>> >>> >>> >>>>My main questions lie in what resources to stop/start on failver >=20 > since >=20 >>>>both (for example) OSS''s are active for one OST and failver for the >>>>other OST. >>> >>>You should never have to stop the active primary OST when failing over >> >> >>>the other OST to the secondary. Failover is generally transparent to >> >>the >> >> >>>clients, their applications may block for a moment during failover, >> >>but=20 >> >> >>>should continue on with the new server. Failback of course requires >> >>you=20 >> >> >>>to stop the service on the secondary before starting it on the >> >>primary. >> >> >>>cliffw >>> >>> >>> >>> >>>>Thanks, >>>>Steve >>> >>> >=20
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Nielsen, Steve wrote:> Cliff, > > I changed my clients to use the lconf call as you recommend below. > > I am able to mount the file system without issues on the client using this method. > > However when I shutdown lustre on mds01 (my main mds server) as a test mds02 does not take over. > > Are there commands I need to run on my standby mds02 server to enable it to take over? > > Also when things failback to mds01 is there something I need to run to enable that?I think I may need some more detail on your setup, please expand if necessary. First, only one MDS should be running at a time. This is very important - you show _never_ have both servers in the failover pair active at the same time. Bad Things can happen to your metadata. For failover, you will have to start the second mds server after the first mds is down. Normally, this is done with an HA package (Heartbeat, CluManager, etc). When the secondary server starts, it should access the shared storage and go. If that''s not happening, we need to see the logs (syslog,dmesg, anything on the console) there should be errors. Check logs on the MDS and OST. For failback, you should stop mds02 with the --failover option. This will do a quick shutdown - then start mds01. cliffw> > Thanks, > Steve > > > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com] > Sent: Wed 9/7/2005 5:55 PM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? > > Nielsen, Steve wrote: > >>Cliff, >> >>I am not clear on how to set MDS active/passive failover. >> >>Here is my MDS setup: >>====================>>--add node --node mds01 || exit 10 >>--add node --node mds02 || exit 11 >> >>--add net --node mds01 --nid mds01 --nettype tcp || exit 20 >>--add net --node mds02 --nid mds02 --nettype tcp || exit 21 >> >>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev >>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 >> >>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev >>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 >> >>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 >>--stripe_pattern 0 || exit 32 >> >> >>Then on my client in /etc/fstab I have: >>======================================>>mds01:/mds1/client /mnt/lustre lustre rw 0 0 >> >>When I say take down mds01 and want mds02 to take over (for an upgrade >>or something) how do my client now know to contact mds02 instead mds01 ? >>Wouldn''t a floating IP address make sense in this case? >> >>Here here is appreciated. > > > Steve - > Two answers, future and current. > > Our new mountconfig will allow you to specify > multiple MDSs as part of the mount command. Unfortunately, the new > mountconfig hasn''t been released yet, it will be soon. For now, you > will not be able to specify the client mount in /etc/fstab. > Instead, you will have to use lconf to mount the clients. > > In your setup script, be sure you specify a client > (the single line will cover any number of actual clients): > > --add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 > > Then on the client node: > ''lconf --node client <your xml file>'' > > will mount the filesystem - failover of the mds will be transparent to > the clients, they will know to try the secondary mds if the primary is > unavailable. > For now, you may wish to put the lconf command in a script, and have the > script called as part of your normal startup. > > cliffw > > > >>Thanks, >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com] >>Sent: Tuesday, September 06, 2005 12:31 PM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>So for both OSS/MDS i don''t need to float any ips between the boxes >> >>and >> >> >>>i don''t need to restart the services. the only thing then via >> >>heartbeat >> >> >>>I need to do is detect the other side down and "stonith" it? Then >> >>things >> >> >>>will be good ? >> >> >>Steve - >>Just wanted to check back and see how you were doing with this. >>One thing I didn''t mention, when failing back the service from secondary >>to primary, you should stop the service with ''lconf --failover'' which >>will be quicker. >>cliffw >> >> >> >> >>>Steve >>> >>>-----Original Message----- >>>From: cliff white [mailto:cliffw@clusterfs.com] >>>Sent: Thursday, September 01, 2005 11:59 AM >>>To: Nielsen, Steve >>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>support@clusterfs.com >>>Subject: Re: setting up failover in lustre... any recommendations? >>> >>>Nielsen, Steve wrote: >>> >>> >>> >>>>I am working on setting this up now. Just need to trudge through it. >>>> >>>>I will share my experiences when done. >>>> >>>>Quick question. For failover over to work I need to have ips that >>> >>>float >>> >>> >>> >>>>between the devices. Won''t this require me to restart the lustre >>>>service ? (i am on rhel 4). So a "service lustre restart" should work >>>>right ? >>> >>> >>>Unfortunately, we do not support IP takeover at this time. What we do >> >>is >> >> >>>this: >>>The servers are configured with a specific IP. >>>The clients know about both IPs, and will attempt to connect in a >>>round-robin fashion until they succeed. >>> >>>Here''s a typically configuration for OST failover: (servers orlando >> >>and >> >> >>>oscar) >>> >>>--add ost --node orlando --ost ost1-home --failover --group orlando \ >>>--lov lov-home --dev /dev/ost1 >>>--add ost --node orlando --ost ost2-home --failover \ >>>--lov lov-home --dev /dev/ost2 >>> >>>--add ost --node oscar --ost ost2-home --failover --group oscar \ >>>--lov lov-home --dev /dev/ost2 >>>--add ost --node oscar --ost ost1-home --failover \ >>>--lov lov-home --dev /dev/ost1 >>> >>>cliffw >>> >>> >>> >>>>Steve >>>> >>>>-----Original Message----- >>>>From: cliff white [mailto:cliffw@clusterfs.com] >>>>Sent: Thursday, September 01, 2005 11:43 AM >>>>To: Nielsen, Steve >>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>support@clusterfs.com >>>>Subject: Re: setting up failover in lustre... any recommendations? >>>> >>>>Nielsen, Steve wrote: >>>> >>>> >>>> >>>> >>>>>Hi, >>>>> >>>>>I have a couple questions about setting up failver with Lustre. Any >>>> >>>>help >>>> >>>> >>>> >>>> >>>>>is appreciated. >>>>> >>>>>Definitions: >>>>>===========>>>>>ssd - shared storage device >>>>>oss - object storage server >>>>>mds - meta data server >>>>> >>>>>Here is my OSS setup: >>>>>====================>>>>>oss1 #----(failover)----# ssd2 >>>>> # # >>>>> | | >>>>> | | >>>>> |(primary) | (primary) >>>>> | | >>>>> | | >>>>> # # >>>>>ssd1 #----(failover)----# oss2 >>>>> >>>>>So: >>>>> oss1 is primary for ssd1 >>>>> oss1 is failover for ssd2 >>>>> oss2 is primary for ssd2 >>>>> oss2 is failover for ssd1 >>>>> >>>>>Here is my MDS setup: >>>>>====================>>>>> +-----# ssd3 #------+ >>>>> | | >>>>> | (primary) | (failover) >>>>> | | >>>>> | | >>>>> # # >>>>> mds1 mds2 >>>>> >>>>>So: >>>>> mds1 is primary for ssd3 >>>>> mds2 is failover for ssd3 >>>>> >>>>>I am now at the stage where I want to implement failover for the >> >>OSS''s >> >> >>>>>and MDS seutp. >>>>> >>>>>I would prefer to use heartbeat from linux-ha.org for the following >>>>>reasons: >>>>> * it''s actively maintained >>>>> * we use it in house extensively >>>>> * i am very familiar with it >>>>> * its straight forward to use (start/stop resources on failover) >>>>> >>>>>Has anyone else used heartbeat to do failover? Are there docs that I >>>> >>>>can >>>> >>>> >>>> >>>> >>>>>be pointed on this specific type of setup? >>>> >>>> >>>>Heartbeat should work fine. We have customers using Red Hat''s >>>>CluManager, which is similar. We are currently writing the docs, I am >>>>very interested in incorporating your experiences, especially since >>> >>>you >>> >>> >>> >>>>have Heartbeat familiarity. >>>> >>>> >>>> >>>> >>>>>I know how to configure heartbeat and to use STONITH to make sure the >>>>>secondary device will not write to the shared storage device at the >>>>>same time as the primary device. >>>> >>>>That''s the key - the shared storage must never be touched by two >>> >>>servers >>> >>> >>> >>>>at once. >>>> >>>> >>>> >>>> >>>>>My main questions lie in what resources to stop/start on failver >> >>since >> >> >>>>>both (for example) OSS''s are active for one OST and failver for the >>>>>other OST. >>>> >>>>You should never have to stop the active primary OST when failing over >>> >>> >>>>the other OST to the secondary. Failover is generally transparent to >>> >>>the >>> >>> >>> >>>>clients, their applications may block for a moment during failover, >>> >>>but >>> >>> >>> >>>>should continue on with the new server. Failback of course requires >>> >>>you >>> >>> >>> >>>>to stop the service on the secondary before starting it on the >>> >>>primary. >>> >>> >>> >>>>cliffw >>>> >>>> >>>> >>>> >>>> >>>>>Thanks, >>>>>Steve >>>> >>>>
Nielsen, Steve
2006-May-19 07:36 UTC
[Lustre-discuss] RE: setting up failover in lustre... any recommendations?
Cliff, I am a little lost then as to what the command lines would be to start/stop MDS within heartbeat. That is where my confusion lies. If you could help with the commandlines that would be great. Here is a guess at at what I need to do: - mds01 and mds02 are my MDS servers with a shared storage device for the metadata that they are both connected to. - using heartbeat to control failover between the two MDS servers - both servers are configured to NOT bring up the lustre service via the initscripts (heatbeat will control this)=20 - configure mds01 in heartbeat as the "prefered" server to run the active MDS (initially) What commandline would I run to bring up MDS then? The init script that comes with lustre or a "raw" lconf command? On failover heartbeat on mds02 needs to execute a series of commands to take over MDS from mds01: - stonith mds01 (that way it reboots, comes up, sees mds02 is the master and does nothing) - start lustre locally on mds02. What is the command line for this ? I suspect its an lconf command? On failing back to mds01 what commands do I need to run?=20 - stonith mds02 - start lustre locally on mds01. What is the command line for this ? I suspect its an lconf command? As a side note: In trying different things to get MDS to failover I get the error message below: Do you know what it means? I am running the same version of lustre everywhere (kernel, modules, ..) 2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol error connecting to host 10.10.1.2 on port 988: Is it running a compatible version of Lustre? Thanks, Steve -----Original Message----- From: cliff white [mailto:cliffw@clusterfs.com]=20 Sent: Thursday, September 08, 2005 11:08 AM To: Nielsen, Steve Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; support@clusterfs.com Subject: Re: setting up failover in lustre... any recommendations? Nielsen, Steve wrote:> Cliff, >=20 > I changed my clients to use the lconf call as you recommend below. >=20 > I am able to mount the file system without issues on the client usingthis method.>=20 > However when I shutdown lustre on mds01 (my main mds server) as a testmds02 does not take over.>=20 > Are there commands I need to run on my standby mds02 server to enableit to take over?>=20 > Also when things failback to mds01 is there something I need to run toenable that? I think I may need some more detail on your setup, please expand if necessary. First, only one MDS should be running at a time. This is very important - you show _never_ have both servers in the failover pair active at the same time. Bad Things can happen to your metadata. For failover, you will have to start the second mds server after the first mds is down. Normally, this is done with an HA package (Heartbeat, CluManager, etc). When the secondary server starts, it should access the shared storage and go. If that''s not happening, we need to see the logs (syslog,dmesg, anything on the console) there should be errors. Check logs on the MDS and OST. For failback, you should stop mds02 with the --failover option. This will do a quick shutdown - then start mds01. cliffw>=20 > Thanks, > Steve >=20 >=20 > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com] > Sent: Wed 9/7/2005 5:55 PM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth;support@clusterfs.com> Subject: Re: setting up failover in lustre... any recommendations? > =20 > Nielsen, Steve wrote: >=20 >>Cliff, >> >>I am not clear on how to set MDS active/passive failover. >> >>Here is my MDS setup: >>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>--add node --node mds01 || exit 10 >>--add node --node mds02 || exit 11 >> >>--add net --node mds01 --nid mds01 --nettype tcp || exit 20 >>--add net --node mds02 --nid mds02 --nettype tcp || exit 21 >> >>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev >>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 >> >>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev >>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 >> >>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 >>--stripe_pattern 0 || exit 32 >> >> >>Then on my client in /etc/fstab I have: >>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>mds01:/mds1/client /mnt/lustre lustre rw 0 0 >> >>When I say take down mds01 and want mds02 to take over (for an upgrade >>or something) how do my client now know to contact mds02 instead mds01?>>Wouldn''t a floating IP address make sense in this case? >> >>Here here is appreciated. >=20 >=20 > Steve - > Two answers, future and current. >=20 > Our new mountconfig will allow you to specify > multiple MDSs as part of the mount command. Unfortunately, the new > mountconfig hasn''t been released yet, it will be soon. For now, you > will not be able to specify the client mount in /etc/fstab. > Instead, you will have to use lconf to mount the clients. >=20 > In your setup script, be sure you specify a client > (the single line will cover any number of actual clients): >=20 > --add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 >=20 > Then on the client node: > ''lconf --node client <your xml file>'' >=20 > will mount the filesystem - failover of the mds will be transparent to > the clients, they will know to try the secondary mds if the primary is > unavailable. > For now, you may wish to put the lconf command in a script, and havethe> script called as part of your normal startup. >=20 > cliffw >=20 >=20 >=20 >>Thanks, >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>Sent: Tuesday, September 06, 2005 12:31 PM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>So for both OSS/MDS i don''t need to float any ips between the boxes >> >>and >> >> >>>i don''t need to restart the services. the only thing then via >> >>heartbeat >> >> >>>I need to do is detect the other side down and "stonith" it? Then >> >>things >> >> >>>will be good ? >> >> >>Steve - >>Just wanted to check back and see how you were doing with this. >>One thing I didn''t mention, when failing back the service fromsecondary>>to primary, you should stop the service with ''lconf --failover'' which >>will be quicker. >>cliffw >> >> >> >> >>>Steve >>> >>>-----Original Message----- >>>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>>Sent: Thursday, September 01, 2005 11:59 AM >>>To: Nielsen, Steve >>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>support@clusterfs.com >>>Subject: Re: setting up failover in lustre... any recommendations? >>> >>>Nielsen, Steve wrote: >>> >>> >>> >>>>I am working on setting this up now. Just need to trudge through it. >>>> >>>>I will share my experiences when done. >>>> >>>>Quick question. For failover over to work I need to have ips that >>> >>>float >>> >>> >>> >>>>between the devices. Won''t this require me to restart the lustre >>>>service ? (i am on rhel 4). So a "service lustre restart" shouldwork>>>>right ? >>> >>> >>>Unfortunately, we do not support IP takeover at this time. What we do >> >>is >> >> >>>this: >>>The servers are configured with a specific IP. >>>The clients know about both IPs, and will attempt to connect in a=20 >>>round-robin fashion until they succeed. >>> >>>Here''s a typically configuration for OST failover: (servers orlando >> >>and=20 >> >> >>>oscar) >>> >>>--add ost --node orlando --ost ost1-home --failover --group orlando \ >>>--lov lov-home --dev /dev/ost1 >>>--add ost --node orlando --ost ost2-home --failover \ >>>--lov lov-home --dev /dev/ost2 >>> >>>--add ost --node oscar --ost ost2-home --failover --group oscar \ >>>--lov lov-home --dev /dev/ost2 >>>--add ost --node oscar --ost ost1-home --failover \ >>>--lov lov-home --dev /dev/ost1 >>> >>>cliffw >>> >>> >>> >>>>Steve >>>> >>>>-----Original Message----- >>>>From: cliff white [mailto:cliffw@clusterfs.com]=20 >>>>Sent: Thursday, September 01, 2005 11:43 AM >>>>To: Nielsen, Steve >>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>support@clusterfs.com >>>>Subject: Re: setting up failover in lustre... any recommendations? >>>> >>>>Nielsen, Steve wrote: >>>> >>>> >>>> >>>> >>>>>Hi, >>>>> >>>>>I have a couple questions about setting up failver with Lustre. Any >>>> >>>>help >>>> >>>> >>>> >>>> >>>>>is appreciated. >>>>> >>>>>Definitions: >>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>ssd - shared storage device >>>>>oss - object storage server >>>>>mds - meta data server >>>>> >>>>>Here is my OSS setup: >>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>oss1 #----(failover)----# ssd2 >>>>> # # >>>>> | | >>>>> | | >>>>> |(primary) | (primary) >>>>> | | >>>>> | | >>>>> # # >>>>>ssd1 #----(failover)----# oss2 >>>>> >>>>>So: >>>>> oss1 is primary for ssd1 >>>>> oss1 is failover for ssd2 >>>>> oss2 is primary for ssd2 >>>>> oss2 is failover for ssd1 >>>>> >>>>>Here is my MDS setup: >>>>>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> +-----# ssd3 #------+ >>>>> | | >>>>> | (primary) | (failover) >>>>> | | >>>>> | | >>>>> # # >>>>> mds1 mds2 >>>>> >>>>>So: >>>>> mds1 is primary for ssd3 >>>>> mds2 is failover for ssd3 >>>>> >>>>>I am now at the stage where I want to implement failover for the >> >>OSS''s >> >> >>>>>and MDS seutp. >>>>> >>>>>I would prefer to use heartbeat from linux-ha.org for the following >>>>>reasons: >>>>> * it''s actively maintained >>>>> * we use it in house extensively >>>>> * i am very familiar with it >>>>> * its straight forward to use (start/stop resources on failover) >>>>> >>>>>Has anyone else used heartbeat to do failover? Are there docs thatI>>>> >>>>can >>>> >>>> >>>> >>>> >>>>>be pointed on this specific type of setup? >>>> >>>> >>>>Heartbeat should work fine. We have customers using Red Hat''s=20 >>>>CluManager, which is similar. We are currently writing the docs, Iam=20>>>>very interested in incorporating your experiences, especially since >>> >>>you=20 >>> >>> >>> >>>>have Heartbeat familiarity. >>>> >>>> >>>> >>>> >>>>>I know how to configure heartbeat and to use STONITH to make surethe>>>>>secondary device will not write to the shared storage device at the >>>>>same time as the primary device. >>>> >>>>That''s the key - the shared storage must never be touched by two >>> >>>servers >>> >>> >>> >>>>at once. >>>> >>>> >>>> >>>> >>>>>My main questions lie in what resources to stop/start on failver >> >>since >> >> >>>>>both (for example) OSS''s are active for one OST and failver for the >>>>>other OST. >>>> >>>>You should never have to stop the active primary OST when failingover>>> >>> >>>>the other OST to the secondary. Failover is generally transparent to >>> >>>the >>> >>> >>> >>>>clients, their applications may block for a moment during failover, >>> >>>but=20 >>> >>> >>> >>>>should continue on with the new server. Failback of course requires >>> >>>you=20 >>> >>> >>> >>>>to stop the service on the secondary before starting it on the >>> >>>primary. >>> >>> >>> >>>>cliffw >>>> >>>> >>>> >>>> >>>> >>>>>Thanks, >>>>>Steve >>>> >>>>
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Nielsen, Steve wrote:> So for both OSS/MDS i don''t need to float any ips between the boxes and > i don''t need to restart the services. the only thing then via heartbeat > I need to do is detect the other side down and "stonith" it? Then things > will be good ?Steve - Just wanted to check back and see how you were doing with this. One thing I didn''t mention, when failing back the service from secondary to primary, you should stop the service with ''lconf --failover'' which will be quicker. cliffw> > Steve > > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com] > Sent: Thursday, September 01, 2005 11:59 AM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? > > Nielsen, Steve wrote: > >>I am working on setting this up now. Just need to trudge through it. >> >>I will share my experiences when done. >> >>Quick question. For failover over to work I need to have ips that > > float > >>between the devices. Won''t this require me to restart the lustre >>service ? (i am on rhel 4). So a "service lustre restart" should work >>right ? > > > Unfortunately, we do not support IP takeover at this time. What we do is > this: > The servers are configured with a specific IP. > The clients know about both IPs, and will attempt to connect in a > round-robin fashion until they succeed. > > Here''s a typically configuration for OST failover: (servers orlando and > oscar) > > --add ost --node orlando --ost ost1-home --failover --group orlando \ > --lov lov-home --dev /dev/ost1 > --add ost --node orlando --ost ost2-home --failover \ > --lov lov-home --dev /dev/ost2 > > --add ost --node oscar --ost ost2-home --failover --group oscar \ > --lov lov-home --dev /dev/ost2 > --add ost --node oscar --ost ost1-home --failover \ > --lov lov-home --dev /dev/ost1 > > cliffw > >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com] >>Sent: Thursday, September 01, 2005 11:43 AM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>Hi, >>> >>>I have a couple questions about setting up failver with Lustre. Any >> >>help >> >> >>>is appreciated. >>> >>>Definitions: >>>===========>>>ssd - shared storage device >>>oss - object storage server >>>mds - meta data server >>> >>>Here is my OSS setup: >>>====================>>> oss1 #----(failover)----# ssd2 >>> # # >>> | | >>> | | >>> |(primary) | (primary) >>> | | >>> | | >>> # # >>> ssd1 #----(failover)----# oss2 >>> >>>So: >>> oss1 is primary for ssd1 >>> oss1 is failover for ssd2 >>> oss2 is primary for ssd2 >>> oss2 is failover for ssd1 >>> >>>Here is my MDS setup: >>>====================>>> +-----# ssd3 #------+ >>> | | >>> | (primary) | (failover) >>> | | >>> | | >>> # # >>> mds1 mds2 >>> >>>So: >>> mds1 is primary for ssd3 >>> mds2 is failover for ssd3 >>> >>>I am now at the stage where I want to implement failover for the OSS''s >>>and MDS seutp. >>> >>>I would prefer to use heartbeat from linux-ha.org for the following >>>reasons: >>> * it''s actively maintained >>> * we use it in house extensively >>> * i am very familiar with it >>> * its straight forward to use (start/stop resources on failover) >>> >>>Has anyone else used heartbeat to do failover? Are there docs that I >> >>can >> >> >>>be pointed on this specific type of setup? >> >> >>Heartbeat should work fine. We have customers using Red Hat''s >>CluManager, which is similar. We are currently writing the docs, I am >>very interested in incorporating your experiences, especially since > > you > >>have Heartbeat familiarity. >> >> >>>I know how to configure heartbeat and to use STONITH to make sure the >>>secondary device will not write to the shared storage device at the >>>same time as the primary device. >> >>That''s the key - the shared storage must never be touched by two > > servers > >>at once. >> >> >>>My main questions lie in what resources to stop/start on failver since >>>both (for example) OSS''s are active for one OST and failver for the >>>other OST. >> >>You should never have to stop the active primary OST when failing over > > >>the other OST to the secondary. Failover is generally transparent to > > the > >>clients, their applications may block for a moment during failover, > > but > >>should continue on with the new server. Failback of course requires > > you > >>to stop the service on the secondary before starting it on the > > primary. > >>cliffw >> >> >> >>>Thanks, >>>Steve >> >> >
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] RE: setting up failover in lustre... any recommendations?
Ragnar Kjørstad wrote:> On Thu, Sep 01, 2005 at 01:03:11PM -0400, Nielsen, Steve wrote: > >>So for both OSS/MDS i don''t need to float any ips between the boxes and >>i don''t need to restart the services. the only thing then via heartbeat >>I need to do is detect the other side down and "stonith" it? Then things >>will be good ? > > > Disclamer: I''ve just started looking at this too. > > I believe you also need to activate the OST on the secondary server by > running the appropriate lconf command? > (starting another resource, in linux-ha speak) > > Basically a OCF Resource Agent is required - a shellscript that > heartbeat can use to start and stop the service as required (not the > whole lustre service, but the specific OST). > > This can be done either by writing a new shell script, or extending the > lustre init-script to double as a OCF Resource Agent. (it needs to be > able to start/stop specific groups of OSTs rather than the whole > service).If you are using current Lustre, you can symlink /etc/init.d/lustre to the service name, and that script (the symlink) should work as a Resource Agent script. This has not been tested extensively, so feedback is appreciated. cliffw> > > >>>-----Original Message----- >>>From: cliff white [mailto:cliffw@clusterfs.com] >>>Sent: Thursday, September 01, 2005 11:43 AM >>>To: Nielsen, Steve >>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>support@clusterfs.com >>>Subject: Re: setting up failover in lustre... any recommendations? >>> >>>Nielsen, Steve wrote: >>> >>> >>>>Hi, >>>> >>>>I have a couple questions about setting up failver with Lustre. Any >>> >>>help >>> >>> >>>>is appreciated. >>>> >>>>Definitions: >>>>===========>>>>ssd - shared storage device >>>>oss - object storage server >>>>mds - meta data server >>>> >>>>Here is my OSS setup: >>>>====================>>>> oss1 #----(failover)----# ssd2 >>>> # # >>>> | | >>>> | | >>>> |(primary) | (primary) >>>> | | >>>> | | >>>> # # >>>> ssd1 #----(failover)----# oss2 >>>> >>>>So: >>>> oss1 is primary for ssd1 >>>> oss1 is failover for ssd2 >>>> oss2 is primary for ssd2 >>>> oss2 is failover for ssd1 >>>> >>>>Here is my MDS setup: >>>>====================>>>> +-----# ssd3 #------+ >>>> | | >>>> | (primary) | (failover) >>>> | | >>>> | | >>>> # # >>>> mds1 mds2 >>>> >>>>So: >>>> mds1 is primary for ssd3 >>>> mds2 is failover for ssd3 >>>> >>>>I am now at the stage where I want to implement failover for the OSS''s >>>>and MDS seutp. >>>> >>>>I would prefer to use heartbeat from linux-ha.org for the following >>>>reasons: >>>> * it''s actively maintained >>>> * we use it in house extensively >>>> * i am very familiar with it >>>> * its straight forward to use (start/stop resources on failover) >>>> >>>>Has anyone else used heartbeat to do failover? Are there docs that I >>> >>>can >>> >>> >>>>be pointed on this specific type of setup? >>> >>> >>>Heartbeat should work fine. We have customers using Red Hat''s >>>CluManager, which is similar. We are currently writing the docs, I am >>>very interested in incorporating your experiences, especially since >> >>you >> >>>have Heartbeat familiarity. >>> >>> >>>>I know how to configure heartbeat and to use STONITH to make sure the >>>>secondary device will not write to the shared storage device at the >>>>same time as the primary device. >>> >>>That''s the key - the shared storage must never be touched by two >> >>servers >> >>>at once. >>> >>> >>>>My main questions lie in what resources to stop/start on failver since >>>>both (for example) OSS''s are active for one OST and failver for the >>>>other OST. >>> >>>You should never have to stop the active primary OST when failing over >> >>>the other OST to the secondary. Failover is generally transparent to >> >>the >> >>>clients, their applications may block for a moment during failover, >> >>but >> >>>should continue on with the new server. Failback of course requires >> >>you >> >>>to stop the service on the secondary before starting it on the >> >>primary. >> >>>cliffw >>> >>> >>> >>>>Thanks, >>>>Steve >>> >>> >>_______________________________________________ >>Lustre-discuss mailing list >>Lustre-discuss@lists.clusterfs.com >>https://lists.clusterfs.com/mailman/listinfo/lustre-discuss > >
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Nielsen, Steve wrote:> Hi, > > I have a couple questions about setting up failver with Lustre. Any help > is appreciated. > > Definitions: > ===========> ssd - shared storage device > oss - object storage server > mds - meta data server > > Here is my OSS setup: > ====================> oss1 #----(failover)----# ssd2 > # # > | | > | | > |(primary) | (primary) > | | > | | > # # > ssd1 #----(failover)----# oss2 > > So: > oss1 is primary for ssd1 > oss1 is failover for ssd2 > oss2 is primary for ssd2 > oss2 is failover for ssd1 > > Here is my MDS setup: > ====================> +-----# ssd3 #------+ > | | > | (primary) | (failover) > | | > | | > # # > mds1 mds2 > > So: > mds1 is primary for ssd3 > mds2 is failover for ssd3 > > I am now at the stage where I want to implement failover for the OSS''s > and MDS seutp. > > I would prefer to use heartbeat from linux-ha.org for the following > reasons: > * it''s actively maintained > * we use it in house extensively > * i am very familiar with it > * its straight forward to use (start/stop resources on failover) > > Has anyone else used heartbeat to do failover? Are there docs that I can > be pointed on this specific type of setup?Heartbeat should work fine. We have customers using Red Hat''s CluManager, which is similar. We are currently writing the docs, I am very interested in incorporating your experiences, especially since you have Heartbeat familiarity.> > I know how to configure heartbeat and to use STONITH to make sure the > secondary device will not write to the shared storage device at the > same time as the primary device.That''s the key - the shared storage must never be touched by two servers at once.> > My main questions lie in what resources to stop/start on failver since > both (for example) OSS''s are active for one OST and failver for the > other OST.You should never have to stop the active primary OST when failing over the other OST to the secondary. Failover is generally transparent to the clients, their applications may block for a moment during failover, but should continue on with the new server. Failback of course requires you to stop the service on the secondary before starting it on the primary. cliffw> > Thanks, > Steve
Nielsen, Steve
2006-May-19 07:36 UTC
[Lustre-discuss] RE: setting up failover in lustre... any recommendations?
I am working on setting this up now. Just need to trudge through it. I will share my experiences when done. Quick question. For failover over to work I need to have ips that float between the devices. Won''t this require me to restart the lustre service ? (i am on rhel 4). So a "service lustre restart" should work right ? Steve -----Original Message----- From: cliff white [mailto:cliffw@clusterfs.com]=20 Sent: Thursday, September 01, 2005 11:43 AM To: Nielsen, Steve Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; support@clusterfs.com Subject: Re: setting up failover in lustre... any recommendations? Nielsen, Steve wrote:> Hi, >=20 > I have a couple questions about setting up failver with Lustre. Anyhelp> is appreciated. >=20 > Definitions: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > ssd - shared storage device > oss - object storage server > mds - meta data server >=20 > Here is my OSS setup: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > oss1 #----(failover)----# ssd2 > # # > | | > | | > |(primary) | (primary) > | | > | | > # # > ssd1 #----(failover)----# oss2 >=20 > So: > oss1 is primary for ssd1 > oss1 is failover for ssd2 > oss2 is primary for ssd2 > oss2 is failover for ssd1 >=20 > Here is my MDS setup: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > +-----# ssd3 #------+ > | | > | (primary) | (failover) > | | > | | > # # > mds1 mds2 >=20 > So: > mds1 is primary for ssd3 > mds2 is failover for ssd3 >=20 > I am now at the stage where I want to implement failover for the OSS''s > and MDS seutp. >=20 > I would prefer to use heartbeat from linux-ha.org for the following > reasons: > * it''s actively maintained > * we use it in house extensively > * i am very familiar with it > * its straight forward to use (start/stop resources on failover) >=20 > Has anyone else used heartbeat to do failover? Are there docs that Ican> be pointed on this specific type of setup?Heartbeat should work fine. We have customers using Red Hat''s=20 CluManager, which is similar. We are currently writing the docs, I am=20 very interested in incorporating your experiences, especially since you=20 have Heartbeat familiarity.>=20 > I know how to configure heartbeat and to use STONITH to make sure the > secondary device will not write to the shared storage device at the > same time as the primary device.That''s the key - the shared storage must never be touched by two servers at once.>=20 > My main questions lie in what resources to stop/start on failver since > both (for example) OSS''s are active for one OST and failver for the > other OST.You should never have to stop the active primary OST when failing over=20 the other OST to the secondary. Failover is generally transparent to the clients, their applications may block for a moment during failover, but=20 should continue on with the new server. Failback of course requires you=20 to stop the service on the secondary before starting it on the primary. cliffw>=20 > Thanks, > Steve
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Nielsen, Steve wrote:> I am working on setting this up now. Just need to trudge through it. > > I will share my experiences when done. > > Quick question. For failover over to work I need to have ips that float > between the devices. Won''t this require me to restart the lustre > service ? (i am on rhel 4). So a "service lustre restart" should work > right ?Unfortunately, we do not support IP takeover at this time. What we do is this: The servers are configured with a specific IP. The clients know about both IPs, and will attempt to connect in a round-robin fashion until they succeed. Here''s a typically configuration for OST failover: (servers orlando and oscar) --add ost --node orlando --ost ost1-home --failover --group orlando \ --lov lov-home --dev /dev/ost1 --add ost --node orlando --ost ost2-home --failover \ --lov lov-home --dev /dev/ost2 --add ost --node oscar --ost ost2-home --failover --group oscar \ --lov lov-home --dev /dev/ost2 --add ost --node oscar --ost ost1-home --failover \ --lov lov-home --dev /dev/ost1 cliffw> > Steve > > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com] > Sent: Thursday, September 01, 2005 11:43 AM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? > > Nielsen, Steve wrote: > >>Hi, >> >>I have a couple questions about setting up failver with Lustre. Any > > help > >>is appreciated. >> >>Definitions: >>===========>>ssd - shared storage device >>oss - object storage server >>mds - meta data server >> >>Here is my OSS setup: >>====================>> oss1 #----(failover)----# ssd2 >> # # >> | | >> | | >> |(primary) | (primary) >> | | >> | | >> # # >> ssd1 #----(failover)----# oss2 >> >>So: >> oss1 is primary for ssd1 >> oss1 is failover for ssd2 >> oss2 is primary for ssd2 >> oss2 is failover for ssd1 >> >>Here is my MDS setup: >>====================>> +-----# ssd3 #------+ >> | | >> | (primary) | (failover) >> | | >> | | >> # # >> mds1 mds2 >> >>So: >> mds1 is primary for ssd3 >> mds2 is failover for ssd3 >> >>I am now at the stage where I want to implement failover for the OSS''s >>and MDS seutp. >> >>I would prefer to use heartbeat from linux-ha.org for the following >>reasons: >> * it''s actively maintained >> * we use it in house extensively >> * i am very familiar with it >> * its straight forward to use (start/stop resources on failover) >> >>Has anyone else used heartbeat to do failover? Are there docs that I > > can > >>be pointed on this specific type of setup? > > > Heartbeat should work fine. We have customers using Red Hat''s > CluManager, which is similar. We are currently writing the docs, I am > very interested in incorporating your experiences, especially since you > have Heartbeat familiarity. > >>I know how to configure heartbeat and to use STONITH to make sure the >>secondary device will not write to the shared storage device at the >>same time as the primary device. > > That''s the key - the shared storage must never be touched by two servers > at once. > >>My main questions lie in what resources to stop/start on failver since >>both (for example) OSS''s are active for one OST and failver for the >>other OST. > > You should never have to stop the active primary OST when failing over > the other OST to the secondary. Failover is generally transparent to the > > clients, their applications may block for a moment during failover, but > should continue on with the new server. Failback of course requires you > to stop the service on the secondary before starting it on the primary. > cliffw > > >>Thanks, >>Steve > >
Nielsen, Steve
2006-May-19 07:36 UTC
[Lustre-discuss] RE: setting up failover in lustre... any recommendations?
So for both OSS/MDS i don''t need to float any ips between the boxes and i don''t need to restart the services. the only thing then via heartbeat I need to do is detect the other side down and "stonith" it? Then things will be good ? Steve -----Original Message----- From: cliff white [mailto:cliffw@clusterfs.com]=20 Sent: Thursday, September 01, 2005 11:59 AM To: Nielsen, Steve Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; support@clusterfs.com Subject: Re: setting up failover in lustre... any recommendations? Nielsen, Steve wrote:> I am working on setting this up now. Just need to trudge through it. >=20 > I will share my experiences when done. >=20 > Quick question. For failover over to work I need to have ips thatfloat> between the devices. Won''t this require me to restart the lustre > service ? (i am on rhel 4). So a "service lustre restart" should work > right ?Unfortunately, we do not support IP takeover at this time. What we do is this: The servers are configured with a specific IP. The clients know about both IPs, and will attempt to connect in a=20 round-robin fashion until they succeed. Here''s a typically configuration for OST failover: (servers orlando and=20 oscar) --add ost --node orlando --ost ost1-home --failover --group orlando \ --lov lov-home --dev /dev/ost1 --add ost --node orlando --ost ost2-home --failover \ --lov lov-home --dev /dev/ost2 --add ost --node oscar --ost ost2-home --failover --group oscar \ --lov lov-home --dev /dev/ost2 --add ost --node oscar --ost ost1-home --failover \ --lov lov-home --dev /dev/ost1 cliffw>=20 > Steve >=20 > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com]=20 > Sent: Thursday, September 01, 2005 11:43 AM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? >=20 > Nielsen, Steve wrote: >=20 >>Hi, >> >>I have a couple questions about setting up failver with Lustre. Any >=20 > help >=20 >>is appreciated. >> >>Definitions: >>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>ssd - shared storage device >>oss - object storage server >>mds - meta data server >> >>Here is my OSS setup: >>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> oss1 #----(failover)----# ssd2 >> # # >> | | >> | | >> |(primary) | (primary) >> | | >> | | >> # # >> ssd1 #----(failover)----# oss2 >> >>So: >> oss1 is primary for ssd1 >> oss1 is failover for ssd2 >> oss2 is primary for ssd2 >> oss2 is failover for ssd1 >> >>Here is my MDS setup: >>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> +-----# ssd3 #------+ >> | | >> | (primary) | (failover) >> | | >> | | >> # # >> mds1 mds2 >> >>So: >> mds1 is primary for ssd3 >> mds2 is failover for ssd3 >> >>I am now at the stage where I want to implement failover for the OSS''s >>and MDS seutp. >> >>I would prefer to use heartbeat from linux-ha.org for the following >>reasons: >> * it''s actively maintained >> * we use it in house extensively >> * i am very familiar with it >> * its straight forward to use (start/stop resources on failover) >> >>Has anyone else used heartbeat to do failover? Are there docs that I >=20 > can >=20 >>be pointed on this specific type of setup? >=20 >=20 > Heartbeat should work fine. We have customers using Red Hat''s=20 > CluManager, which is similar. We are currently writing the docs, I am=20 > very interested in incorporating your experiences, especially sinceyou=20> have Heartbeat familiarity. >=20 >>I know how to configure heartbeat and to use STONITH to make sure the >>secondary device will not write to the shared storage device at the >>same time as the primary device. >=20 > That''s the key - the shared storage must never be touched by twoservers> at once. >=20 >>My main questions lie in what resources to stop/start on failver since >>both (for example) OSS''s are active for one OST and failver for the >>other OST. >=20 > You should never have to stop the active primary OST when failing over> the other OST to the secondary. Failover is generally transparent tothe>=20 > clients, their applications may block for a moment during failover,but=20> should continue on with the new server. Failback of course requiresyou=20> to stop the service on the secondary before starting it on theprimary.> cliffw >=20 >=20 >>Thanks, >>Steve >=20 >=20
Ragnar Kjørstad
2006-May-19 07:36 UTC
[Lustre-discuss] RE: setting up failover in lustre... any recommendations?
On Thu, Sep 01, 2005 at 01:03:11PM -0400, Nielsen, Steve wrote:> So for both OSS/MDS i don''t need to float any ips between the boxes and > i don''t need to restart the services. the only thing then via heartbeat > I need to do is detect the other side down and "stonith" it? Then things > will be good ?Disclamer: I''ve just started looking at this too. I believe you also need to activate the OST on the secondary server by running the appropriate lconf command? (starting another resource, in linux-ha speak) Basically a OCF Resource Agent is required - a shellscript that heartbeat can use to start and stop the service as required (not the whole lustre service, but the specific OST). This can be done either by writing a new shell script, or extending the lustre init-script to double as a OCF Resource Agent. (it needs to be able to start/stop specific groups of OSTs rather than the whole service).> > -----Original Message----- > > From: cliff white [mailto:cliffw@clusterfs.com] > > Sent: Thursday, September 01, 2005 11:43 AM > > To: Nielsen, Steve > > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > > support@clusterfs.com > > Subject: Re: setting up failover in lustre... any recommendations? > > > > Nielsen, Steve wrote: > > > >>Hi, > >> > >>I have a couple questions about setting up failver with Lustre. Any > > > > help > > > >>is appreciated. > >> > >>Definitions: > >>===========> >>ssd - shared storage device > >>oss - object storage server > >>mds - meta data server > >> > >>Here is my OSS setup: > >>====================> >> oss1 #----(failover)----# ssd2 > >> # # > >> | | > >> | | > >> |(primary) | (primary) > >> | | > >> | | > >> # # > >> ssd1 #----(failover)----# oss2 > >> > >>So: > >> oss1 is primary for ssd1 > >> oss1 is failover for ssd2 > >> oss2 is primary for ssd2 > >> oss2 is failover for ssd1 > >> > >>Here is my MDS setup: > >>====================> >> +-----# ssd3 #------+ > >> | | > >> | (primary) | (failover) > >> | | > >> | | > >> # # > >> mds1 mds2 > >> > >>So: > >> mds1 is primary for ssd3 > >> mds2 is failover for ssd3 > >> > >>I am now at the stage where I want to implement failover for the OSS''s > >>and MDS seutp. > >> > >>I would prefer to use heartbeat from linux-ha.org for the following > >>reasons: > >> * it''s actively maintained > >> * we use it in house extensively > >> * i am very familiar with it > >> * its straight forward to use (start/stop resources on failover) > >> > >>Has anyone else used heartbeat to do failover? Are there docs that I > > > > can > > > >>be pointed on this specific type of setup? > > > > > > Heartbeat should work fine. We have customers using Red Hat''s > > CluManager, which is similar. We are currently writing the docs, I am > > very interested in incorporating your experiences, especially since > you > > have Heartbeat familiarity. > > > >>I know how to configure heartbeat and to use STONITH to make sure the > >>secondary device will not write to the shared storage device at the > >>same time as the primary device. > > > > That''s the key - the shared storage must never be touched by two > servers > > at once. > > > >>My main questions lie in what resources to stop/start on failver since > >>both (for example) OSS''s are active for one OST and failver for the > >>other OST. > > > > You should never have to stop the active primary OST when failing over > > > the other OST to the secondary. Failover is generally transparent to > the > > > > clients, their applications may block for a moment during failover, > but > > should continue on with the new server. Failback of course requires > you > > to stop the service on the secondary before starting it on the > primary. > > cliffw > > > > > >>Thanks, > >>Steve > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss-- Ragnar Kjørstad Software Engineer Scali - http://www.scali.com Scaling the Linux Datacenter
Adam Cassar
2006-May-19 07:36 UTC
[Lustre-discuss] setting up failover in lustre... any recommendations?
--=-wMlg7ZFu/y3ktNEKh+UT Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi, We do failover for luster using clumanager, I''m sure you could adapt them to a heartbeat set up. I''ve attached the scripts that we use for health checking and are based on the lustre init scripts. One thing that we have found is that lustre fails over quicker if you are not sharing services on a node. On Thu, 2005-09-01 at 11:33 -0400, Nielsen, Steve wrote:> Hi, > > I have a couple questions about setting up failver with Lustre. Any help > is appreciated. > > Definitions: > ===========> ssd - shared storage device > oss - object storage server > mds - meta data server > > Here is my OSS setup: > ====================> oss1 #----(failover)----# ssd2 > # # > | | > | | > |(primary) | (primary) > | | > | | > # # > ssd1 #----(failover)----# oss2 > > So: > oss1 is primary for ssd1 > oss1 is failover for ssd2 > oss2 is primary for ssd2 > oss2 is failover for ssd1 > > Here is my MDS setup: > ====================> +-----# ssd3 #------+ > | | > | (primary) | (failover) > | | > | | > # # > mds1 mds2 > > So: > mds1 is primary for ssd3 > mds2 is failover for ssd3 > > I am now at the stage where I want to implement failover for the OSS''s > and MDS seutp. > > I would prefer to use heartbeat from linux-ha.org for the following > reasons: > * it''s actively maintained > * we use it in house extensively > * i am very familiar with it > * its straight forward to use (start/stop resources on failover) > > Has anyone else used heartbeat to do failover? Are there docs that I can > be pointed on this specific type of setup? > > I know how to configure heartbeat and to use STONITH to make sure the > secondary device will not write to the shared storage device at the > same time as the primary device. > > My main questions lie in what resources to stop/start on failver since > both (for example) OSS''s are active for one OST and failver for the > other OST. > > Thanks, > Steve > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss >-- Adam Cassar ICT Manager NetRegistry Pty Ltd ______________________________________________ http://www.netregistry.com.au Tel: 02 9699 6099 Fax: 02 9699 6088 PO Box 270 Broadway NSW 2007 Domains |Business Email|Web Hosting|E-Commerce Trusted by 10,000s of businesses since 1997 ______________________________________________ --=-wMlg7ZFu/y3ktNEKh+UT Content-Disposition: attachment; filename=lustre_functions.sh Content-Type: application/x-shellscript; name=lustre_functions.sh Content-Transfer-Encoding: 7bit : ${LUSTRE_CFG:=/etc/lustre/lustre.cfg} [ -f ${LUSTRE_CFG} ] && . ${LUSTRE_CFG} : ${LUSTRE_CONFIG_XML:=/etc/lustre/config.xml} : ${LCONF:=/usr/sbin/lconf} : ${LCONF_START_ARGS:="${LUSTRE_CONFIG_XML}"} : ${LCONF_FORCE_STOP_ARGS:="--force --cleanup ${LUSTRE_CONFIG_XML}"} : ${LCONF_STOP_ARGS:="--failover --cleanup ${LUSTRE_CONFIG_XML}"} : ${LCTL:=/usr/sbin/lctl} PRIMARY_SERVICE=ost1 STATUS=/var/lustre/status HEALTHCHECK=/proc/fs/lustre/health_check NODE=$HOSTNAME start() { SERVICE=$1 if [ $SERVICE != $PRIMARY_SERVICE ]; then START_ARGS="--group $SERVICE --select $SERVICE=$NODE $LCONF_START_ARGS" else START_ARGS=$LCONF_START_ARGS fi echo -n "Starting $SERVICE ${LCONF} ${START_ARGS}: " ${LCONF} ${START_ARGS} RETVAL=$? echo $SERVICE if [ $RETVAL -eq 0 ]; then echo "online" > $STATUS else echo "online pending" > $STATUS fi } stop() { SERVICE=$1 if [ $SERVICE != $PRIMARY_SERVICE ]; then STOP_ARGS="--group $SERVICE --select $SERVICE=$NODE $LCONF_STOP_ARGS" else STOP_ARGS=$LCONF_STOP_ARGS fi echo -n "Shutting down $SERVICE ${LCONF} ${STOP_ARGS}: " ${LCONF} ${STOP_ARGS} RETVAL=$? echo $SERVICE if [ $RETVAL -eq 0 ]; then echo "offline" >$STATUS else echo "offline pending" >$STATUS fi } status() { STATE="stopped" egrep -q "libcfs|lvfs|portals" /proc/modules && STATE="loaded" # check for any configured devices (may indicate partial startup) [ "`cat /proc/fs/lustre/devices 2> /dev/null`" ] && STATE="running" #was partial # check for servers in recovery MDS="`ls /proc/fs/lustre/mds/*/recovery_status 2> /dev/null`" OST="`ls /proc/fs/lustre/ost/*/recovery_status 2> /dev/null`" [ "$MDS$OST" ] && grep -q RECOV $MDS $OST && STATE="recovery" # check for server disconnections DISCON="`grep -v FULL /proc/fs/lustre/*c/*/*server_uuid 2> /dev/null`" [ "$DISCON" ] && STATE="disconnected" #how do we tell if it is actually serving? echo $STATE } health() { STATUS="not running" if [ -f $HEALTHCHECK ]; then STATUS=`cat $HEALTHCHECK` fi echo $STATUS; if [ "$STATUS" != "healthy" ]; then return 1 fi return 0 } --=-wMlg7ZFu/y3ktNEKh+UT Content-Disposition: attachment; filename=ost1.sh Content-Type: application/x-shellscript; name=ost1.sh Content-Transfer-Encoding: 7bit #!/bin/dash MYNAME=$(basename $0) # The clug utility uses the normal logging levels as defined in # sys/syslog.h. Calls to clulog will use the logging level defined # for the Service Manager (clusvcmgrd). LOG_EMERG=0 # system is unusable LOG_ALERT=1 # action must be taken immediately LOG_CRIT=2 # critical conditions LOG_ERR=3 # error conditions LOG_WARNING=4 # warning conditions LOG_NOTICE=5 # normal but significant condition LOG_INFO=6 # informational LOG_DEBUG=7 # debug-level messages if [ $# -ne 1 ]; then echo "Usage: $0 {start, stop, status}" exit 1 fi action=$1 # type of action, i.e. ''start'', ''stop'' or ''status'' # # Record all output into a temp file in case of error # exec >> /var/log/lustre/$MYNAME.$action.log 2>&1 clulog -s $LOG_DEBUG "In $0 with action=$action" # source lustre scripts . /usr/local/bin/lustre_functions.sh PRIMARY_SERVICE=ost1 MY_SERVICE=ost1 case $action in ''start'') clulog -s $LOG_INFO "Running $MY_SERVICE start script" RESULT=`start $MY_SERVICE` clulog -s $LOG_INFO "$MY_SERVICE $action script returned: $RESULT" ;; ''stop'') clulog -s $LOG_INFO "Running $MY_SERVICE stop script" RESULT=`stop $MY_SERVICE` clulog -s $LOG_INFO "$MY_SERVICE $action script returned: $RESULT" ;; ''status'') clulog -s $LOG_INFO "Running $MY_SERVICE status script" RESULT=`health` clulog -s $LOG_INFO "$MY_SERVICE $action script returned: $RESULT" exit $? ;; *) clulog -s $LOG_ERR "Unknown action ''$action'' passed to user script" exit 1 # return failure esac exit 0 # return success --=-wMlg7ZFu/y3ktNEKh+UT--
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Nielsen, Steve wrote:> So for both OSS/MDS i don''t need to float any ips between the boxes and > i don''t need to restart the services. the only thing then via heartbeat > I need to do is detect the other side down and "stonith" it? Then things > will be good ?I believe so - look at /etc/init.d/lustre for the current health check you should be able to run ''/etc/init.d/lustre status'' and use that response for service health. With current Lustre, we also support SNMP traps for health checking. cliffw> > Steve > > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com] > Sent: Thursday, September 01, 2005 11:59 AM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? > > Nielsen, Steve wrote: > >>I am working on setting this up now. Just need to trudge through it. >> >>I will share my experiences when done. >> >>Quick question. For failover over to work I need to have ips that > > float > >>between the devices. Won''t this require me to restart the lustre >>service ? (i am on rhel 4). So a "service lustre restart" should work >>right ? > > > Unfortunately, we do not support IP takeover at this time. What we do is > this: > The servers are configured with a specific IP. > The clients know about both IPs, and will attempt to connect in a > round-robin fashion until they succeed. > > Here''s a typically configuration for OST failover: (servers orlando and > oscar) > > --add ost --node orlando --ost ost1-home --failover --group orlando \ > --lov lov-home --dev /dev/ost1 > --add ost --node orlando --ost ost2-home --failover \ > --lov lov-home --dev /dev/ost2 > > --add ost --node oscar --ost ost2-home --failover --group oscar \ > --lov lov-home --dev /dev/ost2 > --add ost --node oscar --ost ost1-home --failover \ > --lov lov-home --dev /dev/ost1 > > cliffw > >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com] >>Sent: Thursday, September 01, 2005 11:43 AM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>Hi, >>> >>>I have a couple questions about setting up failver with Lustre. Any >> >>help >> >> >>>is appreciated. >>> >>>Definitions: >>>===========>>>ssd - shared storage device >>>oss - object storage server >>>mds - meta data server >>> >>>Here is my OSS setup: >>>====================>>> oss1 #----(failover)----# ssd2 >>> # # >>> | | >>> | | >>> |(primary) | (primary) >>> | | >>> | | >>> # # >>> ssd1 #----(failover)----# oss2 >>> >>>So: >>> oss1 is primary for ssd1 >>> oss1 is failover for ssd2 >>> oss2 is primary for ssd2 >>> oss2 is failover for ssd1 >>> >>>Here is my MDS setup: >>>====================>>> +-----# ssd3 #------+ >>> | | >>> | (primary) | (failover) >>> | | >>> | | >>> # # >>> mds1 mds2 >>> >>>So: >>> mds1 is primary for ssd3 >>> mds2 is failover for ssd3 >>> >>>I am now at the stage where I want to implement failover for the OSS''s >>>and MDS seutp. >>> >>>I would prefer to use heartbeat from linux-ha.org for the following >>>reasons: >>> * it''s actively maintained >>> * we use it in house extensively >>> * i am very familiar with it >>> * its straight forward to use (start/stop resources on failover) >>> >>>Has anyone else used heartbeat to do failover? Are there docs that I >> >>can >> >> >>>be pointed on this specific type of setup? >> >> >>Heartbeat should work fine. We have customers using Red Hat''s >>CluManager, which is similar. We are currently writing the docs, I am >>very interested in incorporating your experiences, especially since > > you > >>have Heartbeat familiarity. >> >> >>>I know how to configure heartbeat and to use STONITH to make sure the >>>secondary device will not write to the shared storage device at the >>>same time as the primary device. >> >>That''s the key - the shared storage must never be touched by two > > servers > >>at once. >> >> >>>My main questions lie in what resources to stop/start on failver since >>>both (for example) OSS''s are active for one OST and failver for the >>>other OST. >> >>You should never have to stop the active primary OST when failing over > > >>the other OST to the secondary. Failover is generally transparent to > > the > >>clients, their applications may block for a moment during failover, > > but > >>should continue on with the new server. Failback of course requires > > you > >>to stop the service on the secondary before starting it on the > > primary. > >>cliffw >> >> >> >>>Thanks, >>>Steve >> >> >
cliff white
2006-May-19 07:36 UTC
[Lustre-discuss] Re: setting up failover in lustre... any recommendations?
Nielsen, Steve wrote:> Thanks. > > I will setup this up and test it. > > BTW, I am using lustre 1.4.5 and regular old ethternet. > > I assume for the OSTs the same command lineas as MDSs would apply as > well (with correct --group config and symlinking lustre) ? > > On the --failover for failback does the lconf command wait till its > complete? Or should I sleep in the script after issuing lconf --failvoer > ? >One correction - with Lustre 1.4.5, you no longer need the ''--group'' parameter. You should remove that from your setup script. For lconf, only the --service parameter is needed. cliffw> Steve > > -----Original Message----- > From: cliff white [mailto:cliffw@clusterfs.com] > Sent: Thursday, September 08, 2005 2:12 PM > To: Nielsen, Steve > Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; > support@clusterfs.com > Subject: Re: setting up failover in lustre... any recommendations? > > Nielsen, Steve wrote: > >>Cliff, >> >>I am a little lost then as to what the command lines would be to >>start/stop MDS within heartbeat. That is where my confusion lies. If >>you could help with the commandlines that would be great. > > No problem. > First, what version of Lustre are you using, and what type of network? > (Ethernet, IB, Elan, etc) > >>Here is a guess at at what I need to do: >> >>- mds01 and mds02 are my MDS servers with a shared storage device for >>the metadata that they are both connected to. >> >>- using heartbeat to control failover between the two MDS servers >> >>- both servers are configured to NOT bring up the lustre service via > > the > >>initscripts (heatbeat will control this) >> >>- configure mds01 in heartbeat as the "prefered" server to run the >>active MDS (initially) > > > This is all correct - you need to make one change to your configuration. > You have: > > --add mds --node mds01 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 > > --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 > > Remove the ''--group'' entry for the secondary server. > > --add mds --node mds02 --mds mds1 --fstype ldiskfs --dev > /dev/vg_mds1/lv_mds1 --failover || exit 31 > >>What commandline would I run to bring up MDS then? The init script > > that > >>comes with lustre or a "raw" lconf command? > > > If you are running Lustre > 1.4.4 you can symlink /etc/init.d/lustre > to <path to service scripts>/mds01 (this is the --group name, so it > would be ''mds01'' on both nodes) and use the symlink to start/stop the > mds pair, this is our preferred failover method. > Second choice would be hand-running ''/etc/init.d/lustre start'' > Last, you could run lconf. > > >>On failover heartbeat on mds02 needs to execute a series of commands > > to > >>take over MDS from mds01: >> - stonith mds01 (that way it reboots, comes up, sees mds02 is >>the master and does nothing) >> - start lustre locally on mds02. What is the command line for >>this ? I suspect its an lconf command? > > > The symlink method should work for starting either node, it works from > the ''--group'' parameter. That is why you need to remove the duplicate > --group. You can also use lconf - ''lconf --group mds01 --select > mds01=mds02 <config file>'' > >>On failing back to mds01 what commands do I need to run? >> - stonith mds02 >> - start lustre locally on mds01. What is the command line for >>this ? I suspect its an lconf command? > > > lconf with --failover will do a quick shutdown of mds02, once > that shutdown is complete, you would start mds01, with the service > symlink, /etc/init.d/lustre or lconf. That would be transparent for the > clients. > > Personally, I would avoid the stonith on failback, but that would work, > and should also be transparent for the clients. > > >>As a side note: >>In trying different things to get MDS to failover I get the error >>message below: Do you know what it means? I am running the same > > version > >>of lustre everywhere (kernel, modules, ..) >> >>2005-09-08 10:23:05 -0500,svr01,kern,err,kernel: LustreError: Protocol >>error connecting to host 10.10.1.2 on port 988: Is it running a >>compatible version of Lustre? > > > This is a known issue, port 988 is withing the range of ports handed out > by the portmapper. It is possible that an RPC service is starting before > Lustre and grabbing that port. Typically, disabling nfs and nfslock > avoids the problem, it''s a issue with other services that collide with > the portmapper range, see > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=103401 > for more explaination of the issue. > cliffw > > >>Thanks, >>Steve >> >>-----Original Message----- >>From: cliff white [mailto:cliffw@clusterfs.com] >>Sent: Thursday, September 08, 2005 11:08 AM >>To: Nielsen, Steve >>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>support@clusterfs.com >>Subject: Re: setting up failover in lustre... any recommendations? >> >>Nielsen, Steve wrote: >> >> >>>Cliff, >>> >>>I changed my clients to use the lconf call as you recommend below. >>> >>>I am able to mount the file system without issues on the client using >> >>this method. >> >> >>>However when I shutdown lustre on mds01 (my main mds server) as a test >> >>mds02 does not take over. >> >> >>>Are there commands I need to run on my standby mds02 server to enable >> >>it to take over? >> >> >>>Also when things failback to mds01 is there something I need to run to >> >>enable that? >>I think I may need some more detail on your setup, please >>expand if necessary. >> >>First, only one MDS should be running at a time. This is >>very important - you show _never_ have both servers in the >>failover pair active at the same time. Bad Things can happen >>to your metadata. >> >>For failover, you will have to start the second mds server >>after the first mds is down. Normally, this is done with an >>HA package (Heartbeat, CluManager, etc). When the secondary >>server starts, it should access the shared storage and go. >> >>If that''s not happening, we need to see the logs (syslog,dmesg, >>anything on the console) there should be errors. Check logs on the >>MDS and OST. >> >>For failback, you should stop mds02 with the --failover option. >>This will do a quick shutdown - then start mds01. >>cliffw >> >> >> >>>Thanks, >>>Steve >>> >>> >>>-----Original Message----- >>>From: cliff white [mailto:cliffw@clusterfs.com] >>>Sent: Wed 9/7/2005 5:55 PM >>>To: Nielsen, Steve >>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >> >>support@clusterfs.com >> >> >>>Subject: Re: setting up failover in lustre... any recommendations? >>> >>>Nielsen, Steve wrote: >>> >>> >>> >>>>Cliff, >>>> >>>>I am not clear on how to set MDS active/passive failover. >>>> >>>>Here is my MDS setup: >>>>====================>>>>--add node --node mds01 || exit 10 >>>>--add node --node mds02 || exit 11 >>>> >>>>--add net --node mds01 --nid mds01 --nettype tcp || exit 20 >>>>--add net --node mds02 --nid mds02 --nettype tcp || exit 21 >>>> >>>>--add mds --node mds01 --mds mds1 --fstype ldiskfs --dev >>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 --size 10000 || exit 30 >>>> >>>>--add mds --node mds02 --mds mds1 --fstype ldiskfs --dev >>>>/dev/vg_mds1/lv_mds1 --failover --group mds01 || exit 31 >>>> >>>>--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 >>>>--stripe_pattern 0 || exit 32 >>>> >>>> >>>>Then on my client in /etc/fstab I have: >>>>======================================>>>>mds01:/mds1/client /mnt/lustre lustre rw 0 > > 0 > >>>>When I say take down mds01 and want mds02 to take over (for an > > upgrade > >>>>or something) how do my client now know to contact mds02 instead > > mds01 > >>? >> >> >>>>Wouldn''t a floating IP address make sense in this case? >>>> >>>>Here here is appreciated. >>> >>> >>>Steve - >>>Two answers, future and current. >>> >>>Our new mountconfig will allow you to specify >>>multiple MDSs as part of the mount command. Unfortunately, the new >>>mountconfig hasn''t been released yet, it will be soon. For now, you >>>will not be able to specify the client mount in /etc/fstab. >>>Instead, you will have to use lconf to mount the clients. >>> >>>In your setup script, be sure you specify a client >>> (the single line will cover any number of actual clients): >>> >>>--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 >>> >>>Then on the client node: >>>''lconf --node client <your xml file>'' >>> >>>will mount the filesystem - failover of the mds will be transparent to >>>the clients, they will know to try the secondary mds if the primary is >>>unavailable. >>>For now, you may wish to put the lconf command in a script, and have >> >>the >> >> >>>script called as part of your normal startup. >>> >>>cliffw >>> >>> >>> >>> >>> >>>>Thanks, >>>>Steve >>>> >>>>-----Original Message----- >>>>From: cliff white [mailto:cliffw@clusterfs.com] >>>>Sent: Tuesday, September 06, 2005 12:31 PM >>>>To: Nielsen, Steve >>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>support@clusterfs.com >>>>Subject: Re: setting up failover in lustre... any recommendations? >>>> >>>>Nielsen, Steve wrote: >>>> >>>> > > >>>>>So for both OSS/MDS i don''t need to float any ips between the boxes >>>> >>>>and >>>> >>>> >>>> >>>> >>>>>i don''t need to restart the services. the only thing then via >>>> >>>>heartbeat >>>> >>>> >>>> >>>> >>>>>I need to do is detect the other side down and "stonith" it? Then >>>> >>>>things >>>> >>>> >>>> >>>> >>>>>will be good ? >>>> >>>> >>>>Steve - >>>>Just wanted to check back and see how you were doing with this. >>>>One thing I didn''t mention, when failing back the service from >> >>secondary >> >> >>>>to primary, you should stop the service with ''lconf --failover'' which >>>>will be quicker. >>>>cliffw >>>> >>>> >>>> >>>> >>>> >>>> >>>>>Steve >>>>> >>>>>-----Original Message----- >>>>>From: cliff white [mailto:cliffw@clusterfs.com] >>>>>Sent: Thursday, September 01, 2005 11:59 AM >>>>>To: Nielsen, Steve >>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>>support@clusterfs.com >>>>>Subject: Re: setting up failover in lustre... any recommendations? >>>>> >>>>>Nielsen, Steve wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>I am working on setting this up now. Just need to trudge through > > it. > >>>>>>I will share my experiences when done. >>>>>> >>>>>>Quick question. For failover over to work I need to have ips that >>>>> >>>>>float >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>between the devices. Won''t this require me to restart the lustre >>>>>>service ? (i am on rhel 4). So a "service lustre restart" should >> >>work >> >> >>>>>>right ? >>>>> >>>>> >>>>>Unfortunately, we do not support IP takeover at this time. What we > > do > >>>>is >>>> >>>> >>>> >>>> >>>>>this: >>>>>The servers are configured with a specific IP. >>>>>The clients know about both IPs, and will attempt to connect in a >>>>>round-robin fashion until they succeed. >>>>> >>>>>Here''s a typically configuration for OST failover: (servers orlando >>>> >>>>and >>>> >>>> >>>> >>>> >>>>>oscar) >>>>> >>>>>--add ost --node orlando --ost ost1-home --failover --group orlando > > \ > >>>>>--lov lov-home --dev /dev/ost1 >>>>>--add ost --node orlando --ost ost2-home --failover \ >>>>>--lov lov-home --dev /dev/ost2 >>>>> >>>>>--add ost --node oscar --ost ost2-home --failover --group oscar \ >>>>>--lov lov-home --dev /dev/ost2 >>>>>--add ost --node oscar --ost ost1-home --failover \ >>>>>--lov lov-home --dev /dev/ost1 >>>>> >>>>>cliffw >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Steve >>>>>> >>>>>>-----Original Message----- >>>>>>From: cliff white [mailto:cliffw@clusterfs.com] >>>>>>Sent: Thursday, September 01, 2005 11:43 AM >>>>>>To: Nielsen, Steve >>>>>>Cc: lustre-discuss@lists.clusterfs.com; Jeffrey Denworth; >>>>>>support@clusterfs.com >>>>>>Subject: Re: setting up failover in lustre... any recommendations? >>>>>> >>>>>>Nielsen, Steve wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Hi, >>>>>>> >>>>>>>I have a couple questions about setting up failver with Lustre. > > Any > >>>>>>help >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>is appreciated. >>>>>>> >>>>>>>Definitions: >>>>>>>===========>>>>>>>ssd - shared storage device >>>>>>>oss - object storage server >>>>>>>mds - meta data server >>>>>>> >>>>>>>Here is my OSS setup: >>>>>>>====================>>>>>>>oss1 #----(failover)----# ssd2 >>>>>>># # >>>>>>>| | >>>>>>>| | >>>>>>>|(primary) | (primary) >>>>>>>| | >>>>>>>| | >>>>>>># # >>>>>>>ssd1 #----(failover)----# oss2 >>>>>>> >>>>>>>So: >>>>>>>oss1 is primary for ssd1 >>>>>>>oss1 is failover for ssd2 >>>>>>>oss2 is primary for ssd2 >>>>>>>oss2 is failover for ssd1 >>>>>>> >>>>>>>Here is my MDS setup: >>>>>>>====================>>>>>>>+-----# ssd3 #------+ >>>>>>>| | >>>>>>>| (primary) | (failover) >>>>>>>| | >>>>>>>| | >>>>>>># # >>>>>>>mds1 mds2 >>>>>>> >>>>>>>So: >>>>>>>mds1 is primary for ssd3 >>>>>>>mds2 is failover for ssd3 >>>>>>> >>>>>>>I am now at the stage where I want to implement failover for the >>>> >>>>OSS''s >>>> >>>> >>>> >>>> >>>>>>>and MDS seutp. >>>>>>> >>>>>>>I would prefer to use heartbeat from linux-ha.org for the > > following > >>>>>>>reasons: >>>>>>>* it''s actively maintained >>>>>>>* we use it in house extensively >>>>>>>* i am very familiar with it >>>>>>>* its straight forward to use (start/stop resources on failover) >>>>>>> >>>>>>>Has anyone else used heartbeat to do failover? Are there docs that >> >>I >> >> >>>>>>can >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>be pointed on this specific type of setup? >>>>>> >>>>>> >>>>>>Heartbeat should work fine. We have customers using Red Hat''s >>>>>>CluManager, which is similar. We are currently writing the docs, I >> >>am >> >> >>>>>>very interested in incorporating your experiences, especially since >>>>> >>>>>you >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>have Heartbeat familiarity. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>I know how to configure heartbeat and to use STONITH to make sure >> >>the >> >> >>>>>>>secondary device will not write to the shared storage device at > > the > >>>>>>>same time as the primary device. >>>>>> >>>>>>That''s the key - the shared storage must never be touched by two >>>>> >>>>>servers >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>at once. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>My main questions lie in what resources to stop/start on failver >>>> >>>>since >>>> >>>> >>>> >>>> >>>>>>>both (for example) OSS''s are active for one OST and failver for > > the > >>>>>>>other OST. >>>>>> >>>>>>You should never have to stop the active primary OST when failing >> >>over >> >> >>>>>>the other OST to the secondary. Failover is generally transparent > > to > >>>>>the >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>clients, their applications may block for a moment during failover, >>>>> >>>>>but >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>should continue on with the new server. Failback of course requires >>>>> >>>>>you >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>to stop the service on the secondary before starting it on the >>>>> >>>>>primary. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>cliffw >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Thanks, >>>>>>>Steve >>>>>> >>>>>> >