thr3ads.net - Lustre discuss - [Lustre-discuss] Confusion with failover [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Dhruv

2008-Jun-26 06:09 UTC

[Lustre-discuss] Confusion with failover

Hello Everybody,
I am a novice in using Lustre. Wanted some help.
I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am
facing some problems.

Case1:
I tried the ost failover . Following is the config file to generate
xml file.

rm -f failover_2node.xml
./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 --
nettype tcp
./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 --
nettype tcp
./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 --
nettype tcp
./lmc -m failover_2node.xml --add net --node client --nid ''*''
--
nettype tcp

# Cofigure MDS
./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test --
fstype ldiskfs --dev /dev/sdb5

# Cofigure LOV
./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test --
stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0

# Configures OSTs
./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test
--ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1
./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test
--ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7

# Configure client (this is a ''generic'' client used for all
client
mounts)
./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/
lustre --mds mds_test --lov lov_test

.
Following were my lconf commands.

1. lconf --reformat --node node-ost1 failover_2node.xml   .... on sm02
2. lconf --reformat --node node-ost2 --service=ost1
failover_2node.xml .... on sm06
3. lconf --reformat --node node-mds failover_2node.xml.... on sm01
4. lconf --node client failover_2node.xml ... on sm02 and sm06

So my intention is to keep a failover ost node incase one fails. MDS
is on a seperate node. I tried different scenarios where one ost goes
down and still data can be retrieved from other. New files can be
created and old can be deleted on the failover ost. Data was available
most of time.

So my question is whether Linux HA required to configure such failover
scenario?

Case2:
I tried with same sort of formula as shown above for a failover MDS.
But when the main MDS fails, it doesnt switches to new MDS. Also when
the main MDS comes up again, the file system doesnt recover. I brought
down the client and again brought up. Then it was working.

So is Linux HA or similar program necessary for configuring failover??

Dhruv

Klaus Steden

2008-Jun-26 19:10 UTC

head link

[Lustre-discuss] Confusion with failover

Hopefully this isn''t a stupid question ... but have you considered
using
Lustre 1.6 instead, if it''s an option? It''s much, much easier
to work with.

As for failover itself ... Lustre provides redundancy in the design, i.e.
you can have a secondary MDS that comes online when the primary has failed,
and OSSes can assume control of one another''s OSTs in case of host
failure,
but it does not implement any of this functionality. You''ll have to use
something like Linux-HA to get your server nodes to gracefully resume
service for failed nodes.

Again, I''d strongly recommend using 1.6, as the latest version of
Linux-HA
has native support for Lustre, which can use standard Linux
''mount''
commands. If you go with 1.4, you''ll have to noodle with XML and
scripts
(which will still work) which is more of a hassle, while 1.6 "just
works".

cheers,
Klaus

On 6/25/08 11:09 PM, "Dhruv" <DhruvDesaai at gmail.com>did etch
on stone
tablets:
> Hello Everybody,
> I am a novice in using Lustre. Wanted some help.
> I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am
> facing some problems.
> 
> Case1:
> I tried the ost failover . Following is the config file to generate
> xml file.
> 
> rm -f failover_2node.xml
> ./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 --
> nettype tcp
> ./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 --
> nettype tcp
> ./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 --
> nettype tcp
> ./lmc -m failover_2node.xml --add net --node client --nid
''*'' --
> nettype tcp
> 
> # Cofigure MDS
> ./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test --
> fstype ldiskfs --dev /dev/sdb5
> 
> # Cofigure LOV
> ./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test --
> stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0
> 
> # Configures OSTs
> ./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test
> --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1
> ./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test
> --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7
> 
> # Configure client (this is a ''generic'' client used for
all client
> mounts)
> ./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/
> lustre --mds mds_test --lov lov_test
> 
> .
> Following were my lconf commands.
> 
> 1. lconf --reformat --node node-ost1 failover_2node.xml   .... on sm02
> 2. lconf --reformat --node node-ost2 --service=ost1
> failover_2node.xml .... on sm06
> 3. lconf --reformat --node node-mds failover_2node.xml.... on sm01
> 4. lconf --node client failover_2node.xml ... on sm02 and sm06
> 
> So my intention is to keep a failover ost node incase one fails. MDS
> is on a seperate node. I tried different scenarios where one ost goes
> down and still data can be retrieved from other. New files can be
> created and old can be deleted on the failover ost. Data was available
> most of time.
> 
> So my question is whether Linux HA required to configure such failover
> scenario?
> 
> Case2:
> I tried with same sort of formula as shown above for a failover MDS.
> But when the main MDS fails, it doesnt switches to new MDS. Also when
> the main MDS comes up again, the file system doesnt recover. I brought
> down the client and again brought up. Then it was working.
> 
> So is Linux HA or similar program necessary for configuring failover??
> 
> Dhruv
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Dhruv

2008-Jun-27 04:15 UTC

head link

[Lustre-discuss] Confusion with failover

Actually with my kernel of 2.6.9-22,  lustre 1.4.5.1 fits. And i am
not in position to change the OS itself.

I tried with the failover of OSTs without Linux HA. It worked fairly.
I am now testing the same rigoursly to see whether i am correct. But
the failover of MDS without HA didnt worked atall.

Can it without HA?


On Jun 27, 12:10 am, Klaus Steden <klaus.ste... at thomson.net>
wrote:> Hopefully this isn''t a stupid question ... but have you considered
using
> Lustre 1.6 instead, if it''s an option? It''s much, much
easier to work with.
>
> As for failover itself ... Lustre provides redundancy in the design, i.e.
> you can have a secondary MDS that comes online when the primary has failed,
> and OSSes can assume control of one another''s OSTs in case of host
failure,
> but it does not implement any of this functionality. You''ll have
to use
> something like Linux-HA to get your server nodes to gracefully resume
> service for failed nodes.
>
> Again, I''d strongly recommend using 1.6, as the latest version of
Linux-HA
> has native support for Lustre, which can use standard Linux
''mount''
> commands. If you go with 1.4, you''ll have to noodle with XML and
scripts
> (which will still work) which is more of a hassle, while 1.6 "just
works".
>
> cheers,
> Klaus
>
> On 6/25/08 11:09 PM, "Dhruv" <DhruvDes... at gmail.com>did
etch on stone
> tablets:
>
>
>
> > Hello Everybody,
> > I am a novice in using Lustre. Wanted some help.
> > I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am
> > facing some problems.
>
> > Case1:
> > I tried the ost failover . Following is the config file to generate
> > xml file.
>
> > rm -f failover_2node.xml
> > ./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 --
> > nettype tcp
> > ./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 --
> > nettype tcp
> > ./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 --
> > nettype tcp
> > ./lmc -m failover_2node.xml --add net --node client --nid
''*'' --
> > nettype tcp
>
> > # Cofigure MDS
> > ./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test
--
> > fstype ldiskfs --dev /dev/sdb5
>
> > # Cofigure LOV
> > ./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test --
> > stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0
>
> > # Configures OSTs
> > ./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test
> > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1
> > ./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test
> > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7
>
> > # Configure client (this is a ''generic'' client used
for all client
> > mounts)
> > ./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/
> > lustre --mds mds_test --lov lov_test
>
> > .
> > Following were my lconf commands.
>
> > 1. lconf --reformat --node node-ost1 failover_2node.xml   .... on sm02
> > 2. lconf --reformat --node node-ost2 --service=ost1
> > failover_2node.xml .... on sm06
> > 3. lconf --reformat --node node-mds failover_2node.xml.... on sm01
> > 4. lconf --node client failover_2node.xml ... on sm02 and sm06
>
> > So my intention is to keep a failover ost node incase one fails. MDS
> > is on a seperate node. I tried different scenarios where one ost goes
> > down and still data can be retrieved from other. New files can be
> > created and old can be deleted on the failover ost. Data was available
> > most of time.
>
> > So my question is whether Linux HA required to configure such failover
> > scenario?
>
> > Case2:
> > I tried with same sort of formula as shown above for a failover MDS.
> > But when the main MDS fails, it doesnt switches to new MDS. Also when
> > the main MDS comes up again, the file system doesnt recover. I brought
> > down the client and again brought up. Then it was working.
>
> > So is Linux HA or similar program necessary for configuring failover??
>
> > Dhruv
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-disc... at lists.lustre.org
> >http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss

Dhruv

2008-Jun-27 04:16 UTC

head link

[Lustre-discuss] Confusion with failover

Actually with my kernel of 2.6.9-22,  lustre 1.4.5.1 fits. And i am
not in position to change the OS itself.

I tried with the failover of OSTs without Linux HA. It worked fairly.
I am now testing the same rigoursly to see whether i am correct. But
the failover of MDS without HA didnt worked atall.

Can it without HA?

On Jun 27, 12:10 am, Klaus Steden <klaus.ste... at thomson.net>
wrote:> Hopefully this isn''t a stupid question ... but have you considered
using
> Lustre 1.6 instead, if it''s an option? It''s much, much
easier to work with.
>
> As for failover itself ... Lustre provides redundancy in the design, i.e.
> you can have a secondary MDS that comes online when the primary has failed,
> and OSSes can assume control of one another''s OSTs in case of host
failure,
> but it does not implement any of this functionality. You''ll have
to use
> something like Linux-HA to get your server nodes to gracefully resume
> service for failed nodes.
>
> Again, I''d strongly recommend using 1.6, as the latest version of
Linux-HA
> has native support for Lustre, which can use standard Linux
''mount''
> commands. If you go with 1.4, you''ll have to noodle with XML and
scripts
> (which will still work) which is more of a hassle, while 1.6 "just
works".
>
> cheers,
> Klaus
>
> On 6/25/08 11:09 PM, "Dhruv" <DhruvDes... at gmail.com>did
etch on stone
> tablets:
>
>
>
> > Hello Everybody,
> > I am a novice in using Lustre. Wanted some help.
> > I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am
> > facing some problems.
>
> > Case1:
> > I tried the ost failover . Following is the config file to generate
> > xml file.
>
> > rm -f failover_2node.xml
> > ./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 --
> > nettype tcp
> > ./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 --
> > nettype tcp
> > ./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 --
> > nettype tcp
> > ./lmc -m failover_2node.xml --add net --node client --nid
''*'' --
> > nettype tcp
>
> > # Cofigure MDS
> > ./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test
--
> > fstype ldiskfs --dev /dev/sdb5
>
> > # Cofigure LOV
> > ./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test --
> > stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0
>
> > # Configures OSTs
> > ./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test
> > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1
> > ./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test
> > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7
>
> > # Configure client (this is a ''generic'' client used
for all client
> > mounts)
> > ./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/
> > lustre --mds mds_test --lov lov_test
>
> > .
> > Following were my lconf commands.
>
> > 1. lconf --reformat --node node-ost1 failover_2node.xml   .... on sm02
> > 2. lconf --reformat --node node-ost2 --service=ost1
> > failover_2node.xml .... on sm06
> > 3. lconf --reformat --node node-mds failover_2node.xml.... on sm01
> > 4. lconf --node client failover_2node.xml ... on sm02 and sm06
>
> > So my intention is to keep a failover ost node incase one fails. MDS
> > is on a seperate node. I tried different scenarios where one ost goes
> > down and still data can be retrieved from other. New files can be
> > created and old can be deleted on the failover ost. Data was available
> > most of time.
>
> > So my question is whether Linux HA required to configure such failover
> > scenario?
>
> > Case2:
> > I tried with same sort of formula as shown above for a failover MDS.
> > But when the main MDS fails, it doesnt switches to new MDS. Also when
> > the main MDS comes up again, the file system doesnt recover. I brought
> > down the client and again brought up. Then it was working.
>
> > So is Linux HA or similar program necessary for configuring failover??
>
> > Dhruv
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-disc... at lists.lustre.org
> >http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss

Klaus Steden

2008-Jun-27 19:17 UTC

head link

[Lustre-discuss] Confusion with failover

On 6/26/08 9:16 PM, "Dhruv" <DhruvDesaai at gmail.com>did etch
on stone
tablets:
> Actually with my kernel of 2.6.9-22,  lustre 1.4.5.1 fits. And i am
> not in position to change the OS itself.
> 
> I tried with the failover of OSTs without Linux HA. It worked fairly.
> I am now testing the same rigoursly to see whether i am correct. But
> the failover of MDS without HA didnt worked atall.
> 
> Can it without HA?
> No. As Brian pointed out, Lustre supports failover at the server level, but
detection, fencing, etc. has to be handled by another process external to
Lustre. Most people use Linux-HA, including myself, and I find it to be
robust and fairly straightforward to implement. However, because you''re
using 1.4, you might have to resort to some "script-fu" to get the
remounting operation to work properly.

Here is a paste of my /etc/ha.d/haresources file, which for Lustre 1.6 can
be used with the Linux ''mount'' command, meaning I can treat my
Lustre MDT as
a regular disk, which HA supports very well. If you use lconf, you''ll
have
to make some sort of script-based call-out to have the secondary MDS start
when it detects failure on the primary.

-- cut --
[root at mds-0-0 ~]# cat /etc/ha.d/haresources
mds-0-0.local 172.16.2.252
Filesystem::-Llustre-MDT0000::/mnt/lustremdt::lustre
-- cut --

(that''s supposed to be all one line ... stupid mail client)

cheers,
Klaus

Dhruv

2008-Jun-28 09:10 UTC

head link

[Lustre-discuss] Confusion with failover

I think Linux HA or something equivalent is mandatory. The manual of
1.6 says so. And as 1.4.5.1 being old, its manual is not available.

Klaus Steden wrote:> On 6/26/08 9:16 PM, "Dhruv" <DhruvDesaai at gmail.com>did
etch on stone
> tablets:
>
> > Actually with my kernel of 2.6.9-22,  lustre 1.4.5.1 fits. And i am
> > not in position to change the OS itself.
> >
> > I tried with the failover of OSTs without Linux HA. It worked fairly.
> > I am now testing the same rigoursly to see whether i am correct. But
> > the failover of MDS without HA didnt worked atall.
> >
> > Can it without HA?
> >
> No. As Brian pointed out, Lustre supports failover at the server level, but
> detection, fencing, etc. has to be handled by another process external to
> Lustre. Most people use Linux-HA, including myself, and I find it to be
> robust and fairly straightforward to implement. However, because
you''re
> using 1.4, you might have to resort to some "script-fu" to get
the
> remounting operation to work properly.
>
> Here is a paste of my /etc/ha.d/haresources file, which for Lustre 1.6 can
> be used with the Linux ''mount'' command, meaning I can
treat my Lustre MDT as
> a regular disk, which HA supports very well. If you use lconf,
you''ll have
> to make some sort of script-based call-out to have the secondary MDS start
> when it detects failure on the primary.
>
> -- cut --
> [root at mds-0-0 ~]# cat /etc/ha.d/haresources
> mds-0-0.local 172.16.2.252
> Filesystem::-Llustre-MDT0000::/mnt/lustremdt::lustre
> -- cut --
>
> (that''s supposed to be all one line ... stupid mail client)
>
> cheers,
> Klaus
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lustre discuss - Jun 2008 - Confusion with failover

[Lustre-discuss] Confusion with failover

[Lustre-discuss] Confusion with failover

[Lustre-discuss] Confusion with failover

[Lustre-discuss] Confusion with failover

[Lustre-discuss] Confusion with failover

[Lustre-discuss] Confusion with failover