Hello Everybody, I am a novice in using Lustre. Wanted some help. I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am facing some problems. Case1: I tried the ost failover . Following is the config file to generate xml file. rm -f failover_2node.xml ./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 -- nettype tcp ./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 -- nettype tcp ./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 -- nettype tcp ./lmc -m failover_2node.xml --add net --node client --nid ''*'' -- nettype tcp # Cofigure MDS ./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test -- fstype ldiskfs --dev /dev/sdb5 # Cofigure LOV ./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test -- stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0 # Configures OSTs ./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1 ./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7 # Configure client (this is a ''generic'' client used for all client mounts) ./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/ lustre --mds mds_test --lov lov_test . Following were my lconf commands. 1. lconf --reformat --node node-ost1 failover_2node.xml .... on sm02 2. lconf --reformat --node node-ost2 --service=ost1 failover_2node.xml .... on sm06 3. lconf --reformat --node node-mds failover_2node.xml.... on sm01 4. lconf --node client failover_2node.xml ... on sm02 and sm06 So my intention is to keep a failover ost node incase one fails. MDS is on a seperate node. I tried different scenarios where one ost goes down and still data can be retrieved from other. New files can be created and old can be deleted on the failover ost. Data was available most of time. So my question is whether Linux HA required to configure such failover scenario? Case2: I tried with same sort of formula as shown above for a failover MDS. But when the main MDS fails, it doesnt switches to new MDS. Also when the main MDS comes up again, the file system doesnt recover. I brought down the client and again brought up. Then it was working. So is Linux HA or similar program necessary for configuring failover?? Dhruv
Hopefully this isn''t a stupid question ... but have you considered using Lustre 1.6 instead, if it''s an option? It''s much, much easier to work with. As for failover itself ... Lustre provides redundancy in the design, i.e. you can have a secondary MDS that comes online when the primary has failed, and OSSes can assume control of one another''s OSTs in case of host failure, but it does not implement any of this functionality. You''ll have to use something like Linux-HA to get your server nodes to gracefully resume service for failed nodes. Again, I''d strongly recommend using 1.6, as the latest version of Linux-HA has native support for Lustre, which can use standard Linux ''mount'' commands. If you go with 1.4, you''ll have to noodle with XML and scripts (which will still work) which is more of a hassle, while 1.6 "just works". cheers, Klaus On 6/25/08 11:09 PM, "Dhruv" <DhruvDesaai at gmail.com>did etch on stone tablets:> Hello Everybody, > I am a novice in using Lustre. Wanted some help. > I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am > facing some problems. > > Case1: > I tried the ost failover . Following is the config file to generate > xml file. > > rm -f failover_2node.xml > ./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 -- > nettype tcp > ./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 -- > nettype tcp > ./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 -- > nettype tcp > ./lmc -m failover_2node.xml --add net --node client --nid ''*'' -- > nettype tcp > > # Cofigure MDS > ./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test -- > fstype ldiskfs --dev /dev/sdb5 > > # Cofigure LOV > ./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test -- > stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0 > > # Configures OSTs > ./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1 > ./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7 > > # Configure client (this is a ''generic'' client used for all client > mounts) > ./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/ > lustre --mds mds_test --lov lov_test > > . > Following were my lconf commands. > > 1. lconf --reformat --node node-ost1 failover_2node.xml .... on sm02 > 2. lconf --reformat --node node-ost2 --service=ost1 > failover_2node.xml .... on sm06 > 3. lconf --reformat --node node-mds failover_2node.xml.... on sm01 > 4. lconf --node client failover_2node.xml ... on sm02 and sm06 > > So my intention is to keep a failover ost node incase one fails. MDS > is on a seperate node. I tried different scenarios where one ost goes > down and still data can be retrieved from other. New files can be > created and old can be deleted on the failover ost. Data was available > most of time. > > So my question is whether Linux HA required to configure such failover > scenario? > > Case2: > I tried with same sort of formula as shown above for a failover MDS. > But when the main MDS fails, it doesnt switches to new MDS. Also when > the main MDS comes up again, the file system doesnt recover. I brought > down the client and again brought up. Then it was working. > > So is Linux HA or similar program necessary for configuring failover?? > > Dhruv > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Actually with my kernel of 2.6.9-22, lustre 1.4.5.1 fits. And i am not in position to change the OS itself. I tried with the failover of OSTs without Linux HA. It worked fairly. I am now testing the same rigoursly to see whether i am correct. But the failover of MDS without HA didnt worked atall. Can it without HA? On Jun 27, 12:10 am, Klaus Steden <klaus.ste... at thomson.net> wrote:> Hopefully this isn''t a stupid question ... but have you considered using > Lustre 1.6 instead, if it''s an option? It''s much, much easier to work with. > > As for failover itself ... Lustre provides redundancy in the design, i.e. > you can have a secondary MDS that comes online when the primary has failed, > and OSSes can assume control of one another''s OSTs in case of host failure, > but it does not implement any of this functionality. You''ll have to use > something like Linux-HA to get your server nodes to gracefully resume > service for failed nodes. > > Again, I''d strongly recommend using 1.6, as the latest version of Linux-HA > has native support for Lustre, which can use standard Linux ''mount'' > commands. If you go with 1.4, you''ll have to noodle with XML and scripts > (which will still work) which is more of a hassle, while 1.6 "just works". > > cheers, > Klaus > > On 6/25/08 11:09 PM, "Dhruv" <DhruvDes... at gmail.com>did etch on stone > tablets: > > > > > Hello Everybody, > > I am a novice in using Lustre. Wanted some help. > > I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am > > facing some problems. > > > Case1: > > I tried the ost failover . Following is the config file to generate > > xml file. > > > rm -f failover_2node.xml > > ./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 -- > > nettype tcp > > ./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 -- > > nettype tcp > > ./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 -- > > nettype tcp > > ./lmc -m failover_2node.xml --add net --node client --nid ''*'' -- > > nettype tcp > > > # Cofigure MDS > > ./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test -- > > fstype ldiskfs --dev /dev/sdb5 > > > # Cofigure LOV > > ./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test -- > > stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0 > > > # Configures OSTs > > ./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test > > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1 > > ./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test > > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7 > > > # Configure client (this is a ''generic'' client used for all client > > mounts) > > ./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/ > > lustre --mds mds_test --lov lov_test > > > . > > Following were my lconf commands. > > > 1. lconf --reformat --node node-ost1 failover_2node.xml .... on sm02 > > 2. lconf --reformat --node node-ost2 --service=ost1 > > failover_2node.xml .... on sm06 > > 3. lconf --reformat --node node-mds failover_2node.xml.... on sm01 > > 4. lconf --node client failover_2node.xml ... on sm02 and sm06 > > > So my intention is to keep a failover ost node incase one fails. MDS > > is on a seperate node. I tried different scenarios where one ost goes > > down and still data can be retrieved from other. New files can be > > created and old can be deleted on the failover ost. Data was available > > most of time. > > > So my question is whether Linux HA required to configure such failover > > scenario? > > > Case2: > > I tried with same sort of formula as shown above for a failover MDS. > > But when the main MDS fails, it doesnt switches to new MDS. Also when > > the main MDS comes up again, the file system doesnt recover. I brought > > down the client and again brought up. Then it was working. > > > So is Linux HA or similar program necessary for configuring failover?? > > > Dhruv > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-disc... at lists.lustre.org > >http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-disc... at lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss
Actually with my kernel of 2.6.9-22, lustre 1.4.5.1 fits. And i am not in position to change the OS itself. I tried with the failover of OSTs without Linux HA. It worked fairly. I am now testing the same rigoursly to see whether i am correct. But the failover of MDS without HA didnt worked atall. Can it without HA? On Jun 27, 12:10 am, Klaus Steden <klaus.ste... at thomson.net> wrote:> Hopefully this isn''t a stupid question ... but have you considered using > Lustre 1.6 instead, if it''s an option? It''s much, much easier to work with. > > As for failover itself ... Lustre provides redundancy in the design, i.e. > you can have a secondary MDS that comes online when the primary has failed, > and OSSes can assume control of one another''s OSTs in case of host failure, > but it does not implement any of this functionality. You''ll have to use > something like Linux-HA to get your server nodes to gracefully resume > service for failed nodes. > > Again, I''d strongly recommend using 1.6, as the latest version of Linux-HA > has native support for Lustre, which can use standard Linux ''mount'' > commands. If you go with 1.4, you''ll have to noodle with XML and scripts > (which will still work) which is more of a hassle, while 1.6 "just works". > > cheers, > Klaus > > On 6/25/08 11:09 PM, "Dhruv" <DhruvDes... at gmail.com>did etch on stone > tablets: > > > > > Hello Everybody, > > I am a novice in using Lustre. Wanted some help. > > I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am > > facing some problems. > > > Case1: > > I tried the ost failover . Following is the config file to generate > > xml file. > > > rm -f failover_2node.xml > > ./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 -- > > nettype tcp > > ./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 -- > > nettype tcp > > ./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 -- > > nettype tcp > > ./lmc -m failover_2node.xml --add net --node client --nid ''*'' -- > > nettype tcp > > > # Cofigure MDS > > ./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test -- > > fstype ldiskfs --dev /dev/sdb5 > > > # Cofigure LOV > > ./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test -- > > stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0 > > > # Configures OSTs > > ./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test > > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1 > > ./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test > > --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7 > > > # Configure client (this is a ''generic'' client used for all client > > mounts) > > ./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/ > > lustre --mds mds_test --lov lov_test > > > . > > Following were my lconf commands. > > > 1. lconf --reformat --node node-ost1 failover_2node.xml .... on sm02 > > 2. lconf --reformat --node node-ost2 --service=ost1 > > failover_2node.xml .... on sm06 > > 3. lconf --reformat --node node-mds failover_2node.xml.... on sm01 > > 4. lconf --node client failover_2node.xml ... on sm02 and sm06 > > > So my intention is to keep a failover ost node incase one fails. MDS > > is on a seperate node. I tried different scenarios where one ost goes > > down and still data can be retrieved from other. New files can be > > created and old can be deleted on the failover ost. Data was available > > most of time. > > > So my question is whether Linux HA required to configure such failover > > scenario? > > > Case2: > > I tried with same sort of formula as shown above for a failover MDS. > > But when the main MDS fails, it doesnt switches to new MDS. Also when > > the main MDS comes up again, the file system doesnt recover. I brought > > down the client and again brought up. Then it was working. > > > So is Linux HA or similar program necessary for configuring failover?? > > > Dhruv > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-disc... at lists.lustre.org > >http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-disc... at lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss
On 6/26/08 9:16 PM, "Dhruv" <DhruvDesaai at gmail.com>did etch on stone tablets:> Actually with my kernel of 2.6.9-22, lustre 1.4.5.1 fits. And i am > not in position to change the OS itself. > > I tried with the failover of OSTs without Linux HA. It worked fairly. > I am now testing the same rigoursly to see whether i am correct. But > the failover of MDS without HA didnt worked atall. > > Can it without HA? >No. As Brian pointed out, Lustre supports failover at the server level, but detection, fencing, etc. has to be handled by another process external to Lustre. Most people use Linux-HA, including myself, and I find it to be robust and fairly straightforward to implement. However, because you''re using 1.4, you might have to resort to some "script-fu" to get the remounting operation to work properly. Here is a paste of my /etc/ha.d/haresources file, which for Lustre 1.6 can be used with the Linux ''mount'' command, meaning I can treat my Lustre MDT as a regular disk, which HA supports very well. If you use lconf, you''ll have to make some sort of script-based call-out to have the secondary MDS start when it detects failure on the primary. -- cut -- [root at mds-0-0 ~]# cat /etc/ha.d/haresources mds-0-0.local 172.16.2.252 Filesystem::-Llustre-MDT0000::/mnt/lustremdt::lustre -- cut -- (that''s supposed to be all one line ... stupid mail client) cheers, Klaus
I think Linux HA or something equivalent is mandatory. The manual of 1.6 says so. And as 1.4.5.1 being old, its manual is not available. Klaus Steden wrote:> On 6/26/08 9:16 PM, "Dhruv" <DhruvDesaai at gmail.com>did etch on stone > tablets: > > > Actually with my kernel of 2.6.9-22, lustre 1.4.5.1 fits. And i am > > not in position to change the OS itself. > > > > I tried with the failover of OSTs without Linux HA. It worked fairly. > > I am now testing the same rigoursly to see whether i am correct. But > > the failover of MDS without HA didnt worked atall. > > > > Can it without HA? > > > No. As Brian pointed out, Lustre supports failover at the server level, but > detection, fencing, etc. has to be handled by another process external to > Lustre. Most people use Linux-HA, including myself, and I find it to be > robust and fairly straightforward to implement. However, because you''re > using 1.4, you might have to resort to some "script-fu" to get the > remounting operation to work properly. > > Here is a paste of my /etc/ha.d/haresources file, which for Lustre 1.6 can > be used with the Linux ''mount'' command, meaning I can treat my Lustre MDT as > a regular disk, which HA supports very well. If you use lconf, you''ll have > to make some sort of script-based call-out to have the secondary MDS start > when it detects failure on the primary. > > -- cut -- > [root at mds-0-0 ~]# cat /etc/ha.d/haresources > mds-0-0.local 172.16.2.252 > Filesystem::-Llustre-MDT0000::/mnt/lustremdt::lustre > -- cut -- > > (that''s supposed to be all one line ... stupid mail client) > > cheers, > Klaus > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss