xgl at xgl.pereslavl.ru
2010-Apr-20 12:42 UTC
[Lustre-discuss] Active-Active failover configuration
Greetings! Sorry for troubling you by such question but I cannot find example in documentation and meet some problems. I want to use Active/Active failover configuration. (Where can I find some examples?) I have 2 nodes - s1 and s2 used as OSS''es I also have 3 block devices (MDS) - 1Tb used as MDS|MDT (OST0) 8Tb used as OST1 (OST1) 8Tb used as OST2 All devices available from both OSS''s. OST0 mounted on s1 MDS and OST1 mounted on s2 in normal state. How can I configure system such way, that if one os OSS''es (s2, as an example), fails out, second OSS (s1) take control of all resources? I have heartbeat installed and configured. [root at s2 ~]# cat /etc/ha.d/haresources s1 Filesystem::/dev/disk/b801::/mnt/ost0::lustre s2 Filesystem::/dev/disk/b800::/mnt/mdt::lustre Filesystem::/dev/disk/8800::/mnt/ost1::lustre I configure system; On s2 I format and mount MDT and OST1 mkfs.lustre --reformat --fsname=lustre --mgs --mdt /dev/disk/by-id/b800 mount -t lustre /dev/disk/b800 /mnt/mdt/ mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib /dev/disk/8800 mount -t lustre /dev/disk/8800 /mnt/ost1 On s1 I format and mount OST0 mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib /dev/disk/b801 mount -t lustre /dev/disk/b801 /mnt/ost0 service heartbeat up and running on both nodes. Where have I add some parameters to have lustre up and running if s2 going down? Or where can I find some examples? How can s1 takeover MDS (/mnt/mdt) and OST1 (/mnt/ost1) that usually mounted on s2? Thanks, Katya
sheila.barthel at oracle.com
2010-Apr-20 15:29 UTC
[Lustre-discuss] Active-Active failover configuration
Section 8.3.2.2 (Configuring Heartbeat) includes a worked example to configure OST failover (active/active): http://wiki.lustre.org/manual/LustreManual18_HTML/Failover.html#50598002_pgfId-1295199 On 4/20/2010 6:42 AM, xgl at xgl.pereslavl.ru wrote:> Greetings! > > Sorry for troubling you by such question but I cannot find example in documentation and meet some problems. > > I want to use Active/Active failover configuration. (Where can I find some examples?) > > I have 2 nodes - s1 and s2 used as OSS''es > I also have 3 block devices > (MDS) - 1Tb used as MDS|MDT > (OST0) 8Tb used as OST1 > (OST1) 8Tb used as OST2 > All devices available from both OSS''s. > > OST0 mounted on s1 > MDS and OST1 mounted on s2 in normal state. > > How can I configure system such way, that if one os OSS''es (s2, as an example), fails out, second OSS (s1) take control of all resources? > > I have heartbeat installed and configured. > [root at s2 ~]# cat /etc/ha.d/haresources > s1 Filesystem::/dev/disk/b801::/mnt/ost0::lustre > s2 Filesystem::/dev/disk/b800::/mnt/mdt::lustre Filesystem::/dev/disk/8800::/mnt/ost1::lustre > > I configure system; > On s2 I format and mount MDT and OST1 > mkfs.lustre --reformat --fsname=lustre --mgs --mdt /dev/disk/by-id/b800 > mount -t lustre /dev/disk/b800 /mnt/mdt/ > mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib /dev/disk/8800 > mount -t lustre /dev/disk/8800 /mnt/ost1 > > On s1 I format and mount OST0 > mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib /dev/disk/b801 > mount -t lustre /dev/disk/b801 /mnt/ost0 > > service heartbeat up and running on both nodes. > > Where have I add some parameters to have lustre up and running if s2 going down? Or where can I find some examples? > How can s1 takeover MDS (/mnt/mdt) and OST1 (/mnt/ost1) that usually mounted on s2? > > Thanks, > Katya > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Oracle <http://www.oracle.com> Sheila Barthel | Documentation Lead Phone: +1 3035622468 <tel:+1%203035622468> Oracle Lustre Group Green Oracle <http://www.oracle.com/commitment> Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100420/b2ba7339/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: oracle_sig_logo.gif Type: image/gif Size: 658 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100420/b2ba7339/attachment.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: green-for-email-sig_0.gif Type: image/gif Size: 356 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100420/b2ba7339/attachment-0001.gif
xgl at xgl.pereslavl.ru
2010-Apr-21 10:36 UTC
[Lustre-discuss] Active-Active failover configuration
Thank you for your answer! Unfortunately, I have read this manual and meet some problems. I''ve configured heartbeat, have defined resources controlled by Heartbeat and haven''t found any error in HA logs. I''ve got 3 shared resources controlled by heartbeat, 2 OSTs and 1 MDS (described in previous message). When I use "hb_takeover all" utilite on OSS1 (s1 in previous message) to takeover control over OSS2 resources - OST1 and MDS (OST1 and MDT mounted on OSS2 (s2) in standard configuration)) it takes control and I saw all resources mounted on one OSS1; But I cannot use lustre filesystem, can''t mount it on a client. On active OSS1 I can see in dmesg that all OSTs try to connect to mds using old (standard) address of OST2; but MDS moved to OSS1. What have I missing? May be I have to specify some keys when formatting MDSs/OSTs to let them work correcly in case of switching resources to another OSS node? How to do it? __________ Thanks, Katya>Section 8.3.2.2 (Configuring Heartbeat) includes a worked example to configure OST failover (active/active):>http://wiki.lustre.org/manual/LustreManual18_HTML/Failover.html#50598002_pgfId-1295199>On 4/20/2010 6:42 AM, xgl at xgl.pereslavl.ru wrote: >> Greetings! >> >> Sorry for troubling you by such question but I cannot find example in documentation and meet some problems. >> >> I want to use Active/Active failover configuration. (Where can I find some examples?) >> >> I have 2 nodes - s1 and s2 used as OSS''es >> I also have 3 block devices >> (MDS) - 1Tb used as MDS|MDT >> (OST0) 8Tb used as OST1 >> (OST1) 8Tb used as OST2 >> All devices available from both OSS''s. >> >> OST0 mounted on s1 >> MDS and OST1 mounted on s2 in normal state. >> >> How can I configure system such way, that if one os OSS''es (s2, as an example), fails out, second OSS (s1) take control of all resources? >> >> I have heartbeat installed and configured. >> [root at s2 ~]# cat /etc/ha.d/haresources >> s1 Filesystem::/dev/disk/b801::/mnt/ost0::lustre >> s2 Filesystem::/dev/disk/b800::/mnt/mdt::lustre Filesystem::/dev/disk/8800::/mnt/ost1::lustre >> >> I configure system; >> On s2 I format and mount MDT and OST1 >> mkfs.lustre --reformat --fsname=lustre --mgs --mdt /dev/disk/by-id/b800 >> mount -t lustre /dev/disk/b800 /mnt/mdt/ >> mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib /dev/disk/8800 >> mount -t lustre /dev/disk/8800 /mnt/ost1 >> >> On s1 I format and mount OST0 >> mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib /dev/disk/b801 >> mount -t lustre /dev/disk/b801 /mnt/ost0 >> >> service heartbeat up and running on both nodes. >> >> Where have I add some parameters to have lustre up and running if s2 going down? Or where can I find some >examples? >> How can s1 takeover MDS (/mnt/mdt) and OST1 (/mnt/ost1) that usually mounted on s2? >> >> Thanks, >> Katya >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>
Katya Tutlyaeva
2010-Apr-22 03:56 UTC
[Lustre-discuss] Active-Active failover configuration
Thank you, I have found the answers. ___________ Thanks, Katya
Hi all, I''m trying to test Lustre using loadgen, but got sergmentation fault error: * *I have succesfully added obdecho.ko on both OSS previously [lustre]# loadgen loadgen> dev lustre-OST0000-osc 192.168.11.12 at o2ib Added uuid OSS_UUID: 192.168.11.12 at o2ib Target OST name is ''lustre-OST0000-osc'' loadgen> st 3 start 0 to 3 loadgen: running thread #1 Segmentation fault Meet same error on both OSS-es and client using any number of clients. What''s wrong? _____________ Thanks, Katya
On 2010-04-22, at 05:23, Katya Tutlyaeva wrote:> I''m trying to test Lustre using loadgen, but got sergmentation fault error: > * > *I have succesfully added obdecho.ko on both OSS previously > > [lustre]# loadgen > loadgen> dev lustre-OST0000-osc > 192.168.11.12 at o2ib > Added uuid OSS_UUID: 192.168.11.12 at o2ib > Target OST name is ''lustre-OST0000-osc'' > loadgen> st 3 > start 0 to 3 > loadgen: running thread #1 > Segmentation fault > > > Meet same error on both OSS-es and client using any number of clients.I believe there is a fix for loadgen in bugzilla. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.