thr3ads.net - Lustre discuss - [Lustre-discuss] Active-Active failover configuration [Apr 2010]

If this information is useful, please help other people find it:
Share via:

xgl at xgl.pereslavl.ru

2010-Apr-20 12:42 UTC

[Lustre-discuss] Active-Active failover configuration

Greetings!

Sorry for troubling you by such question but I cannot find example in
documentation and meet some problems.

I want to use Active/Active failover configuration. (Where can I find some
examples?)

I have 2 nodes - s1 and s2 used as OSS''es
I also have 3 block devices
(MDS) - 1Tb used as MDS|MDT
(OST0) 8Tb used as OST1
(OST1) 8Tb used as OST2
All devices available from both OSS''s.

OST0 mounted on s1
MDS and OST1 mounted on s2 in normal state.

How can I configure system such way, that if one os OSS''es (s2, as an
example), fails out, second OSS (s1) take control of all resources?

I have heartbeat installed and configured.
[root at s2 ~]# cat /etc/ha.d/haresources
s1 Filesystem::/dev/disk/b801::/mnt/ost0::lustre
s2 Filesystem::/dev/disk/b800::/mnt/mdt::lustre
Filesystem::/dev/disk/8800::/mnt/ost1::lustre

I configure system;
On s2 I format and mount MDT and OST1
mkfs.lustre --reformat --fsname=lustre --mgs --mdt /dev/disk/by-id/b800   
mount -t lustre /dev/disk/b800 /mnt/mdt/
mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib
/dev/disk/8800
mount -t lustre /dev/disk/8800 /mnt/ost1

On s1 I format and mount OST0
mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at o2ib
/dev/disk/b801
mount -t lustre /dev/disk/b801 /mnt/ost0

service heartbeat up and running on both nodes.

Where have I add some parameters to have lustre up and running if s2 going down?
Or where can I find some examples?
How can s1 takeover MDS (/mnt/mdt) and OST1 (/mnt/ost1) that usually mounted on
s2?

Thanks,
Katya

sheila.barthel at oracle.com

2010-Apr-20 15:29 UTC

head link

[Lustre-discuss] Active-Active failover configuration

Section 8.3.2.2 (Configuring Heartbeat) includes a worked example to 
configure OST failover (active/active):

http://wiki.lustre.org/manual/LustreManual18_HTML/Failover.html#50598002_pgfId-1295199

On 4/20/2010 6:42 AM, xgl at xgl.pereslavl.ru wrote:> Greetings!
>
> Sorry for troubling you by such question but I cannot find example in
documentation and meet some problems.
>
> I want to use Active/Active failover configuration. (Where can I find some
examples?)
>
> I have 2 nodes - s1 and s2 used as OSS''es
> I also have 3 block devices
> (MDS) - 1Tb used as MDS|MDT
> (OST0) 8Tb used as OST1
> (OST1) 8Tb used as OST2
> All devices available from both OSS''s.
>
> OST0 mounted on s1
> MDS and OST1 mounted on s2 in normal state.
>
> How can I configure system such way, that if one os OSS''es (s2, as
an example), fails out, second OSS (s1) take control of all resources?
>
> I have heartbeat installed and configured.
> [root at s2 ~]# cat /etc/ha.d/haresources
> s1 Filesystem::/dev/disk/b801::/mnt/ost0::lustre
> s2 Filesystem::/dev/disk/b800::/mnt/mdt::lustre
Filesystem::/dev/disk/8800::/mnt/ost1::lustre
>
> I configure system;
> On s2 I format and mount MDT and OST1
> mkfs.lustre --reformat --fsname=lustre --mgs --mdt /dev/disk/by-id/b800   
> mount -t lustre /dev/disk/b800 /mnt/mdt/
> mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at
o2ib /dev/disk/8800
> mount -t lustre /dev/disk/8800 /mnt/ost1
>
> On s1 I format and mount OST0
> mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at
o2ib /dev/disk/b801
> mount -t lustre /dev/disk/b801 /mnt/ost0
>
> service heartbeat up and running on both nodes.
>
> Where have I add some parameters to have lustre up and running if s2 going
down? Or where can I find some examples?
> How can s1 takeover MDS (/mnt/mdt) and OST1 (/mnt/ost1) that usually
mounted on s2?
>
> Thanks,
> Katya
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   
-- 
Oracle <http://www.oracle.com>
Sheila Barthel | Documentation Lead
Phone: +1 3035622468 <tel:+1%203035622468>
Oracle Lustre Group

Green Oracle <http://www.oracle.com/commitment> Oracle is committed to 
developing practices and products that help protect the environment
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100420/b2ba7339/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracle_sig_logo.gif
Type: image/gif
Size: 658 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100420/b2ba7339/attachment.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: green-for-email-sig_0.gif
Type: image/gif
Size: 356 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100420/b2ba7339/attachment-0001.gif

xgl at xgl.pereslavl.ru

2010-Apr-21 10:36 UTC

head link

[Lustre-discuss] Active-Active failover configuration

Thank you for your answer!

Unfortunately, I have read this manual and meet some problems.

I''ve configured heartbeat, have defined resources controlled by
Heartbeat and haven''t found any error in HA logs.
I''ve got 3 shared resources controlled by heartbeat, 2 OSTs and 1 MDS
(described in previous message).

When I use "hb_takeover all" utilite on OSS1 (s1 in previous message)
to takeover control over OSS2 resources - OST1 and MDS (OST1 and MDT mounted on
OSS2 (s2) in standard configuration)) it takes control and I saw all resources
mounted on one OSS1; But I cannot use lustre filesystem, can''t mount it
on a client.

On active OSS1 I can see in dmesg that all OSTs try to connect to mds using old
(standard) address of OST2; but MDS moved to OSS1.

What have I missing?
May be I have to specify some keys when formatting MDSs/OSTs to let them work
correcly in case of switching resources to another OSS node? How to do it?

__________
Thanks,
Katya
>Section 8.3.2.2 (Configuring Heartbeat) includes a worked example to
configure OST failover (active/active):
>http://wiki.lustre.org/manual/LustreManual18_HTML/Failover.html#50598002_pgfId-1295199
>On 4/20/2010 6:42 AM, xgl at xgl.pereslavl.ru wrote:
>> Greetings!
>>
>> Sorry for troubling you by such question but I cannot find example in
documentation and meet some problems.
>>
>> I want to use Active/Active failover configuration. (Where can I find
some examples?)
>>
>> I have 2 nodes - s1 and s2 used as OSS''es
>> I also have 3 block devices
>> (MDS) - 1Tb used as MDS|MDT
>> (OST0) 8Tb used as OST1
>> (OST1) 8Tb used as OST2
>> All devices available from both OSS''s.
>>
>> OST0 mounted on s1
>> MDS and OST1 mounted on s2 in normal state.
>>
>> How can I configure system such way, that if one os OSS''es
(s2, as an example), fails out, second OSS (s1) take control of all resources?
>>
>> I have heartbeat installed and configured.
>> [root at s2 ~]# cat /etc/ha.d/haresources
>> s1 Filesystem::/dev/disk/b801::/mnt/ost0::lustre
>> s2 Filesystem::/dev/disk/b800::/mnt/mdt::lustre
Filesystem::/dev/disk/8800::/mnt/ost1::lustre
>>
>> I configure system;
>> On s2 I format and mount MDT and OST1
>> mkfs.lustre --reformat --fsname=lustre --mgs --mdt /dev/disk/by-id/b800
>> mount -t lustre /dev/disk/b800 /mnt/mdt/
>> mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at
o2ib /dev/disk/8800
>> mount -t lustre /dev/disk/8800 /mnt/ost1
>>
>> On s1 I format and mount OST0
>> mkfs.lustre --reformat --ost --fsname=lustre --mgsnode=192.168.11.12 at
o2ib /dev/disk/b801
>> mount -t lustre /dev/disk/b801 /mnt/ost0
>>
>> service heartbeat up and running on both nodes.
>>
>> Where have I add some parameters to have lustre up and running if s2
going down? Or where can I find some >examples?
>> How can s1 takeover MDS (/mnt/mdt) and OST1 (/mnt/ost1) that usually
mounted on s2?
>>
>> Thanks,
>> Katya
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>

Katya Tutlyaeva

2010-Apr-22 03:56 UTC

head link

[Lustre-discuss] Active-Active failover configuration

Thank you, I have found the answers.


___________
Thanks,
Katya

Katya Tutlyaeva

2010-Apr-22 11:23 UTC

head link

[Lustre-discuss] Lustre loadgen error

Hi all,
I''m trying to test Lustre using loadgen, but got sergmentation fault
error:
*
*I have succesfully added obdecho.ko on both OSS previously

[lustre]# loadgen
loadgen> dev lustre-OST0000-osc
192.168.11.12 at o2ib
Added uuid OSS_UUID: 192.168.11.12 at o2ib
Target OST name is ''lustre-OST0000-osc''
loadgen> st 3
start 0 to 3
loadgen: running thread #1
Segmentation fault


Meet same error on both OSS-es and client using any number of clients.

What''s wrong?

_____________
Thanks,
Katya

Andreas Dilger

2010-Apr-22 19:43 UTC

head link

[Lustre-discuss] Lustre loadgen error

On 2010-04-22, at 05:23, Katya Tutlyaeva wrote:> I''m trying to test Lustre using loadgen, but got sergmentation
fault error:
> *
> *I have succesfully added obdecho.ko on both OSS previously
> 
> [lustre]# loadgen
> loadgen> dev lustre-OST0000-osc
> 192.168.11.12 at o2ib
> Added uuid OSS_UUID: 192.168.11.12 at o2ib
> Target OST name is ''lustre-OST0000-osc''
> loadgen> st 3
> start 0 to 3
> loadgen: running thread #1
> Segmentation fault
> 
> 
> Meet same error on both OSS-es and client using any number of clients.
I believe there is a fix for loadgen in bugzilla.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Lustre discuss - Apr 2010 - Active-Active failover configuration

[Lustre-discuss] Active-Active failover configuration

[Lustre-discuss] Active-Active failover configuration

[Lustre-discuss] Active-Active failover configuration

[Lustre-discuss] Active-Active failover configuration

[Lustre-discuss] Lustre loadgen error

[Lustre-discuss] Lustre loadgen error