thr3ads.net - Lustre discuss - [Lustre-discuss] about ost failover in lustre 1.6-beta7 [Feb 2007]

If this information is useful, please help other people find it:
Share via:

aries

2007-Feb-07 20:36 UTC

[Lustre-discuss] about ost failover in lustre 1.6-beta7

The lustre topology is simple , each ost has the same size of 1G

                       /-  ost1(sdb1)    
client -  mds(mgs,mdt)  -  ost2(sdb1)
                       \-  ost3(sdb1)


Output of "lctl dl" on mds
[root@mds ~]# lctl dl
  1 UP mgc MGC192.168.1.200@tcp c603fed3-3f71-89a3-ab80-84563a5190f5 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov pogo-mdtlov pogo-mdtlov_UUID 4
  4 UP mds pogo-MDT0000 pogo-MDT0000_UUID 5
  5 UP osc pogo-OST0000-osc pogo-mdtlov_UUID 5
  6 UP osc pogo-OST0001-osc pogo-mdtlov_UUID 5
  7 UP osc pogo-OST0002-osc pogo-mdtlov_UUID 5


Output of "df -h /test" on client 
[root@client ~]# df -h /test
Filesystem            Size  Used Avail Use% Mounted on
192.168.1.200@tcp:/pogo 2.8G  153M  2.5G   6% /test

Now I add a failover pair of sdb2(ost1) and sdb1(ost2).
ost1: mkfs.lustre --fsname=pogo --ost --failnode=ost2 --mgsnode=mds@tcp0 
/dev/sdb2

after i add the failover pair
Output of "lctl dl" on mds:
[root@mds ~]# lctl dl
  1 UP mgc MGC192.168.1.200@tcp c603fed3-3f71-89a3-ab80-84563a5190f5 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov pogo-mdtlov pogo-mdtlov_UUID 4
  4 UP mds pogo-MDT0000 pogo-MDT0000_UUID 5
  5 UP osc pogo-OST0000-osc pogo-mdtlov_UUID 5
  6 UP osc pogo-OST0001-osc pogo-mdtlov_UUID 5
  7 UP osc pogo-OST0002-osc pogo-mdtlov_UUID 5
  8 UP osc pogo-OST0003-osc pogo-mdtlov_UUID 5

Output of "df -h /test" on client:
[root@client ~]# df -h /test
Filesystem            Size  Used Avail Use% Mounted on
192.168.1.200@tcp:/pogo 3.7G  170M  3.3G   5% /test

So the question: Does it count 1 ost size of a failover pair into lov or 
2 ost size?

Then I umount sdb1 on ost2 ,the /test direcotry on client kust hanged.  
The following message appear on ost2
Lustre: 3417:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 
12345-192.168.1.210@tcp portal 7 match 667 offset 0 length 128: 2
Lustre: 3416:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 
12345-192.168.1.200@tcp portal 28 match 670 offset 0 length 128: 2


I just follow the doc of mountconf. Plz let me know which step is wrong 
, thanks in advance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070207/bca3995b/attachment.html

Nathaniel Rutman

2007-Feb-08 15:50 UTC

head link

[Lustre-discuss] about ost failover in lustre 1.6-beta7

aries wrote:> The lustre topology is simple , each ost has the same size of 1G
>
>                        /-  ost1(sdb1)    
> client -  mds(mgs,mdt)  -  ost2(sdb1)
>                        \-  ost3(sdb1)
>
>
> Output of "lctl dl" on mds
> [root@mds ~]# lctl dl
>   1 UP mgc MGC192.168.1.200@tcp c603fed3-3f71-89a3-ab80-84563a5190f5 5
>   2 UP mdt MDS MDS_uuid 3
>   3 UP lov pogo-mdtlov pogo-mdtlov_UUID 4
>   4 UP mds pogo-MDT0000 pogo-MDT0000_UUID 5
>   5 UP osc pogo-OST0000-osc pogo-mdtlov_UUID 5
>   6 UP osc pogo-OST0001-osc pogo-mdtlov_UUID 5
>   7 UP osc pogo-OST0002-osc pogo-mdtlov_UUID 5
>
>
> Output of "df -h /test" on client 
> [root@client ~]# df -h /test
> Filesystem            Size  Used Avail Use% Mounted on
> 192.168.1.200@tcp:/pogo 2.8G  153M  2.5G   6% /test
>
> Now I add a failover pair of sdb2(ost1) and sdb1(ost2).
> ost1: mkfs.lustre --fsname=pogo --ost --failnode=ost2 
> --mgsnode=mds@tcp0 /dev/sdb2
>
> after i add the failover pair
> Output of "lctl dl" on mds:
> [root@mds ~]# lctl dl
>   1 UP mgc MGC192.168.1.200@tcp c603fed3-3f71-89a3-ab80-84563a5190f5 5
>   2 UP mdt MDS MDS_uuid 3
>   3 UP lov pogo-mdtlov pogo-mdtlov_UUID 4
>   4 UP mds pogo-MDT0000 pogo-MDT0000_UUID 5
>   5 UP osc pogo-OST0000-osc pogo-mdtlov_UUID 5
>   6 UP osc pogo-OST0001-osc pogo-mdtlov_UUID 5
>   7 UP osc pogo-OST0002-osc pogo-mdtlov_UUID 5
>   8 UP osc pogo-OST0003-osc pogo-mdtlov_UUID 5
>
> Output of "df -h /test" on client:
> [root@client ~]# df -h /test
> Filesystem            Size  Used Avail Use% Mounted on
> 192.168.1.200@tcp:/pogo 3.7G  170M  3.3G   5% /test
>
> So the question: Does it count 1 ost size of a failover pair into lov 
> or 2 ost size?A failover pair is a pair of separate nodes that both have access to a 
shared disk.  In your case,  sdb2 on ost1 and sdb1 on ost2 need to map 
to the same physical device.  So a failover pair counts as 1 ost in all 
cases; it just happens to have a failover address to be able to access 
the data.

> Then I umount sdb1 on ost2 ,the /test direcotry on client kust hanged.You must mount it then on ost1.  Note that you should not have it 
mounted on both ost1 and ost2 at the same time.
>   The following message appear on ost2
> Lustre: 3417:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 
> 12345-192.168.1.210@tcp portal 7 match 667 offset 0 length 128: 2
> Lustre: 3416:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 
> 12345-192.168.1.200@tcp portal 28 match 670 offset 0 length 128: 2
>This just says that 1.210 and 1.200 are still looking for something on 
ost2 that is not handled by anything (i.e. OST0003, which you
unmounted.)>
> I just follow the doc of mountconf. Plz let me know which step is 
> wrong , thanks in advance.
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

aries

2007-Feb-08 18:47 UTC

head link

[Lustre-discuss] about ost failover in lustre 1.6-beta7

Nathaniel Rutman wrote:> aries wrote:
>> The lustre topology is simple , each ost has the same size of 1G
>>
>>                        /-  ost1(sdb1)    client -  mds(mgs,mdt)  -  
>> ost2(sdb1)
>>                        \-  ost3(sdb1)
>>
>>
>> Output of "lctl dl" on mds
>> [root@mds ~]# lctl dl
>>   1 UP mgc MGC192.168.1.200@tcp c603fed3-3f71-89a3-ab80-84563a5190f5 5
>>   2 UP mdt MDS MDS_uuid 3
>>   3 UP lov pogo-mdtlov pogo-mdtlov_UUID 4
>>   4 UP mds pogo-MDT0000 pogo-MDT0000_UUID 5
>>   5 UP osc pogo-OST0000-osc pogo-mdtlov_UUID 5
>>   6 UP osc pogo-OST0001-osc pogo-mdtlov_UUID 5
>>   7 UP osc pogo-OST0002-osc pogo-mdtlov_UUID 5
>>
>>
>> Output of "df -h /test" on client [root@client ~]# df -h
/test
>> Filesystem            Size  Used Avail Use% Mounted on
>> 192.168.1.200@tcp:/pogo 2.8G  153M  2.5G   6% /test
>>
>> Now I add a failover pair of sdb2(ost1) and sdb1(ost2).
>> ost1: mkfs.lustre --fsname=pogo --ost --failnode=ost2 
>> --mgsnode=mds@tcp0 /dev/sdb2
>>
>> after i add the failover pair
>> Output of "lctl dl" on mds:
>> [root@mds ~]# lctl dl
>>   1 UP mgc MGC192.168.1.200@tcp c603fed3-3f71-89a3-ab80-84563a5190f5 5
>>   2 UP mdt MDS MDS_uuid 3
>>   3 UP lov pogo-mdtlov pogo-mdtlov_UUID 4
>>   4 UP mds pogo-MDT0000 pogo-MDT0000_UUID 5
>>   5 UP osc pogo-OST0000-osc pogo-mdtlov_UUID 5
>>   6 UP osc pogo-OST0001-osc pogo-mdtlov_UUID 5
>>   7 UP osc pogo-OST0002-osc pogo-mdtlov_UUID 5
>>   8 UP osc pogo-OST0003-osc pogo-mdtlov_UUID 5
>>
>> Output of "df -h /test" on client:
>> [root@client ~]# df -h /test
>> Filesystem            Size  Used Avail Use% Mounted on
>> 192.168.1.200@tcp:/pogo 3.7G  170M  3.3G   5% /test
>>
>> So the question: Does it count 1 ost size of a failover pair into lov 
>> or 2 ost size?
> A failover pair is a pair of separate nodes that both have access to a 
> shared disk.  In your case,  sdb2 on ost1 and sdb1 on ost2 need to map 
> to the same physical device.  So a failover pair counts as 1 ost in 
> all cases; it just happens to have a failover address to be able to 
> access the data.So in my opnion there is only oss failover(just like multipath of san) 
but not ost failover(like raid 1) in lustre, am i right?>
>
>> Then I umount sdb1 on ost2 ,the /test direcotry on client kust hanged.
> You must mount it then on ost1.  Note that you should not have it 
> mounted on both ost1 and ost2 at the same time.Does it mean i can''t mount the failover pair at the same time
?>
>>   The following message appear on ost2
>> Lustre: 3417:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 
>> 12345-192.168.1.210@tcp portal 7 match 667 offset 0 length 128: 2
>> Lustre: 3416:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 
>> 12345-192.168.1.200@tcp portal 28 match 670 offset 0 length 128: 2
>>
> This just says that 1.210 and 1.200 are still looking for something on 
> ost2 that is not handled by anything (i.e. OST0003, which you unmounted.)
>>
>> I just follow the doc of mountconf. Plz let me know which step is 
>> wrong , thanks in advance.
>>
------------------------------------------------------------------------
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss@clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>   
>
>

Lustre discuss - Feb 2007 - about ost failover in lustre 1.6-beta7

[Lustre-discuss] about ost failover in lustre 1.6-beta7

[Lustre-discuss] about ost failover in lustre 1.6-beta7

[Lustre-discuss] about ost failover in lustre 1.6-beta7