thr3ads.net - Lustre discuss - [Lustre-discuss] formating OST and MDT [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Wojciech Turek

2007-Oct-01 11:30 UTC

[Lustre-discuss] formating OST and MDT

Hi,

I have one server for MGS/MDS function and 4 server for OSS. All  
machines are identical. MDS is connected to back and storage that is  
serving two data LUN''s. OSS''s are connected to back end
storage that
is serving 24 data LUN''s. each server has two network interface  
configured as follows.
OSS1(hostname=storage07) 10.143.245.7 at tcp0
OSS1(hostname=storage07) 10.142.10.7 at tcp1

OSS1(hostname=storage08) 10.143.245.8 at tcp0
OSS1(hostname=storage08) 10.142.10.8 at tcp1

OSS1(hostname=storage09) 10.143.245.9 at tcp0
OSS1(hostname=storage09) 10.142.10.9 at tcp1

OSS1(hostname=storage10) 10.143.245.9 at tcp0
OSS1(hostname=storage10) 10.142.10.9 at tcp1

MDS1(hostname=mds01) 10.143.245.201 at tcp0
MDS1(hostname=mds01) 10.142.10.201 at tcp1

MDS2(hostname=mds02) 10.143.245.202 at tcp0
MDS2(hostname=mds02) 10.142.10.202 at tcp1

tcp0 is 10GbE
tcp1 is 1GbE


I would like to configure lustre in such a way that if tcp0 interface  
will fail on the OSS or MDS, lustre will be able to use secondary  
network to keep communication alive and at least some of the clients  
could. Primary network should be 10GbE and secondary network 1GbE

I have prepared following makefs.lustre lines:

storage07
mkfs.lustre --reformat --fsname=ddn-home -- 
failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-0
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-1
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-2
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-3
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-4
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-5


storage08
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-6
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-7
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-8
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-9
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-10
mkfs.lustre --reformat --fsname=ddn-home -- 
failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-11


storage09
mkfs.lustre --reformat --fsname=ddn-home -- 
failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-12
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-13
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-14
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-15
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-16
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-17

storage10
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-18
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-19
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-20
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-21
mkfs.lustre --reformat --fsname=ddn-data -- 
failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-22
mkfs.lustre --reformat --fsname=ddn-home -- 
failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-23


MDS
mkfs.lustre --reformat --fsname=ddn-home --mgs --mdt -- 
failnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-0
mkfs.lustre --reformat --fsname=ddn-data --mdt -- 
mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- 
failnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 /dev/dm-1

Will this work as I suppose?

Thanks

Wojciech Turek

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071001/7a1e1bb6/attachment-0002.html

Andreas Dilger

2007-Oct-03 22:17 UTC

head link

[Lustre-discuss] formating OST and MDT

On Oct 01, 2007  12:30 +0100, Wojciech Turek wrote:> I have one server for MGS/MDS function and 4 server for OSS. All  
> machines are identical. MDS is connected to back and storage that is  
> serving two data LUN''s. OSS''s are connected to back end
storage that
> is serving 24 data LUN''s. each server has two network interface  
> configured as follows.
> OSS1(hostname=storage07) 10.143.245.7 at tcp0
> OSS1(hostname=storage07) 10.142.10.7 at tcp1
> 
> tcp0 is 10GbE
> tcp1 is 1GbE
> 
> 
> I would like to configure lustre in such a way that if tcp0 interface  
> will fail on the OSS or MDS, lustre will be able to use secondary  
> network to keep communication alive and at least some of the clients  
> could. Primary network should be 10GbE and secondary network 1GbE
This will work as you want if tcp0 is listed first in modprobe.conf.
LNET will only use tcp0 unless that fails, at which point it will use
tcp1.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Charles Taylor

2007-Oct-04 01:54 UTC

head link

[Lustre-discuss] Lustre FS Corruption

We have a 4-way SMP server (dual opteron 275s) configured as a  
combined MGS/MDS and OSS server as thus...

/dev/sda              205G  1.8G  192G   1% /lustre/mri/mdt0
/dev/f2c0l0/lv-f2c0l0
                       3.4T  2.3T  1.2T  67% /lustre/mri/ost0
/dev/f2c0l1/lv-f2c0l1
                       3.4T  3.0T  460G  87% /lustre/mri/ost1
/dev/f2c1l0/lv-f2c1l0
                       3.4T  3.0T  399G  89% /lustre/mri/ost2
/dev/f2c1l1/lv-f2c1l1
                       3.4T  3.0T  418G  88% /lustre/mri/ost3
/dev/f3c0l0/lv-f3c0l0
                       3.4T  3.0T  430G  88% /lustre/mri/ost4
/dev/f3c0l1/lv-f3c0l1
                       3.4T  3.0T  431G  88% /lustre/mri/ost5
/dev/f3c1l0/lv-f3c1l0
                       3.4T  3.0T  378G  90% /lustre/mri/ost6
/dev/f3c1l1/lv-f3c1l1
                       3.4T  3.0T  417G  88% /lustre/mri/ost7

Under heavy load our server has gone down several times (we think due  
to bug 13438).   Although we have successfully run e2fsck locally on  
the MDS and each OSS  AND run lfsck according to the documentation,  
we still seem to be missing about 9TB of our storage.  That is to say  
that "du -s -h *" finds about 14TB but "df -h" says that the
file
system is practically full.

[root at submit mri]# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/mri/scratch           27T   23T  4.0T  86% /scratch/mri

Fortunately, we are in a position to wipe it out and reinitialize the  
FS but still, this is a bit disconcerting.   Also, we''ve incorporated  
the patch suggested in bug report 13438 into our source and rebuilt  
but we don''t yet know if this will resolve the crashes.   Anyone else  
having stability and corruption issues with 1.6.2 on CentOS 4.5  
(2.6.9-55.ELsmp)? with the tcp and o2ib (OFED 1.2) lnet modules?

I suppose the next thing to try (if the patch does not work) would be  
to upgrade to the CentOS 4.5 update corresponding to the RPMs  
(2.6.9-55.0.2) but since we had no problems building from source  
against our patched kernel, I''m skeptical about that making much  
difference.

Thanks,

Charlie Taylor
UF HPC Center

On Oct 3, 2007, at 6:17 PM, Andreas Dilger wrote:
> On Oct 01, 2007  12:30 +0100, Wojciech Turek wrote:
>> I have one server for MGS/MDS function and 4 server for OSS. All
>> machines are identical. MDS is connected to back and storage that is
>> serving two data LUN''s. OSS''s are connected to back
end storage that
>> is serving 24 data LUN''s. each server has two network
interface
>> configured as follows.
>> OSS1(hostname=storage07) 10.143.245.7 at tcp0
>> OSS1(hostname=storage07) 10.142.10.7 at tcp1
>>
>> tcp0 is 10GbE
>> tcp1 is 1GbE
>>
>>
>> I would like to configure lustre in such a way that if tcp0 interface
>> will fail on the OSS or MDS, lustre will be able to use secondary
>> network to keep communication alive and at least some of the clients
>> could. Primary network should be 10GbE and secondary network 1GbE
>
> This will work as you want if tcp0 is listed first in modprobe.conf.
> LNET will only use tcp0 unless that fails, at which point it will use
> tcp1.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Andreas Dilger

2007-Oct-05 21:56 UTC

head link

[Lustre-discuss] Lustre FS Corruption

On Oct 03, 2007  21:54 -0400, Charles Taylor wrote:> We have a 4-way SMP server (dual opteron 275s) configured as a  
> combined MGS/MDS and OSS server as thus...
> 
> /dev/sda              205G  1.8G  192G   1% /lustre/mri/mdt0
> /dev/f2c0l0/lv-f2c0l0
>                        3.4T  2.3T  1.2T  67% /lustre/mri/ost0
> /dev/f2c0l1/lv-f2c0l1
>                        3.4T  3.0T  460G  87% /lustre/mri/ost1
> /dev/f2c1l0/lv-f2c1l0
>                        3.4T  3.0T  399G  89% /lustre/mri/ost2
> /dev/f2c1l1/lv-f2c1l1
>                        3.4T  3.0T  418G  88% /lustre/mri/ost3
> /dev/f3c0l0/lv-f3c0l0
>                        3.4T  3.0T  430G  88% /lustre/mri/ost4
> /dev/f3c0l1/lv-f3c0l1
>                        3.4T  3.0T  431G  88% /lustre/mri/ost5
> /dev/f3c1l0/lv-f3c1l0
>                        3.4T  3.0T  378G  90% /lustre/mri/ost6
> /dev/f3c1l1/lv-f3c1l1
>                        3.4T  3.0T  417G  88% /lustre/mri/ost7
> 
> 
> Under heavy load our server has gone down several times (we think due  
> to bug 13438).   Although we have successfully run e2fsck locally on  
> the MDS and each OSS  AND run lfsck according to the documentation,  
> we still seem to be missing about 9TB of our storage.  That is to say  
> that "du -s -h *" finds about 14TB but "df -h" says
that the file
> system is practically full.
Presumably this is still true after stopping the clients and servers,
and restarting?  In some cases file space can be used if e.g. you have
open-unlinked files being held by some clients.  Also, files can be
held by the MDS from crashed clients in case they return after recovery,
and that may not be reclaimed in some cases until after an MDS or OST
restart.

The other issues to be aware of are described in the KB articles:
https://bugzilla.lustre.org/show_bug.cgi?id=2381
https://bugzilla.lustre.org/show_bug.cgi?id=2378

However, that still doesn''t explain where 7.5TB of space went.

Presumably lfsck didn''t report any space leakage?  In case you
weren''t
aware, lfsck doesn''t do any action by default, and you need to ask it
to delete or link orphan objects on the OSTs.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Lustre discuss - Oct 2007 - formating OST and MDT

[Lustre-discuss] formating OST and MDT

[Lustre-discuss] formating OST and MDT

[Lustre-discuss] Lustre FS Corruption

[Lustre-discuss] Lustre FS Corruption