Hi, I have one server for MGS/MDS function and 4 server for OSS. All machines are identical. MDS is connected to back and storage that is serving two data LUN''s. OSS''s are connected to back end storage that is serving 24 data LUN''s. each server has two network interface configured as follows. OSS1(hostname=storage07) 10.143.245.7 at tcp0 OSS1(hostname=storage07) 10.142.10.7 at tcp1 OSS1(hostname=storage08) 10.143.245.8 at tcp0 OSS1(hostname=storage08) 10.142.10.8 at tcp1 OSS1(hostname=storage09) 10.143.245.9 at tcp0 OSS1(hostname=storage09) 10.142.10.9 at tcp1 OSS1(hostname=storage10) 10.143.245.9 at tcp0 OSS1(hostname=storage10) 10.142.10.9 at tcp1 MDS1(hostname=mds01) 10.143.245.201 at tcp0 MDS1(hostname=mds01) 10.142.10.201 at tcp1 MDS2(hostname=mds02) 10.143.245.202 at tcp0 MDS2(hostname=mds02) 10.142.10.202 at tcp1 tcp0 is 10GbE tcp1 is 1GbE I would like to configure lustre in such a way that if tcp0 interface will fail on the OSS or MDS, lustre will be able to use secondary network to keep communication alive and at least some of the clients could. Primary network should be 10GbE and secondary network 1GbE I have prepared following makefs.lustre lines: storage07 mkfs.lustre --reformat --fsname=ddn-home -- failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-0 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-1 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-2 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-3 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-4 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.8 at tcp0:10.142.10.8 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-5 storage08 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-6 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-7 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-8 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-9 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-10 mkfs.lustre --reformat --fsname=ddn-home -- failnode=10.143.245.7 at tcp0:10.142.10.7 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-11 storage09 mkfs.lustre --reformat --fsname=ddn-home -- failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-12 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-13 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-14 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-15 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-16 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.10 at tcp0:10.142.10.10 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-17 storage10 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-18 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-19 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-20 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-21 mkfs.lustre --reformat --fsname=ddn-data -- failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-22 mkfs.lustre --reformat --fsname=ddn-home -- failnode=10.143.245.9 at tcp0:10.142.10.9 at tcp1 --ost -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- mgsnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-23 MDS mkfs.lustre --reformat --fsname=ddn-home --mgs --mdt -- failnode=10.143.245.202 at tcp0:10.142.10.202 at tcp1 /dev/dm-0 mkfs.lustre --reformat --fsname=ddn-data --mdt -- mgsnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 -- failnode=10.143.245.201 at tcp0:10.142.10.201 at tcp1 /dev/dm-1 Will this work as I suppose? Thanks Wojciech Turek -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071001/7a1e1bb6/attachment-0002.html
On Oct 01, 2007 12:30 +0100, Wojciech Turek wrote:> I have one server for MGS/MDS function and 4 server for OSS. All > machines are identical. MDS is connected to back and storage that is > serving two data LUN''s. OSS''s are connected to back end storage that > is serving 24 data LUN''s. each server has two network interface > configured as follows. > OSS1(hostname=storage07) 10.143.245.7 at tcp0 > OSS1(hostname=storage07) 10.142.10.7 at tcp1 > > tcp0 is 10GbE > tcp1 is 1GbE > > > I would like to configure lustre in such a way that if tcp0 interface > will fail on the OSS or MDS, lustre will be able to use secondary > network to keep communication alive and at least some of the clients > could. Primary network should be 10GbE and secondary network 1GbEThis will work as you want if tcp0 is listed first in modprobe.conf. LNET will only use tcp0 unless that fails, at which point it will use tcp1. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
We have a 4-way SMP server (dual opteron 275s) configured as a combined MGS/MDS and OSS server as thus... /dev/sda 205G 1.8G 192G 1% /lustre/mri/mdt0 /dev/f2c0l0/lv-f2c0l0 3.4T 2.3T 1.2T 67% /lustre/mri/ost0 /dev/f2c0l1/lv-f2c0l1 3.4T 3.0T 460G 87% /lustre/mri/ost1 /dev/f2c1l0/lv-f2c1l0 3.4T 3.0T 399G 89% /lustre/mri/ost2 /dev/f2c1l1/lv-f2c1l1 3.4T 3.0T 418G 88% /lustre/mri/ost3 /dev/f3c0l0/lv-f3c0l0 3.4T 3.0T 430G 88% /lustre/mri/ost4 /dev/f3c0l1/lv-f3c0l1 3.4T 3.0T 431G 88% /lustre/mri/ost5 /dev/f3c1l0/lv-f3c1l0 3.4T 3.0T 378G 90% /lustre/mri/ost6 /dev/f3c1l1/lv-f3c1l1 3.4T 3.0T 417G 88% /lustre/mri/ost7 Under heavy load our server has gone down several times (we think due to bug 13438). Although we have successfully run e2fsck locally on the MDS and each OSS AND run lfsck according to the documentation, we still seem to be missing about 9TB of our storage. That is to say that "du -s -h *" finds about 14TB but "df -h" says that the file system is practically full. [root at submit mri]# df -h . Filesystem Size Used Avail Use% Mounted on /mri/scratch 27T 23T 4.0T 86% /scratch/mri Fortunately, we are in a position to wipe it out and reinitialize the FS but still, this is a bit disconcerting. Also, we''ve incorporated the patch suggested in bug report 13438 into our source and rebuilt but we don''t yet know if this will resolve the crashes. Anyone else having stability and corruption issues with 1.6.2 on CentOS 4.5 (2.6.9-55.ELsmp)? with the tcp and o2ib (OFED 1.2) lnet modules? I suppose the next thing to try (if the patch does not work) would be to upgrade to the CentOS 4.5 update corresponding to the RPMs (2.6.9-55.0.2) but since we had no problems building from source against our patched kernel, I''m skeptical about that making much difference. Thanks, Charlie Taylor UF HPC Center On Oct 3, 2007, at 6:17 PM, Andreas Dilger wrote:> On Oct 01, 2007 12:30 +0100, Wojciech Turek wrote: >> I have one server for MGS/MDS function and 4 server for OSS. All >> machines are identical. MDS is connected to back and storage that is >> serving two data LUN''s. OSS''s are connected to back end storage that >> is serving 24 data LUN''s. each server has two network interface >> configured as follows. >> OSS1(hostname=storage07) 10.143.245.7 at tcp0 >> OSS1(hostname=storage07) 10.142.10.7 at tcp1 >> >> tcp0 is 10GbE >> tcp1 is 1GbE >> >> >> I would like to configure lustre in such a way that if tcp0 interface >> will fail on the OSS or MDS, lustre will be able to use secondary >> network to keep communication alive and at least some of the clients >> could. Primary network should be 10GbE and secondary network 1GbE > > This will work as you want if tcp0 is listed first in modprobe.conf. > LNET will only use tcp0 unless that fails, at which point it will use > tcp1. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
On Oct 03, 2007 21:54 -0400, Charles Taylor wrote:> We have a 4-way SMP server (dual opteron 275s) configured as a > combined MGS/MDS and OSS server as thus... > > /dev/sda 205G 1.8G 192G 1% /lustre/mri/mdt0 > /dev/f2c0l0/lv-f2c0l0 > 3.4T 2.3T 1.2T 67% /lustre/mri/ost0 > /dev/f2c0l1/lv-f2c0l1 > 3.4T 3.0T 460G 87% /lustre/mri/ost1 > /dev/f2c1l0/lv-f2c1l0 > 3.4T 3.0T 399G 89% /lustre/mri/ost2 > /dev/f2c1l1/lv-f2c1l1 > 3.4T 3.0T 418G 88% /lustre/mri/ost3 > /dev/f3c0l0/lv-f3c0l0 > 3.4T 3.0T 430G 88% /lustre/mri/ost4 > /dev/f3c0l1/lv-f3c0l1 > 3.4T 3.0T 431G 88% /lustre/mri/ost5 > /dev/f3c1l0/lv-f3c1l0 > 3.4T 3.0T 378G 90% /lustre/mri/ost6 > /dev/f3c1l1/lv-f3c1l1 > 3.4T 3.0T 417G 88% /lustre/mri/ost7 > > > Under heavy load our server has gone down several times (we think due > to bug 13438). Although we have successfully run e2fsck locally on > the MDS and each OSS AND run lfsck according to the documentation, > we still seem to be missing about 9TB of our storage. That is to say > that "du -s -h *" finds about 14TB but "df -h" says that the file > system is practically full.Presumably this is still true after stopping the clients and servers, and restarting? In some cases file space can be used if e.g. you have open-unlinked files being held by some clients. Also, files can be held by the MDS from crashed clients in case they return after recovery, and that may not be reclaimed in some cases until after an MDS or OST restart. The other issues to be aware of are described in the KB articles: https://bugzilla.lustre.org/show_bug.cgi?id=2381 https://bugzilla.lustre.org/show_bug.cgi?id=2378 However, that still doesn''t explain where 7.5TB of space went. Presumably lfsck didn''t report any space leakage? In case you weren''t aware, lfsck doesn''t do any action by default, and you need to ask it to delete or link orphan objects on the OSTs. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.