Joan J. Piles
2010-Sep-08 15:16 UTC
[Lustre-discuss] Lustre requirements and tuning tricks
Hello all! We are planning an upgrade to our current storage infrastructure, and we intend to deploy lustre to serve some 150 clients. We intend to use 5 OSS with the following configuration: - 2 x Intel 5520 (quad core) processor (or equivalent). - 24Gb RAM. - 20 x 2Tb SAS2 (7,200 rpm) disks. - 1 x 16 and 1x8 ports Adaptec RAID controllers, with 20 SAS and 4 SSD drives. - 2 x10Gb Ethernet ports. And then 2 MDS like these: - 2 x Intel 5520 (quad core) processor (or equivalent). - 36Gb RAM. - 2 x 64Gb SSD disks. - 2 x10Gb Ethernet ports. After having read the documentation, it seems to be a sensible configuration, specially regarding the OSS. However we are not so sure about the MDS. We have seen recommendations to reserve 5% of the total file system space in the MDS. Is this true and then we should go for 2x2Tb SAS disks for the MDS? Is SSD really worth there? And we have also read about having a separate storage for the OSTs'' journals. Is it really useful to get a pair of extra small (16Gb) SSD disks for each OST to keep the journals and bitmaps? Finally, we have also read that it''s important to have different OSTs in different physical drives to avoid bottlenecks. Is thas so if we make a big RAID volume and then several logical volumes (done with the hardware raid card, the operating system would just see different block devices)? Thank you very much in advance, Joan -- -------------------------------------------------------------------------- Joan Josep Piles Contreras - Analista de sistemas I3A - Instituto de Investigaci?n en Ingenier?a de Arag?n Tel: 976 76 10 00 (ext. 5454) http://i3a.unizar.es -- jpiles at unizar.es --------------------------------------------------------------------------
Joan J. Piles wrote:> And then 2 MDS like these: > > - 2 x Intel 5520 (quad core) processor (or equivalent). > - 36Gb RAM. > - 2 x 64Gb SSD disks. > - 2 x10Gb Ethernet ports.Hmmm ....> > After having read the documentation, it seems to be a sensible > configuration, specially regarding the OSS. However we are not so sure > about the MDS. We have seen recommendations to reserve 5% of the total > file system space in the MDS. Is this true and then we should go for > 2x2Tb SAS disks for the MDS? Is SSD really worth there?There is a nice formula for approximating your MDS needs on the wiki. Basically it is something to the effect of Number-of-inodes-planned * 1kB = storage space required So, for 10 million inodes, you need ~10 GB of space. I am not sure if this helps, but you might be able to estimate your likely usage scenario. Updating MDSes isn''t easy (e.g. you have to pre-plan)> And we have also read about having a separate storage for the OSTs'' > journals. Is it really useful to get a pair of extra small (16Gb) SSD > disks for each OST to keep the journals and bitmaps? > > Finally, we have also read that it''s important to have different OSTs in > different physical drives to avoid bottlenecks. Is thas so if we make a > big RAID volume and then several logical volumes (done with the hardware > raid card, the operating system would just see different block devices)?Yes, though this will be suboptimal in performance. You want traffic to different LUNs not sharing the same physical disks. Build smaller RAID containers, and single LUNs atop those. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
Kevin Van Maren
2010-Sep-08 16:00 UTC
[Lustre-discuss] Lustre requirements and tuning tricks
On Sep 8, 2010, at 8:25 AM, Joe Landman <landman at scalableinformatics.com> wrote:> Joan J. Piles wrote: > >> And then 2 MDS like these: >> >> - 2 x Intel 5520 (quad core) processor (or equivalent). >> - 36Gb RAM. >> - 2 x 64Gb SSD disks. >> - 2 x10Gb Ethernet ports. > > Hmmm ....In general there is not much gain from using SSD for MDT, and depending on the SSD, it could do much _worse_ than spinning rust. Many ssd controllers degrade horribly under the small random write workload. (SSD are best for sequential write, random read). Journals may receive some benefit, as the sequential write pattern works much better for SSDs, although SSDs are not normally needed there.> >> After having read the documentation, it seems to be a sensible >> configuration, specially regarding the OSS. However we are not so >> sure >> about the MDS. We have seen recommendations to reserve 5% of the >> total >> file system space in the MDS. Is this true and then we should go for >> 2x2Tb SAS disks for the MDS? Is SSD really worth there? > > There is a nice formula for approximating your MDS needs on the wiki. > Basically it is something to the effect of > > Number-of-inodes-planned * 1kB = storage space required > > So, for 10 million inodes, you need ~10 GB of space. I am not sure if > this helps, but you might be able to estimate your likely usage > scenario. Updating MDSes isn''t easy (e.g. you have to pre-plan)It is 4KB/inode on the MDT. (It can be set to 2KB if you need 4 billion files on an 8TB MDT). My sizing rule of thumb has been ~ one MDT drive in RAID10 for each OST, to ensure you scale IOPS.> >> And we have also read about having a separate storage for the OSTs'' >> journals. Is it really useful to get a pair of extra small (16Gb) SSD >> disks for each OST to keep the journals and bitmaps?It doesn''t have to be SSD, and bitmaps are only applicable for software RAID. But unless you use asynchronous journals, there is normally a big win from external journals -- even with HW RAID having non-volatile storage. The bug win is putting journals on raid 1, rather than raid5/6.>> >> Finally, we have also read that it''s important to have different >> OSTs in >> different physical drives to avoid bottlenecks. Is thas so if we >> make a >> big RAID volume and then several logical volumes (done with the >> hardware >> raid card, the operating system would just see different block >> devices)? > > Yes, though this will be suboptimal in performance. You want > traffic to > different LUNs not sharing the same physical disks. Build smaller > RAID > containers, and single LUNs atop those.You get best performane with one HW RAID per OST. And that RAID should be optimized for 1MB IO (ie, not. 6+p) for best performance without having to muck with a bunch of parameters. If the OSTs are on the same drives, then there will be excessive head contention as different OST filesystems seek the same disks, greatly reducing throughput.> -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/jackrabbit > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss