Hi, I hope this is the correct list, and the question hasn''t been asked before. I''ve read through most of the material on the wiki and the website and I am currently in the process of building a proof-of-concept lustre cluster. One thing isn''t clear about the aggregated throughput figures in the FAQ: http://www.clusterfs.com/faq.html Stated throughput on a 64-bit linux OSS in the FAQ is: Dual-NIC gig-e on a 64-bit OSS: 220 MB/s We''re hoping to use these OSS''s to provide access to a large collection of rather small files. Most of them are rendered images in various formats, typical filesize range is 1Kb - 100Kb The total volume is expected to grow well over the 10Tb over time, we''re currently at 4TB. We would like to achieve the fastest throughput possible. If I understood everything correctly parts of a file can / will be stored on multiple OSS/OST''s ? Because of this aggregated throughput for a single file can be higher than the max-throughput per OSS ? Wat is the smallest element of a file that can be spread over multiple OSS''s/OST''s ? Best Regards, Ramon van Alteren
Ramon van Alteren wrote:> Hi, > > I hope this is the correct list, and the question hasn''t been asked before. > > I''ve read through most of the material on the wiki and the website and I > am currently in the process of building a proof-of-concept lustre cluster. > One thing isn''t clear about the aggregated throughput figures in the > FAQ: http://www.clusterfs.com/faq.html > > Stated throughput on a 64-bit linux OSS in the FAQ is: > Dual-NIC gig-e on a 64-bit OSS: 220 MB/s > > We''re hoping to use these OSS''s to provide access to a large collection > of rather small files. > Most of them are rendered images in various formats, typical filesize > range is 1Kb - 100Kb > > The total volume is expected to grow well over the 10Tb over time, we''re > currently at 4TB. > We would like to achieve the fastest throughput possible. > > If I understood everything correctly parts of a file can / will be > stored on multiple OSS/OST''s ? > Because of this aggregated throughput for a single file can be higher > than the max-throughput per OSS ? > Wat is the smallest element of a file that can be spread over multiple > OSS''s/OST''s ? > >You can stripe but with files so small you''ll see no benefit. You really don''t want to stripe unless you have to. One thing to watch is your metadata inodes with that many files so small. 2TB of disk can store 2 billion inodes. That means a max of 2 billion files. Since 8TB is the max disk size supported by most distros you''re limited to 8 billion files in one lustre filesystem. Then you have to start another filesystem. Daniel> Best Regards, > > Ramon van Alteren > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
Hi Daniel, Daniel Leaberry wrote:> You can stripe but with files so small you''ll see no benefit. You > really don''t want to stripe unless you have to. One thing to watch is > your metadata inodes with that many files so small. 2TB of disk can > store 2 billion inodes. That means a max of 2 billion files. Since 8TB > is the max disk size supported by most distros you''re limited to 8 > billion files in one lustre filesystem. Then you have to start another > filesystem.Thanks for the reply, but I''m not sure I understand correctly. Are you telling me that the size of the total lustre filesystem is limited by the size of the MDS storage filesystem? Meaning that in order to store 8 billion files in a lustre filesystem I would need an 8Tb MDS filesystem ? Or am I reading your answer completely the wrong way ? Is this limitation over all versions or tied to a specific lustre version (aka is this true for 1.6 as well ) Regards, Ramon
Ramon van Alteren wrote:> Hi Daniel, > > Daniel Leaberry wrote: > >> You can stripe but with files so small you''ll see no benefit. You >> really don''t want to stripe unless you have to. One thing to watch is >> your metadata inodes with that many files so small. 2TB of disk can >> store 2 billion inodes. That means a max of 2 billion files. Since 8TB >> is the max disk size supported by most distros you''re limited to 8 >> billion files in one lustre filesystem. Then you have to start another >> filesystem. >> > Thanks for the reply, but I''m not sure I understand correctly. > Are you telling me that the size of the total lustre filesystem is > limited by the size of the MDS storage filesystem? > Meaning that in order to store 8 billion files in a lustre filesystem I > would need an 8Tb MDS filesystem ? >Yes. It''s limited by inodes. If all you create are 10MB files you won''t ever be limited by the mds filesystem because you consume 4KB for every 10MB on the OST''s. If you store small files you''re very much limited by the mfs filesystem. Best case scenerio with no striping means you can use 1KB inodes. Here''s a 950GB lun formatted with 1K inodes. As you can see I have 943 million inodes which means I can store 943 million files. If all those files are 4KB files on the OST''s then my max filesystem size can be no greater than 943 million x 4KB. And because the OST''s by default are formatted with an inode every 16KB you could run short there as well. [root@lu-mds01 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 25G 1.3G 22G 6% / /dev/sda1 1012M 45M 916M 5% /boot none 7.9G 0 7.9G 0% /dev/shm /dev/sdb 450G 489M 405G 1% /var/mnt/lustre01-mds [root@lu-mds01 ~]# df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda2 3204992 50072 3154920 2% / /dev/sda1 131616 44 131572 1% /boot none 2060261 1 2060260 1% /dev/shm /dev/sdb 943652864 24 943652840 1% /var/mnt/lustre01-mds> Or am I reading your answer completely the wrong way ? > > Is this limitation over all versions or tied to a specific lustre > version (aka is this true for 1.6 as well ) >This is a limitation of all lustre versions. Once disjoint clustered MDS''s come then you can have as many files as you like in one filesystem.> Regards, > > Ramon >