Hi all, I plan on having about 100M files totaling about 8.5TiBytes. To see how ext3 would perform with large numbers of files I've written a test program which creates a configurable number of files into a configurable number of directories, reads from those files, lists them and then deletes them. Even up to 1M files ext3 seems to perform well and scale linearly; the time to execute the program on 1M files is about double the time it takes it to execute on .5M files. But past 1M files it seems to have n^2 scalability. Test details appear below. Looking at the various options for ext3 nothing jumps out as the obvious one to use to improve performance. Any recommendations? Thanks! Sean ------ Parameter one is number of files, parameter two is number of directories to write into. Dell MD3000 + LVM2. 2x7 10k rpm SAS disks 128k stripe RAID-0 for 3.8 TiBytes of total storage. Fedora Core 6 x86_64. 2xQuad Core Xeon. Default mount and ext3 options used. [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 10000 1000 real 0m1.054s user 0m0.128s sys 0m0.382s [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 1000000 1000 real 1m0.938s user 0m12.203s sys 0m40.358s [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 10000000 1000 real 13m39.881s user 2m6.645s sys 7m26.665s [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 20000000 1000 real 44m46.359s user 4m22.911s sys 17m2.792s
> Hi all, > > I plan on having about 100M files totaling about 8.5TiBytes. To see > how ext3 would perform with large numbers of files I've written a test > program which creates a configurable number of files into a configurable > number of directories, reads from those files, lists them and then > deletes them. Even up to 1M files ext3 seems to perform well and scale > linearly; the time to execute the program on 1M files is about double > the time it takes it to execute on .5M files. But past 1M files it > seems to have n^2 scalability. Test details appear below. > > Looking at the various options for ext3 nothing jumps out as the obvious > one to use to improve performance. > > Any recommendations?If you want performance that's not O(n^2), the number of directory levels must go up one each time the order of magnitude of the number of files goes up. That is, the number of files per directory must be constant. Suppose you have a directory of N files. To locate each file requires N location operations each requiring looking at an average of N/2 files. So it is O(N*(N2)), which is O(N^2). Add another level of directories each time you increase the number of files by a factor of 10. DS
Searching for directories (to ensure no duplicates, etc) is going to be order N^2. Size of the directory is likely to be a limiting factor. Try increasing to 10000 directories (in two layors of 100 each). I'll bet you that the result will be a pretty good increase in speed (getting back to the speeds that you had with 1M directories). On 8/1/07, Sean McCauliff <smccauliff at mail.arc.nasa.gov> wrote:> Hi all, > > I plan on having about 100M files totaling about 8.5TiBytes. To see > how ext3 would perform with large numbers of files I've written a test > program which creates a configurable number of files into a configurable > number of directories, reads from those files, lists them and then > deletes them. Even up to 1M files ext3 seems to perform well and scale > linearly; the time to execute the program on 1M files is about double > the time it takes it to execute on .5M files. But past 1M files it > seems to have n^2 scalability. Test details appear below. > > Looking at the various options for ext3 nothing jumps out as the obvious > one to use to improve performance. > > Any recommendations? >-- Stephen Samuel http://www.bcgreen.com 778-861-7641
How about the file size ? If the size is small, another performance kill should be on disk inode layout. Because the order of access dentries of dir is probably different from the order of allocating inodes in inode tables. This will make much time wast on hard disk seeking for the first time. Just FYI. Coly Sean McCauliff wrote:> Hi all, > > I plan on having about 100M files totaling about 8.5TiBytes. To see > how ext3 would perform with large numbers of files I've written a test > program which creates a configurable number of files into a configurable > number of directories, reads from those files, lists them and then > deletes them. Even up to 1M files ext3 seems to perform well and scale > linearly; the time to execute the program on 1M files is about double > the time it takes it to execute on .5M files. But past 1M files it > seems to have n^2 scalability. Test details appear below. > > Looking at the various options for ext3 nothing jumps out as the obvious > one to use to improve performance. > > Any recommendations? > > Thanks! > Sean > > ------ > Parameter one is number of files, parameter two is number of directories > to write into. > > Dell MD3000 + LVM2. 2x7 10k rpm SAS disks 128k stripe RAID-0 for 3.8 > TiBytes of total storage. Fedora Core 6 x86_64. 2xQuad Core Xeon. > Default mount and ext3 options used. > > [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 10000 1000 > > real 0m1.054s > user 0m0.128s > sys 0m0.382s > > [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 1000000 > 1000 > > real 1m0.938s > user 0m12.203s > sys 0m40.358s > [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 10000000 > 1000 > > real 13m39.881s > user 2m6.645s > sys 7m26.665s > [root at galaxy filestore]# time /soc/abyss/test/fileSystemTest.pl 20000000 > 1000 > > real 44m46.359s > user 4m22.911s > sys 17m2.792s
On Wed, Aug 01, 2007 at 18:55:53 -0700, Sean McCauliff <smccauliff at mail.arc.nasa.gov> wrote:> Hi all, > > I plan on having about 100M files totaling about 8.5TiBytes. To see > how ext3 would perform with large numbers of files I've written a test > program which creates a configurable number of files into a configurable > number of directories, reads from those files, lists them and then > deletes them. Even up to 1M files ext3 seems to perform well and scale > linearly; the time to execute the program on 1M files is about double > the time it takes it to execute on .5M files. But past 1M files it > seems to have n^2 scalability. Test details appear below. > > Looking at the various options for ext3 nothing jumps out as the obvious > one to use to improve performance. > > Any recommendations?Did you make sure directory indexing is available? I think that is the default now for ext3, but maybe it wasn't turned on for your test.