Alain Spineux
2011-Mar-12 12:10 UTC
[CentOS] wich filesystem to store > 250E6 small files in same or hashed dire
Hi I need to store about 250.000.000 files. Files are less than 4k. On a ext4 (fedora 14) the system crawl at 10.000.000 in the same directory. I tried to create hash directories, two level of 4096 dir = 16.000.000 but I had to stop the script to create these dir after hours and "rm -rf" would have taken days ! mkfs was my friend I tried two levels, first of 4096 dir, second of 64 dir. The creation of the hash dir took "only" few minutes, but copying 10000 files make my HD scream for 120s ! I take only 10s when working in the same directory. The filenames are all 27 chars and the first chars can be used to hash the files. My question is : Which filesystem and how to store these files ? Regards -- Alain Spineux? ? ? ? ? ? ? ? ?? |? aspineux gmail com Monitor your iT & Backups | http://www.magikmon.com Free Backup front-end ? ? | http://www.magikmon.com/mksbackup Your email 100% available | http://www.emailgency.com
Emmanuel Noobadmin
2011-Mar-12 13:35 UTC
[CentOS] wich filesystem to store > 250E6 small files in same or hashed dire
I haven't tried it but could you possibly use a database to hold all those files instead? At less than 4K per "row", performance from an indexed database might be faster. On 3/12/11, Alain Spineux <aspineux at gmail.com> wrote:> Hi > > I need to store about 250.000.000 files. Files are less than 4k. > > On a ext4 (fedora 14) the system crawl at 10.000.000 in the same directory. > > I tried to create hash directories, two level of 4096 dir = 16.000.000 > but I had to stop the script to create these dir after hours > and "rm -rf" would have taken days ! mkfs was my friend > > I tried two levels, first of 4096 dir, second of 64 dir. The creation > of the hash dir took "only" few minutes, > but copying 10000 files make my HD scream for 120s ! I take only 10s > when working in the same directory. > > The filenames are all 27 chars and the first chars can be used to hash > the files. > > My question is : Which filesystem and how to store these files ? > > Regards > > -- > Alain Spineux? ? ? ? ? ? ? ? ?? |? aspineux gmail com > Monitor your iT & Backups | http://www.magikmon.com > Free Backup front-end ? ? | http://www.magikmon.com/mksbackup > Your email 100% available | http://www.emailgency.com > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >
Simon Matter
2011-Mar-12 14:12 UTC
[CentOS] wich filesystem to store > 250E6 small files in same or hashed dire
> Hi > > I need to store about 250.000.000 files. Files are less than 4k. > > On a ext4 (fedora 14) the system crawl at 10.000.000 in the same > directory. > > I tried to create hash directories, two level of 4096 dir = 16.000.000 > but I had to stop the script to create these dir after hours > and "rm -rf" would have taken days ! mkfs was my friend > > I tried two levels, first of 4096 dir, second of 64 dir. The creation > of the hash dir took "only" few minutes, > but copying 10000 files make my HD scream for 120s ! I take only 10s > when working in the same directory. > > The filenames are all 27 chars and the first chars can be used to hash > the files. > > My question is : Which filesystem and how to store these files ?Did you try XFS? Deletes may be slow but apart from that it did a nice jobs when I last used it. But we had only around 50.000.000 files at the time. However, also ext3 worked quite well after *removing* dir_index. Also, did you run a x86_64 kernel? We were having all kind of troubles with big boxes and i686-PAE kernel, because direntry and inode caches were very small. Simon
Dr. Ed Morbius
2011-Mar-14 20:10 UTC
[CentOS] wich filesystem to store > 250E6 small files in same or hashed dire
on 13:10 Sat 12 Mar, Alain Spineux (aspineux at gmail.com) wrote:> Hi > > I need to store about 250.000.000 files. Files are less than 4k. > > On a ext4 (fedora 14) the system crawl at 10.000.000 in the same directory. > > I tried to create hash directories, two level of 4096 dir = 16.000.000 > but I had to stop the script to create these dir after hours > and "rm -rf" would have taken days ! mkfs was my friend > > I tried two levels, first of 4096 dir, second of 64 dir. The creation > of the hash dir took "only" few minutes, > but copying 10000 files make my HD scream for 120s ! I take only 10s > when working in the same directory. > > The filenames are all 27 chars and the first chars can be used to hash > the files. > > My question is : Which filesystem and how to store these files ?I'd also question the architecture and suggest an alternate approach: hierarchical directory tree, database, "nosql" hashing lookup, or other approach. See squid for an example of using directory trees to handle very large numbers of objects. In fact, if you wired things up right, you could probably use squid as a proxy back-end. In general, I'd say a filesystem is the wrong approach to this problem. What's the creation/deletion/update/lifecycle of these objects? Are they all created at once? A few at a time? Are they ever updated? Are they expired and/or deleted? Otherwise, reiserfs and its hashed directory indexes scales well, though I've only pushed it to about 125,000 entries in a single node. There is the usual comment about viability of a filesystem whose principle architect is in jail on a murder charge. It's possible XFS/JFS might also work. I'd suggest you test building and deleting large directories. Incidentally, for testing, 'make -J' can be useful for parallelizing processing, which would also test whether or not locking/contention on the directory entry itself is going to be a bottleneck (I suspect it may be). You might also find that GNU 'find's "-depth" argument is useful for deleting deep/large trees. -- Dr. Ed Morbius, Chief Scientist / | Robot Wrangler / Staff Psychologist | When you seek unlimited power Krell Power Systems Unlimited | Go to Krell!