Bob Hoffman
2009-Jul-09 17:04 UTC
[CentOS] Regarding LARGE number of files in a folder in linux
This goes out to you admins who manage servers with a heavy load of information. I would like to know what you do about the number of files in a folder, or if that is a concern. I think there is a limitation or a slow down if it gets to big, but what is optimal (if necessary) Example- running a website that allows a user to upload some photos (small ones). You get lets say 300,000 users each uploading 10 photos. That's 3 million files. Storing that in one folder would seem like it would cause an issue when using that folder, is that right? If it does, what do you do about that? How do you handle things? If you have 300,000 clients you could give them their own folder each and then the folders would have only 10 photos, but one folder would contain 300,000 folders. SO what is best for file management and system resources? -Thanks Bob
Alan Sparks
2009-Jul-09 17:14 UTC
[CentOS] Regarding LARGE number of files in a folder in linux
Bob Hoffman wrote:> This goes out to you admins who manage servers with a heavy load of > information. > > I would like to know what you do about the number of files in a folder, or > if that is a concern. I think there is a limitation or a slow down if it > gets to big, but what is optimal (if necessary) > > Example- running a website that allows a user to upload some photos (small > ones). You get lets say 300,000 users each uploading 10 photos. That's 3 > million files. > > Storing that in one folder would seem like it would cause an issue when > using that folder, is that right? > > If it does, what do you do about that? How do you handle things? > > If you have 300,000 clients you could give them their own folder each and > then the folders would have only 10 photos, but one folder would contain > 300,000 folders. > > SO what is best for file management and system resources? > >Using hash_index on ext3 or a hashing file system helps... but in many such contexts, I've found if you can do a multi-level directory hashing scheme (compute some reproducible hash on a file name or user name/ID) and index into a directory structure, this can help. -Alan
Stephen Harris
2009-Jul-09 17:14 UTC
[CentOS] Regarding LARGE number of files in a folder in linux
On Thu, Jul 09, 2009 at 01:04:37PM -0400, Bob Hoffman wrote:> If you have 300,000 clients you could give them their own folder each and > then the folders would have only 10 photos, but one folder would contain > 300,000 folders.No. Because that top level folder would be split by first letter, or by first and second letter eg "fred" would be f/r/fred (or f/r/ed) "harry" would be h/a/harry (or h/a/rry) If you find there's too much clumping then (eg you have a lot of people beginning "fr"), you hash the name instead, and then split on the hash. (A simple hash could just be an incrementing number - "userid"). Then you simply program the web server to automatically convert from friendly name to split (or split hash'd) name. So it _looks_ like everyone has names like "fred" and "harry" but your directory structure is a lot more efficient.> SO what is best for file management and system resources?Best is subjective. I've just described _one_ method. -- rgds Stephen
On Thu, 2009-07-09 at 13:04 -0400, Bob Hoffman wrote:> > SO what is best for file management and system resources? >--- Looking at Case Studies from former companies and how they did it. Then follow there solution or make it better. If that company can give you POC (proof of concept) for 30 - 90 days just maybe you might have something. john