Thank you for your hint. I really mean I am planning to store millions of files on the file system. Then may I ask that what is the maximum number of files which could be stored in one directory without affecting the performance of web server? At 2018-11-03 16:03:56, "Walter H." <Walter.H at mathemainzel.info> wrote:>On 03.11.2018 08:44, yf chu wrote: >> I have a website with millions of pages. >> >does 'millions of pages' also mean 'millions of files on the file system'? > >just a hint - has nothing to do with any file system as its universal: >e.g. when you have 10000 files >don't store them in one folder, create 100 folders with 100 files in each; > >there is no file system that handles millions of files in one folder >or with limited resources (e.g. RAM) > >_______________________________________________ >CentOS mailing list >CentOS at centos.org >https://lists.centos.org/mailman/listinfo/centos
Stephen John Smoogen
2018-Nov-03 14:39 UTC
[CentOS] inquiry about limitation of file system
On Sat, 3 Nov 2018 at 04:17, yf chu <cyflhn at 163.com> wrote:> > Thank you for your hint. > I really mean I am planning to store millions of files on the file system. > Then may I ask that what is the maximum number of files which could be stored in one directory without affecting the performance of web server? > >There is no simple answer to that. It will depend on everything from the physical drives used, the hardware that connects the motherboard to the drives, the size of the cache and type of CPU on the system, any low level filesystem items (software/hardware raid, type of raid, redundancy of the raid, etc), the type of the file system, the size of the files, the layout of directory structure, and the metadata connected to those files and needing to be checked. Any one of those can severely affect partially performance of the web-server, and multiple combinations of them can severely affect it. This means a lot of benchmarking for the hardware and os are needed to get an idea if any of the tuning of number of files per directory will make things better or not. I have seen many systems where the hardware worked better with a certain type of RAID and it didn't matter if you had 10,000 or 100 files in each directory.. the changes in performance were minimal but moving from RAID10 to RAID6 or vice versa sped things up much more.. or adding more cache to the hardware controller etc etc. Assuming you have tuned all of that, then the number of files in the directory comes down to a 'gut' check. I have seen some people do some sort of power of 2 per directory but rarely go over 1024. if you do a 3 level double hex tree <[0-f][0-f]>/<[0-f][0-f]>/<[0-f][0-f]>/ and lay them out using some sort of file hash method.. you can easily sit 256 files in each directory and have 2^32 files.. You will probably end up with some hot spots depending on the hash method so it would be good to test that first.> > > > > > > At 2018-11-03 16:03:56, "Walter H." <Walter.H at mathemainzel.info> wrote: > >On 03.11.2018 08:44, yf chu wrote: > >> I have a website with millions of pages. > >> > >does 'millions of pages' also mean 'millions of files on the file system'? > > > >just a hint - has nothing to do with any file system as its universal: > >e.g. when you have 10000 files > >don't store them in one folder, create 100 folders with 100 files in each; > > > >there is no file system that handles millions of files in one folder > >or with limited resources (e.g. RAM) > > > >_______________________________________________ > >CentOS mailing list > >CentOS at centos.org > >https://lists.centos.org/mailman/listinfo/centos > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos-- Stephen J Smoogen.
On Nov 3, 2018, at 04:16, yf chu <cyflhn at 163.com> wrote:> > Thank you for your hint. > I really mean I am planning to store millions of files on the file system. > Then may I ask that what is the maximum number of files which could be stored in one directory without affecting the performance of web server?There are hard limits in each file system. For ext4, there is no per-directory limit, but there is an upper limit of total files (inodes really) per file system: 2^32 - 1 (4,294,967,295). XFS also has no per-directory limit, and a 2^64 limit of inodes. (18,446,744,073,709,551,616) If you are using ext2 or 3 I think the limit per directory is around 10,000, and you start seeing heavy performance issues beyond that. Don?t use them. Now, filesystem limits aside, software that try to read those directories with huge numbers of files are going to have performance issues. I/O operations, memory limitations and time are going to be bottlenecks to web operations. You really need to reconsider how you want to serve these pages. -- Jonathan Billings
On 2018-11-03, Jonathan Billings <billings at negate.org> wrote:> > Now, filesystem limits aside, software that try to read those directories with huge numbers of files are going to have performance issues. I/O operations, memory limitations and time are going to be bottlenecks to web operations.Just to be pedantic, it's only what Jonathan suggested that would be a performance problem. Typically, a web server doesn't need to read the directory in order to retrieve a file and send it back to a client, so that wouldn't necessarily be a performance issue. But having too many files in one directory would impact other operations that might be important, like backups, finding files, or most other bulk file operations, which would also have an effect on other processes like the web server. (And if the web server is generating directory listings on the fly that would be a huge performance problem.) And as others have mentioned, this issue isn't filesystem-specific. There are ways to work around some of these issues, but in general it's better to avoid them in the first place. The typical ways of working around this issue are storing the files in a hashed directory tree, and storing the files as blobs in a database. There are lots of tools to help either job. --keith -- kkeller at wombat.san-francisco.ca.us
Thank you for your advice. I know the issue depends on a lot of factors. Would you please give me some detail information about how to tune these parameters such as the size of cache,the type of cpu? I am not quite familiar with these detail. At 2018-11-03 22:39:55, "Stephen John Smoogen" <smooge at gmail.com> wrote:>On Sat, 3 Nov 2018 at 04:17, yf chu <cyflhn at 163.com> wrote: >> >> Thank you for your hint. >> I really mean I am planning to store millions of files on the file system. >> Then may I ask that what is the maximum number of files which could be stored in one directory without affecting the performance of web server? >> >> > >There is no simple answer to that. It will depend on everything from >the physical drives used, the hardware that connects the motherboard >to the drives, the size of the cache and type of CPU on the system, >any low level filesystem items (software/hardware raid, type of raid, >redundancy of the raid, etc), the type of the file system, the size of >the files, the layout of directory structure, and the metadata >connected to those files and needing to be checked. > >Any one of those can severely affect partially performance of the >web-server, and multiple combinations of them can severely affect it. >This means a lot of benchmarking for the hardware and os are needed to >get an idea if any of the tuning of number of files per directory will >make things better or not. I have seen many systems where the hardware >worked better with a certain type of RAID and it didn't matter if you >had 10,000 or 100 files in each directory.. the changes in performance >were minimal but moving from RAID10 to RAID6 or vice versa sped things >up much more.. or adding more cache to the hardware controller etc >etc. > >Assuming you have tuned all of that, then the number of files in the >directory comes down to a 'gut' check. I have seen some people do some >sort of power of 2 per directory but rarely go over 1024. if you do a >3 level double hex tree <[0-f][0-f]>/<[0-f][0-f]>/<[0-f][0-f]>/ and >lay them out using some sort of file hash method.. you can easily sit >256 files in each directory and have 2^32 files.. You will probably >end up with some hot spots depending on the hash method so it would be >good to test that first. > >> >> >> >> >> >> >> At 2018-11-03 16:03:56, "Walter H." <Walter.H at mathemainzel.info> wrote: >> >On 03.11.2018 08:44, yf chu wrote: >> >> I have a website with millions of pages. >> >> >> >does 'millions of pages' also mean 'millions of files on the file system'? >> > >> >just a hint - has nothing to do with any file system as its universal: >> >e.g. when you have 10000 files >> >don't store them in one folder, create 100 folders with 100 files in each; >> > >> >there is no file system that handles millions of files in one folder >> >or with limited resources (e.g. RAM) >> > >> >_______________________________________________ >> >CentOS mailing list >> >CentOS at centos.org >> >https://lists.centos.org/mailman/listinfo/centos >> _______________________________________________ >> CentOS mailing list >> CentOS at centos.org >> https://lists.centos.org/mailman/listinfo/centos > > > >-- >Stephen J Smoogen. >_______________________________________________ >CentOS mailing list >CentOS at centos.org >https://lists.centos.org/mailman/listinfo/centos