On 11/3/18 12:44 AM, yf chu wrote:> I wonder whether the performance will be affected if there are too many
files and directories on the server.
With XFS on modern CentOS systems, you probably don't need to worry:
https://www.youtube.com/watch?v=FegjLbCnoBw
For older systems, as best I understand it: As the directory tree grows,
the answer to your question depends on how many entries are in the
directories, how deep the directory structure is, and how random the
access pattern is.? Ultimately, you want to minimize the number of
individual disk reads required.
Directories with lots of entries is one situation where you may see
performance degrade.? Typically around the time the directory grows
larger than the maximum size of the direct block list [1] (48k), reading
the directory starts to take a little longer. After the maximum size of
the single indirect block list (4MB), it will tend to get slower again.?
File names impact directory size, so average filename length factors in,
as well as the number of files.
A given file lookup will need to reach each of the parent directories to
locate the next item in the path.? If your path is very deep, then your
directories are likely to be smaller on average, but you're increasing
the number of lookups required for parent directories to reduce the
length of the block list.? It might make your worst-case better, but
your best-case is probably worse.
The system's cache means that accessing a few files in a large structure
is not as expensive as random files in a large structure.? If you have a
large structure, but users tend to access mostly the same files at any
given time, then the system won't be reading the disk for every lookup.?
If accesses aren't random, then structure size becomes less important.
Hashed name directory structure has been mentioned, and those can be
useful if you have a very large number of objects to store, and they all
have the same permission set.? A hashed name structure typically
requires that you? store in a database a map between the original names
(that users see) and the names' hashes.? You could hash each name at
lookup, but that doesn't give you a good mechanism for dealing with
collisions.? Hashed name directory structures typically have a worse
best-case performance due to the lookup, but they offer predictable and
even growth for lookup times for each file.? Where a free-form directory
structure might have a large difference between the best-case and
worst-case lookup, a hashed name directory structure should be roughly
the same access time for all files.
1: https://en.wikipedia.org/wiki/Inode_pointer_structure