paul.r.schenk@accenture.com
2002-Jul-23 11:02 UTC
[Samba] Directory with large number of files (follow-up)
Hello all, This is a follow-up to my post a few weeks ago about poor performance when serving files from a directory with a large number of files (in this case over 600 000 files). I traced this down to two places in the code: 1) The trans2 routine get_lanman2_dir_entry loops through the entire directory looking for possible matches. I can see why this is the best idea for the general case (better than doing all possible variations on the name). However, in the large number of files case, it takes quite a long time to loop through the directory (1/2 million files). I patched this routine to only look for the name exactly as supplied. This breaks backward compatibility and forces case sensitivity, but in this case I have control over what files are being asked for (it's an application we maintain). 2) The routine OpenDir in dir.c creates a Dir structure that contains every directory entry. It even does this if 'dont descend' is set for the directory (this must be a bug). I patched OpenDir to return after retrieving at most 50 entries. Since I don't loop through the list to get my files (see point 1 above), this is not a loss. So now I can open any file using Samba on my HP D380/2 in the same amount of time it takes a Pentium II thing running NT4 to serve the files. As an aside, 'dont descend' seems only marginally useful. Given my walk through the code, every directory is scanned in it's entirety at least twice (by OpenDir) before a decision not to show any files is made. If it takes over 1 minute to scan the directory, you can say good-bye to your CPU pretty quickly. This option prevents browsing, but I've seen some requests for \dirname\* that caused get_lanman2_dir_entry to find the matches for this (I would have expected 'dont descend' to stop this). So what do I suggest in general? A 'max compatible number' would be a good option. It would work something like: in smb.conf: max compatible number = 5000 Which would do two things. 1) If OpenDir get's more than the given number of files, it will abort with only the partial list. 2) If get_lanman2_dir_entry loops through this many files with no match, it will give up and just try for the name exactly as supplied. This would let Samba deal with these strange cases of lots of files and still keep all the old clients happy in most cases. If anybody else thinks this might be of use, I can try to pretty up my hacks to implement this. I've been hacking 2.2.5 in case it matters. Thanks Paul This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited.