David Collier-Brown
2000-Mar-08 20:47 UTC
why is samba so slow with many files in one directory? [LARGE MESSAGE]
Hubert Gr?nheidt wrote:> Maybe it'll help to be more precise: > We have currently 14 Mio files separated into 140 Directories, each > containing 100000 files. The naming-scheme is simple: <id>.<extension>; so > directory 00000001 contains files 0.<someext> to 99999.<someext>, directory > 00000002 contains files 100000.<someext> to 199999.<someext> etc. > The extensions are different, but all files are unique in their > number, the extensions are only used to indicate the type of the file.Cool: you can already split these by number-pattern. I'd try to make the directory - small enough for good scan-performance - large enough that clients tend to sit in the same directory for reasonable periods The latter assumes that there is some kind of locality of reference in the use of these files.> Since the files have an average size of 11kB we wanted to try ReiserFS > and Samba to deliver the files to Windows NT Clients (An > NTFS-checkdisk currently lasts 8 hours on our RAID-System with NTFS > and a Journaling Filesystem like ReiserFS, which is especially fast at > small files, seemed very attractive to us).Ok, sounds like a good plan.> I tried it at home last week (had a little time, while fighting > influenza) with ext2-filesystem and some thousand files but the > results were discouraging. > > When I start *top* I can see that Linux uses nearly 100% CPU an 9x% > are from SMBD, so the filesystem seems not to be the problem but the > SMB daemon.Yes, creation is going to be a pretty cpu-bound operation: I'll bet you see high numbers for time spent in wait-io and system state, the rest in user.> > Samba is configured to be case-sensitive, security: per share, > preferred master and local master, no wins support and allow any > hosts. > The share is configured to be read-write, guest OK, case-sensitive, > default case=lower, mangle case = no and browsable. > The user from Windows NT is known to Samba and the smbpasswd file.I was going to watch samba under truss, but I just broke 2.0.7 alpha 1... back to 2.0.6! Ok, I just ran truss on smbd, while issuing an ls command to smbclient (on Solaris). It said: 3.7258 open64("./", O_RDONLY|O_NDELAY) = 9 3.7261 fcntl(9, F_SETFD, 0x00000001) = 0 3.7263 fstat64(9, 0xFFBEE698) = 0 3.7266 getdents64(9, 0x00153B50, 1048) = 1040 3.7269 getdents64(9, 0x00153B50, 1048) = 1040 3.7272 getdents64(9, 0x00153B50, 1048) = 1024 3.7275 getdents64(9, 0x00153B50, 1048) = 1048 3.7278 getdents64(9, 0x00153B50, 1048) = 1032 3.7281 getdents64(9, 0x00153B50, 1048) = 912 3.7284 getdents64(9, 0x00153B50, 1048) = 0 3.7286 close(9) = 0 a normal scan, which took .0028 seconds, followed by some log writing (which took about .0042 seconds per line!) This was then followed by stats for the ls (which is something of an ls -l) which took .2799 seconds, ~96 times the getdents time, and a statvfs to get the disk space free 4.0985 statvfs64(".", 0xFFBEEFD0) There were 184 entries in the directory I used, 6096 bytes, and that gives a readdir speed of about 1.4 MB/S or 44 K-entries/S for a small directory. Big ones get slower as a function of indirect blocks used, so there will be a step-function in the speed you'll want to stay below. The logs said [2000/03/08 11:04:55, 3] smbd/process.c:process_smb(615) Transaction 32 of length 87 [2000/03/08 11:04:55, 3] smbd/process.c:switch_message(448) switch message SMBtrans2 (pid 8826) [2000/03/08 11:04:55, 3] smbd/trans2.c:call_trans2findfirst(669) call_trans2findfirst: dirtype = 22, maxentries = 512, close_after_first=0, clo se_if_end = 1 requires_resume_key = 1 level = 260, max_data_bytes 65535 [2000/03/08 11:04:55, 3] lib/util.c:unix_clean_name(608) unix_clean_name [/*] [2000/03/08 11:04:55, 3] lib/util.c:unix_clean_name(608) unix_clean_name [*] [2000/03/08 11:04:55, 3] lib/util.c:unix_clean_name(608) unix_clean_name [./] [2000/03/08 11:04:55, 3] smbd/dir.c:dptr_create(491) creating new dirptr 256 for path ./, expect_close = 1 ...which is the message seen in truss, above. This is followed by [2000/03/08 11:04:55, 3] smbd/process.c:process_smb(615) Transaction 33 of length 39 [2000/03/08 11:04:55, 3] smbd/process.c:switch_message(448) switch message SMBdskattr (pid 8826) [2000/03/08 11:04:55, 3] smbd/reply.c:reply_dskattr(1199) dskattr dfree=343 Which is the by a disk-space-free request for the ls. To me, this says the simple directory scan is fairly "light" at the system level, and most of the cycles get used by the app. Sar says: SunOS elsbeth 5.8 Generic sun4u 03/08/00 12:56:32 %usr %sys %wio %idle 12:56:33 2 9 9 80 12:56:34 5 3 1 91 12:56:35 0 0 0 100 12:56:36 0 0 0 100 (Yes, I was testing on Solaris 8 at work (;-)) The open and first readdir caused wait-io, the rest grabbed data from a buffer and the cpu processing jumped up. Let's try this with a 100,000-file directory, created on my local disk, that should be slow! (the creation is taking ages, in fact! I think we'll stop at 85,784 files) # sar -o foo.raw 1 120 SunOS elsbeth 5.8 Generic sun4u 03/08/00 15:23:33 %usr %sys %wio %idle 15:23:34 23 77 0 0 15:23:35 28 72 0 0 15:23:36 29 71 0 0 15:23:37 34 66 0 0 15:23:38 28 72 0 0 yes, the user time jumps up, and the system time too as the data is transferred to the client. Looking at it in detail, the cpu was 20% usr for the first 30 seconds, then jumped to 80 % as the client, running on the same machine, started formatting and printing. The system time started at 80%, and dropped to 30% after the transfer completed. This is attached as a gif file: dir.cpu.gif ly other interest sting graph was logical and physical reads: this is attached as dir.read.gif, and the physical reads were remarkably low, as the disk and cache seems to make them "instantaneous". That tends to imply that the OS is mostly walking buffer pages and transferring data to the app. [I'll send Herr Gr?nheidt a more detailed set of plots] So we need to do both: minimize samba processing, and organize the filesystem for fast directory traversal. The latter is a multiplier on both slow directory scans in Unix and and samba's processing, so reorganizing will give the biggest single payoff. --dave -- David Collier-Brown, | Always do right. This will gratify some people 185 Ellerslie Ave., | and astonish the rest. -- Mark Twain Willowdale, Ontario | //www.oreilly.com/catalog/samba/author.html Work: (905) 415-2849 Home: (416) 223-8968 Email: davecb@canada.sun.com -------------- next part -------------- A non-text attachment was scrubbed... Name: dir.cpu.gif Type: image/gif Size: 4088 bytes Desc: not available Url : http://lists.samba.org/archive/samba/attachments/20000308/fd0140e3/dir.cpu.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: dir.read.gif Type: image/gif Size: 2922 bytes Desc: not available Url : http://lists.samba.org/archive/samba/attachments/20000308/fd0140e3/dir.read.gif