Nikos Balkanas
2005-Jan-22 05:54 UTC
[Samba] Samba - CPU and memory usage - Proposed solution(?)
Hello, Solution developed against samba 2.2.22. Didn't and do not have the opportunity to test samba 3.0.0. At the time I was working as a technical architect for Tellas, the 2nd largest Telcom in Greece. We used large billing and CRM systems (Geneva, Siebel). Filesystem and Database were hosted on Solaris SF68000 servers (4-6 CPUs/domain). Therefore, we used samba on the Unix servers. These systems generate lots of data, and they use the proper interface between database and filesystem. That is, bulk (bills, contracts) are kept as files, and only the path is in the database. This of course (depending on company size and traffic) generates single directories with millions of files in each one. Samba can handle up to ~20,000 files/per directory without significant server or service degradation. At 70,000 files/directory (10 directories), siebel would delay ~20" to display a customer's contracts making it very difficult for CRM to work. At the same time geneva with ~1,000,000/directory would delay ~20' to display a particular bill. All this time geneva smbd processes were ~150 MB RAM and CPU 100%. 4 simultaneous such requests by CRM and support could stonewall the domain. 10 simultaneous requests would crash the server (easy to do when a single request lasts ~20'). No browsing or wildmasks of files needed, only exact file request through the database. Putting samba through the debugger, I noticed that on every file request, it would scan all the files in the large directory, while converting to Unix filenames and building up the filename cash until it reaches 150 MB. I developed a configurable parameter "many files", which when set, disables file browsing (who needs listing of ~1,000,000 files?) and performs a "stat" to get the file. The improvement was huge and manyfold. Response went down to < 1", CPU to ~ .1% and RAM ~ 2.5 MB/process. More importantly, these results are independent on how many files are in a directory (as long as the filesystem doesn't run out of inodes!). Even more, security is better, since CRM agents cannot view, modify or delete files from the mapped filesystem, but instead they go only through the application as intended. Since this is a per directory configurable parameter, other samba directories with fewer files can have full browsing/listing at the same time. The solution was tested against Windows XP. Windows XP must use a similar "stat" mechanism, since it went very fast with ~1,000,000 files/directory. Directory listing is slow (as expected), and in batches of 200 or so files at a time. However, you cannot disable browsing, and therefore it is an inferior solution, since security is more lax, and each time that a bill is about to be saved, the full browsing window is opened, with all the side-efects on the server. It uses, however, fewer packets than samba to do file requests. As mentioned I have no idea, and I am not able to test 3.0. My apologies if you already have corrected for it. If not, and there is interest for the patch let me know - but it will be against 2.2.22. The patch has been tested succesfully on Telas' production environment for ~1 year without any complains. With this patch, samba can be the top choice for large serious professional production systems. Otherwise directories should be kept less than 20,000 files. Cheers, Nikos Balkanas
Andrew Bartlett
2005-Jan-22 06:35 UTC
[Samba] Re: Samba - CPU and memory usage - Proposed solution(?)
On Sat, 2005-01-22 at 07:53 +0200, Nikos Balkanas wrote:> Hello, > > Solution developed against samba 2.2.22. Didn't and do not have the > opportunity to test samba 3.0.0.> No browsing or wildmasks of files needed, only > exact file request through the database. > > Putting samba through the debugger, I noticed that on every file request, it > would scan all the files in the large directory, while converting to Unix > filenames and building up the filename cash until it reaches 150 MB. I > developed a configurable parameter "many files", which when set, disables > file browsing (who needs listing of ~1,000,000 files?) and performs a "stat" > to get the file.Yes, this is a well-known problem with Samba's case insensitive filename handling, on case-sensitive Unix systems. If, as occurs in your case, the name is known exactly, such that a stat () will determine the result, then you may set 'case sensitive = yes', and Samba will do exactly that. This is best in 3.0.11pre2, (ie, the current code, I'm not sure how far back the changes were applied) where jra applied some patches to ensure that the directory listing was not performed. So, you may wish to advise your former employer that an 'out of the box' solution should now be available. Finally, I'm pleased to see Samba used being used in such big applications. It's a joy to hear about these kind of installations. Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org Student Network Administrator, Hawker College http://hawkerc.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.samba.org/archive/samba/attachments/20050122/1c39cacf/attachment.bin