yonatan pingle
2011-Jul-24 09:29 UTC
[CentOS] lots of small files in a folder on Linux centos
Hello, I have a rather annoying issue on going with one of my centos virtual servers. the server hosts a website using apache and mysql ,there are three persons involved with keeping the site up and running. and i am his root due to the fact he does not know anything with about Linux. there is an php/sql coder , and the site owner which only knows to use the CMS and upload new articles to the website. the coder and the site owner work together for a long time already , i am their new admin ( as the last one was a major ISP which failed to host the site properly ). lately the server is under-preforming and load averages are high, mysql service keeps crashing and the server is hitting max memory usage ( so i added ram .. ) , after looking into the website folders, i have found one folder which from my point of view is one of the causes for the server loads. (sorry for piping ls ). uploads]# ls | wc -l 3123 I have talked with the site owner, which in turn showed this to the coder ,now he throws the ball back claiming: it has nothing to do with server performance. the folder is full of images, about 40K each, and i have good reason to believe this is the problem, as this is not the first time i see that a folder which includes a large amount of files causes a server to under-perform. the coder is not tech savvy as one might expect, so it's really hard for me to explain the issue of having lots of files in one folder to the site owner or to the coder. the hardware is a decent machine dual E5530 24RAM with six hard drives in raid. the virtual server has 2GB of ram and it's own CPU share ( 4 cores 8 threads ). the coder is arguing with facts sadly to say he has the site owner on "his side". long story short, how should i explain in the most simple way in plain english that having that much files in a folder will cause a server to work slower? pros vs cons of having a large amount of small files in the same folder on Linux Centos? -- Best Regards, Yonatan Pingle RHCT | RHCSA | CCNA1
Eero Volotinen
2011-Jul-24 11:03 UTC
[CentOS] lots of small files in a folder on Linux centos
2011/7/24 yonatan pingle <yonatan.pingle at gmail.com>:> Hello, > I have a rather annoying issue on going with one of my centos virtual servers. > the server hosts a website using apache and mysql ,there are three > persons involved with keeping the site up and running. > and i am his root due to the fact he does not know anything with about Linux. > there is an php/sql coder , and the site owner which only knows to use > the CMS and upload new articles to the website. > > the coder and the site owner work together for a long time already , i > am their new admin ( as the last one was a major ISP which failed to > host the site properly ). > > lately the server is under-preforming and load averages are high, > mysql service keeps crashing and the server is hitting max memory > usage ( so i added ram .. ) , > after looking into the website folders, i have found one folder which > from my point of view is one of the causes for the server loads. > > (sorry for piping ls ). > > uploads]# ls | wc -l > 3123 > > I have talked with the site owner, which in turn showed this to the > coder ,now he throws the ball back claiming: it has nothing to do with > server performance. > the folder is full of images, about 40K each, and i have good reason > to believe this is the problem, as this is not the first time i see > that a folder which includes a large amount of files causes a server > to under-perform. > > the coder is not tech savvy as one might expect, so it's really hard > for me to explain the issue of having lots of files in one folder to > the site owner or to the coder. > > the hardware is a decent machine dual E5530 24RAM with six hard drives in raid. > the virtual server has 2GB of ram and it's own CPU share ( 4 cores 8 threads ). > the coder is arguing with facts sadly to say he has the site owner on > "his side". > > long story short, how should i explain in the most simple way in plain > english that having that much files in a folder will cause a server to > work slower? > > pros vs cons of having a large amount of small files in the same > folder on Linux Centos?I assume that you are using ext3 or ext4 filesystems? Both ext3 and ext4 slows down, if there is too much files in same directory. XFS-fs is solution to fix this problem. -- Eero
R P Herrold
2011-Jul-24 14:13 UTC
[CentOS] lots of small files in a folder on Linux centos
On Sun, 24 Jul 2011, yonatan pingle wrote:> the coder is not tech savvy as one might expect, so it's > really hard for me to explain the issue of having lots of > files in one folder to the site owner or to the coder.I do not expect coders to remain 'not tech savvy' If the coder is not willing to learn and to test, you are already doomed, and should walk away from the project To show the problem, take a pile of pennies, and ask the coder to find one with a given year. The coder will have to do a linear search, to even know if the target exists. Then show a egg carton with another pile of pennies sorted and labelled by year in each section, and aask them to repeat the task -- in the latter case, it is a 'single seek' to solve the problem Obviously, the target year may not even be present. With a single pile (directory) the linear search is still required, but with 'binning' by years, that is obvious by inspection as well One approach to lots of files in a single directory (which can cause problems in getting timely access to a specific file) is to build a permuted directory tree from the file names to spread the load around. If the files are of a form where they have 'closely identical' names [pix00001.jpg, pix00002.jpg, etc], first build a 'hashed' version of the file name with md5sum, or such, to level the hash leading characters [herrold at localhost ~]$ ./hashdemo.sh pix00001.jpg fd8f49c6487588989cd764eb493251ec pix00002.jpg 12955d9587d99becf3b2ede46305624c pix00003.jpg bfdc8f593676e4f1e878bb6959f14ce2 [herrold at localhost ~]$ cat hashdemo.sh #!/bin/sh # CANDIDATES="pix00001.jpg pix00002.jpg pix00003.jpg" for i in `echo "${CANDIDATES}"`; do HASH=`echo "$i" | md5sum - | awk {'print $1'}` echo "$i ${HASH}" done [herrold at localhost ~]$ then, we look to the leading letter of the hask, to design our egg carton bins. We place pix00001.jpg in directory: ./f/ and pix00002.jpg in directory ./1/ and pix00003.jpg in directory ./b/ and so forth -- if the directories get too full again, you might go to using the first two letters of the hash to perform the 'binning' process The md5sum function is readily available in php, as are directory creation and so forth, so positioning the files, and computing the indexes are straightforward there This is all pretty basic stuff, covered in Knuth in TAOCP long ago -- Russ herrold
Rajagopal Swaminathan
2011-Jul-25 01:08 UTC
[CentOS] lots of small files in a folder on Linux centos
Greetings, On Sun, Jul 24, 2011 at 2:59 PM, yonatan pingle <yonatan.pingle at gmail.com> wrote:> Hello, > after looking into the website folders, i have found one folder which > from my point of view is one of the causes for the server loads. >hmm... does mount <dir> -noatime -noadirtime help speed it up? -- Regards, Rajagopal
On Sunday, July 24, 2011 05:29:23 AM yonatan pingle wrote: ...> lately the server is under-preforming and load averages are high, > mysql service keeps crashing and the server is hitting max memory > usage ( so i added ram .. ) , > after looking into the website folders, i have found one folder which > from my point of view is one of the causes for the server loads....> uploads]# ls | wc -l > 3123...> pros vs cons of having a large amount of small files in the same > folder on Linux Centos?3,123 files is not a large number. From a CentOS 4 file server here..... [root at pachyderm sky_data]# ls|wc -l 13526 [root at pachyderm sky_data]# cd ../motse [root at pachyderm motse]# ls |wc -l 28218 [root at pachyderm motse]#cd [root at pachyderm ~]# du -s /var/lib/pgsql 556420596 /var/lib/pgsql [root at pachyderm ~]# (Yeah, 556GB in PostgreSQL....) Pachyderm = 'The elephant never forgets....' But I'm not looking forward to converting it to a post-C4 PostgreSQL.... Performance on this box is pretty good, all things considered. Large log files I have found can be performance problems; check to make sure log files are being rolled properly. There are some specific MySQL tuning documents out there; I seem to remember a posting on a local LUG list about some serious MySQL performance issues that took a long time to ferret out, but I can't seem to find it quickly.....