I have a program which traverses directories and builds a list of
files in those directories. The hierarchy being traversed consists
of approximately 750,000 files scattered among approximately 4100
directories. I ran it under three configurations:
1. a regular linux system with on a 15k RPM SAS drive.
2. directly on the RAID under my gluster installation.
3. GlusterFS 3.1.0 / RDMA / Distributed / native fuse clients.
I did the experiment twice in a row on each configuration. Results:
1. 90 seconds then 7 seconds.
2. 74 seconds then 4 seconds.
3. 4678 seconds then 4648 seconds.
Any suggestions about why my gluster installation is so much slower
than the regular file systems and how I can speed things up?
Here is pseudo-code for the traversal:
push the root onto a stack
while stack not empty
curdir = pop the stack
opendir(curdir)
foreach diritem in curdir (ignoring . and ..),
stat diritem
if diritem is a directory,
push diritem onto the stack
else
put diritem and its size into output
closedir(curdir)
Thanks!
.. Lana (lana.deere at gmail.com)
For gluster 3.1.0 on a large hierarchy I reported that it took these times to build a list of files:> 3. 4678 seconds then 4648 seconds.For gluster 3.1.1 on the same hierarchy, having done "gluster volume reset" after the upgrade, I got 640 seconds then 90 seconds. I'm guessing this improvement is due to the stat-prefetch translator, but whether or not my guess is correct, this is a nice speedup. .. Lana (lana.deere at gmail.com) On Fri, Nov 5, 2010 at 3:03 PM, Lana Deere <lana.deere at gmail.com> wrote:> I have a program which traverses directories and builds a list of > files in those directories. ?The hierarchy being traversed consists > of approximately 750,000 files scattered among approximately 4100 > directories. ?I ran it under three configurations:[...]> ?3. GlusterFS 3.1.0 / RDMA / Distributed / native fuse clients.[...]> I did the experiment twice in a row on each configuration. ?Results: > ?3. 4678 seconds then 4648 seconds.