thr3ads.net - Ext3 users - Very slow directory traversal [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Ross Boylan

2007-Oct-06 07:10 UTC

Very slow directory traversal

My last full backup of my Cyrus mail spool had 1,393,569 files and
cconsumed about 4G after compression. It took over 13 hours.  Some
investigation led to the following test:
 time tar cf /dev/null /var/spool/cyrus/mail/r/user/ross/debian/user/
That took 15 minutes the first time it ran, and 32 seconds when run
immediately thereafter.  There were 355,746 files. This is typical of
what I've been seeing: initial run is slow; later runs are much faster.

df shows
/dev/evms/CyrusSpool  19285771  17650480    606376  97% /var/spool/cyrus

mount shows
/dev/evms/CyrusSpool on /var/spool/cyrus type ext3 (rw,noatime)

The spool was active when I did the tests just described, but inactive
during backup.  It's on top of LVM as managed by EVMS in a Linux 2.6.18
kernel, Pentium 4 processor.  It might be significant the Linux treats
this as an SMP machine with 2 processors, since the single processor has
hyperthreading.  I'm using a stock Debian kernel, -686 variant.

# time dd if=/dev/evms/CyrusSpool bs=4096 skip=16k count=256k
of=/dev/null
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 26.4824 seconds, 40.5 MB/s

The spool was mostly populated all at once from another system, and the
file names are mostly numbers.  Perhaps that creates some hashing
trouble?

Can anyone explain this, or, even better, give me a hint how I could
improve this situation?

I found some earlier posts on similar issues, although they mostly
concerned apparently empty directories that took a long time.  Theodore
Tso had a comment that seemed to indicate that hashing conflicts with
Unix requirements.  I think the implication was that you could end up
with linearized, or partly linearized searches under some scenarios.
Since this is a mail spool, I think it gets lots of sync()'s.

I conducted pretty extensive tests before picking ext3 for this file
system; it was fastest for my tests of writing messages into the spool.
I think I tested the "nearly full disk" scenario, but I probably
didn't
test the scale of files I have now.  Obviously my problem now is
reading, not writing.

# dumpe2fs -h /dev/evms/CyrusSpool
dumpe2fs 1.40.2 (12-Jul-2007)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          44507cfa-39ce-46f1-9e3e-87091225395d
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype
needs_recovery sparse_super
Filesystem flags:         signed directory hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              10289152  # c 10x the number of files.
Block count:              20578300
Reserved block count:     1028915
Free blocks:              1651151
Free inodes:              8860352
First block:              1
Block size:               1024
Fragment size:            1024
Reserved GDT blocks:      236
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         4096
Inode blocks per group:   512
Filesystem created:       Mon Jan  1 11:32:49 2007
Last mount time:          Thu Oct  4 09:42:00 2007
Last write time:          Thu Oct  4 09:42:00 2007
Mount count:              2
Maximum mount count:      25
Last checked:             Fri Sep 28 09:26:39 2007
Check interval:           15552000 (6 months)
Next check after:         Wed Mar 26 09:26:39 2008
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      9f50511e-2078-4476-96f4-c6f3415fda4f
Journal backup:           inode blocks
Journal size:             32M

I believe I created it this way; in particular, I'm pretty sure I've had
dir_index from the start.

Alex Bligh

2007-Oct-06 11:06 UTC

head link

Very slow directory traversal

--On 06 October 2007 00:10 -0700 Ross Boylan <ross at biostat.ucsf.edu>
wrote:
> I believe I created it this way; in particular, I'm pretty sure
I've had
> dir_index from the start.
  find /var/spool/cyrus -type d -exec lsattr -lad \{\} \;

and check the large directories are actually indexed

Alex

Andreas Dilger

2007-Oct-10 15:59 UTC

head link

Very slow directory traversal

On Oct 06, 2007  00:10 -0700, Ross Boylan wrote:> My last full backup of my Cyrus mail spool had 1,393,569 files and
> cconsumed about 4G after compression. It took over 13 hours.  Some
> investigation led to the following test:
>  time tar cf /dev/null /var/spool/cyrus/mail/r/user/ross/debian/user/
FYI - "tar cf /dev/null" actually skips reading any file data.  The
code special cases /dev/null and skips the read entirely.
> That took 15 minutes the first time it ran, and 32 seconds when run
> immediately thereafter.  There were 355,746 files. This is typical of
> what I've been seeing: initial run is slow; later runs are much faster.
I'd expect this is because on the initial run the on-disk inode ordering 
causes a lot of seeks, and later runs come straight from memory.  Probably
not a lot you can do directly, but e.g. pre-reading the inode table would
be a good start.

> I found some earlier posts on similar issues, although they mostly
> concerned apparently empty directories that took a long time.  Theodore
> Tso had a comment that seemed to indicate that hashing conflicts with
> Unix requirements.  I think the implication was that you could end up
> with linearized, or partly linearized searches under some scenarios.
> Since this is a mail spool, I think it gets lots of sync()'s.
There was an LD_PRELOAD library that Ted wrote that may also help:
http://marc.info/?l=mutt-dev&m=107226330912347&w=2

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Ext3 users - Oct 2007 - Very slow directory traversal

Very slow directory traversal

Very slow directory traversal

Very slow directory traversal