thr3ads.net - freebsd stable - stat() takes 54 msec in a directory with 94k files (even with a big dirhash) [May 2009]

If this information is useful, please help other people find it:
Share via:

Attila Nagy

2009-May-12 09:06 UTC

stat() takes 54 msec in a directory with 94k files (even with a big dirhash)

Hello,

I have a strange error on FreeBSD 7-STABLE (compiled on 7th May, just 
few commits after the release, but an earlier kernel did the same).

I'm doing several parallel rsyncs from a machine to another (let's call 
them source and destination). The source contains maildirs, so there are 
some directories with a (relatively) lot of files.
The source runs an earlier (around 6.2) FreeBSD and plain softupdates 
mounted UFS2 file systems.
The destination has a bigger (UFS2) filesystem, on top of gjournal, 
mounted as async.

I've noticed that rsync sometimes stops moving data and the destination 
machine gets sluggish. After some testing, I could catch the effect in 
action (was not that hard, because it persists even for hours sometimes).

top shows around 20% system activity (there are two quad core CPUs) and 
0% user. The WCPU field at rsync shows 100%.

ktrace-ing the rsync process I can see this:
   31639 rsync    0.000004 CALL  lstat(0x7fffffffab70,0x7fffffffaf70)
   31639 rsync    0.000004 NAMI  
"hm33/00/16/uid/Maildir/new/1212536121.54673,S=3128"
   31639 rsync    0.054226 STRU  struct stat {dev=100, ino=136943662, 
mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942760, 
atime=1241807071, stime=1212536121, ctime=1241807071, 
birthtime=1212536121, size=3128, blksize=4096, blocks=8, flags=0x0 }
   31639 rsync    0.000013 RET   lstat 0
   31639 rsync    0.000018 CALL  lstat(0x7fffffffab70,0x7fffffffaf70)
   31639 rsync    0.000004 NAMI  
"hm33/00/16/uid/Maildir/new/1212537276.69702,S=4634"
   31639 rsync    0.054409 STRU  struct stat {dev=100, ino=136943663, 
mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942762, 
atime=1241807071, stime=1212537276, ctime=1241807071, 
birthtime=1212537276, size=4634, blksize=4096, blocks=12, flags=0x0 }
   31639 rsync    0.000013 RET   lstat 0
   31639 rsync    0.000020 CALL  lstat(0x7fffffffab70,0x7fffffffaf70)
   31639 rsync    0.000005 NAMI  
"hm33/00/16/uid/Maildir/new/1212537689.74390,S=3172"
   31639 rsync    0.054230 STRU  struct stat {dev=100, ino=136943664, 
mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942765, 
atime=1241807071, stime=1212537689, ctime=1241807071, 
birthtime=1212537689, size=3172, blksize=4096, blocks=8, flags=0x0 }
   31639 rsync    0.000013 RET   lstat 0

So according to ktrace, the stat call takes 54 milliseconds to return 
for each of the files.
I have tried with the default and a pretty much raised dirhash maxmem 
value, but I can still get these.
Currently I have:
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 18589428
vfs.ufs.dirhash_maxmem: 209715200
vfs.ufs.dirhash_minsize: 2560
So dirhash has space to expand.

The directory in question contains 94493 files.

The source machine doesn't show this behaviour.

top's output on the destination machine:
CPU:  0.0% user,  0.0% nice, 22.7% system,  0.0% interrupt, 77.3% idle
Mem: 159M Active, 3032M Inact, 599M Wired, 47M Cache, 399M Buf, 102M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
31639 root         1 118    0 50648K 10512K CPU0   0   2:01 100.00% rsync
  634 root         1  -4    0  2536K   628K vlruwk 1   0:20  0.00% supervise
26760 root         1  44    0 25940K  3316K select 1   0:10  0.00% sshd
31640 root         1  75    0 87512K  8324K suspfs 4   0:10  0.00% rsync
31641 root         1  75    0 18904K  7124K suspfs 6   0:10  0.00% rsync
31637 root         1  75    0 40408K  7744K suspfs 4   0:09  0.00% rsync
31636 root         1  44    0 20952K  6288K select 2   0:09  0.00% rsync
31638 root         1  44    0   104M  8912K select 3   0:09  0.00% rsync
31635 root         1  75    0 80344K  7812K suspfs 4   0:09  0.00% rsync
31642 root         1  44    0 17940K  7624K select 1   0:04  0.00% ssh
31646 root         1  45    0 17940K  7656K select 1   0:03  0.00% ssh

All of the rsyncs use the same file system, but with different top level 
directories. During this, neither of the other rsyncs can run.

Any ideas about what could be done to work around this?

Thanks,

Paul B. Mahol

2009-May-12 10:27 UTC

head link

stat() takes 54 msec in a directory with 94k files (even with a big dirhash)

On 5/12/09, Attila Nagy <bra@fsn.hu> wrote:> Hello,
>
> I have a strange error on FreeBSD 7-STABLE (compiled on 7th May, just
> few commits after the release, but an earlier kernel did the same).
>
> I'm doing several parallel rsyncs from a machine to another (let's
call
> them source and destination). The source contains maildirs, so there are
> some directories with a (relatively) lot of files.
> The source runs an earlier (around 6.2) FreeBSD and plain softupdates
> mounted UFS2 file systems.
> The destination has a bigger (UFS2) filesystem, on top of gjournal,
> mounted as async.
>
> I've noticed that rsync sometimes stops moving data and the destination
> machine gets sluggish. After some testing, I could catch the effect in
> action (was not that hard, because it persists even for hours sometimes).
>
> top shows around 20% system activity (there are two quad core CPUs) and
> 0% user. The WCPU field at rsync shows 100%.
>
> ktrace-ing the rsync process I can see this:
>    31639 rsync    0.000004 CALL  lstat(0x7fffffffab70,0x7fffffffaf70)
>    31639 rsync    0.000004 NAMI
> "hm33/00/16/uid/Maildir/new/1212536121.54673,S=3128"
>    31639 rsync    0.054226 STRU  struct stat {dev=100, ino=136943662,
> mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942760,
> atime=1241807071, stime=1212536121, ctime=1241807071,
> birthtime=1212536121, size=3128, blksize=4096, blocks=8, flags=0x0 }
>    31639 rsync    0.000013 RET   lstat 0
>    31639 rsync    0.000018 CALL  lstat(0x7fffffffab70,0x7fffffffaf70)
>    31639 rsync    0.000004 NAMI
> "hm33/00/16/uid/Maildir/new/1212537276.69702,S=4634"
>    31639 rsync    0.054409 STRU  struct stat {dev=100, ino=136943663,
> mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942762,
> atime=1241807071, stime=1212537276, ctime=1241807071,
> birthtime=1212537276, size=4634, blksize=4096, blocks=12, flags=0x0 }
>    31639 rsync    0.000013 RET   lstat 0
>    31639 rsync    0.000020 CALL  lstat(0x7fffffffab70,0x7fffffffaf70)
>    31639 rsync    0.000005 NAMI
> "hm33/00/16/uid/Maildir/new/1212537689.74390,S=3172"
>    31639 rsync    0.054230 STRU  struct stat {dev=100, ino=136943664,
> mode=-rw------- , nlink=1, uid=999, gid=999, rdev=546942765,
> atime=1241807071, stime=1212537689, ctime=1241807071,
> birthtime=1212537689, size=3172, blksize=4096, blocks=8, flags=0x0 }
>    31639 rsync    0.000013 RET   lstat 0
>
> So according to ktrace, the stat call takes 54 milliseconds to return
> for each of the files.
> I have tried with the default and a pretty much raised dirhash maxmem
> value, but I can still get these.
> Currently I have:
> vfs.ufs.dirhash_docheck: 0
> vfs.ufs.dirhash_mem: 18589428
> vfs.ufs.dirhash_maxmem: 209715200
> vfs.ufs.dirhash_minsize: 2560
> So dirhash has space to expand.
>
> The directory in question contains 94493 files.
>
> The source machine doesn't show this behaviour.
>
> top's output on the destination machine:
> CPU:  0.0% user,  0.0% nice, 22.7% system,  0.0% interrupt, 77.3% idle
> Mem: 159M Active, 3032M Inact, 599M Wired, 47M Cache, 399M Buf, 102M Free
> Swap: 4096M Total, 4096M Free
>
>   PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> 31639 root         1 118    0 50648K 10512K CPU0   0   2:01 100.00% rsync
>   634 root         1  -4    0  2536K   628K vlruwk 1   0:20  0.00%
supervise
> 26760 root         1  44    0 25940K  3316K select 1   0:10  0.00% sshd
> 31640 root         1  75    0 87512K  8324K suspfs 4   0:10  0.00% rsync
> 31641 root         1  75    0 18904K  7124K suspfs 6   0:10  0.00% rsync
> 31637 root         1  75    0 40408K  7744K suspfs 4   0:09  0.00% rsync
> 31636 root         1  44    0 20952K  6288K select 2   0:09  0.00% rsync
> 31638 root         1  44    0   104M  8912K select 3   0:09  0.00% rsync
> 31635 root         1  75    0 80344K  7812K suspfs 4   0:09  0.00% rsync
> 31642 root         1  44    0 17940K  7624K select 1   0:04  0.00% ssh
> 31646 root         1  45    0 17940K  7656K select 1   0:03  0.00% ssh
>
> All of the rsyncs use the same file system, but with different top level
> directories. During this, neither of the other rsyncs can run.
>
> Any ideas about what could be done to work around this?
Big guess, maybe it updates atime? Try with noatime mount option.
-- 
Paul

freebsd stable - May 2009 - stat() takes 54 msec in a directory with 94k files (even with a big dirhash)

stat() takes 54 msec in a directory with 94k files (even with a big dirhash)

stat() takes 54 msec in a directory with 94k files (even with a big dirhash)