thr3ads.net - zfs discuss - [zfs-discuss] 40min ls in empty directory [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Ben Rockwood

2008-Jul-16 21:46 UTC

[zfs-discuss] 40min ls in empty directory

I''ve run into an odd problem which I lovingly refer to as a "black
hole directory".

On a Thumper used for mail stores we''ve found find''s take an
exceptionally long time to run.  There are directories that have as many as
400,000 files, which I immediately considered the culprit.  However, under
investigation, they aren''t the problem at all.  The problem is seen
here in this truss output (first column is delta time):


 0.0001 lstat64("tmp", 0x08046A20)                      = 0
 0.0000 openat(AT_FDCWD, "tmp", O_RDONLY|O_NDELAY|O_LARGEFILE) = 8
 0.0001 fcntl(8, F_SETFD, 0x00000001)                   = 0
 0.0000 fstat64(8, 0x08046920)                          = 0
 0.0000 fstat64(8, 0x08046AB0)                          = 0
 0.0000 fchdir(8)                                       = 0
1321.3133       getdents64(8, 0xFEE48000, 8192)                 = 48
1255.8416       getdents64(8, 0xFEE48000, 8192)                 = 0
 0.0001 fchdir(7)                                       = 0
 0.0001 close(8)                                        = 0

These two getdents64 syscalls take approx 20 mins each.  Notice that the
directory structure is 48 bytes, the directory is empty:

drwx------   2 102      102            2 Feb 21 02:24 tmp

My assumption is that the directory is corrupt, but I''d like to prove
that.  I have a scrub running on the pool, but its got about 16 hours to go
before it completes.  20% complete thus far and nothing is reported.

No errors are logged when I stimulate this problem.

Does anyone have suggestions on how to get additional data on this issue? 
I''ve used dtrace flows to examine, however what I really want to see is
the zio''s as a result of the getdents, but can''t see how to do
so.  Ideally I''d quiet the system and watch all zio''s
occurring while I stimulate it, but this is production and not possible.   If
anyone knows how to watch DMU/ZIO activity that _only_ pertains to a certain PID
please let me know. ;)

Suggestions on how to pro-actively catch these sorts of instances are welcome,
as are alternative explanations.

benr.
 
 
This message posted from opensolaris.org

zfs discuss - Jul 2008 - 40min ls in empty directory

[zfs-discuss] 40min ls in empty directory