thr3ads.net - freebsd stable - ZFS directory with a large number of files [Aug 2011]

If this information is useful, please help other people find it:
Share via:

seanrees@gmail.com

2011-Aug-02 08:07 UTC

ZFS directory with a large number of files

Hi there,

I Googled around and checked the PRs and wasn't successful in finding
any reports of what I'm seeing. I'm hoping someone here can help me
debug what's going on.

On my FreeBSD 8.2-S machine (built circa 12th June), I created a
directory and populated it over the course of 3 weeks with about 2
million individual files. As you might imagine, a 'ls' of this
directory took quite some time.

The files were conveniently named with a timestamp in the filename
(still images from a security camera, once per second) so I've since
moved them all to timestamped directories (yyyy/MM/dd/hh/mm). What I
found though was the original directory the images were in is still
very slow to ls -- and it only has 1 file in it, another directory.

To clarify:
% ls second
[lots of time and many many files enumerated]
% # rename files using rename script
% ls second
[wait ages]
2011 dead
% mkdir second2 && mv second/2011 second2
% ls second2
[fast!]
2011
% ls second
[still very slow]
dead
% time ls second
dead/
gls -F --color  0.00s user 1.56s system 0% cpu 3:09.61 total

(timings are similar for /bin/ls)

This data is stored on a striped ZFS pool (version 15, though the
kernel reports version 28 is available but zpool upgrade seems to
disagree), 2T in size. I've run zpool scrub with no effect. ZFS is
busily driving the disks away; my iostat monitoring has all three
drives in the zpool running at 40-60% busy for the duration of the ls
(it was quiet before).

I've attached truss to the ls process. It spends a lot of time here:
fstatfs(0x5,0x7fffffffe0d0,0x800ad5548,0x7fffffffdfd8,0x0,0x0) = 0 (0x0)

I'm thinking there's some old ZFS metadata that it's looking into,
but
I'm not sure how to best dig into this to understand what's going on
under the hood.

Can anyone perhaps point me the right direction on this?

Thanks,

Sean

Ronald Klop

2011-Aug-02 08:51 UTC

head link

ZFS directory with a large number of files

Not an in depth solution for ZFS, but maybe a solution for you.
mkdir images2
mv images/* images2
rmdir images

Ronald.


On Tue, 02 Aug 2011 09:39:03 +0200, seanrees@gmail.com  
<seanrees@gmail.com> wrote:
> Hi there,
>
> I Googled around and checked the PRs and wasn't successful in finding
> any reports of what I'm seeing. I'm hoping someone here can help me
> debug what's going on.
>
> On my FreeBSD 8.2-S machine (built circa 12th June), I created a
> directory and populated it over the course of 3 weeks with about 2
> million individual files. As you might imagine, a 'ls' of this
> directory took quite some time.
>
> The files were conveniently named with a timestamp in the filename
> (still images from a security camera, once per second) so I've since
> moved them all to timestamped directories (yyyy/MM/dd/hh/mm). What I
> found though was the original directory the images were in is still
> very slow to ls -- and it only has 1 file in it, another directory.
>
> To clarify:
> % ls second
> [lots of time and many many files enumerated]
> % # rename files using rename script
> % ls second
> [wait ages]
> 2011 dead
> % mkdir second2 && mv second/2011 second2
> % ls second2
> [fast!]
> 2011
> % ls second
> [still very slow]
> dead
> % time ls second
> dead/
> gls -F --color  0.00s user 1.56s system 0% cpu 3:09.61 total
>
> (timings are similar for /bin/ls)
>
> This data is stored on a striped ZFS pool (version 15, though the
> kernel reports version 28 is available but zpool upgrade seems to
> disagree), 2T in size. I've run zpool scrub with no effect. ZFS is
> busily driving the disks away; my iostat monitoring has all three
> drives in the zpool running at 40-60% busy for the duration of the ls
> (it was quiet before).
>
> I've attached truss to the ls process. It spends a lot of time here:
> fstatfs(0x5,0x7fffffffe0d0,0x800ad5548,0x7fffffffdfd8,0x0,0x0) = 0 (0x0)
>
> I'm thinking there's some old ZFS metadata that it's looking
into, but
> I'm not sure how to best dig into this to understand what's going
on
> under the hood.
>
> Can anyone perhaps point me the right direction on this?
>
> Thanks,
>
> Sean
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"

Jeremy Chadwick

2011-Aug-02 09:08 UTC

head link

ZFS directory with a large number of files

On Tue, Aug 02, 2011 at 08:39:03AM +0100, seanrees@gmail.com
wrote:> On my FreeBSD 8.2-S machine (built circa 12th June), I created a
> directory and populated it over the course of 3 weeks with about 2
> million individual files.
I'll keep this real simple:

Why did you do this?

I hope this was a stress test of some kind.  If not:

This is the 2nd or 3rd mail in recent months from people saying "I
decided to do something utterly stupid with my filesystem[1] and now I'm
asking why performance sucks".

Why can people not create proper directory tree layouts to avoid this
problem regardless of what filesystem is used?  I just don't get it.

[1]: Applies to any filesystem, not just ZFS.  There was a UFS one a
month or two ago too...

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

C. P. Ghost

2011-Aug-02 09:30 UTC

head link

ZFS directory with a large number of files

On Tue, Aug 2, 2011 at 9:39 AM, seanrees@gmail.com <seanrees@gmail.com>
wrote:> On my FreeBSD 8.2-S machine (built circa 12th June), I created a
> directory and populated it over the course of 3 weeks with about 2
> million individual files. As you might imagine, a 'ls' of this
> directory took quite some time.
What actually takes some time here isn't zfs, but the sorting
of ls(1). Usually, running ls(1) with -f (Output is not sorted)
speeds up things enormously.
> The files were conveniently named with a timestamp in the filename
> (still images from a security camera, once per second) so I've since
> moved them all to timestamped directories (yyyy/MM/dd/hh/mm). What I
> found though was the original directory the images were in is still
> very slow to ls -- and it only has 1 file in it, another directory.
That is strange... and shouldn't happen. According to the ZFS
Performance Wiki [1], operations on ZFS file systems are supposed to
be pretty efficient:

  Concurrent, constant time directory operations

  Large directories need constant time operations (lookup, create,
  delete, etc). Hot directories need concurrent operations. ZFS uses
  extensible hashing to solve this. Block based, amortized growth
  cost, short chains for constant time ops, per-block locking for high
  concurrency. A caveat is that readir returns entries in hash order.

  Directories are implemented via the ZFS Attribute Processor (ZAP) in
  ZFS. ZAP can be used to arbitrary name value pairs. ZAP uses two
  algorithms are optimized for large lists (large directories) and
  small lists (attribute lists).

  The ZAP implementation is in zap.c and zap_leaf.c. Each directory is
  maintained as a table of pointers to constant sized buckets holding
  a variable number of entries. Each directory record is 16k in
  size. When this block gets full, a new block of size next power of
  two is allocated.

  A directory starts off as a microzap, and then upgraded to a fat zap
  (via mzap_upgrade) if the size of the name exceeds MZAP_NAME_LEN (
  MZAP_ENT_LEN - 8 - 4 - 2) or 50 or if the size of the microzap
  exceeds MZAP_MAX_BLKSZ (128k)

[1]: http://www.solarisinternals.com/wiki/index.php/ZFS_Performance

I don't know what's going on there, but someone with ZFS internals
expertise may want to have a closer look.
> To clarify:
> % ls second
> [lots of time and many many files enumerated]
> % # rename files using rename script
> % ls second
> [wait ages]
> 2011 dead
> % mkdir second2 && mv second/2011 second2
> % ls second2
> [fast!]
> 2011
> % ls second
> [still very slow]
> dead
> % time ls second
> dead/
> gls -F --color ?0.00s user 1.56s system 0% cpu 3:09.61 total
>
> (timings are similar for /bin/ls)
>
> This data is stored on a striped ZFS pool (version 15, though the
> kernel reports version 28 is available but zpool upgrade seems to
> disagree), 2T in size. I've run zpool scrub with no effect. ZFS is
> busily driving the disks away; my iostat monitoring has all three
> drives in the zpool running at 40-60% busy for the duration of the ls
> (it was quiet before).
>
> I've attached truss to the ls process. It spends a lot of time here:
> fstatfs(0x5,0x7fffffffe0d0,0x800ad5548,0x7fffffffdfd8,0x0,0x0) = 0 (0x0)
That's a very good hint indeed!
> I'm thinking there's some old ZFS metadata that it's looking
into, but
> I'm not sure how to best dig into this to understand what's going
on
> under the hood.
>
> Can anyone perhaps point me the right direction on this?
>
> Thanks,
>
> Sean
Regards,
-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/

Peter Jeremy

2011-Aug-03 06:52 UTC

head link

ZFS directory with a large number of files

On 2011-Aug-02 08:39:03 +0100, "seanrees@gmail.com"
<seanrees@gmail.com> wrote:>On my FreeBSD 8.2-S machine (built circa 12th June), I created a
>directory and populated it over the course of 3 weeks with about 2
>million individual files. As you might imagine, a 'ls' of this
>directory took quite some time.
>
>The files were conveniently named with a timestamp in the filename
>(still images from a security camera, once per second) so I've since
>moved them all to timestamped directories (yyyy/MM/dd/hh/mm). What I
>found though was the original directory the images were in is still
>very slow to ls -- and it only has 1 file in it, another directory.
I've also seen this behaviour on Solaris 10 after cleaning out a
directory with a large number of files (though not as pathological as
your case).  I tried creating and deleting entries in an unsuccessful
effort to trigger directory compaction.  I wound up moving the
remaining contents into a new directory, deleting the original one
and renaming the new directory.

It would appear te be a garbage collection bug in ZFS.

On 2011-Aug-02 13:10:27 +0300, Daniel Kalchev <daniel@digsys.bg>
wrote:>On 02.08.11 12:46, Daniel O'Connor wrote:
>> I am pretty sure UFS does not have this problem. i.e. once you 
>> delete/move the files out of the directory its performance would be 
>> good again. 
>
>UFS would be the classic example of poor performance if you do this.
Traditional UFS (including Solaris) behave badly in this scenario but
4.4BSD derivatives will release unused space at the end of a directory
and have smarts to more efficiently skip unused entries at the start
of a directory.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110803/0dd25d6e/attachment.pgp

Damien Fleuriot

2011-Aug-09 09:16 UTC

head link

ZFS directory with a large number of files

On 8/2/11 9:39 AM, seanrees@gmail.com wrote:> Hi there,
> 
> I Googled around and checked the PRs and wasn't successful in finding
> any reports of what I'm seeing. I'm hoping someone here can help me
> debug what's going on.
> 
> On my FreeBSD 8.2-S machine (built circa 12th June), I created a
> directory and populated it over the course of 3 weeks with about 2
> million individual files. As you might imagine, a 'ls' of this
> directory took quite some time.
> 
> The files were conveniently named with a timestamp in the filename
> (still images from a security camera, once per second) so I've since
> moved them all to timestamped directories (yyyy/MM/dd/hh/mm). What I
> found though was the original directory the images were in is still
> very slow to ls -- and it only has 1 file in it, another directory.
> 
While not addressing your original question, which many people have
already, I'll toss in the following:

I do hope you've disabled access times on your ZFS dataset ?

zfs set atime=off YOUR_DATASET/supercamera/captures

freebsd stable - Aug 2011 - ZFS directory with a large number of files

ZFS directory with a large number of files

ZFS directory with a large number of files

ZFS directory with a large number of files

ZFS directory with a large number of files

ZFS directory with a large number of files

ZFS directory with a large number of files