thr3ads.net - zfs discuss - [zfs-discuss] Poor directory traversal or small file performance? [May 2006]

If this information is useful, please help other people find it:
Share via:

Joe Little

2006-May-04 21:47 UTC

[zfs-discuss] Poor directory traversal or small file performance?

I''ve been writing to the Solaris NFS list since I was getting some bad
performance copying via NFS (noticeably there) a large set of small
files. We have various source trees, including a tree with many linux
versions that I was copying to my ZFS NAS-to-be. On large files, it
flies pretty well, and "zpool iostat 1" shows interesting patterns of
writes in the low k''s up to 102MB/sec and down again as buffered
segments apparently are synced.

However, in the numerous small file case, we see consistently only
transfers in the low k''s per second. First, to give some background,
we are utilizing iscsi, with the backend made up a directly exposed
SATA disks via the target. I''ve put them in a 8 disk raidz:

  pool: poola0
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        poola0      ONLINE       0     0     0
          raidz     ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0
            c2t6d0  ONLINE       0     0     0
            c2t7d0  ONLINE       0     0     0
            c2t8d0  ONLINE       0     0     0

Again, I can get some great numbers on large files (doing a dd with a
large blocksize screams!), but as a test, I took a problematic tree of
around 1 million files, and walked it with a find/ls:

bash-3.00# time find . \! -name ".*" | wc -l
  987423

real    53m52.285s
user    0m2.624s
sys     0m27.980s

That was local to the system, and not even NFS.

The original files, located on a EXT3 RAID50, accessed via a linux
client (NFS v3):
[root at bagels old-servers]# time find . \! -name ".*" | wc -l
987423

real    1m4.255s
user    0m0.914s
sys     0m6.976s

Woe.. Something just isn''t right here. Are there explicit ways I can
find out what''s wrong with my setup? This is from a dtrace/zdb/mdb
neophyte. All I have been tracking with are zpool iostats.

Neil Perrin

2006-May-05 03:01 UTC

head link

[zfs-discuss] Poor directory traversal or small file performance?

Was this a 32 bit intel system by chance?
If so this is quite likely caused by:

6413731 pathologically slower fsync on 32 bit systems

This was fixed in snv_39.

Joe Little wrote On 05/04/06 15:47,:> I''ve been writing to the Solaris NFS list since I was getting some
bad
> performance copying via NFS (noticeably there) a large set of small
> files. We have various source trees, including a tree with many linux
> versions that I was copying to my ZFS NAS-to-be. On large files, it
> flies pretty well, and "zpool iostat 1" shows interesting
patterns of
> writes in the low k''s up to 102MB/sec and down again as buffered
> segments apparently are synced.
> 
> However, in the numerous small file case, we see consistently only
> transfers in the low k''s per second. First, to give some
background,
> we are utilizing iscsi, with the backend made up a directly exposed
> SATA disks via the target. I''ve put them in a 8 disk raidz:
> 
>   pool: poola0
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         poola0      ONLINE       0     0     0
>           raidz     ONLINE       0     0     0
>             c2t1d0  ONLINE       0     0     0
>             c2t2d0  ONLINE       0     0     0
>             c2t3d0  ONLINE       0     0     0
>             c2t4d0  ONLINE       0     0     0
>             c2t5d0  ONLINE       0     0     0
>             c2t6d0  ONLINE       0     0     0
>             c2t7d0  ONLINE       0     0     0
>             c2t8d0  ONLINE       0     0     0
> 
> Again, I can get some great numbers on large files (doing a dd with a
> large blocksize screams!), but as a test, I took a problematic tree of
> around 1 million files, and walked it with a find/ls:
> 
> bash-3.00# time find . \! -name ".*" | wc -l
>   987423
> 
> real    53m52.285s
> user    0m2.624s
> sys     0m27.980s
> 
> That was local to the system, and not even NFS.
> 
> The original files, located on a EXT3 RAID50, accessed via a linux
> client (NFS v3):
> [root at bagels old-servers]# time find . \! -name ".*" | wc -l
> 987423
> 
> real    1m4.255s
> user    0m0.914s
> sys     0m6.976s
> 
> Woe.. Something just isn''t right here. Are there explicit ways I
can
> find out what''s wrong with my setup? This is from a dtrace/zdb/mdb
> neophyte. All I have been tracking with are zpool iostats.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 

Neil

Neil Perrin

2006-May-05 03:08 UTC

head link

[zfs-discuss] Poor directory traversal or small file performance?

Actually the nfs slowness could be caused by the bug below,
but it doesn''t explain the "find ." times on a local zfs.

Neil Perrin wrote On 05/04/06 21:01,:> Was this a 32 bit intel system by chance?
> If so this is quite likely caused by:
> 
> 6413731 pathologically slower fsync on 32 bit systems
> 
> This was fixed in snv_39.
> 
> Joe Little wrote On 05/04/06 15:47,:
> 
>> I''ve been writing to the Solaris NFS list since I was getting
some bad
>> performance copying via NFS (noticeably there) a large set of small
>> files. We have various source trees, including a tree with many linux
>> versions that I was copying to my ZFS NAS-to-be. On large files, it
>> flies pretty well, and "zpool iostat 1" shows interesting
patterns of
>> writes in the low k''s up to 102MB/sec and down again as
buffered
>> segments apparently are synced.
>>
>> However, in the numerous small file case, we see consistently only
>> transfers in the low k''s per second. First, to give some
background,
>> we are utilizing iscsi, with the backend made up a directly exposed
>> SATA disks via the target. I''ve put them in a 8 disk raidz:
>>
>>   pool: poola0
>>  state: ONLINE
>>  scrub: none requested
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         poola0      ONLINE       0     0     0
>>           raidz     ONLINE       0     0     0
>>             c2t1d0  ONLINE       0     0     0
>>             c2t2d0  ONLINE       0     0     0
>>             c2t3d0  ONLINE       0     0     0
>>             c2t4d0  ONLINE       0     0     0
>>             c2t5d0  ONLINE       0     0     0
>>             c2t6d0  ONLINE       0     0     0
>>             c2t7d0  ONLINE       0     0     0
>>             c2t8d0  ONLINE       0     0     0
>>
>> Again, I can get some great numbers on large files (doing a dd with a
>> large blocksize screams!), but as a test, I took a problematic tree of
>> around 1 million files, and walked it with a find/ls:
>>
>> bash-3.00# time find . \! -name ".*" | wc -l
>>   987423
>>
>> real    53m52.285s
>> user    0m2.624s
>> sys     0m27.980s
>>
>> That was local to the system, and not even NFS.
>>
>> The original files, located on a EXT3 RAID50, accessed via a linux
>> client (NFS v3):
>> [root at bagels old-servers]# time find . \! -name ".*" | wc
-l
>> 987423
>>
>> real    1m4.255s
>> user    0m0.914s
>> sys     0m6.976s
>>
>> Woe.. Something just isn''t right here. Are there explicit ways
I can
>> find out what''s wrong with my setup? This is from a
dtrace/zdb/mdb
>> neophyte. All I have been tracking with are zpool iostats.
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
-- 

Neil

Joe Little

2006-May-05 03:58 UTC

head link

[zfs-discuss] Poor directory traversal or small file performance?

Nope. The ZFS head (iscsi initiator) is a Sun Ultra 20 Workstation.
The clients are RHEL4 quad opterons running the x86_64 kernel series.


On 5/4/06, Neil Perrin <Neil.Perrin at sun.com>
wrote:> Actually the nfs slowness could be caused by the bug below,
> but it doesn''t explain the "find ." times on a local
zfs.
>
> Neil Perrin wrote On 05/04/06 21:01,:
> > Was this a 32 bit intel system by chance?
> > If so this is quite likely caused by:
> >
> > 6413731 pathologically slower fsync on 32 bit systems
> >
> > This was fixed in snv_39.
> >
> > Joe Little wrote On 05/04/06 15:47,:
> >
> >> I''ve been writing to the Solaris NFS list since I was
getting some bad
> >> performance copying via NFS (noticeably there) a large set of
small
> >> files. We have various source trees, including a tree with many
linux
> >> versions that I was copying to my ZFS NAS-to-be. On large files,
it
> >> flies pretty well, and "zpool iostat 1" shows
interesting patterns of
> >> writes in the low k''s up to 102MB/sec and down again as
buffered
> >> segments apparently are synced.
> >>
> >> However, in the numerous small file case, we see consistently only
> >> transfers in the low k''s per second. First, to give some
background,
> >> we are utilizing iscsi, with the backend made up a directly
exposed
> >> SATA disks via the target. I''ve put them in a 8 disk
raidz:
> >>
> >>   pool: poola0
> >>  state: ONLINE
> >>  scrub: none requested
> >> config:
> >>
> >>         NAME        STATE     READ WRITE CKSUM
> >>         poola0      ONLINE       0     0     0
> >>           raidz     ONLINE       0     0     0
> >>             c2t1d0  ONLINE       0     0     0
> >>             c2t2d0  ONLINE       0     0     0
> >>             c2t3d0  ONLINE       0     0     0
> >>             c2t4d0  ONLINE       0     0     0
> >>             c2t5d0  ONLINE       0     0     0
> >>             c2t6d0  ONLINE       0     0     0
> >>             c2t7d0  ONLINE       0     0     0
> >>             c2t8d0  ONLINE       0     0     0
> >>
> >> Again, I can get some great numbers on large files (doing a dd
with a
> >> large blocksize screams!), but as a test, I took a problematic
tree of
> >> around 1 million files, and walked it with a find/ls:
> >>
> >> bash-3.00# time find . \! -name ".*" | wc -l
> >>   987423
> >>
> >> real    53m52.285s
> >> user    0m2.624s
> >> sys     0m27.980s
> >>
> >> That was local to the system, and not even NFS.
> >>
> >> The original files, located on a EXT3 RAID50, accessed via a linux
> >> client (NFS v3):
> >> [root at bagels old-servers]# time find . \! -name ".*"
| wc -l
> >> 987423
> >>
> >> real    1m4.255s
> >> user    0m0.914s
> >> sys     0m6.976s
> >>
> >> Woe.. Something just isn''t right here. Are there explicit
ways I can
> >> find out what''s wrong with my setup? This is from a
dtrace/zdb/mdb
> >> neophyte. All I have been tracking with are zpool iostats.
> >> _______________________________________________
> >> zfs-discuss mailing list
> >> zfs-discuss at opensolaris.org
> >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> >
>
> --
>
> Neil
>

Joe Little

2006-May-05 05:18 UTC

head link

[zfs-discuss] Poor directory traversal or small file performance?

I just responsed to the NFS list, and it definitely looks like a bad
interaction between NFS->ZFS->iSCSI, where as the first two (local
disk for ZFS) or the last two (no ZFS) are very fast. Are there posted
zfs dtrace scripts for observability of i/o?


On 5/4/06, Neil Perrin <Neil.Perrin at sun.com>
wrote:> Actually the nfs slowness could be caused by the bug below,
> but it doesn''t explain the "find ." times on a local
zfs.
>
> Neil Perrin wrote On 05/04/06 21:01,:
> > Was this a 32 bit intel system by chance?
> > If so this is quite likely caused by:
> >
> > 6413731 pathologically slower fsync on 32 bit systems
> >
> > This was fixed in snv_39.
> >
> > Joe Little wrote On 05/04/06 15:47,:
> >
> >> I''ve been writing to the Solaris NFS list since I was
getting some bad
> >> performance copying via NFS (noticeably there) a large set of
small
> >> files. We have various source trees, including a tree with many
linux
> >> versions that I was copying to my ZFS NAS-to-be. On large files,
it
> >> flies pretty well, and "zpool iostat 1" shows
interesting patterns of
> >> writes in the low k''s up to 102MB/sec and down again as
buffered
> >> segments apparently are synced.
> >>
> >> However, in the numerous small file case, we see consistently only
> >> transfers in the low k''s per second. First, to give some
background,
> >> we are utilizing iscsi, with the backend made up a directly
exposed
> >> SATA disks via the target. I''ve put them in a 8 disk
raidz:
> >>
> >>   pool: poola0
> >>  state: ONLINE
> >>  scrub: none requested
> >> config:
> >>
> >>         NAME        STATE     READ WRITE CKSUM
> >>         poola0      ONLINE       0     0     0
> >>           raidz     ONLINE       0     0     0
> >>             c2t1d0  ONLINE       0     0     0
> >>             c2t2d0  ONLINE       0     0     0
> >>             c2t3d0  ONLINE       0     0     0
> >>             c2t4d0  ONLINE       0     0     0
> >>             c2t5d0  ONLINE       0     0     0
> >>             c2t6d0  ONLINE       0     0     0
> >>             c2t7d0  ONLINE       0     0     0
> >>             c2t8d0  ONLINE       0     0     0
> >>
> >> Again, I can get some great numbers on large files (doing a dd
with a
> >> large blocksize screams!), but as a test, I took a problematic
tree of
> >> around 1 million files, and walked it with a find/ls:
> >>
> >> bash-3.00# time find . \! -name ".*" | wc -l
> >>   987423
> >>
> >> real    53m52.285s
> >> user    0m2.624s
> >> sys     0m27.980s
> >>
> >> That was local to the system, and not even NFS.
> >>
> >> The original files, located on a EXT3 RAID50, accessed via a linux
> >> client (NFS v3):
> >> [root at bagels old-servers]# time find . \! -name ".*"
| wc -l
> >> 987423
> >>
> >> real    1m4.255s
> >> user    0m0.914s
> >> sys     0m6.976s
> >>
> >> Woe.. Something just isn''t right here. Are there explicit
ways I can
> >> find out what''s wrong with my setup? This is from a
dtrace/zdb/mdb
> >> neophyte. All I have been tracking with are zpool iostats.
> >> _______________________________________________
> >> zfs-discuss mailing list
> >> zfs-discuss at opensolaris.org
> >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> >
>
> --
>
> Neil
>

zfs discuss - May 2006 - Poor directory traversal or small file performance?

[zfs-discuss] Poor directory traversal or small file performance?

[zfs-discuss] Poor directory traversal or small file performance?

[zfs-discuss] Poor directory traversal or small file performance?

[zfs-discuss] Poor directory traversal or small file performance?

[zfs-discuss] Poor directory traversal or small file performance?