Joe Little
2006-May-04 21:47 UTC
[zfs-discuss] Poor directory traversal or small file performance?
I''ve been writing to the Solaris NFS list since I was getting some bad
performance copying via NFS (noticeably there) a large set of small
files. We have various source trees, including a tree with many linux
versions that I was copying to my ZFS NAS-to-be. On large files, it
flies pretty well, and "zpool iostat 1" shows interesting patterns of
writes in the low k''s up to 102MB/sec and down again as buffered
segments apparently are synced.
However, in the numerous small file case, we see consistently only
transfers in the low k''s per second. First, to give some background,
we are utilizing iscsi, with the backend made up a directly exposed
SATA disks via the target. I''ve put them in a 8 disk raidz:
pool: poola0
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
poola0 ONLINE 0 0 0
raidz ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0
c2t5d0 ONLINE 0 0 0
c2t6d0 ONLINE 0 0 0
c2t7d0 ONLINE 0 0 0
c2t8d0 ONLINE 0 0 0
Again, I can get some great numbers on large files (doing a dd with a
large blocksize screams!), but as a test, I took a problematic tree of
around 1 million files, and walked it with a find/ls:
bash-3.00# time find . \! -name ".*" | wc -l
987423
real 53m52.285s
user 0m2.624s
sys 0m27.980s
That was local to the system, and not even NFS.
The original files, located on a EXT3 RAID50, accessed via a linux
client (NFS v3):
[root at bagels old-servers]# time find . \! -name ".*" | wc -l
987423
real 1m4.255s
user 0m0.914s
sys 0m6.976s
Woe.. Something just isn''t right here. Are there explicit ways I can
find out what''s wrong with my setup? This is from a dtrace/zdb/mdb
neophyte. All I have been tracking with are zpool iostats.
Neil Perrin
2006-May-05 03:01 UTC
[zfs-discuss] Poor directory traversal or small file performance?
Was this a 32 bit intel system by chance? If so this is quite likely caused by: 6413731 pathologically slower fsync on 32 bit systems This was fixed in snv_39. Joe Little wrote On 05/04/06 15:47,:> I''ve been writing to the Solaris NFS list since I was getting some bad > performance copying via NFS (noticeably there) a large set of small > files. We have various source trees, including a tree with many linux > versions that I was copying to my ZFS NAS-to-be. On large files, it > flies pretty well, and "zpool iostat 1" shows interesting patterns of > writes in the low k''s up to 102MB/sec and down again as buffered > segments apparently are synced. > > However, in the numerous small file case, we see consistently only > transfers in the low k''s per second. First, to give some background, > we are utilizing iscsi, with the backend made up a directly exposed > SATA disks via the target. I''ve put them in a 8 disk raidz: > > pool: poola0 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > poola0 ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t2d0 ONLINE 0 0 0 > c2t3d0 ONLINE 0 0 0 > c2t4d0 ONLINE 0 0 0 > c2t5d0 ONLINE 0 0 0 > c2t6d0 ONLINE 0 0 0 > c2t7d0 ONLINE 0 0 0 > c2t8d0 ONLINE 0 0 0 > > Again, I can get some great numbers on large files (doing a dd with a > large blocksize screams!), but as a test, I took a problematic tree of > around 1 million files, and walked it with a find/ls: > > bash-3.00# time find . \! -name ".*" | wc -l > 987423 > > real 53m52.285s > user 0m2.624s > sys 0m27.980s > > That was local to the system, and not even NFS. > > The original files, located on a EXT3 RAID50, accessed via a linux > client (NFS v3): > [root at bagels old-servers]# time find . \! -name ".*" | wc -l > 987423 > > real 1m4.255s > user 0m0.914s > sys 0m6.976s > > Woe.. Something just isn''t right here. Are there explicit ways I can > find out what''s wrong with my setup? This is from a dtrace/zdb/mdb > neophyte. All I have been tracking with are zpool iostats. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Neil
Neil Perrin
2006-May-05 03:08 UTC
[zfs-discuss] Poor directory traversal or small file performance?
Actually the nfs slowness could be caused by the bug below, but it doesn''t explain the "find ." times on a local zfs. Neil Perrin wrote On 05/04/06 21:01,:> Was this a 32 bit intel system by chance? > If so this is quite likely caused by: > > 6413731 pathologically slower fsync on 32 bit systems > > This was fixed in snv_39. > > Joe Little wrote On 05/04/06 15:47,: > >> I''ve been writing to the Solaris NFS list since I was getting some bad >> performance copying via NFS (noticeably there) a large set of small >> files. We have various source trees, including a tree with many linux >> versions that I was copying to my ZFS NAS-to-be. On large files, it >> flies pretty well, and "zpool iostat 1" shows interesting patterns of >> writes in the low k''s up to 102MB/sec and down again as buffered >> segments apparently are synced. >> >> However, in the numerous small file case, we see consistently only >> transfers in the low k''s per second. First, to give some background, >> we are utilizing iscsi, with the backend made up a directly exposed >> SATA disks via the target. I''ve put them in a 8 disk raidz: >> >> pool: poola0 >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> poola0 ONLINE 0 0 0 >> raidz ONLINE 0 0 0 >> c2t1d0 ONLINE 0 0 0 >> c2t2d0 ONLINE 0 0 0 >> c2t3d0 ONLINE 0 0 0 >> c2t4d0 ONLINE 0 0 0 >> c2t5d0 ONLINE 0 0 0 >> c2t6d0 ONLINE 0 0 0 >> c2t7d0 ONLINE 0 0 0 >> c2t8d0 ONLINE 0 0 0 >> >> Again, I can get some great numbers on large files (doing a dd with a >> large blocksize screams!), but as a test, I took a problematic tree of >> around 1 million files, and walked it with a find/ls: >> >> bash-3.00# time find . \! -name ".*" | wc -l >> 987423 >> >> real 53m52.285s >> user 0m2.624s >> sys 0m27.980s >> >> That was local to the system, and not even NFS. >> >> The original files, located on a EXT3 RAID50, accessed via a linux >> client (NFS v3): >> [root at bagels old-servers]# time find . \! -name ".*" | wc -l >> 987423 >> >> real 1m4.255s >> user 0m0.914s >> sys 0m6.976s >> >> Woe.. Something just isn''t right here. Are there explicit ways I can >> find out what''s wrong with my setup? This is from a dtrace/zdb/mdb >> neophyte. All I have been tracking with are zpool iostats. >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- Neil
Joe Little
2006-May-05 03:58 UTC
[zfs-discuss] Poor directory traversal or small file performance?
Nope. The ZFS head (iscsi initiator) is a Sun Ultra 20 Workstation. The clients are RHEL4 quad opterons running the x86_64 kernel series. On 5/4/06, Neil Perrin <Neil.Perrin at sun.com> wrote:> Actually the nfs slowness could be caused by the bug below, > but it doesn''t explain the "find ." times on a local zfs. > > Neil Perrin wrote On 05/04/06 21:01,: > > Was this a 32 bit intel system by chance? > > If so this is quite likely caused by: > > > > 6413731 pathologically slower fsync on 32 bit systems > > > > This was fixed in snv_39. > > > > Joe Little wrote On 05/04/06 15:47,: > > > >> I''ve been writing to the Solaris NFS list since I was getting some bad > >> performance copying via NFS (noticeably there) a large set of small > >> files. We have various source trees, including a tree with many linux > >> versions that I was copying to my ZFS NAS-to-be. On large files, it > >> flies pretty well, and "zpool iostat 1" shows interesting patterns of > >> writes in the low k''s up to 102MB/sec and down again as buffered > >> segments apparently are synced. > >> > >> However, in the numerous small file case, we see consistently only > >> transfers in the low k''s per second. First, to give some background, > >> we are utilizing iscsi, with the backend made up a directly exposed > >> SATA disks via the target. I''ve put them in a 8 disk raidz: > >> > >> pool: poola0 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> poola0 ONLINE 0 0 0 > >> raidz ONLINE 0 0 0 > >> c2t1d0 ONLINE 0 0 0 > >> c2t2d0 ONLINE 0 0 0 > >> c2t3d0 ONLINE 0 0 0 > >> c2t4d0 ONLINE 0 0 0 > >> c2t5d0 ONLINE 0 0 0 > >> c2t6d0 ONLINE 0 0 0 > >> c2t7d0 ONLINE 0 0 0 > >> c2t8d0 ONLINE 0 0 0 > >> > >> Again, I can get some great numbers on large files (doing a dd with a > >> large blocksize screams!), but as a test, I took a problematic tree of > >> around 1 million files, and walked it with a find/ls: > >> > >> bash-3.00# time find . \! -name ".*" | wc -l > >> 987423 > >> > >> real 53m52.285s > >> user 0m2.624s > >> sys 0m27.980s > >> > >> That was local to the system, and not even NFS. > >> > >> The original files, located on a EXT3 RAID50, accessed via a linux > >> client (NFS v3): > >> [root at bagels old-servers]# time find . \! -name ".*" | wc -l > >> 987423 > >> > >> real 1m4.255s > >> user 0m0.914s > >> sys 0m6.976s > >> > >> Woe.. Something just isn''t right here. Are there explicit ways I can > >> find out what''s wrong with my setup? This is from a dtrace/zdb/mdb > >> neophyte. All I have been tracking with are zpool iostats. > >> _______________________________________________ > >> zfs-discuss mailing list > >> zfs-discuss at opensolaris.org > >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > -- > > Neil >
Joe Little
2006-May-05 05:18 UTC
[zfs-discuss] Poor directory traversal or small file performance?
I just responsed to the NFS list, and it definitely looks like a bad interaction between NFS->ZFS->iSCSI, where as the first two (local disk for ZFS) or the last two (no ZFS) are very fast. Are there posted zfs dtrace scripts for observability of i/o? On 5/4/06, Neil Perrin <Neil.Perrin at sun.com> wrote:> Actually the nfs slowness could be caused by the bug below, > but it doesn''t explain the "find ." times on a local zfs. > > Neil Perrin wrote On 05/04/06 21:01,: > > Was this a 32 bit intel system by chance? > > If so this is quite likely caused by: > > > > 6413731 pathologically slower fsync on 32 bit systems > > > > This was fixed in snv_39. > > > > Joe Little wrote On 05/04/06 15:47,: > > > >> I''ve been writing to the Solaris NFS list since I was getting some bad > >> performance copying via NFS (noticeably there) a large set of small > >> files. We have various source trees, including a tree with many linux > >> versions that I was copying to my ZFS NAS-to-be. On large files, it > >> flies pretty well, and "zpool iostat 1" shows interesting patterns of > >> writes in the low k''s up to 102MB/sec and down again as buffered > >> segments apparently are synced. > >> > >> However, in the numerous small file case, we see consistently only > >> transfers in the low k''s per second. First, to give some background, > >> we are utilizing iscsi, with the backend made up a directly exposed > >> SATA disks via the target. I''ve put them in a 8 disk raidz: > >> > >> pool: poola0 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> poola0 ONLINE 0 0 0 > >> raidz ONLINE 0 0 0 > >> c2t1d0 ONLINE 0 0 0 > >> c2t2d0 ONLINE 0 0 0 > >> c2t3d0 ONLINE 0 0 0 > >> c2t4d0 ONLINE 0 0 0 > >> c2t5d0 ONLINE 0 0 0 > >> c2t6d0 ONLINE 0 0 0 > >> c2t7d0 ONLINE 0 0 0 > >> c2t8d0 ONLINE 0 0 0 > >> > >> Again, I can get some great numbers on large files (doing a dd with a > >> large blocksize screams!), but as a test, I took a problematic tree of > >> around 1 million files, and walked it with a find/ls: > >> > >> bash-3.00# time find . \! -name ".*" | wc -l > >> 987423 > >> > >> real 53m52.285s > >> user 0m2.624s > >> sys 0m27.980s > >> > >> That was local to the system, and not even NFS. > >> > >> The original files, located on a EXT3 RAID50, accessed via a linux > >> client (NFS v3): > >> [root at bagels old-servers]# time find . \! -name ".*" | wc -l > >> 987423 > >> > >> real 1m4.255s > >> user 0m0.914s > >> sys 0m6.976s > >> > >> Woe.. Something just isn''t right here. Are there explicit ways I can > >> find out what''s wrong with my setup? This is from a dtrace/zdb/mdb > >> neophyte. All I have been tracking with are zpool iostats. > >> _______________________________________________ > >> zfs-discuss mailing list > >> zfs-discuss at opensolaris.org > >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > -- > > Neil >