Joe Little
2006-May-04 21:47 UTC
[zfs-discuss] Poor directory traversal or small file performance?
I''ve been writing to the Solaris NFS list since I was getting some bad performance copying via NFS (noticeably there) a large set of small files. We have various source trees, including a tree with many linux versions that I was copying to my ZFS NAS-to-be. On large files, it flies pretty well, and "zpool iostat 1" shows interesting patterns of writes in the low k''s up to 102MB/sec and down again as buffered segments apparently are synced. However, in the numerous small file case, we see consistently only transfers in the low k''s per second. First, to give some background, we are utilizing iscsi, with the backend made up a directly exposed SATA disks via the target. I''ve put them in a 8 disk raidz: pool: poola0 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM poola0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 Again, I can get some great numbers on large files (doing a dd with a large blocksize screams!), but as a test, I took a problematic tree of around 1 million files, and walked it with a find/ls: bash-3.00# time find . \! -name ".*" | wc -l 987423 real 53m52.285s user 0m2.624s sys 0m27.980s That was local to the system, and not even NFS. The original files, located on a EXT3 RAID50, accessed via a linux client (NFS v3): [root at bagels old-servers]# time find . \! -name ".*" | wc -l 987423 real 1m4.255s user 0m0.914s sys 0m6.976s Woe.. Something just isn''t right here. Are there explicit ways I can find out what''s wrong with my setup? This is from a dtrace/zdb/mdb neophyte. All I have been tracking with are zpool iostats.
Neil Perrin
2006-May-05 03:01 UTC
[zfs-discuss] Poor directory traversal or small file performance?
Was this a 32 bit intel system by chance? If so this is quite likely caused by: 6413731 pathologically slower fsync on 32 bit systems This was fixed in snv_39. Joe Little wrote On 05/04/06 15:47,:> I''ve been writing to the Solaris NFS list since I was getting some bad > performance copying via NFS (noticeably there) a large set of small > files. We have various source trees, including a tree with many linux > versions that I was copying to my ZFS NAS-to-be. On large files, it > flies pretty well, and "zpool iostat 1" shows interesting patterns of > writes in the low k''s up to 102MB/sec and down again as buffered > segments apparently are synced. > > However, in the numerous small file case, we see consistently only > transfers in the low k''s per second. First, to give some background, > we are utilizing iscsi, with the backend made up a directly exposed > SATA disks via the target. I''ve put them in a 8 disk raidz: > > pool: poola0 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > poola0 ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t2d0 ONLINE 0 0 0 > c2t3d0 ONLINE 0 0 0 > c2t4d0 ONLINE 0 0 0 > c2t5d0 ONLINE 0 0 0 > c2t6d0 ONLINE 0 0 0 > c2t7d0 ONLINE 0 0 0 > c2t8d0 ONLINE 0 0 0 > > Again, I can get some great numbers on large files (doing a dd with a > large blocksize screams!), but as a test, I took a problematic tree of > around 1 million files, and walked it with a find/ls: > > bash-3.00# time find . \! -name ".*" | wc -l > 987423 > > real 53m52.285s > user 0m2.624s > sys 0m27.980s > > That was local to the system, and not even NFS. > > The original files, located on a EXT3 RAID50, accessed via a linux > client (NFS v3): > [root at bagels old-servers]# time find . \! -name ".*" | wc -l > 987423 > > real 1m4.255s > user 0m0.914s > sys 0m6.976s > > Woe.. Something just isn''t right here. Are there explicit ways I can > find out what''s wrong with my setup? This is from a dtrace/zdb/mdb > neophyte. All I have been tracking with are zpool iostats. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Neil
Neil Perrin
2006-May-05 03:08 UTC
[zfs-discuss] Poor directory traversal or small file performance?
Actually the nfs slowness could be caused by the bug below, but it doesn''t explain the "find ." times on a local zfs. Neil Perrin wrote On 05/04/06 21:01,:> Was this a 32 bit intel system by chance? > If so this is quite likely caused by: > > 6413731 pathologically slower fsync on 32 bit systems > > This was fixed in snv_39. > > Joe Little wrote On 05/04/06 15:47,: > >> I''ve been writing to the Solaris NFS list since I was getting some bad >> performance copying via NFS (noticeably there) a large set of small >> files. We have various source trees, including a tree with many linux >> versions that I was copying to my ZFS NAS-to-be. On large files, it >> flies pretty well, and "zpool iostat 1" shows interesting patterns of >> writes in the low k''s up to 102MB/sec and down again as buffered >> segments apparently are synced. >> >> However, in the numerous small file case, we see consistently only >> transfers in the low k''s per second. First, to give some background, >> we are utilizing iscsi, with the backend made up a directly exposed >> SATA disks via the target. I''ve put them in a 8 disk raidz: >> >> pool: poola0 >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> poola0 ONLINE 0 0 0 >> raidz ONLINE 0 0 0 >> c2t1d0 ONLINE 0 0 0 >> c2t2d0 ONLINE 0 0 0 >> c2t3d0 ONLINE 0 0 0 >> c2t4d0 ONLINE 0 0 0 >> c2t5d0 ONLINE 0 0 0 >> c2t6d0 ONLINE 0 0 0 >> c2t7d0 ONLINE 0 0 0 >> c2t8d0 ONLINE 0 0 0 >> >> Again, I can get some great numbers on large files (doing a dd with a >> large blocksize screams!), but as a test, I took a problematic tree of >> around 1 million files, and walked it with a find/ls: >> >> bash-3.00# time find . \! -name ".*" | wc -l >> 987423 >> >> real 53m52.285s >> user 0m2.624s >> sys 0m27.980s >> >> That was local to the system, and not even NFS. >> >> The original files, located on a EXT3 RAID50, accessed via a linux >> client (NFS v3): >> [root at bagels old-servers]# time find . \! -name ".*" | wc -l >> 987423 >> >> real 1m4.255s >> user 0m0.914s >> sys 0m6.976s >> >> Woe.. Something just isn''t right here. Are there explicit ways I can >> find out what''s wrong with my setup? This is from a dtrace/zdb/mdb >> neophyte. All I have been tracking with are zpool iostats. >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- Neil
Joe Little
2006-May-05 03:58 UTC
[zfs-discuss] Poor directory traversal or small file performance?
Nope. The ZFS head (iscsi initiator) is a Sun Ultra 20 Workstation. The clients are RHEL4 quad opterons running the x86_64 kernel series. On 5/4/06, Neil Perrin <Neil.Perrin at sun.com> wrote:> Actually the nfs slowness could be caused by the bug below, > but it doesn''t explain the "find ." times on a local zfs. > > Neil Perrin wrote On 05/04/06 21:01,: > > Was this a 32 bit intel system by chance? > > If so this is quite likely caused by: > > > > 6413731 pathologically slower fsync on 32 bit systems > > > > This was fixed in snv_39. > > > > Joe Little wrote On 05/04/06 15:47,: > > > >> I''ve been writing to the Solaris NFS list since I was getting some bad > >> performance copying via NFS (noticeably there) a large set of small > >> files. We have various source trees, including a tree with many linux > >> versions that I was copying to my ZFS NAS-to-be. On large files, it > >> flies pretty well, and "zpool iostat 1" shows interesting patterns of > >> writes in the low k''s up to 102MB/sec and down again as buffered > >> segments apparently are synced. > >> > >> However, in the numerous small file case, we see consistently only > >> transfers in the low k''s per second. First, to give some background, > >> we are utilizing iscsi, with the backend made up a directly exposed > >> SATA disks via the target. I''ve put them in a 8 disk raidz: > >> > >> pool: poola0 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> poola0 ONLINE 0 0 0 > >> raidz ONLINE 0 0 0 > >> c2t1d0 ONLINE 0 0 0 > >> c2t2d0 ONLINE 0 0 0 > >> c2t3d0 ONLINE 0 0 0 > >> c2t4d0 ONLINE 0 0 0 > >> c2t5d0 ONLINE 0 0 0 > >> c2t6d0 ONLINE 0 0 0 > >> c2t7d0 ONLINE 0 0 0 > >> c2t8d0 ONLINE 0 0 0 > >> > >> Again, I can get some great numbers on large files (doing a dd with a > >> large blocksize screams!), but as a test, I took a problematic tree of > >> around 1 million files, and walked it with a find/ls: > >> > >> bash-3.00# time find . \! -name ".*" | wc -l > >> 987423 > >> > >> real 53m52.285s > >> user 0m2.624s > >> sys 0m27.980s > >> > >> That was local to the system, and not even NFS. > >> > >> The original files, located on a EXT3 RAID50, accessed via a linux > >> client (NFS v3): > >> [root at bagels old-servers]# time find . \! -name ".*" | wc -l > >> 987423 > >> > >> real 1m4.255s > >> user 0m0.914s > >> sys 0m6.976s > >> > >> Woe.. Something just isn''t right here. Are there explicit ways I can > >> find out what''s wrong with my setup? This is from a dtrace/zdb/mdb > >> neophyte. All I have been tracking with are zpool iostats. > >> _______________________________________________ > >> zfs-discuss mailing list > >> zfs-discuss at opensolaris.org > >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > -- > > Neil >
Joe Little
2006-May-05 05:18 UTC
[zfs-discuss] Poor directory traversal or small file performance?
I just responsed to the NFS list, and it definitely looks like a bad interaction between NFS->ZFS->iSCSI, where as the first two (local disk for ZFS) or the last two (no ZFS) are very fast. Are there posted zfs dtrace scripts for observability of i/o? On 5/4/06, Neil Perrin <Neil.Perrin at sun.com> wrote:> Actually the nfs slowness could be caused by the bug below, > but it doesn''t explain the "find ." times on a local zfs. > > Neil Perrin wrote On 05/04/06 21:01,: > > Was this a 32 bit intel system by chance? > > If so this is quite likely caused by: > > > > 6413731 pathologically slower fsync on 32 bit systems > > > > This was fixed in snv_39. > > > > Joe Little wrote On 05/04/06 15:47,: > > > >> I''ve been writing to the Solaris NFS list since I was getting some bad > >> performance copying via NFS (noticeably there) a large set of small > >> files. We have various source trees, including a tree with many linux > >> versions that I was copying to my ZFS NAS-to-be. On large files, it > >> flies pretty well, and "zpool iostat 1" shows interesting patterns of > >> writes in the low k''s up to 102MB/sec and down again as buffered > >> segments apparently are synced. > >> > >> However, in the numerous small file case, we see consistently only > >> transfers in the low k''s per second. First, to give some background, > >> we are utilizing iscsi, with the backend made up a directly exposed > >> SATA disks via the target. I''ve put them in a 8 disk raidz: > >> > >> pool: poola0 > >> state: ONLINE > >> scrub: none requested > >> config: > >> > >> NAME STATE READ WRITE CKSUM > >> poola0 ONLINE 0 0 0 > >> raidz ONLINE 0 0 0 > >> c2t1d0 ONLINE 0 0 0 > >> c2t2d0 ONLINE 0 0 0 > >> c2t3d0 ONLINE 0 0 0 > >> c2t4d0 ONLINE 0 0 0 > >> c2t5d0 ONLINE 0 0 0 > >> c2t6d0 ONLINE 0 0 0 > >> c2t7d0 ONLINE 0 0 0 > >> c2t8d0 ONLINE 0 0 0 > >> > >> Again, I can get some great numbers on large files (doing a dd with a > >> large blocksize screams!), but as a test, I took a problematic tree of > >> around 1 million files, and walked it with a find/ls: > >> > >> bash-3.00# time find . \! -name ".*" | wc -l > >> 987423 > >> > >> real 53m52.285s > >> user 0m2.624s > >> sys 0m27.980s > >> > >> That was local to the system, and not even NFS. > >> > >> The original files, located on a EXT3 RAID50, accessed via a linux > >> client (NFS v3): > >> [root at bagels old-servers]# time find . \! -name ".*" | wc -l > >> 987423 > >> > >> real 1m4.255s > >> user 0m0.914s > >> sys 0m6.976s > >> > >> Woe.. Something just isn''t right here. Are there explicit ways I can > >> find out what''s wrong with my setup? This is from a dtrace/zdb/mdb > >> neophyte. All I have been tracking with are zpool iostats. > >> _______________________________________________ > >> zfs-discuss mailing list > >> zfs-discuss at opensolaris.org > >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > -- > > Neil >