The discussion is really old: writing many small files on an nfs mounted zfs filesystem is slow without ssd zil due to the sync nature of the nfs protocol itself. But there is something I don''t really understand. My tests on an old opteron box with 2 small u160 scsi arrays and a zpool with 4 mirrored vdevs built from 146gb disks show mostly idle disks when untarring an archive with many small files over nfs. Any source package can be used for this test. I''m on zpool version 22 (still sxce b130, the client is opensolaris b130), nfs mount options are all default, NFSD_SERVERS=128. Configuration of the pool is like this: zpool status ib1 pool: ib1 state: ONLINE scrub: scrub completed after 0h52m with 0 errors on Sat Jan 15 14:19:02 2011 config: NAME STATE READ WRITE CKSUM ib1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 zpool iostat -v shows capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- ib1 268G 276G 0 180 0 723K mirror 95.4G 40.6G 0 44 0 180K c1t4d0 - - 0 44 0 180K c3t0d0 - - 0 44 0 180K mirror 95.2G 40.8G 0 44 0 180K c1t6d0 - - 0 44 0 180K c4t0d0 - - 0 44 0 180K mirror 39.0G 97.0G 0 45 0 184K c3t3d0 - - 0 45 0 184K c4t3d0 - - 0 45 0 184K mirror 38.5G 97.5G 0 44 0 180K c3t4d0 - - 0 44 0 180K c4t4d0 - - 0 44 0 180K ---------- ----- ----- ----- ----- ----- ----- So each disk gets 40-50 iops, 180 ops on the whole pool (mirrored). Note that these u320 scsi disks should be able to handle about 150 iops per disk, so theres no iops aggregation. The strange thing is the following iostat -MindexC output: extended device statistics ---- errors --- r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 14 0 14 c0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 14 0 14 c0t0d0 0.0 186.0 0.0 0.4 0.0 0.0 0.0 0.1 0 2 0 0 0 0 c1 0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0 c1t4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c1t5d0 0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0 c1t6d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2t2d0 0.0 279.5 0.0 0.5 0.0 0.0 0.0 0.1 0 3 0 0 0 0 c3 0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0 c3t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3t2d0 0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0 c3t3d0 0.0 93.5 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0 c3t4d0 0.0 279.0 0.0 0.5 0.0 0.0 0.0 0.2 0 5 0 0 0 0 c4 0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.3 0 3 0 0 0 0 c4t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c4t2d0 0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0 c4t4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c4t1d0 0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0 c4t3d0 Service times for the involved disks are around 0.1-0.3 msec, I think this is the sequential write nature of zfs. The disks are at most 3% busy. When writing synchronous I''d expect 100% busy disks. And when reading or writing locally the disks really get busy, about 50 MB/sec per disk due to the 160 MB/sec scsi bus limitation per channel (there are 2 u160 channels with 3 disks each, and 1 channel with 2 disks). Richard Ellings zilstat gives N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops <=4kB 4-32kB >=32kB 9552 9552 9552 671744 671744 671744 164 164 0 0 10192 10192 10192 724992 724992 724992 177 177 0 0 9568 9568 9568 679936 679936 679936 166 166 0 0 11712 11712 11712 823296 823296 823296 201 201 0 0 10784 10784 10784 765952 765952 765952 187 187 0 0 10024 10024 10024 708608 708608 708608 173 173 0 0 About 200 zil ops all < 4k as maximum. As said the disks aren''t busy during this test. The test zfs ist configured with atime off. logbias nearly doesn''t matter, with logbias=latency the iops rate is a little bit lower. Attached are some bonnie++ results to show, that all disks and the whole pool are quite healthy. I get > 1000 random reads/sec local and still nearly 900 reads/sec via nfs. For large files I easily get gbit wirespeed (105 MB/sec read) with nfs. And for random reads in a bonnie or iozone test the disks are really 80%-100% busy. Just for small files the array sits almost idle, the array can do way more. I discovered this on different solaris versions, not only this test system. Is there any explanation for this behaviour? Thanks, Michael -- This message posted from opensolaris.org -------------- next part -------------- local Version 1.03c ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ibmr10 16G 108972 25 89923 21 263540 26 1074 3 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 30359 99 +++++ +++ +++++ +++ 24836 99 +++++ +++ +++++ +++ ibmr10,16G,,,108972,25,89923,21,,,263540,26,1073.5,3,16,30359,99,+++++,+++,+++++,+++,24836,99,+++++,+++,+++++,+++ -------------- next part -------------- NFS Version 1.03d ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP nfsibmr10 16G 50022 11 42524 14 105335 18 884.8 20 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 152 3 +++++ +++ 182 1 151 3 +++++ +++ 183 1 nfsibmr10,16G,,,50022,11,42524,14,,,105335,18,884.8,20,16,152,3,+++++,+++,182,1,151,3,+++++,+++,183,1