The discussion is really old: writing many small files on an nfs mounted zfs
filesystem is slow without ssd zil due to the sync nature of the nfs protocol
itself. But there is something I don''t really understand. My tests on
an old opteron box with 2 small u160 scsi arrays and a zpool with 4 mirrored
vdevs built from 146gb disks show mostly idle disks when untarring an archive
with many small files over nfs. Any source package can be used for this test.
I''m on zpool version 22 (still sxce b130, the client is opensolaris
b130), nfs mount options are all default, NFSD_SERVERS=128.
Configuration of the pool is like this:
zpool status ib1
pool: ib1
state: ONLINE
scrub: scrub completed after 0h52m with 0 errors on Sat Jan 15 14:19:02 2011
config:
NAME STATE READ WRITE CKSUM
ib1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c4t3d0 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
zpool iostat -v shows
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
ib1 268G 276G 0 180 0 723K
mirror 95.4G 40.6G 0 44 0 180K
c1t4d0 - - 0 44 0 180K
c3t0d0 - - 0 44 0 180K
mirror 95.2G 40.8G 0 44 0 180K
c1t6d0 - - 0 44 0 180K
c4t0d0 - - 0 44 0 180K
mirror 39.0G 97.0G 0 45 0 184K
c3t3d0 - - 0 45 0 184K
c4t3d0 - - 0 45 0 184K
mirror 38.5G 97.5G 0 44 0 180K
c3t4d0 - - 0 44 0 180K
c4t4d0 - - 0 44 0 180K
---------- ----- ----- ----- ----- ----- -----
So each disk gets 40-50 iops, 180 ops on the whole pool (mirrored). Note that
these u320 scsi disks should be able to handle about 150 iops per disk, so
theres no iops aggregation. The strange thing is the following iostat -MindexC
output:
extended device statistics ---- errors ---
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot
device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 14 0 14 c0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 14 0 14
c0t0d0
0.0 186.0 0.0 0.4 0.0 0.0 0.0 0.1 0 2 0 0 0 0 c1
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c1t4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c1t5d0
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c1t6d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c2t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c2t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c2t2d0
0.0 279.5 0.0 0.5 0.0 0.0 0.0 0.1 0 3 0 0 0 0 c3
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c3t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c3t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c3t2d0
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c3t3d0
0.0 93.5 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c3t4d0
0.0 279.0 0.0 0.5 0.0 0.0 0.0 0.2 0 5 0 0 0 0 c4
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.3 0 3 0 0 0 0
c4t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c4t2d0
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c4t4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0
c4t1d0
0.0 93.0 0.0 0.2 0.0 0.0 0.0 0.1 0 1 0 0 0 0
c4t3d0
Service times for the involved disks are around 0.1-0.3 msec, I think this is
the sequential write nature of zfs. The disks are at most 3% busy. When writing
synchronous I''d expect 100% busy disks. And when reading or writing
locally the disks really get busy, about 50 MB/sec per disk due to the 160
MB/sec scsi bus limitation per channel (there are 2 u160 channels with 3 disks
each, and 1 channel with 2 disks).
Richard Ellings zilstat gives
N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops
<=4kB 4-32kB >=32kB
9552 9552 9552 671744 671744 671744 164 164
0 0
10192 10192 10192 724992 724992 724992 177 177
0 0
9568 9568 9568 679936 679936 679936 166 166
0 0
11712 11712 11712 823296 823296 823296 201 201
0 0
10784 10784 10784 765952 765952 765952 187 187
0 0
10024 10024 10024 708608 708608 708608 173 173
0 0
About 200 zil ops all < 4k as maximum. As said the disks aren''t busy
during this test.
The test zfs ist configured with atime off. logbias nearly doesn''t
matter, with logbias=latency the iops rate is a little bit lower.
Attached are some bonnie++ results to show, that all disks and the whole pool
are quite healthy. I get > 1000 random reads/sec local and still nearly 900
reads/sec via nfs. For large files I easily get gbit wirespeed (105 MB/sec read)
with nfs. And for random reads in a bonnie or iozone test the disks are really
80%-100% busy. Just for small files the array sits almost idle, the array can do
way more. I discovered this on different solaris versions, not only this test
system. Is there any explanation for this behaviour?
Thanks,
Michael
--
This message posted from opensolaris.org
-------------- next part --------------
local
Version 1.03c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ibmr10 16G 108972 25 89923 21 263540 26 1074
3
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 30359 99 +++++ +++ +++++ +++ 24836 99 +++++ +++ +++++ +++
ibmr10,16G,,,108972,25,89923,21,,,263540,26,1073.5,3,16,30359,99,+++++,+++,+++++,+++,24836,99,+++++,+++,+++++,+++
-------------- next part --------------
NFS
Version 1.03d ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
nfsibmr10 16G 50022 11 42524 14 105335 18 884.8 20
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 152 3 +++++ +++ 182 1 151 3 +++++ +++ 183 1
nfsibmr10,16G,,,50022,11,42524,14,,,105335,18,884.8,20,16,152,3,+++++,+++,182,1,151,3,+++++,+++,183,1