Robert Milkowski
2006-Aug-07 15:22 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hi. 3510 with two HW controllers, configured on LUN in RAID-10 using 12 disks in head unit (FC-AL 73GB 15K disks). Optimization set to random, stripe size 32KB. Connected to v440 using two links, however in tests only one link was used (no MPxIO). I used filebench and varmail test with default parameters and run for 60s, test was run twice. System is S10U2 with all available patches (all support patches), kernel -18. ZFS filesystem on HW lun with atime=off: IO Summary: 499078 ops 8248.0 ops/s, (1269/1269 r/w) 40.6mb/s, 314us cpu/op, 6.0ms latency IO Summary: 503112 ops 8320.2 ops/s, (1280/1280 r/w) 41.0mb/s, 296us cpu/op, 5.9ms latency Now the same LUN but ZFS was destroyed and UFS filesystem was created. UFS filesystem on HW lun with maxcontig=24 and noatime: IO Summary: 401671 ops 6638.2 ops/s, (1021/1021 r/w) 32.7mb/s, 404us cpu/op, 7.5ms latency IO Summary: 403194 ops 6664.5 ops/s, (1025/1025 r/w) 32.5mb/s, 406us cpu/op, 7.5ms latency Now another v440 server (the same config) with snv_44, connected several 3510 JBODS on two FC-loops however only one loop was used (no MPxIO). The same disks (73GB FC-AL 15K). ZFS filesystem with atime=off with ZFS raid-10 using 12 disks from one enclosure: IO Summary: 558331 ops 9244.1 ops/s, (1422/1422 r/w) 45.2mb/s, 312us cpu/op, 5.2ms latency IO Summary: 537542 ops 8899.9 ops/s, (1369/1369 r/w) 43.5mb/s, 307us cpu/op, 5.4ms latency ### details #### $ cat zfs-benhc.txt v440, Generic_118833-18 filebench> set $dir=/se3510_hw_raid10_12disks/t1/ filebench> run 60 582: 42.107: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 582: 42.108: Creating fileset bigfileset... 582: 45.262: Preallocated 812 of 1000 of fileset bigfileset in 4 seconds 582: 45.262: Creating/pre-allocating files 582: 45.262: Starting 1 filereader instances 586: 46.268: Starting 16 filereaderthread threads 582: 49.278: Running... 582: 109.787: Run took 60 seconds... 582: 109.801: Per-Operation Breakdown closefile4 634ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 634ops/s 10.3mb/s 0.1ms/op 65us/op-cpu openfile4 634ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile3 634ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 634ops/s 0.0mb/s 11.3ms/op 150us/op-cpu appendfilerand3 635ops/s 9.9mb/s 0.1ms/op 132us/op-cpu readfile3 635ops/s 10.4mb/s 0.1ms/op 66us/op-cpu openfile3 635ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile2 635ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 635ops/s 0.0mb/s 11.9ms/op 137us/op-cpu appendfilerand2 635ops/s 9.9mb/s 0.1ms/op 94us/op-cpu createfile2 634ops/s 0.0mb/s 0.2ms/op 163us/op-cpu deletefile1 634ops/s 0.0mb/s 0.1ms/op 86us/op-cpu 582: 109.801: IO Summary: 499078 ops 8248.0 ops/s, (1269/1269 r/w) 40.6mb/s, 314us cpu/op, 6.0ms latency 582: 109.801: Shutting down processes filebench> filebench> run 60 582: 190.655: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 582: 190.720: Removed any existing fileset bigfileset in 1 seconds 582: 190.720: Creating fileset bigfileset... 582: 193.259: Preallocated 786 of 1000 of fileset bigfileset in 3 seconds 582: 193.259: Creating/pre-allocating files 582: 193.259: Starting 1 filereader instances 591: 194.268: Starting 16 filereaderthread threads 582: 197.278: Running... 582: 257.748: Run took 60 seconds... 582: 257.761: Per-Operation Breakdown closefile4 640ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 640ops/s 10.5mb/s 0.1ms/op 64us/op-cpu openfile4 640ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile3 640ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 640ops/s 0.0mb/s 11.1ms/op 147us/op-cpu appendfilerand3 640ops/s 10.0mb/s 0.1ms/op 124us/op-cpu readfile3 640ops/s 10.5mb/s 0.1ms/op 67us/op-cpu openfile3 640ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile2 640ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 640ops/s 0.0mb/s 11.9ms/op 139us/op-cpu appendfilerand2 640ops/s 10.0mb/s 0.1ms/op 89us/op-cpu createfile2 640ops/s 0.0mb/s 0.2ms/op 157us/op-cpu deletefile1 640ops/s 0.0mb/s 0.1ms/op 87us/op-cpu 582: 257.761: IO Summary: 503112 ops 8320.2 ops/s, (1280/1280 r/w) 41.0mb/s, 296us cpu/op, 5.9ms latency 582: 257.761: Shutting down processes filebench> bash-3.00# zpool destroy se3510_hw_raid10_12disks bash-3.00# newfs -C 24 /dev/rdsk/c3t40d0s0 newfs: construct a new file system /dev/rdsk/c3t40d0s0: (y/n)? y Warning: 4164 sector(s) in last cylinder unallocated /dev/rdsk/c3t40d0s0: 857083836 sectors in 139500 cylinders of 48 tracks, 128 sectors 418498.0MB in 8719 cyl groups (16 c/g, 48.00MB/g, 5824 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920, Initializing cylinder groups: ............................................................................... ............................................................................... ................ super-block backups for last 10 cylinder groups at: 856130208, 856228640, 856327072, 856425504, 856523936, 856622368, 856720800, 856819232, 856917664, 857016096 bash-3.00# mount -o noatime /dev/dsk/c3t40d0s0 /mnt/ bash-3.00# bash-3.00# /opt/filebench/bin/sparcv9/filebench filebench> load varmail 632: 2.758: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 632: 2.759: Usage: set $dir=<dir> 632: 2.759: set $filesize=<size> defaults to 16384 632: 2.759: set $nfiles=<value> defaults to 1000 632: 2.759: set $nthreads=<value> defaults to 16 632: 2.759: set $meaniosize=<value> defaults to 16384 632: 2.759: set $meandirwidth=<size> defaults to 1000000 632: 2.759: (sets mean dir width and dir depth is calculated as log (width, nfiles) 632: 2.759: dirdepth therefore defaults to dir depth of 1 as in postmark 632: 2.759: set $meandir lower to increase depth beyond 1 if desired) 632: 2.759: 632: 2.759: run runtime (e.g. run 60) 632: 2.759: syntax error, token expected on line 51 filebench> set $dir=/mnt/ filebench> run 60 632: 7.699: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 632: 7.722: Creating fileset bigfileset... 632: 10.611: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 632: 10.611: Creating/pre-allocating files 632: 10.611: Starting 1 filereader instances 633: 11.615: Starting 16 filereaderthread threads 632: 14.625: Running... 632: 75.135: Run took 60 seconds... 632: 75.149: Per-Operation Breakdown closefile4 511ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 511ops/s 8.4mb/s 0.1ms/op 65us/op-cpu openfile4 511ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile3 511ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 511ops/s 0.0mb/s 9.7ms/op 168us/op-cpu appendfilerand3 511ops/s 8.0mb/s 2.6ms/op 190us/op-cpu readfile3 511ops/s 8.3mb/s 0.1ms/op 65us/op-cpu openfile3 511ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile2 511ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 511ops/s 0.0mb/s 8.4ms/op 152us/op-cpu appendfilerand2 511ops/s 8.0mb/s 1.7ms/op 170us/op-cpu createfile2 511ops/s 0.0mb/s 4.3ms/op 297us/op-cpu deletefile1 511ops/s 0.0mb/s 3.1ms/op 145us/op-cpu 632: 75.149: IO Summary: 401671 ops 6638.2 ops/s, (1021/1021 r/w) 32.7mb/s, 404us cpu/op, 7.5ms latency 632: 75.149: Shutting down processes filebench> run 60 632: 193.974: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 632: 194.874: Removed any existing fileset bigfileset in 1 seconds 632: 194.875: Creating fileset bigfileset... 632: 196.817: Preallocated 786 of 1000 of fileset bigfileset in 2 seconds 632: 196.817: Creating/pre-allocating files 632: 196.817: Starting 1 filereader instances 636: 197.825: Starting 16 filereaderthread threads 632: 200.835: Running... 632: 261.335: Run took 60 seconds... 632: 261.350: Per-Operation Breakdown closefile4 513ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 513ops/s 8.2mb/s 0.1ms/op 64us/op-cpu openfile4 513ops/s 0.0mb/s 0.0ms/op 38us/op-cpu closefile3 513ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 513ops/s 0.0mb/s 9.7ms/op 169us/op-cpu appendfilerand3 513ops/s 8.0mb/s 2.7ms/op 189us/op-cpu readfile3 513ops/s 8.3mb/s 0.1ms/op 65us/op-cpu openfile3 513ops/s 0.0mb/s 0.0ms/op 38us/op-cpu closefile2 513ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 513ops/s 0.0mb/s 8.4ms/op 154us/op-cpu appendfilerand2 513ops/s 8.0mb/s 1.7ms/op 165us/op-cpu createfile2 513ops/s 0.0mb/s 4.2ms/op 301us/op-cpu deletefile1 513ops/s 0.0mb/s 3.2ms/op 148us/op-cpu 632: 261.350: IO Summary: 403194 ops 6664.5 ops/s, (1025/1025 r/w) 32.5mb/s, 406us cpu/op, 7.5ms latency 632: 261.350: Shutting down processes filebench> v440, snv_44 bash-3.00# zpool status pool: zfs_raid10_12disks state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zfs_raid10_12disks ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c2t17d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 c2t19d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t22d0 ONLINE 0 0 0 c2t23d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t24d0 ONLINE 0 0 0 c2t25d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t26d0 ONLINE 0 0 0 c2t27d0 ONLINE 0 0 0 errors: No known data errors bash-3.00# bash-3.00# /opt/filebench/bin/sparcv9/filebench filebench> load varmail 393: 6.283: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 393: 6.283: Usage: set $dir=<dir> 393: 6.283: set $filesize=<size> defaults to 16384 393: 6.283: set $nfiles=<value> defaults to 1000 393: 6.283: set $nthreads=<value> defaults to 16 393: 6.283: set $meaniosize=<value> defaults to 16384 393: 6.284: set $meandirwidth=<size> defaults to 1000000 393: 6.284: (sets mean dir width and dir depth is calculated as log (width, nfiles) 393: 6.284: dirdepth therefore defaults to dir depth of 1 as in postmark 393: 6.284: set $meandir lower to increase depth beyond 1 if desired) 393: 6.284: 393: 6.284: run runtime (e.g. run 60) 393: 6.284: syntax error, token expected on line 51 filebench> set $dir=/zfs_raid10_12disks/t1/ filebench> run 60 393: 18.766: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 393: 18.767: Creating fileset bigfileset... 393: 23.020: Preallocated 812 of 1000 of fileset bigfileset in 5 seconds 393: 23.020: Creating/pre-allocating files 393: 23.020: Starting 1 filereader instances 394: 24.030: Starting 16 filereaderthread threads 393: 27.040: Running... 393: 87.440: Run took 60 seconds... 393: 87.453: Per-Operation Breakdown closefile4 711ops/s 0.0mb/s 0.0ms/op 9us/op-cpu readfile4 711ops/s 11.4mb/s 0.1ms/op 62us/op-cpu openfile4 711ops/s 0.0mb/s 0.1ms/op 65us/op-cpu closefile3 711ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 711ops/s 0.0mb/s 10.0ms/op 148us/op-cpu appendfilerand3 711ops/s 11.1mb/s 0.1ms/op 129us/op-cpu readfile3 711ops/s 11.6mb/s 0.1ms/op 63us/op-cpu openfile3 711ops/s 0.0mb/s 0.1ms/op 65us/op-cpu closefile2 711ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 711ops/s 0.0mb/s 10.0ms/op 115us/op-cpu appendfilerand2 711ops/s 11.1mb/s 0.1ms/op 97us/op-cpu createfile2 711ops/s 0.0mb/s 0.2ms/op 163us/op-cpu deletefile1 711ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 393: 87.454: IO Summary: 558331 ops 9244.1 ops/s, (1422/1422 r/w) 45.2mb/s, 312us cpu/op, 5.2ms latency 393: 87.454: Shutting down processes filebench> run 60 393: 118.054: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 393: 118.108: Removed any existing fileset bigfileset in 1 seconds 393: 118.108: Creating fileset bigfileset... 393: 122.619: Preallocated 786 of 1000 of fileset bigfileset in 5 seconds 393: 122.619: Creating/pre-allocating files 393: 122.619: Starting 1 filereader instances 401: 123.630: Starting 16 filereaderthread threads 393: 126.640: Running... 393: 187.040: Run took 60 seconds... 393: 187.053: Per-Operation Breakdown closefile4 685ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 685ops/s 11.1mb/s 0.1ms/op 62us/op-cpu openfile4 685ops/s 0.0mb/s 0.1ms/op 65us/op-cpu closefile3 685ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 685ops/s 0.0mb/s 10.5ms/op 150us/op-cpu appendfilerand3 685ops/s 10.7mb/s 0.1ms/op 124us/op-cpu readfile3 685ops/s 11.1mb/s 0.1ms/op 60us/op-cpu openfile3 685ops/s 0.0mb/s 0.1ms/op 65us/op-cpu closefile2 685ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 685ops/s 0.0mb/s 10.4ms/op 113us/op-cpu appendfilerand2 685ops/s 10.7mb/s 0.1ms/op 93us/op-cpu createfile2 685ops/s 0.0mb/s 0.2ms/op 156us/op-cpu deletefile1 685ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 393: 187.054: IO Summary: 537542 ops 8899.9 ops/s, (1369/1369 r/w) 43.5mb/s, 307us cpu/op, 5.4ms latency 393: 187.054: Shutting down processes filebench> This message posted from opensolaris.org
Eric Schrock
2006-Aug-07 15:53 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Cool stuff, Robert. It''d be interesting to see some RAID-Z (single- and double-parity) benchmarks as well, but understandably this takes time ;-) The first thing to note is that the current Nevada bits have a number of performance fixes not in S10u2, so there''s going to be a natural bias when comparing ZFS to ZFS between these systems. Second, you may be able to get more performance from the ZFS filesystem on the HW lun by tweaking the max pending # of reqeusts. One thing we''ve found is that ZFS currently has a hardcoded limit of how many outstanding requests to send to the underlying vdev (35). This works well for most single devices, but large arrays can actually handle more, and we end up leaving some performance on the floor. Currently the only way to tweak this variable is through ''mdb -kw''. Try something like: # mdb -kw> ::spa -vADDR STATE NAME ffffffff82ef4140 ACTIVE pool ADDR STATE AUX DESCRIPTION ffffffff9677a1c0 HEALTHY - root ffffffff9678d080 HEALTHY - raidz ffffffff9678db80 HEALTHY - /dev/dsk/c2d0s0 ffffffff96778640 HEALTHY - /dev/dsk/c3d0s0 ffffffff967780c0 HEALTHY - /dev/dsk/c4d0s0 ffffffff9e495780 HEALTHY - /dev/dsk/c5d0s0> ffffffff9678db80::print -a vdev_t vdev_queue.vq_max_pendingffffffff9678df00 vdev_queue.vq_max_pending = 0x23> ffffffff9678df00/Z0t600xffffffff9678df00: 0x23 = 0x3c This will change the max # of pending requests for the disk to 60, instead of 35. We''re trying to figure out how to tweak and/or dynamically detect the best value here, so any more data would be useful. - Eric On Mon, Aug 07, 2006 at 08:22:24AM -0700, Robert Milkowski wrote:> Hi. > > 3510 with two HW controllers, configured on LUN in RAID-10 using 12 > disks in head unit (FC-AL 73GB 15K disks). Optimization set to random, > stripe size 32KB. Connected to v440 using two links, however in tests > only one link was used (no MPxIO). > > I used filebench and varmail test with default parameters and run for > 60s, test was run twice. > > System is S10U2 with all available patches (all support patches), kernel -18. > > > ZFS filesystem on HW lun with atime=off: > > IO Summary: 499078 ops 8248.0 ops/s, (1269/1269 r/w) 40.6mb/s, 314us cpu/op, 6.0ms latency > IO Summary: 503112 ops 8320.2 ops/s, (1280/1280 r/w) 41.0mb/s, 296us cpu/op, 5.9ms latency > > Now the same LUN but ZFS was destroyed and UFS filesystem was created. > UFS filesystem on HW lun with maxcontig=24 and noatime: > > IO Summary: 401671 ops 6638.2 ops/s, (1021/1021 r/w) 32.7mb/s, 404us cpu/op, 7.5ms latency > IO Summary: 403194 ops 6664.5 ops/s, (1025/1025 r/w) 32.5mb/s, 406us cpu/op, 7.5ms latency > > > > Now another v440 server (the same config) with snv_44, connected several 3510 JBODS on two FC-loops however only one loop was used (no MPxIO). The same disks (73GB FC-AL 15K). > > ZFS filesystem with atime=off with ZFS raid-10 using 12 disks from one enclosure: > > IO Summary: 558331 ops 9244.1 ops/s, (1422/1422 r/w) 45.2mb/s, 312us cpu/op, 5.2ms latency > IO Summary: 537542 ops 8899.9 ops/s, (1369/1369 r/w) 43.5mb/s, 307us cpu/op, 5.4ms latency-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Robert Milkowski
2006-Aug-07 16:16 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Eric, Monday, August 7, 2006, 5:53:38 PM, you wrote: ES> Cool stuff, Robert. It''d be interesting to see some RAID-Z (single- and ES> double-parity) benchmarks as well, but understandably this takes time ES> ;-) I intend to test raid-z. Not sure there''ll be enough time for raidz2. ES> The first thing to note is that the current Nevada bits have a number of ES> performance fixes not in S10u2, so there''s going to be a natural bias ES> when comparing ZFS to ZFS between these systems. Yeah, I know. That''s why I put UFS on HW config also to see if ZFS doesn''t underperform on U2. ES> Second, you may be able to get more performance from the ZFS filesystem ES> on the HW lun by tweaking the max pending # of reqeusts. One thing ES> we''ve found is that ZFS currently has a hardcoded limit of how many ES> outstanding requests to send to the underlying vdev (35). This works ES> well for most single devices, but large arrays can actually handle more, ES> and we end up leaving some performance on the floor. Currently the only ES> way to tweak this variable is through ''mdb -kw''. Try something like: Well, strange - I did try with value of 1, 60 and 256. And basically I get the same results from varmail tests. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Luke Lonergan
2006-Aug-07 16:27 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Niiiice! Hooray ZFS! - Luke Sent from my GoodLink synchronized handheld (www.good.com) -----Original Message----- From: Robert Milkowski [mailto:milek at task.gda.pl] Sent: Monday, August 07, 2006 11:25 AM Eastern Standard Time To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID Hi. 3510 with two HW controllers, configured on LUN in RAID-10 using 12 disks in head unit (FC-AL 73GB 15K disks). Optimization set to random, stripe size 32KB. Connected to v440 using two links, however in tests only one link was used (no MPxIO). I used filebench and varmail test with default parameters and run for 60s, test was run twice. System is S10U2 with all available patches (all support patches), kernel -18. ZFS filesystem on HW lun with atime=off: IO Summary: 499078 ops 8248.0 ops/s, (1269/1269 r/w) 40.6mb/s, 314us cpu/op, 6.0ms latency IO Summary: 503112 ops 8320.2 ops/s, (1280/1280 r/w) 41.0mb/s, 296us cpu/op, 5.9ms latency Now the same LUN but ZFS was destroyed and UFS filesystem was created. UFS filesystem on HW lun with maxcontig=24 and noatime: IO Summary: 401671 ops 6638.2 ops/s, (1021/1021 r/w) 32.7mb/s, 404us cpu/op, 7.5ms latency IO Summary: 403194 ops 6664.5 ops/s, (1025/1025 r/w) 32.5mb/s, 406us cpu/op, 7.5ms latency Now another v440 server (the same config) with snv_44, connected several 3510 JBODS on two FC-loops however only one loop was used (no MPxIO). The same disks (73GB FC-AL 15K). ZFS filesystem with atime=off with ZFS raid-10 using 12 disks from one enclosure: IO Summary: 558331 ops 9244.1 ops/s, (1422/1422 r/w) 45.2mb/s, 312us cpu/op, 5.2ms latency IO Summary: 537542 ops 8899.9 ops/s, (1369/1369 r/w) 43.5mb/s, 307us cpu/op, 5.4ms latency ### details #### $ cat zfs-benhc.txt v440, Generic_118833-18 filebench> set $dir=/se3510_hw_raid10_12disks/t1/ filebench> run 60 582: 42.107: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 582: 42.108: Creating fileset bigfileset... 582: 45.262: Preallocated 812 of 1000 of fileset bigfileset in 4 seconds 582: 45.262: Creating/pre-allocating files 582: 45.262: Starting 1 filereader instances 586: 46.268: Starting 16 filereaderthread threads 582: 49.278: Running... 582: 109.787: Run took 60 seconds... 582: 109.801: Per-Operation Breakdown closefile4 634ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 634ops/s 10.3mb/s 0.1ms/op 65us/op-cpu openfile4 634ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile3 634ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 634ops/s 0.0mb/s 11.3ms/op 150us/op-cpu appendfilerand3 635ops/s 9.9mb/s 0.1ms/op 132us/op-cpu readfile3 635ops/s 10.4mb/s 0.1ms/op 66us/op-cpu openfile3 635ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile2 635ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 635ops/s 0.0mb/s 11.9ms/op 137us/op-cpu appendfilerand2 635ops/s 9.9mb/s 0.1ms/op 94us/op-cpu createfile2 634ops/s 0.0mb/s 0.2ms/op 163us/op-cpu deletefile1 634ops/s 0.0mb/s 0.1ms/op 86us/op-cpu 582: 109.801: IO Summary: 499078 ops 8248.0 ops/s, (1269/1269 r/w) 40.6mb/s, 314us cpu/op, 6.0ms latency 582: 109.801: Shutting down processes filebench> filebench> run 60 582: 190.655: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 582: 190.720: Removed any existing fileset bigfileset in 1 seconds 582: 190.720: Creating fileset bigfileset... 582: 193.259: Preallocated 786 of 1000 of fileset bigfileset in 3 seconds 582: 193.259: Creating/pre-allocating files 582: 193.259: Starting 1 filereader instances 591: 194.268: Starting 16 filereaderthread threads 582: 197.278: Running... 582: 257.748: Run took 60 seconds... 582: 257.761: Per-Operation Breakdown closefile4 640ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 640ops/s 10.5mb/s 0.1ms/op 64us/op-cpu openfile4 640ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile3 640ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 640ops/s 0.0mb/s 11.1ms/op 147us/op-cpu appendfilerand3 640ops/s 10.0mb/s 0.1ms/op 124us/op-cpu readfile3 640ops/s 10.5mb/s 0.1ms/op 67us/op-cpu openfile3 640ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile2 640ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 640ops/s 0.0mb/s 11.9ms/op 139us/op-cpu appendfilerand2 640ops/s 10.0mb/s 0.1ms/op 89us/op-cpu createfile2 640ops/s 0.0mb/s 0.2ms/op 157us/op-cpu deletefile1 640ops/s 0.0mb/s 0.1ms/op 87us/op-cpu 582: 257.761: IO Summary: 503112 ops 8320.2 ops/s, (1280/1280 r/w) 41.0mb/s, 296us cpu/op, 5.9ms latency 582: 257.761: Shutting down processes filebench> bash-3.00# zpool destroy se3510_hw_raid10_12disks bash-3.00# newfs -C 24 /dev/rdsk/c3t40d0s0 newfs: construct a new file system /dev/rdsk/c3t40d0s0: (y/n)? y Warning: 4164 sector(s) in last cylinder unallocated /dev/rdsk/c3t40d0s0: 857083836 sectors in 139500 cylinders of 48 tracks, 128 sectors 418498.0MB in 8719 cyl groups (16 c/g, 48.00MB/g, 5824 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920, Initializing cylinder groups: ............................................................................... ............................................................................... ................ super-block backups for last 10 cylinder groups at: 856130208, 856228640, 856327072, 856425504, 856523936, 856622368, 856720800, 856819232, 856917664, 857016096 bash-3.00# mount -o noatime /dev/dsk/c3t40d0s0 /mnt/ bash-3.00# bash-3.00# /opt/filebench/bin/sparcv9/filebench filebench> load varmail 632: 2.758: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 632: 2.759: Usage: set $dir=<dir> 632: 2.759: set $filesize=<size> defaults to 16384 632: 2.759: set $nfiles=<value> defaults to 1000 632: 2.759: set $nthreads=<value> defaults to 16 632: 2.759: set $meaniosize=<value> defaults to 16384 632: 2.759: set $meandirwidth=<size> defaults to 1000000 632: 2.759: (sets mean dir width and dir depth is calculated as log (width, nfiles) 632: 2.759: dirdepth therefore defaults to dir depth of 1 as in postmark 632: 2.759: set $meandir lower to increase depth beyond 1 if desired) 632: 2.759: 632: 2.759: run runtime (e.g. run 60) 632: 2.759: syntax error, token expected on line 51 filebench> set $dir=/mnt/ filebench> run 60 632: 7.699: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 632: 7.722: Creating fileset bigfileset... 632: 10.611: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 632: 10.611: Creating/pre-allocating files 632: 10.611: Starting 1 filereader instances 633: 11.615: Starting 16 filereaderthread threads 632: 14.625: Running... 632: 75.135: Run took 60 seconds... 632: 75.149: Per-Operation Breakdown closefile4 511ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 511ops/s 8.4mb/s 0.1ms/op 65us/op-cpu openfile4 511ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile3 511ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 511ops/s 0.0mb/s 9.7ms/op 168us/op-cpu appendfilerand3 511ops/s 8.0mb/s 2.6ms/op 190us/op-cpu readfile3 511ops/s 8.3mb/s 0.1ms/op 65us/op-cpu openfile3 511ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile2 511ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 511ops/s 0.0mb/s 8.4ms/op 152us/op-cpu appendfilerand2 511ops/s 8.0mb/s 1.7ms/op 170us/op-cpu createfile2 511ops/s 0.0mb/s 4.3ms/op 297us/op-cpu deletefile1 511ops/s 0.0mb/s 3.1ms/op 145us/op-cpu 632: 75.149: IO Summary: 401671 ops 6638.2 ops/s, (1021/1021 r/w) 32.7mb/s, 404us cpu/op, 7.5ms latency 632: 75.149: Shutting down processes filebench> run 60 632: 193.974: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 632: 194.874: Removed any existing fileset bigfileset in 1 seconds 632: 194.875: Creating fileset bigfileset... 632: 196.817: Preallocated 786 of 1000 of fileset bigfileset in 2 seconds 632: 196.817: Creating/pre-allocating files 632: 196.817: Starting 1 filereader instances 636: 197.825: Starting 16 filereaderthread threads 632: 200.835: Running... 632: 261.335: Run took 60 seconds... 632: 261.350: Per-Operation Breakdown closefile4 513ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 513ops/s 8.2mb/s 0.1ms/op 64us/op-cpu openfile4 513ops/s 0.0mb/s 0.0ms/op 38us/op-cpu closefile3 513ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 513ops/s 0.0mb/s 9.7ms/op 169us/op-cpu appendfilerand3 513ops/s 8.0mb/s 2.7ms/op 189us/op-cpu readfile3 513ops/s 8.3mb/s 0.1ms/op 65us/op-cpu openfile3 513ops/s 0.0mb/s 0.0ms/op 38us/op-cpu closefile2 513ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 513ops/s 0.0mb/s 8.4ms/op 154us/op-cpu appendfilerand2 513ops/s 8.0mb/s 1.7ms/op 165us/op-cpu createfile2 513ops/s 0.0mb/s 4.2ms/op 301us/op-cpu deletefile1 513ops/s 0.0mb/s 3.2ms/op 148us/op-cpu 632: 261.350: IO Summary: 403194 ops 6664.5 ops/s, (1025/1025 r/w) 32.5mb/s, 406us cpu/op, 7.5ms latency 632: 261.350: Shutting down processes filebench> v440, snv_44 bash-3.00# zpool status pool: zfs_raid10_12disks state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zfs_raid10_12disks ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c2t17d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 c2t19d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t22d0 ONLINE 0 0 0 c2t23d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t24d0 ONLINE 0 0 0 c2t25d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t26d0 ONLINE 0 0 0 c2t27d0 ONLINE 0 0 0 errors: No known data errors bash-3.00# bash-3.00# /opt/filebench/bin/sparcv9/filebench filebench> load varmail 393: 6.283: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 393: 6.283: Usage: set $dir=<dir> 393: 6.283: set $filesize=<size> defaults to 16384 393: 6.283: set $nfiles=<value> defaults to 1000 393: 6.283: set $nthreads=<value> defaults to 16 393: 6.283: set $meaniosize=<value> defaults to 16384 393: 6.284: set $meandirwidth=<size> defaults to 1000000 393: 6.284: (sets mean dir width and dir depth is calculated as log (width, nfiles) 393: 6.284: dirdepth therefore defaults to dir depth of 1 as in postmark 393: 6.284: set $meandir lower to increase depth beyond 1 if desired) 393: 6.284: 393: 6.284: run runtime (e.g. run 60) 393: 6.284: syntax error, token expected on line 51 filebench> set $dir=/zfs_raid10_12disks/t1/ filebench> run 60 393: 18.766: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 393: 18.767: Creating fileset bigfileset... 393: 23.020: Preallocated 812 of 1000 of fileset bigfileset in 5 seconds 393: 23.020: Creating/pre-allocating files 393: 23.020: Starting 1 filereader instances 394: 24.030: Starting 16 filereaderthread threads 393: 27.040: Running... 393: 87.440: Run took 60 seconds... 393: 87.453: Per-Operation Breakdown closefile4 711ops/s 0.0mb/s 0.0ms/op 9us/op-cpu readfile4 711ops/s 11.4mb/s 0.1ms/op 62us/op-cpu openfile4 711ops/s 0.0mb/s 0.1ms/op 65us/op-cpu closefile3 711ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 711ops/s 0.0mb/s 10.0ms/op 148us/op-cpu appendfilerand3 711ops/s 11.1mb/s 0.1ms/op 129us/op-cpu readfile3 711ops/s 11.6mb/s 0.1ms/op 63us/op-cpu openfile3 711ops/s 0.0mb/s 0.1ms/op 65us/op-cpu closefile2 711ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 711ops/s 0.0mb/s 10.0ms/op 115us/op-cpu appendfilerand2 711ops/s 11.1mb/s 0.1ms/op 97us/op-cpu createfile2 711ops/s 0.0mb/s 0.2ms/op 163us/op-cpu deletefile1 711ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 393: 87.454: IO Summary: 558331 ops 9244.1 ops/s, (1422/1422 r/w) 45.2mb/s, 312us cpu/op, 5.2ms latency 393: 87.454: Shutting down processes filebench> run 60 393: 118.054: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 393: 118.108: Removed any existing fileset bigfileset in 1 seconds 393: 118.108: Creating fileset bigfileset... 393: 122.619: Preallocated 786 of 1000 of fileset bigfileset in 5 seconds 393: 122.619: Creating/pre-allocating files 393: 122.619: Starting 1 filereader instances 401: 123.630: Starting 16 filereaderthread threads 393: 126.640: Running... 393: 187.040: Run took 60 seconds... 393: 187.053: Per-Operation Breakdown closefile4 685ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 685ops/s 11.1mb/s 0.1ms/op 62us/op-cpu openfile4 685ops/s 0.0mb/s 0.1ms/op 65us/op-cpu closefile3 685ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 685ops/s 0.0mb/s 10.5ms/op 150us/op-cpu appendfilerand3 685ops/s 10.7mb/s 0.1ms/op 124us/op-cpu readfile3 685ops/s 11.1mb/s 0.1ms/op 60us/op-cpu openfile3 685ops/s 0.0mb/s 0.1ms/op 65us/op-cpu closefile2 685ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 685ops/s 0.0mb/s 10.4ms/op 113us/op-cpu appendfilerand2 685ops/s 10.7mb/s 0.1ms/op 93us/op-cpu createfile2 685ops/s 0.0mb/s 0.2ms/op 156us/op-cpu deletefile1 685ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 393: 187.054: IO Summary: 537542 ops 8899.9 ops/s, (1369/1369 r/w) 43.5mb/s, 307us cpu/op, 5.4ms latency 393: 187.054: Shutting down processes filebench> This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Eric Schrock
2006-Aug-07 16:30 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
On Mon, Aug 07, 2006 at 06:16:12PM +0200, Robert Milkowski wrote:> > ES> Second, you may be able to get more performance from the ZFS filesystem > ES> on the HW lun by tweaking the max pending # of reqeusts. One thing > ES> we''ve found is that ZFS currently has a hardcoded limit of how many > ES> outstanding requests to send to the underlying vdev (35). This works > ES> well for most single devices, but large arrays can actually handle more, > ES> and we end up leaving some performance on the floor. Currently the only > ES> way to tweak this variable is through ''mdb -kw''. Try something like: > > Well, strange - I did try with value of 1, 60 and 256. And basically I > get the same results from varmail tests. >Well that''s good data, too. It means that this isn''t an impediment for this particular test. It was just a shot in the dark... - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Richard Elling
2006-Aug-07 16:54 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hi Robert, thanks for the data. Please clarify one thing for me. In the case of the HW raid, was there just one LUN? Or was it 12 LUNs? -- richard Robert Milkowski wrote:> Hi. > > 3510 with two HW controllers, configured on LUN in RAID-10 using 12 disks in head unit (FC-AL 73GB 15K disks). Optimization set to random, stripe size 32KB. Connected to v440 using two links, however in tests only one link was used (no MPxIO). > > I used filebench and varmail test with default parameters and run for 60s, test was run twice. > > System is S10U2 with all available patches (all support patches), kernel -18. > > > ZFS filesystem on HW lun with atime=off: > > IO Summary: 499078 ops 8248.0 ops/s, (1269/1269 r/w) 40.6mb/s, 314us cpu/op, 6.0ms latency > IO Summary: 503112 ops 8320.2 ops/s, (1280/1280 r/w) 41.0mb/s, 296us cpu/op, 5.9ms latency > > Now the same LUN but ZFS was destroyed and UFS filesystem was created. > UFS filesystem on HW lun with maxcontig=24 and noatime: > > IO Summary: 401671 ops 6638.2 ops/s, (1021/1021 r/w) 32.7mb/s, 404us cpu/op, 7.5ms latency > IO Summary: 403194 ops 6664.5 ops/s, (1025/1025 r/w) 32.5mb/s, 406us cpu/op, 7.5ms latency > > > > Now another v440 server (the same config) with snv_44, connected several 3510 JBODS on two FC-loops however only one loop was used (no MPxIO). The same disks (73GB FC-AL 15K). > > ZFS filesystem with atime=off with ZFS raid-10 using 12 disks from one enclosure: > > IO Summary: 558331 ops 9244.1 ops/s, (1422/1422 r/w) 45.2mb/s, 312us cpu/op, 5.2ms latency > IO Summary: 537542 ops 8899.9 ops/s, (1369/1369 r/w) 43.5mb/s, 307us cpu/op, 5.4ms latency > > > > > > > > ### details #### > > $ cat zfs-benhc.txt > > > v440, Generic_118833-18 > > filebench> set $dir=/se3510_hw_raid10_12disks/t1/ > filebench> run 60 > 582: 42.107: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 > 582: 42.108: Creating fileset bigfileset... > 582: 45.262: Preallocated 812 of 1000 of fileset bigfileset in 4 seconds > 582: 45.262: Creating/pre-allocating files > 582: 45.262: Starting 1 filereader instances > 586: 46.268: Starting 16 filereaderthread threads > 582: 49.278: Running... > 582: 109.787: Run took 60 seconds... > 582: 109.801: Per-Operation Breakdown > closefile4 634ops/s 0.0mb/s 0.0ms/op 8us/op-cpu > readfile4 634ops/s 10.3mb/s 0.1ms/op 65us/op-cpu > openfile4 634ops/s 0.0mb/s 0.1ms/op 63us/op-cpu > closefile3 634ops/s 0.0mb/s 0.0ms/op 11us/op-cpu > fsyncfile3 634ops/s 0.0mb/s 11.3ms/op 150us/op-cpu > appendfilerand3 635ops/s 9.9mb/s 0.1ms/op 132us/op-cpu > readfile3 635ops/s 10.4mb/s 0.1ms/op 66us/op-cpu > openfile3 635ops/s 0.0mb/s 0.1ms/op 63us/op-cpu > closefile2 635ops/s 0.0mb/s 0.0ms/op 11us/op-cpu > fsyncfile2 635ops/s 0.0mb/s 11.9ms/op 137us/op-cpu > appendfilerand2 635ops/s 9.9mb/s 0.1ms/op 94us/op-cpu > createfile2 634ops/s 0.0mb/s 0.2ms/op 163us/op-cpu > deletefile1 634ops/s 0.0mb/s 0.1ms/op 86us/op-cpu > > 582: 109.801: > IO Summary: 499078 ops 8248.0 ops/s, (1269/1269 r/w) 40.6mb/s, 314us cpu/op, 6.0ms latency > 582: 109.801: Shutting down processes > filebench> > > filebench> run 60 > 582: 190.655: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 > 582: 190.720: Removed any existing fileset bigfileset in 1 seconds > 582: 190.720: Creating fileset bigfileset... > 582: 193.259: Preallocated 786 of 1000 of fileset bigfileset in 3 seconds > 582: 193.259: Creating/pre-allocating files > 582: 193.259: Starting 1 filereader instances > 591: 194.268: Starting 16 filereaderthread threads > 582: 197.278: Running... > 582: 257.748: Run took 60 seconds... > 582: 257.761: Per-Operation Breakdown > closefile4 640ops/s 0.0mb/s 0.0ms/op 8us/op-cpu > readfile4 640ops/s 10.5mb/s 0.1ms/op 64us/op-cpu > openfile4 640ops/s 0.0mb/s 0.1ms/op 63us/op-cpu > closefile3 640ops/s 0.0mb/s 0.0ms/op 11us/op-cpu > fsyncfile3 640ops/s 0.0mb/s 11.1ms/op 147us/op-cpu > appendfilerand3 640ops/s 10.0mb/s 0.1ms/op 124us/op-cpu > readfile3 640ops/s 10.5mb/s 0.1ms/op 67us/op-cpu > openfile3 640ops/s 0.0mb/s 0.1ms/op 63us/op-cpu > closefile2 640ops/s 0.0mb/s 0.0ms/op 11us/op-cpu > fsyncfile2 640ops/s 0.0mb/s 11.9ms/op 139us/op-cpu > appendfilerand2 640ops/s 10.0mb/s 0.1ms/op 89us/op-cpu > createfile2 640ops/s 0.0mb/s 0.2ms/op 157us/op-cpu > deletefile1 640ops/s 0.0mb/s 0.1ms/op 87us/op-cpu > > 582: 257.761: > IO Summary: 503112 ops 8320.2 ops/s, (1280/1280 r/w) 41.0mb/s, 296us cpu/op, 5.9ms latency > 582: 257.761: Shutting down processes > filebench> > > > > > bash-3.00# zpool destroy se3510_hw_raid10_12disks > bash-3.00# newfs -C 24 /dev/rdsk/c3t40d0s0 > newfs: construct a new file system /dev/rdsk/c3t40d0s0: (y/n)? y > Warning: 4164 sector(s) in last cylinder unallocated > /dev/rdsk/c3t40d0s0: 857083836 sectors in 139500 cylinders of 48 tracks, 128 sectors > 418498.0MB in 8719 cyl groups (16 c/g, 48.00MB/g, 5824 i/g) > super-block backups (for fsck -F ufs -o b=#) at: > 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920, > Initializing cylinder groups: > ............................................................................... > ............................................................................... > ................ > super-block backups for last 10 cylinder groups at: > 856130208, 856228640, 856327072, 856425504, 856523936, 856622368, 856720800, > 856819232, 856917664, 857016096 > bash-3.00# mount -o noatime /dev/dsk/c3t40d0s0 /mnt/ > bash-3.00# > > > bash-3.00# /opt/filebench/bin/sparcv9/filebench > filebench> load varmail > 632: 2.758: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded > 632: 2.759: Usage: set $dir=<dir> > 632: 2.759: set $filesize=<size> defaults to 16384 > 632: 2.759: set $nfiles=<value> defaults to 1000 > 632: 2.759: set $nthreads=<value> defaults to 16 > 632: 2.759: set $meaniosize=<value> defaults to 16384 > 632: 2.759: set $meandirwidth=<size> defaults to 1000000 > 632: 2.759: (sets mean dir width and dir depth is calculated as log (width, nfiles) > 632: 2.759: dirdepth therefore defaults to dir depth of 1 as in postmark > 632: 2.759: set $meandir lower to increase depth beyond 1 if desired) > 632: 2.759: > 632: 2.759: run runtime (e.g. run 60) > 632: 2.759: syntax error, token expected on line 51 > filebench> set $dir=/mnt/ > filebench> run 60 > 632: 7.699: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 > 632: 7.722: Creating fileset bigfileset... > 632: 10.611: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds > 632: 10.611: Creating/pre-allocating files > 632: 10.611: Starting 1 filereader instances > 633: 11.615: Starting 16 filereaderthread threads > 632: 14.625: Running... > 632: 75.135: Run took 60 seconds... > 632: 75.149: Per-Operation Breakdown > closefile4 511ops/s 0.0mb/s 0.0ms/op 8us/op-cpu > readfile4 511ops/s 8.4mb/s 0.1ms/op 65us/op-cpu > openfile4 511ops/s 0.0mb/s 0.0ms/op 37us/op-cpu > closefile3 511ops/s 0.0mb/s 0.0ms/op 12us/op-cpu > fsyncfile3 511ops/s 0.0mb/s 9.7ms/op 168us/op-cpu > appendfilerand3 511ops/s 8.0mb/s 2.6ms/op 190us/op-cpu > readfile3 511ops/s 8.3mb/s 0.1ms/op 65us/op-cpu > openfile3 511ops/s 0.0mb/s 0.0ms/op 37us/op-cpu > closefile2 511ops/s 0.0mb/s 0.0ms/op 12us/op-cpu > fsyncfile2 511ops/s 0.0mb/s 8.4ms/op 152us/op-cpu > appendfilerand2 511ops/s 8.0mb/s 1.7ms/op 170us/op-cpu > createfile2 511ops/s 0.0mb/s 4.3ms/op 297us/op-cpu > deletefile1 511ops/s 0.0mb/s 3.1ms/op 145us/op-cpu > > 632: 75.149: > IO Summary: 401671 ops 6638.2 ops/s, (1021/1021 r/w) 32.7mb/s, 404us cpu/op, 7.5ms latency > 632: 75.149: Shutting down processes > filebench> run 60 > 632: 193.974: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 > 632: 194.874: Removed any existing fileset bigfileset in 1 seconds > 632: 194.875: Creating fileset bigfileset... > 632: 196.817: Preallocated 786 of 1000 of fileset bigfileset in 2 seconds > 632: 196.817: Creating/pre-allocating files > 632: 196.817: Starting 1 filereader instances > 636: 197.825: Starting 16 filereaderthread threads > 632: 200.835: Running... > 632: 261.335: Run took 60 seconds... > 632: 261.350: Per-Operation Breakdown > closefile4 513ops/s 0.0mb/s 0.0ms/op 8us/op-cpu > readfile4 513ops/s 8.2mb/s 0.1ms/op 64us/op-cpu > openfile4 513ops/s 0.0mb/s 0.0ms/op 38us/op-cpu > closefile3 513ops/s 0.0mb/s 0.0ms/op 12us/op-cpu > fsyncfile3 513ops/s 0.0mb/s 9.7ms/op 169us/op-cpu > appendfilerand3 513ops/s 8.0mb/s 2.7ms/op 189us/op-cpu > readfile3 513ops/s 8.3mb/s 0.1ms/op 65us/op-cpu > openfile3 513ops/s 0.0mb/s 0.0ms/op 38us/op-cpu > closefile2 513ops/s 0.0mb/s 0.0ms/op 12us/op-cpu > fsyncfile2 513ops/s 0.0mb/s 8.4ms/op 154us/op-cpu > appendfilerand2 513ops/s 8.0mb/s 1.7ms/op 165us/op-cpu > createfile2 513ops/s 0.0mb/s 4.2ms/op 301us/op-cpu > deletefile1 513ops/s 0.0mb/s 3.2ms/op 148us/op-cpu > > 632: 261.350: > IO Summary: 403194 ops 6664.5 ops/s, (1025/1025 r/w) 32.5mb/s, 406us cpu/op, 7.5ms latency > 632: 261.350: Shutting down processes > filebench> > > > > > > > > > v440, snv_44 > > bash-3.00# zpool status > pool: zfs_raid10_12disks > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > zfs_raid10_12disks ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c2t16d0 ONLINE 0 0 0 > c2t17d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c2t18d0 ONLINE 0 0 0 > c2t19d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c2t20d0 ONLINE 0 0 0 > c2t21d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c2t22d0 ONLINE 0 0 0 > c2t23d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c2t24d0 ONLINE 0 0 0 > c2t25d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c2t26d0 ONLINE 0 0 0 > c2t27d0 ONLINE 0 0 0 > > errors: No known data errors > bash-3.00# > > > bash-3.00# /opt/filebench/bin/sparcv9/filebench > filebench> load varmail > 393: 6.283: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded > 393: 6.283: Usage: set $dir=<dir> > 393: 6.283: set $filesize=<size> defaults to 16384 > 393: 6.283: set $nfiles=<value> defaults to 1000 > 393: 6.283: set $nthreads=<value> defaults to 16 > 393: 6.283: set $meaniosize=<value> defaults to 16384 > 393: 6.284: set $meandirwidth=<size> defaults to 1000000 > 393: 6.284: (sets mean dir width and dir depth is calculated as log (width, nfiles) > 393: 6.284: dirdepth therefore defaults to dir depth of 1 as in postmark > 393: 6.284: set $meandir lower to increase depth beyond 1 if desired) > 393: 6.284: > 393: 6.284: run runtime (e.g. run 60) > 393: 6.284: syntax error, token expected on line 51 > filebench> set $dir=/zfs_raid10_12disks/t1/ > filebench> run 60 > 393: 18.766: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 > 393: 18.767: Creating fileset bigfileset... > 393: 23.020: Preallocated 812 of 1000 of fileset bigfileset in 5 seconds > 393: 23.020: Creating/pre-allocating files > 393: 23.020: Starting 1 filereader instances > 394: 24.030: Starting 16 filereaderthread threads > 393: 27.040: Running... > 393: 87.440: Run took 60 seconds... > 393: 87.453: Per-Operation Breakdown > closefile4 711ops/s 0.0mb/s 0.0ms/op 9us/op-cpu > readfile4 711ops/s 11.4mb/s 0.1ms/op 62us/op-cpu > openfile4 711ops/s 0.0mb/s 0.1ms/op 65us/op-cpu > closefile3 711ops/s 0.0mb/s 0.0ms/op 11us/op-cpu > fsyncfile3 711ops/s 0.0mb/s 10.0ms/op 148us/op-cpu > appendfilerand3 711ops/s 11.1mb/s 0.1ms/op 129us/op-cpu > readfile3 711ops/s 11.6mb/s 0.1ms/op 63us/op-cpu > openfile3 711ops/s 0.0mb/s 0.1ms/op 65us/op-cpu > closefile2 711ops/s 0.0mb/s 0.0ms/op 11us/op-cpu > fsyncfile2 711ops/s 0.0mb/s 10.0ms/op 115us/op-cpu > appendfilerand2 711ops/s 11.1mb/s 0.1ms/op 97us/op-cpu > createfile2 711ops/s 0.0mb/s 0.2ms/op 163us/op-cpu > deletefile1 711ops/s 0.0mb/s 0.1ms/op 89us/op-cpu > > 393: 87.454: > IO Summary: 558331 ops 9244.1 ops/s, (1422/1422 r/w) 45.2mb/s, 312us cpu/op, 5.2ms latency > 393: 87.454: Shutting down processes > filebench> run 60 > 393: 118.054: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 > 393: 118.108: Removed any existing fileset bigfileset in 1 seconds > 393: 118.108: Creating fileset bigfileset... > 393: 122.619: Preallocated 786 of 1000 of fileset bigfileset in 5 seconds > 393: 122.619: Creating/pre-allocating files > 393: 122.619: Starting 1 filereader instances > 401: 123.630: Starting 16 filereaderthread threads > 393: 126.640: Running... > 393: 187.040: Run took 60 seconds... > 393: 187.053: Per-Operation Breakdown > closefile4 685ops/s 0.0mb/s 0.0ms/op 8us/op-cpu > readfile4 685ops/s 11.1mb/s 0.1ms/op 62us/op-cpu > openfile4 685ops/s 0.0mb/s 0.1ms/op 65us/op-cpu > closefile3 685ops/s 0.0mb/s 0.0ms/op 11us/op-cpu > fsyncfile3 685ops/s 0.0mb/s 10.5ms/op 150us/op-cpu > appendfilerand3 685ops/s 10.7mb/s 0.1ms/op 124us/op-cpu > readfile3 685ops/s 11.1mb/s 0.1ms/op 60us/op-cpu > openfile3 685ops/s 0.0mb/s 0.1ms/op 65us/op-cpu > closefile2 685ops/s 0.0mb/s 0.0ms/op 11us/op-cpu > fsyncfile2 685ops/s 0.0mb/s 10.4ms/op 113us/op-cpu > appendfilerand2 685ops/s 10.7mb/s 0.1ms/op 93us/op-cpu > createfile2 685ops/s 0.0mb/s 0.2ms/op 156us/op-cpu > deletefile1 685ops/s 0.0mb/s 0.1ms/op 89us/op-cpu > > 393: 187.054: > IO Summary: 537542 ops 8899.9 ops/s, (1369/1369 r/w) 43.5mb/s, 307us cpu/op, 5.4ms latency > 393: 187.054: Shutting down processes > filebench> > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
eric kustarz
2006-Aug-07 17:38 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
>ES> Second, you may be able to get more performance from the ZFS filesystem >ES> on the HW lun by tweaking the max pending # of reqeusts. One thing >ES> we''ve found is that ZFS currently has a hardcoded limit of how many >ES> outstanding requests to send to the underlying vdev (35). This works >ES> well for most single devices, but large arrays can actually handle more, >ES> and we end up leaving some performance on the floor. Currently the only >ES> way to tweak this variable is through ''mdb -kw''. Try something like: > >Well, strange - I did try with value of 1, 60 and 256. And basically I >get the same results from varmail tests. > > > >If vdev_reopen() is called then it will reset vq_max_pending to the vdev_knob''s default value. So you can set the "global" vq_max_pending in vdev_knob (though this affects all pools and all vdevs of each pool): #mdb -kw > vdev_knob::print .... Also, here''s a simple dscript (doesn''t work on U2 though due to a CTF bug, but works on nevada). This tells the average and distribution # of I/Os you tried doing. So if you find this under 35, then upping vq_max_pending won''t help. If however, you find you''re continually hitting the upper limit of 35, upping vq_max_pending should help. #!/usr/sbin/dtrace -s vdev_queue_io_to_issue:return /arg1 != NULL/ { @c["issued I/O"] = count(); } vdev_queue_io_to_issue:return /arg1 == NULL/ { @c["didn''t issue I/O"] = count(); } vdev_queue_io_to_issue:entry { @avgers["avg pending I/Os"] = avg(args[0]->vq_pending_tree.avl_numnodes); @lquant["quant pending I/Os"] = quantize(args[0]->vq_pending_tree.avl_numnodes); @c["total times tried to issue I/O"] = count(); } vdev_queue_io_to_issue:entry /args[0]->vq_pending_tree.avl_numnodes > 349/ { @avgers["avg pending I/Os > 349"] = avg(args[0]->vq_pending_tree.avl_numnodes); @quant["quant pending I/Os > 349"] = lquantize(args[0]->vq_pending_tree.avl_numnodes, 33, 1000, 1); @c["total times tried to issue I/O where > 349"] = count(); } /* bail after 5 minutes */ tick-300sec { exit(0); }
Robert Milkowski
2006-Aug-08 08:38 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Richard, Monday, August 7, 2006, 6:54:37 PM, you wrote: RE> Hi Robert, thanks for the data. RE> Please clarify one thing for me. RE> In the case of the HW raid, was there just one LUN? Or was it 12 LUNs? Just one lun which was build on 3510 from 12 luns in raid-1(0). -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski
2006-Aug-08 14:13 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hi. This time some RAID5/RAID-Z benchmarks. This time I connected 3510 head unit with one link to the same server as 3510 JBODs are connected (using second link). snv_44 is used, server is v440. I also tried changing max pending IO requests for HW raid5 lun and checked with DTrace that larger value is really used - it is but it doesn''t change benchmark numbers. 1. ZFS on HW RAID5 with 6 disks, atime=off IO Summary: 444386 ops 7341.7 ops/s, (1129/1130 r/w) 36.1mb/s, 297us cpu/op, 6.6ms latency IO Summary: 438649 ops 7247.0 ops/s, (1115/1115 r/w) 35.5mb/s, 293us cpu/op, 6.7ms latency 2. ZFS with software RAID-Z with 6 disks, atime=off IO Summary: 457505 ops 7567.3 ops/s, (1164/1164 r/w) 37.2mb/s, 340us cpu/op, 6.4ms latency IO Summary: 457767 ops 7567.8 ops/s, (1164/1165 r/w) 36.9mb/s, 340us cpu/op, 6.4ms latency 3. UFS on HW RAID5 with 6 disks, noatime IO Summary: 62776 ops 1037.3 ops/s, (160/160 r/w) 5.5mb/s, 481us cpu/op, 49.7ms latency IO Summary: 63661 ops 1051.6 ops/s, (162/162 r/w) 5.4mb/s, 477us cpu/op, 49.1ms latency 4. UFS on HW RAID5 with 6 disks, noatime, S10U2 + patches (the same filesystem mounted as in 3) IO Summary: 393167 ops 6503.1 ops/s, (1000/1001 r/w) 32.4mb/s, 405us cpu/op, 7.5ms latency IO Summary: 394525 ops 6521.2 ops/s, (1003/1003 r/w) 32.0mb/s, 407us cpu/op, 7.7ms latency 5. ZFS with software RAID-Z with 6 disks, atime=off, S10U2 + patches (the same disks as in test #2) IO Summary: 461708 ops 7635.5 ops/s, (1175/1175 r/w) 37.4mb/s, 330us cpu/op, 6.4ms latency IO Summary: 457649 ops 7562.1 ops/s, (1163/1164 r/w) 37.0mb/s, 328us cpu/op, 6.5ms latency In this benchmark software raid-5 with ZFS (raid-z to be precise) gives a little bit better performance than hardware raid-5. ZFS is also faster in both cases (HW ans SW raid) than UFS on HW raid. Something is wrong with UFS on snv_44 - the same ufs filesystem on s10U2 works as expected. ZFS on S10U2 in this benchmark gives the same results as on snv_44. #### details #### // c2t43d0 is a HW raid5 made of 6 disks // array is configured for random IO''s # zpool create HW_RAID5_6disks c2t43d0 # # zpool create -f zfs_raid5_6disks raidz c3t16d0 c3t17d0 c3t18d0 c3t19d0 c3t20d0 c3t21d0 # # zfs set atime=off zfs_raid5_6disks HW_RAID5_6disks # # zfs create HW_RAID5_6disks/t1 # zfs create zfs_raid5_6disks/t1 # # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 450: 3.175: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 450: 3.199: Usage: set $dir=<dir> 450: 3.199: set $filesize=<size> defaults to 16384 450: 3.199: set $nfiles=<value> defaults to 1000 450: 3.199: set $nthreads=<value> defaults to 16 450: 3.199: set $meaniosize=<value> defaults to 16384 450: 3.199: set $meandirwidth=<size> defaults to 1000000 450: 3.199: (sets mean dir width and dir depth is calculated as log (width, nfiles) 450: 3.199: dirdepth therefore defaults to dir depth of 1 as in postmark 450: 3.199: set $meandir lower to increase depth beyond 1 if desired) 450: 3.199: 450: 3.199: run runtime (e.g. run 60) 450: 3.199: syntax error, token expected on line 51 filebench> set $dir=/HW_RAID5_6disks/t1 filebench> run 60 450: 13.320: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 450: 13.321: Creating fileset bigfileset... 450: 15.514: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 450: 15.515: Creating/pre-allocating files 450: 15.515: Starting 1 filereader instances 451: 16.525: Starting 16 filereaderthread threads 450: 19.535: Running... 450: 80.065: Run took 60 seconds... 450: 80.079: Per-Operation Breakdown closefile4 565ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 565ops/s 9.2mb/s 0.1ms/op 60us/op-cpu openfile4 565ops/s 0.0mb/s 0.1ms/op 64us/op-cpu closefile3 565ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 565ops/s 0.0mb/s 12.9ms/op 147us/op-cpu appendfilerand3 565ops/s 8.8mb/s 0.1ms/op 126us/op-cpu readfile3 565ops/s 9.2mb/s 0.1ms/op 60us/op-cpu openfile3 565ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile2 565ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 565ops/s 0.0mb/s 12.9ms/op 102us/op-cpu appendfilerand2 565ops/s 8.8mb/s 0.1ms/op 92us/op-cpu createfile2 565ops/s 0.0mb/s 0.2ms/op 154us/op-cpu deletefile1 565ops/s 0.0mb/s 0.1ms/op 86us/op-cpu 450: 80.079: IO Summary: 444386 ops 7341.7 ops/s, (1129/1130 r/w) 36.1mb/s, 297us cpu/op, 6.6ms latency 450: 80.079: Shutting down processes filebench> run 60 450: 115.945: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 450: 115.998: Removed any existing fileset bigfileset in 1 seconds 450: 115.998: Creating fileset bigfileset... 450: 118.049: Preallocated 786 of 1000 of fileset bigfileset in 3 seconds 450: 118.049: Creating/pre-allocating files 450: 118.049: Starting 1 filereader instances 454: 119.055: Starting 16 filereaderthread threads 450: 122.065: Running... 450: 182.595: Run took 60 seconds... 450: 182.608: Per-Operation Breakdown closefile4 557ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 557ops/s 9.0mb/s 0.1ms/op 59us/op-cpu openfile4 557ops/s 0.0mb/s 0.1ms/op 64us/op-cpu closefile3 557ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 557ops/s 0.0mb/s 13.0ms/op 149us/op-cpu appendfilerand3 558ops/s 8.7mb/s 0.1ms/op 120us/op-cpu readfile3 558ops/s 9.0mb/s 0.1ms/op 59us/op-cpu openfile3 558ops/s 0.0mb/s 0.1ms/op 64us/op-cpu closefile2 558ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 558ops/s 0.0mb/s 13.2ms/op 100us/op-cpu appendfilerand2 558ops/s 8.7mb/s 0.1ms/op 90us/op-cpu createfile2 557ops/s 0.0mb/s 0.1ms/op 151us/op-cpu deletefile1 557ops/s 0.0mb/s 0.1ms/op 86us/op-cpu 450: 182.609: IO Summary: 438649 ops 7247.0 ops/s, (1115/1115 r/w) 35.5mb/s, 293us cpu/op, 6.7ms latency 450: 182.609: Shutting down processes filebench> quit # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 458: 2.590: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 458: 2.591: Usage: set $dir=<dir> 458: 2.591: set $filesize=<size> defaults to 16384 458: 2.591: set $nfiles=<value> defaults to 1000 458: 2.591: set $nthreads=<value> defaults to 16 458: 2.591: set $meaniosize=<value> defaults to 16384 458: 2.591: set $meandirwidth=<size> defaults to 1000000 458: 2.591: (sets mean dir width and dir depth is calculated as log (width, nfiles) 458: 2.591: dirdepth therefore defaults to dir depth of 1 as in postmark 458: 2.592: set $meandir lower to increase depth beyond 1 if desired) 458: 2.592: 458: 2.592: run runtime (e.g. run 60) 458: 2.592: syntax error, token expected on line 51 filebench> set $dir=/zfs_raid5_6disks/t1 filebench> run 60 458: 9.251: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 458: 9.251: Creating fileset bigfileset... 458: 14.232: Preallocated 812 of 1000 of fileset bigfileset in 5 seconds 458: 14.232: Creating/pre-allocating files 458: 14.232: Starting 1 filereader instances 459: 15.235: Starting 16 filereaderthread threads 458: 18.245: Running... 458: 78.704: Run took 60 seconds... 458: 78.718: Per-Operation Breakdown closefile4 582ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 582ops/s 9.6mb/s 0.1ms/op 62us/op-cpu openfile4 582ops/s 0.0mb/s 0.1ms/op 67us/op-cpu closefile3 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 582ops/s 0.0mb/s 12.4ms/op 206us/op-cpu appendfilerand3 582ops/s 9.1mb/s 0.1ms/op 125us/op-cpu readfile3 582ops/s 9.5mb/s 0.1ms/op 61us/op-cpu openfile3 582ops/s 0.0mb/s 0.1ms/op 66us/op-cpu closefile2 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 582ops/s 0.0mb/s 12.4ms/op 132us/op-cpu appendfilerand2 582ops/s 9.1mb/s 0.1ms/op 94us/op-cpu createfile2 582ops/s 0.0mb/s 0.2ms/op 160us/op-cpu deletefile1 582ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 458: 78.718: IO Summary: 457505 ops 7567.3 ops/s, (1164/1164 r/w) 37.2mb/s, 340us cpu/op, 6.4ms latency 458: 78.718: Shutting down processes filebench> run 60 458: 98.396: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 458: 98.449: Removed any existing fileset bigfileset in 1 seconds 458: 98.449: Creating fileset bigfileset... 458: 103.837: Preallocated 786 of 1000 of fileset bigfileset in 6 seconds 458: 103.837: Creating/pre-allocating files 458: 103.837: Starting 1 filereader instances 468: 104.845: Starting 16 filereaderthread threads 458: 107.854: Running... 458: 168.345: Run took 60 seconds... 458: 168.358: Per-Operation Breakdown closefile4 582ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 582ops/s 9.4mb/s 0.1ms/op 61us/op-cpu openfile4 582ops/s 0.0mb/s 0.1ms/op 66us/op-cpu closefile3 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 582ops/s 0.0mb/s 12.5ms/op 207us/op-cpu appendfilerand3 582ops/s 9.1mb/s 0.1ms/op 124us/op-cpu readfile3 582ops/s 9.4mb/s 0.1ms/op 61us/op-cpu openfile3 582ops/s 0.0mb/s 0.1ms/op 66us/op-cpu closefile2 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 582ops/s 0.0mb/s 12.3ms/op 132us/op-cpu appendfilerand2 582ops/s 9.1mb/s 0.1ms/op 94us/op-cpu createfile2 582ops/s 0.0mb/s 0.2ms/op 156us/op-cpu deletefile1 582ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 458: 168.359: IO Summary: 457767 ops 7567.8 ops/s, (1164/1165 r/w) 36.9mb/s, 340us cpu/op, 6.4ms latency 458: 168.359: Shutting down processes filebench> # zpool destroy HW_RAID5_6disks # newfs -C 20 /dev/rdsk/c2t43d0s0 newfs: construct a new file system /dev/rdsk/c2t43d0s0: (y/n)? y Warning: 68 sector(s) in last cylinder unallocated /dev/rdsk/c2t43d0s0: 714233788 sectors in 116249 cylinders of 48 tracks, 128 sectors 348747.0MB in 7266 cyl groups (16 c/g, 48.00MB/g, 5824 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920, Initializing cylinder groups: ............................................................................... .................................................................. super-block backups for last 10 cylinder groups at: 713296928, 713395360, 713493792, 713592224, 713690656, 713789088, 713887520, 713985952, 714084384, 714182816 # # mount -o noatime /dev/dsk/c2t43d0s0 /mnt # # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 546: 2.573: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 546: 2.573: Usage: set $dir=<dir> 546: 2.573: set $filesize=<size> defaults to 16384 546: 2.573: set $nfiles=<value> defaults to 1000 546: 2.574: set $nthreads=<value> defaults to 16 546: 2.574: set $meaniosize=<value> defaults to 16384 546: 2.574: set $meandirwidth=<size> defaults to 1000000 546: 2.574: (sets mean dir width and dir depth is calculated as log (width, nfiles) 546: 2.574: dirdepth therefore defaults to dir depth of 1 as in postmark 546: 2.574: set $meandir lower to increase depth beyond 1 if desired) 546: 2.574: 546: 2.574: run runtime (e.g. run 60) 546: 2.574: syntax error, token expected on line 51 filebench> set $dir=/mnt filebench> run 60 546: 22.095: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 546: 22.109: Creating fileset bigfileset... 546: 24.577: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 546: 24.577: Creating/pre-allocating files 546: 24.577: Starting 1 filereader instances 548: 25.584: Starting 16 filereaderthread threads 546: 28.594: Running... 546: 89.114: Run took 60 seconds... 546: 89.128: Per-Operation Breakdown closefile4 80ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 80ops/s 1.5mb/s 0.1ms/op 76us/op-cpu openfile4 80ops/s 0.0mb/s 0.0ms/op 39us/op-cpu closefile3 80ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 80ops/s 0.0mb/s 29.2ms/op 107us/op-cpu appendfilerand3 80ops/s 1.2mb/s 30.4ms/op 189us/op-cpu readfile3 80ops/s 1.5mb/s 0.1ms/op 73us/op-cpu openfile3 80ops/s 0.0mb/s 0.0ms/op 38us/op-cpu closefile2 80ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 80ops/s 0.0mb/s 30.8ms/op 125us/op-cpu appendfilerand2 80ops/s 1.2mb/s 22.6ms/op 173us/op-cpu createfile2 80ops/s 0.0mb/s 37.2ms/op 224us/op-cpu deletefile1 80ops/s 0.0mb/s 48.5ms/op 108us/op-cpu 546: 89.128: IO Summary: 62776 ops 1037.3 ops/s, (160/160 r/w) 5.5mb/s, 481us cpu/op, 49.7ms latency 546: 89.128: Shutting down processes filebench> run 60 546: 738.541: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 546: 739.455: Removed any existing fileset bigfileset in 1 seconds 546: 739.455: Creating fileset bigfileset... 546: 741.387: Preallocated 786 of 1000 of fileset bigfileset in 2 seconds 546: 741.387: Creating/pre-allocating files 546: 741.387: Starting 1 filereader instances 557: 742.394: Starting 16 filereaderthread threads 546: 745.404: Running... 546: 805.944: Run took 60 seconds... 546: 805.958: Per-Operation Breakdown closefile4 81ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 81ops/s 1.5mb/s 0.1ms/op 73us/op-cpu openfile4 81ops/s 0.0mb/s 0.0ms/op 38us/op-cpu closefile3 81ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 81ops/s 0.0mb/s 27.8ms/op 105us/op-cpu appendfilerand3 81ops/s 1.3mb/s 28.6ms/op 187us/op-cpu readfile3 81ops/s 1.4mb/s 0.1ms/op 70us/op-cpu openfile3 81ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile2 81ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 81ops/s 0.0mb/s 29.9ms/op 124us/op-cpu appendfilerand2 81ops/s 1.3mb/s 23.6ms/op 171us/op-cpu createfile2 81ops/s 0.0mb/s 38.9ms/op 220us/op-cpu deletefile1 81ops/s 0.0mb/s 47.4ms/op 109us/op-cpu 546: 805.958: IO Summary: 63661 ops 1051.6 ops/s, (162/162 r/w) 5.4mb/s, 477us cpu/op, 49.1ms latency 546: 805.958: Shutting down processes filebench> #### solaris 10 06/06 + patches, server with the same hardware specs ##### ##### test # 4 # mount -o noatime /dev/dsk/c3t40d0s0 /mnt # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 1384: 3.678: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 1384: 3.679: Usage: set $dir=<dir> 1384: 3.679: set $filesize=<size> defaults to 16384 1384: 3.679: set $nfiles=<value> defaults to 1000 1384: 3.679: set $nthreads=<value> defaults to 16 1384: 3.679: set $meaniosize=<value> defaults to 16384 1384: 3.679: set $meandirwidth=<size> defaults to 1000000 1384: 3.679: (sets mean dir width and dir depth is calculated as log (width, nfiles) 1384: 3.679: dirdepth therefore defaults to dir depth of 1 as in postmark 1384: 3.679: set $meandir lower to increase depth beyond 1 if desired) 1384: 3.680: 1384: 3.680: run runtime (e.g. run 60) 1384: 3.680: syntax error, token expected on line 51 filebench> set $dir=/mnt filebench> run 60 1384: 10.872: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 1384: 11.858: Removed any existing fileset bigfileset in 1 seconds 1384: 11.859: Creating fileset bigfileset... 1384: 14.221: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 1384: 14.221: Creating/pre-allocating files 1384: 14.221: Starting 1 filereader instances 1387: 15.231: Starting 16 filereaderthread threads 1384: 18.241: Running... 1384: 78.701: Run took 60 seconds... 1384: 78.715: Per-Operation Breakdown closefile4 500ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 500ops/s 8.4mb/s 0.1ms/op 65us/op-cpu openfile4 500ops/s 0.0mb/s 0.0ms/op 36us/op-cpu closefile3 500ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 500ops/s 0.0mb/s 9.7ms/op 169us/op-cpu appendfilerand3 500ops/s 7.8mb/s 2.6ms/op 187us/op-cpu readfile3 500ops/s 8.3mb/s 0.1ms/op 64us/op-cpu openfile3 500ops/s 0.0mb/s 0.0ms/op 36us/op-cpu closefile2 500ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 500ops/s 0.0mb/s 8.4ms/op 154us/op-cpu appendfilerand2 500ops/s 7.8mb/s 1.7ms/op 168us/op-cpu createfile2 500ops/s 0.0mb/s 4.3ms/op 298us/op-cpu deletefile1 500ops/s 0.0mb/s 3.2ms/op 144us/op-cpu 1384: 78.715: IO Summary: 393167 ops 6503.1 ops/s, (1000/1001 r/w) 32.4mb/s, 405us cpu/op, 7.5ms latency 1384: 78.715: Shutting down processes filebench> run 60 1384: 94.146: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 1384: 95.767: Removed any existing fileset bigfileset in 2 seconds 1384: 95.768: Creating fileset bigfileset... 1384: 97.972: Preallocated 786 of 1000 of fileset bigfileset in 3 seconds 1384: 97.973: Creating/pre-allocating files 1384: 97.973: Starting 1 filereader instances 1393: 98.981: Starting 16 filereaderthread threads 1384: 101.991: Running... 1384: 162.491: Run took 60 seconds... 1384: 162.505: Per-Operation Breakdown closefile4 502ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 502ops/s 8.1mb/s 0.1ms/op 64us/op-cpu openfile4 502ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile3 502ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 502ops/s 0.0mb/s 9.9ms/op 172us/op-cpu appendfilerand3 502ops/s 7.8mb/s 2.7ms/op 189us/op-cpu readfile3 502ops/s 8.2mb/s 0.1ms/op 65us/op-cpu openfile3 502ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile2 502ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 502ops/s 0.0mb/s 8.6ms/op 156us/op-cpu appendfilerand2 502ops/s 7.8mb/s 1.7ms/op 166us/op-cpu createfile2 502ops/s 0.0mb/s 4.4ms/op 301us/op-cpu deletefile1 502ops/s 0.0mb/s 3.2ms/op 148us/op-cpu 1384: 162.506: IO Summary: 394525 ops 6521.2 ops/s, (1003/1003 r/w) 32.0mb/s, 407us cpu/op, 7.7ms latency 1384: 162.506: Shutting down processes filebench> #### test 5 #### these are the same disks as used in test #2 # zpool create zfs_raid5_6disks raidz c2t16d0 c2t17d0 c2t18d0 c2t19d0 c2t20d0 c2t21d0 # zfs set atime=off zfs_raid5_6disks # zfs create zfs_raid5_6disks/t1 # # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 1437: 3.762: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 1437: 3.762: Usage: set $dir=<dir> 1437: 3.762: set $filesize=<size> defaults to 16384 1437: 3.762: set $nfiles=<value> defaults to 1000 1437: 3.763: set $nthreads=<value> defaults to 16 1437: 3.763: set $meaniosize=<value> defaults to 16384 1437: 3.763: set $meandirwidth=<size> defaults to 1000000 1437: 3.763: (sets mean dir width and dir depth is calculated as log (width, nfiles) 1437: 3.763: dirdepth therefore defaults to dir depth of 1 as in postmark 1437: 3.763: set $meandir lower to increase depth beyond 1 if desired) 1437: 3.763: 1437: 3.763: run runtime (e.g. run 60) 1437: 3.763: syntax error, token expected on line 51 filebench> set $dir=/zfs_raid5_6disks/t1 filebench> run 60 1437: 13.102: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 1437: 13.102: Creating fileset bigfileset... 1437: 20.092: Preallocated 812 of 1000 of fileset bigfileset in 7 seconds 1437: 20.092: Creating/pre-allocating files 1437: 20.092: Starting 1 filereader instances 1438: 21.095: Starting 16 filereaderthread threads 1437: 24.105: Running... 1437: 84.575: Run took 60 seconds... 1437: 84.589: Per-Operation Breakdown closefile4 587ops/s 0.0mb/s 0.0ms/op 9us/op-cpu readfile4 587ops/s 9.5mb/s 0.1ms/op 63us/op-cpu openfile4 587ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile3 587ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 587ops/s 0.0mb/s 12.1ms/op 196us/op-cpu appendfilerand3 587ops/s 9.2mb/s 0.1ms/op 123us/op-cpu readfile3 587ops/s 9.5mb/s 0.1ms/op 64us/op-cpu openfile3 587ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile2 587ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 587ops/s 0.0mb/s 12.6ms/op 145us/op-cpu appendfilerand2 588ops/s 9.2mb/s 0.1ms/op 93us/op-cpu createfile2 587ops/s 0.0mb/s 0.2ms/op 166us/op-cpu deletefile1 587ops/s 0.0mb/s 0.1ms/op 90us/op-cpu 1437: 84.589: IO Summary: 461708 ops 7635.5 ops/s, (1175/1175 r/w) 37.4mb/s, 330us cpu/op, 6.4ms latency 1437: 84.589: Shutting down processes filebench> run 60 1437: 136.114: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 1437: 136.171: Removed any existing fileset bigfileset in 1 seconds 1437: 136.172: Creating fileset bigfileset... 1437: 141.880: Preallocated 786 of 1000 of fileset bigfileset in 6 seconds 1437: 141.880: Creating/pre-allocating files 1437: 141.880: Starting 1 filereader instances 1441: 142.885: Starting 16 filereaderthread threads 1437: 145.895: Running... 1437: 206.415: Run took 60 seconds... 1437: 206.429: Per-Operation Breakdown closefile4 582ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 582ops/s 9.4mb/s 0.1ms/op 63us/op-cpu openfile4 582ops/s 0.0mb/s 0.1ms/op 62us/op-cpu closefile3 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 582ops/s 0.0mb/s 12.2ms/op 202us/op-cpu appendfilerand3 582ops/s 9.1mb/s 0.1ms/op 122us/op-cpu readfile3 582ops/s 9.4mb/s 0.1ms/op 64us/op-cpu openfile3 582ops/s 0.0mb/s 0.1ms/op 62us/op-cpu closefile2 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 582ops/s 0.0mb/s 12.9ms/op 141us/op-cpu appendfilerand2 582ops/s 9.1mb/s 0.1ms/op 91us/op-cpu createfile2 582ops/s 0.0mb/s 0.2ms/op 157us/op-cpu deletefile1 582ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 1437: 206.429: IO Summary: 457649 ops 7562.1 ops/s, (1163/1164 r/w) 37.0mb/s, 328us cpu/op, 6.5ms latency 1437: 206.429: Shutting down processes filebench> This message posted from opensolaris.org
Luke Lonergan
2006-Aug-08 14:48 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Does snv44 have the ZFS fixes to the I/O scheduler, the ARC and the prefetch logic? These are great results for random I/O, I wonder how the sequential I/O looks? Of course you''ll not get great results for sequential I/O on the 3510 :-) - Luke Sent from my GoodLink synchronized handheld (www.good.com) -----Original Message----- From: Robert Milkowski [mailto:milek at task.gda.pl] Sent: Tuesday, August 08, 2006 10:15 AM Eastern Standard Time To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID Hi. This time some RAID5/RAID-Z benchmarks. This time I connected 3510 head unit with one link to the same server as 3510 JBODs are connected (using second link). snv_44 is used, server is v440. I also tried changing max pending IO requests for HW raid5 lun and checked with DTrace that larger value is really used - it is but it doesn''t change benchmark numbers. 1. ZFS on HW RAID5 with 6 disks, atime=off IO Summary: 444386 ops 7341.7 ops/s, (1129/1130 r/w) 36.1mb/s, 297us cpu/op, 6.6ms latency IO Summary: 438649 ops 7247.0 ops/s, (1115/1115 r/w) 35.5mb/s, 293us cpu/op, 6.7ms latency 2. ZFS with software RAID-Z with 6 disks, atime=off IO Summary: 457505 ops 7567.3 ops/s, (1164/1164 r/w) 37.2mb/s, 340us cpu/op, 6.4ms latency IO Summary: 457767 ops 7567.8 ops/s, (1164/1165 r/w) 36.9mb/s, 340us cpu/op, 6.4ms latency 3. UFS on HW RAID5 with 6 disks, noatime IO Summary: 62776 ops 1037.3 ops/s, (160/160 r/w) 5.5mb/s, 481us cpu/op, 49.7ms latency IO Summary: 63661 ops 1051.6 ops/s, (162/162 r/w) 5.4mb/s, 477us cpu/op, 49.1ms latency 4. UFS on HW RAID5 with 6 disks, noatime, S10U2 + patches (the same filesystem mounted as in 3) IO Summary: 393167 ops 6503.1 ops/s, (1000/1001 r/w) 32.4mb/s, 405us cpu/op, 7.5ms latency IO Summary: 394525 ops 6521.2 ops/s, (1003/1003 r/w) 32.0mb/s, 407us cpu/op, 7.7ms latency 5. ZFS with software RAID-Z with 6 disks, atime=off, S10U2 + patches (the same disks as in test #2) IO Summary: 461708 ops 7635.5 ops/s, (1175/1175 r/w) 37.4mb/s, 330us cpu/op, 6.4ms latency IO Summary: 457649 ops 7562.1 ops/s, (1163/1164 r/w) 37.0mb/s, 328us cpu/op, 6.5ms latency In this benchmark software raid-5 with ZFS (raid-z to be precise) gives a little bit better performance than hardware raid-5. ZFS is also faster in both cases (HW ans SW raid) than UFS on HW raid. Something is wrong with UFS on snv_44 - the same ufs filesystem on s10U2 works as expected. ZFS on S10U2 in this benchmark gives the same results as on snv_44. #### details #### // c2t43d0 is a HW raid5 made of 6 disks // array is configured for random IO''s # zpool create HW_RAID5_6disks c2t43d0 # # zpool create -f zfs_raid5_6disks raidz c3t16d0 c3t17d0 c3t18d0 c3t19d0 c3t20d0 c3t21d0 # # zfs set atime=off zfs_raid5_6disks HW_RAID5_6disks # # zfs create HW_RAID5_6disks/t1 # zfs create zfs_raid5_6disks/t1 # # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 450: 3.175: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 450: 3.199: Usage: set $dir=<dir> 450: 3.199: set $filesize=<size> defaults to 16384 450: 3.199: set $nfiles=<value> defaults to 1000 450: 3.199: set $nthreads=<value> defaults to 16 450: 3.199: set $meaniosize=<value> defaults to 16384 450: 3.199: set $meandirwidth=<size> defaults to 1000000 450: 3.199: (sets mean dir width and dir depth is calculated as log (width, nfiles) 450: 3.199: dirdepth therefore defaults to dir depth of 1 as in postmark 450: 3.199: set $meandir lower to increase depth beyond 1 if desired) 450: 3.199: 450: 3.199: run runtime (e.g. run 60) 450: 3.199: syntax error, token expected on line 51 filebench> set $dir=/HW_RAID5_6disks/t1 filebench> run 60 450: 13.320: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 450: 13.321: Creating fileset bigfileset... 450: 15.514: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 450: 15.515: Creating/pre-allocating files 450: 15.515: Starting 1 filereader instances 451: 16.525: Starting 16 filereaderthread threads 450: 19.535: Running... 450: 80.065: Run took 60 seconds... 450: 80.079: Per-Operation Breakdown closefile4 565ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 565ops/s 9.2mb/s 0.1ms/op 60us/op-cpu openfile4 565ops/s 0.0mb/s 0.1ms/op 64us/op-cpu closefile3 565ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 565ops/s 0.0mb/s 12.9ms/op 147us/op-cpu appendfilerand3 565ops/s 8.8mb/s 0.1ms/op 126us/op-cpu readfile3 565ops/s 9.2mb/s 0.1ms/op 60us/op-cpu openfile3 565ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile2 565ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 565ops/s 0.0mb/s 12.9ms/op 102us/op-cpu appendfilerand2 565ops/s 8.8mb/s 0.1ms/op 92us/op-cpu createfile2 565ops/s 0.0mb/s 0.2ms/op 154us/op-cpu deletefile1 565ops/s 0.0mb/s 0.1ms/op 86us/op-cpu 450: 80.079: IO Summary: 444386 ops 7341.7 ops/s, (1129/1130 r/w) 36.1mb/s, 297us cpu/op, 6.6ms latency 450: 80.079: Shutting down processes filebench> run 60 450: 115.945: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 450: 115.998: Removed any existing fileset bigfileset in 1 seconds 450: 115.998: Creating fileset bigfileset... 450: 118.049: Preallocated 786 of 1000 of fileset bigfileset in 3 seconds 450: 118.049: Creating/pre-allocating files 450: 118.049: Starting 1 filereader instances 454: 119.055: Starting 16 filereaderthread threads 450: 122.065: Running... 450: 182.595: Run took 60 seconds... 450: 182.608: Per-Operation Breakdown closefile4 557ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 557ops/s 9.0mb/s 0.1ms/op 59us/op-cpu openfile4 557ops/s 0.0mb/s 0.1ms/op 64us/op-cpu closefile3 557ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 557ops/s 0.0mb/s 13.0ms/op 149us/op-cpu appendfilerand3 558ops/s 8.7mb/s 0.1ms/op 120us/op-cpu readfile3 558ops/s 9.0mb/s 0.1ms/op 59us/op-cpu openfile3 558ops/s 0.0mb/s 0.1ms/op 64us/op-cpu closefile2 558ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 558ops/s 0.0mb/s 13.2ms/op 100us/op-cpu appendfilerand2 558ops/s 8.7mb/s 0.1ms/op 90us/op-cpu createfile2 557ops/s 0.0mb/s 0.1ms/op 151us/op-cpu deletefile1 557ops/s 0.0mb/s 0.1ms/op 86us/op-cpu 450: 182.609: IO Summary: 438649 ops 7247.0 ops/s, (1115/1115 r/w) 35.5mb/s, 293us cpu/op, 6.7ms latency 450: 182.609: Shutting down processes filebench> quit # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 458: 2.590: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 458: 2.591: Usage: set $dir=<dir> 458: 2.591: set $filesize=<size> defaults to 16384 458: 2.591: set $nfiles=<value> defaults to 1000 458: 2.591: set $nthreads=<value> defaults to 16 458: 2.591: set $meaniosize=<value> defaults to 16384 458: 2.591: set $meandirwidth=<size> defaults to 1000000 458: 2.591: (sets mean dir width and dir depth is calculated as log (width, nfiles) 458: 2.591: dirdepth therefore defaults to dir depth of 1 as in postmark 458: 2.592: set $meandir lower to increase depth beyond 1 if desired) 458: 2.592: 458: 2.592: run runtime (e.g. run 60) 458: 2.592: syntax error, token expected on line 51 filebench> set $dir=/zfs_raid5_6disks/t1 filebench> run 60 458: 9.251: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 458: 9.251: Creating fileset bigfileset... 458: 14.232: Preallocated 812 of 1000 of fileset bigfileset in 5 seconds 458: 14.232: Creating/pre-allocating files 458: 14.232: Starting 1 filereader instances 459: 15.235: Starting 16 filereaderthread threads 458: 18.245: Running... 458: 78.704: Run took 60 seconds... 458: 78.718: Per-Operation Breakdown closefile4 582ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 582ops/s 9.6mb/s 0.1ms/op 62us/op-cpu openfile4 582ops/s 0.0mb/s 0.1ms/op 67us/op-cpu closefile3 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 582ops/s 0.0mb/s 12.4ms/op 206us/op-cpu appendfilerand3 582ops/s 9.1mb/s 0.1ms/op 125us/op-cpu readfile3 582ops/s 9.5mb/s 0.1ms/op 61us/op-cpu openfile3 582ops/s 0.0mb/s 0.1ms/op 66us/op-cpu closefile2 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 582ops/s 0.0mb/s 12.4ms/op 132us/op-cpu appendfilerand2 582ops/s 9.1mb/s 0.1ms/op 94us/op-cpu createfile2 582ops/s 0.0mb/s 0.2ms/op 160us/op-cpu deletefile1 582ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 458: 78.718: IO Summary: 457505 ops 7567.3 ops/s, (1164/1164 r/w) 37.2mb/s, 340us cpu/op, 6.4ms latency 458: 78.718: Shutting down processes filebench> run 60 458: 98.396: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 458: 98.449: Removed any existing fileset bigfileset in 1 seconds 458: 98.449: Creating fileset bigfileset... 458: 103.837: Preallocated 786 of 1000 of fileset bigfileset in 6 seconds 458: 103.837: Creating/pre-allocating files 458: 103.837: Starting 1 filereader instances 468: 104.845: Starting 16 filereaderthread threads 458: 107.854: Running... 458: 168.345: Run took 60 seconds... 458: 168.358: Per-Operation Breakdown closefile4 582ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 582ops/s 9.4mb/s 0.1ms/op 61us/op-cpu openfile4 582ops/s 0.0mb/s 0.1ms/op 66us/op-cpu closefile3 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 582ops/s 0.0mb/s 12.5ms/op 207us/op-cpu appendfilerand3 582ops/s 9.1mb/s 0.1ms/op 124us/op-cpu readfile3 582ops/s 9.4mb/s 0.1ms/op 61us/op-cpu openfile3 582ops/s 0.0mb/s 0.1ms/op 66us/op-cpu closefile2 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 582ops/s 0.0mb/s 12.3ms/op 132us/op-cpu appendfilerand2 582ops/s 9.1mb/s 0.1ms/op 94us/op-cpu createfile2 582ops/s 0.0mb/s 0.2ms/op 156us/op-cpu deletefile1 582ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 458: 168.359: IO Summary: 457767 ops 7567.8 ops/s, (1164/1165 r/w) 36.9mb/s, 340us cpu/op, 6.4ms latency 458: 168.359: Shutting down processes filebench> # zpool destroy HW_RAID5_6disks # newfs -C 20 /dev/rdsk/c2t43d0s0 newfs: construct a new file system /dev/rdsk/c2t43d0s0: (y/n)? y Warning: 68 sector(s) in last cylinder unallocated /dev/rdsk/c2t43d0s0: 714233788 sectors in 116249 cylinders of 48 tracks, 128 sectors 348747.0MB in 7266 cyl groups (16 c/g, 48.00MB/g, 5824 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920, Initializing cylinder groups: ............................................................................... .................................................................. super-block backups for last 10 cylinder groups at: 713296928, 713395360, 713493792, 713592224, 713690656, 713789088, 713887520, 713985952, 714084384, 714182816 # # mount -o noatime /dev/dsk/c2t43d0s0 /mnt # # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 546: 2.573: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 546: 2.573: Usage: set $dir=<dir> 546: 2.573: set $filesize=<size> defaults to 16384 546: 2.573: set $nfiles=<value> defaults to 1000 546: 2.574: set $nthreads=<value> defaults to 16 546: 2.574: set $meaniosize=<value> defaults to 16384 546: 2.574: set $meandirwidth=<size> defaults to 1000000 546: 2.574: (sets mean dir width and dir depth is calculated as log (width, nfiles) 546: 2.574: dirdepth therefore defaults to dir depth of 1 as in postmark 546: 2.574: set $meandir lower to increase depth beyond 1 if desired) 546: 2.574: 546: 2.574: run runtime (e.g. run 60) 546: 2.574: syntax error, token expected on line 51 filebench> set $dir=/mnt filebench> run 60 546: 22.095: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 546: 22.109: Creating fileset bigfileset... 546: 24.577: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 546: 24.577: Creating/pre-allocating files 546: 24.577: Starting 1 filereader instances 548: 25.584: Starting 16 filereaderthread threads 546: 28.594: Running... 546: 89.114: Run took 60 seconds... 546: 89.128: Per-Operation Breakdown closefile4 80ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 80ops/s 1.5mb/s 0.1ms/op 76us/op-cpu openfile4 80ops/s 0.0mb/s 0.0ms/op 39us/op-cpu closefile3 80ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 80ops/s 0.0mb/s 29.2ms/op 107us/op-cpu appendfilerand3 80ops/s 1.2mb/s 30.4ms/op 189us/op-cpu readfile3 80ops/s 1.5mb/s 0.1ms/op 73us/op-cpu openfile3 80ops/s 0.0mb/s 0.0ms/op 38us/op-cpu closefile2 80ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 80ops/s 0.0mb/s 30.8ms/op 125us/op-cpu appendfilerand2 80ops/s 1.2mb/s 22.6ms/op 173us/op-cpu createfile2 80ops/s 0.0mb/s 37.2ms/op 224us/op-cpu deletefile1 80ops/s 0.0mb/s 48.5ms/op 108us/op-cpu 546: 89.128: IO Summary: 62776 ops 1037.3 ops/s, (160/160 r/w) 5.5mb/s, 481us cpu/op, 49.7ms latency 546: 89.128: Shutting down processes filebench> run 60 546: 738.541: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 546: 739.455: Removed any existing fileset bigfileset in 1 seconds 546: 739.455: Creating fileset bigfileset... 546: 741.387: Preallocated 786 of 1000 of fileset bigfileset in 2 seconds 546: 741.387: Creating/pre-allocating files 546: 741.387: Starting 1 filereader instances 557: 742.394: Starting 16 filereaderthread threads 546: 745.404: Running... 546: 805.944: Run took 60 seconds... 546: 805.958: Per-Operation Breakdown closefile4 81ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 81ops/s 1.5mb/s 0.1ms/op 73us/op-cpu openfile4 81ops/s 0.0mb/s 0.0ms/op 38us/op-cpu closefile3 81ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 81ops/s 0.0mb/s 27.8ms/op 105us/op-cpu appendfilerand3 81ops/s 1.3mb/s 28.6ms/op 187us/op-cpu readfile3 81ops/s 1.4mb/s 0.1ms/op 70us/op-cpu openfile3 81ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile2 81ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 81ops/s 0.0mb/s 29.9ms/op 124us/op-cpu appendfilerand2 81ops/s 1.3mb/s 23.6ms/op 171us/op-cpu createfile2 81ops/s 0.0mb/s 38.9ms/op 220us/op-cpu deletefile1 81ops/s 0.0mb/s 47.4ms/op 109us/op-cpu 546: 805.958: IO Summary: 63661 ops 1051.6 ops/s, (162/162 r/w) 5.4mb/s, 477us cpu/op, 49.1ms latency 546: 805.958: Shutting down processes filebench> #### solaris 10 06/06 + patches, server with the same hardware specs ##### ##### test # 4 # mount -o noatime /dev/dsk/c3t40d0s0 /mnt # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 1384: 3.678: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 1384: 3.679: Usage: set $dir=<dir> 1384: 3.679: set $filesize=<size> defaults to 16384 1384: 3.679: set $nfiles=<value> defaults to 1000 1384: 3.679: set $nthreads=<value> defaults to 16 1384: 3.679: set $meaniosize=<value> defaults to 16384 1384: 3.679: set $meandirwidth=<size> defaults to 1000000 1384: 3.679: (sets mean dir width and dir depth is calculated as log (width, nfiles) 1384: 3.679: dirdepth therefore defaults to dir depth of 1 as in postmark 1384: 3.679: set $meandir lower to increase depth beyond 1 if desired) 1384: 3.680: 1384: 3.680: run runtime (e.g. run 60) 1384: 3.680: syntax error, token expected on line 51 filebench> set $dir=/mnt filebench> run 60 1384: 10.872: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 1384: 11.858: Removed any existing fileset bigfileset in 1 seconds 1384: 11.859: Creating fileset bigfileset... 1384: 14.221: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 1384: 14.221: Creating/pre-allocating files 1384: 14.221: Starting 1 filereader instances 1387: 15.231: Starting 16 filereaderthread threads 1384: 18.241: Running... 1384: 78.701: Run took 60 seconds... 1384: 78.715: Per-Operation Breakdown closefile4 500ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 500ops/s 8.4mb/s 0.1ms/op 65us/op-cpu openfile4 500ops/s 0.0mb/s 0.0ms/op 36us/op-cpu closefile3 500ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 500ops/s 0.0mb/s 9.7ms/op 169us/op-cpu appendfilerand3 500ops/s 7.8mb/s 2.6ms/op 187us/op-cpu readfile3 500ops/s 8.3mb/s 0.1ms/op 64us/op-cpu openfile3 500ops/s 0.0mb/s 0.0ms/op 36us/op-cpu closefile2 500ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 500ops/s 0.0mb/s 8.4ms/op 154us/op-cpu appendfilerand2 500ops/s 7.8mb/s 1.7ms/op 168us/op-cpu createfile2 500ops/s 0.0mb/s 4.3ms/op 298us/op-cpu deletefile1 500ops/s 0.0mb/s 3.2ms/op 144us/op-cpu 1384: 78.715: IO Summary: 393167 ops 6503.1 ops/s, (1000/1001 r/w) 32.4mb/s, 405us cpu/op, 7.5ms latency 1384: 78.715: Shutting down processes filebench> run 60 1384: 94.146: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 1384: 95.767: Removed any existing fileset bigfileset in 2 seconds 1384: 95.768: Creating fileset bigfileset... 1384: 97.972: Preallocated 786 of 1000 of fileset bigfileset in 3 seconds 1384: 97.973: Creating/pre-allocating files 1384: 97.973: Starting 1 filereader instances 1393: 98.981: Starting 16 filereaderthread threads 1384: 101.991: Running... 1384: 162.491: Run took 60 seconds... 1384: 162.505: Per-Operation Breakdown closefile4 502ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 502ops/s 8.1mb/s 0.1ms/op 64us/op-cpu openfile4 502ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile3 502ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile3 502ops/s 0.0mb/s 9.9ms/op 172us/op-cpu appendfilerand3 502ops/s 7.8mb/s 2.7ms/op 189us/op-cpu readfile3 502ops/s 8.2mb/s 0.1ms/op 65us/op-cpu openfile3 502ops/s 0.0mb/s 0.0ms/op 37us/op-cpu closefile2 502ops/s 0.0mb/s 0.0ms/op 12us/op-cpu fsyncfile2 502ops/s 0.0mb/s 8.6ms/op 156us/op-cpu appendfilerand2 502ops/s 7.8mb/s 1.7ms/op 166us/op-cpu createfile2 502ops/s 0.0mb/s 4.4ms/op 301us/op-cpu deletefile1 502ops/s 0.0mb/s 3.2ms/op 148us/op-cpu 1384: 162.506: IO Summary: 394525 ops 6521.2 ops/s, (1003/1003 r/w) 32.0mb/s, 407us cpu/op, 7.7ms latency 1384: 162.506: Shutting down processes filebench> #### test 5 #### these are the same disks as used in test #2 # zpool create zfs_raid5_6disks raidz c2t16d0 c2t17d0 c2t18d0 c2t19d0 c2t20d0 c2t21d0 # zfs set atime=off zfs_raid5_6disks # zfs create zfs_raid5_6disks/t1 # # /opt/filebench/bin/sparcv9/filebench filebench> load varmail 1437: 3.762: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 1437: 3.762: Usage: set $dir=<dir> 1437: 3.762: set $filesize=<size> defaults to 16384 1437: 3.762: set $nfiles=<value> defaults to 1000 1437: 3.763: set $nthreads=<value> defaults to 16 1437: 3.763: set $meaniosize=<value> defaults to 16384 1437: 3.763: set $meandirwidth=<size> defaults to 1000000 1437: 3.763: (sets mean dir width and dir depth is calculated as log (width, nfiles) 1437: 3.763: dirdepth therefore defaults to dir depth of 1 as in postmark 1437: 3.763: set $meandir lower to increase depth beyond 1 if desired) 1437: 3.763: 1437: 3.763: run runtime (e.g. run 60) 1437: 3.763: syntax error, token expected on line 51 filebench> set $dir=/zfs_raid5_6disks/t1 filebench> run 60 1437: 13.102: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 1437: 13.102: Creating fileset bigfileset... 1437: 20.092: Preallocated 812 of 1000 of fileset bigfileset in 7 seconds 1437: 20.092: Creating/pre-allocating files 1437: 20.092: Starting 1 filereader instances 1438: 21.095: Starting 16 filereaderthread threads 1437: 24.105: Running... 1437: 84.575: Run took 60 seconds... 1437: 84.589: Per-Operation Breakdown closefile4 587ops/s 0.0mb/s 0.0ms/op 9us/op-cpu readfile4 587ops/s 9.5mb/s 0.1ms/op 63us/op-cpu openfile4 587ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile3 587ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 587ops/s 0.0mb/s 12.1ms/op 196us/op-cpu appendfilerand3 587ops/s 9.2mb/s 0.1ms/op 123us/op-cpu readfile3 587ops/s 9.5mb/s 0.1ms/op 64us/op-cpu openfile3 587ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile2 587ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 587ops/s 0.0mb/s 12.6ms/op 145us/op-cpu appendfilerand2 588ops/s 9.2mb/s 0.1ms/op 93us/op-cpu createfile2 587ops/s 0.0mb/s 0.2ms/op 166us/op-cpu deletefile1 587ops/s 0.0mb/s 0.1ms/op 90us/op-cpu 1437: 84.589: IO Summary: 461708 ops 7635.5 ops/s, (1175/1175 r/w) 37.4mb/s, 330us cpu/op, 6.4ms latency 1437: 84.589: Shutting down processes filebench> run 60 1437: 136.114: Fileset bigfileset: 1000 files, avg dir = 1000000.0, avg depth = 0.5, mbytes=15 1437: 136.171: Removed any existing fileset bigfileset in 1 seconds 1437: 136.172: Creating fileset bigfileset... 1437: 141.880: Preallocated 786 of 1000 of fileset bigfileset in 6 seconds 1437: 141.880: Creating/pre-allocating files 1437: 141.880: Starting 1 filereader instances 1441: 142.885: Starting 16 filereaderthread threads 1437: 145.895: Running... 1437: 206.415: Run took 60 seconds... 1437: 206.429: Per-Operation Breakdown closefile4 582ops/s 0.0mb/s 0.0ms/op 8us/op-cpu readfile4 582ops/s 9.4mb/s 0.1ms/op 63us/op-cpu openfile4 582ops/s 0.0mb/s 0.1ms/op 62us/op-cpu closefile3 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3 582ops/s 0.0mb/s 12.2ms/op 202us/op-cpu appendfilerand3 582ops/s 9.1mb/s 0.1ms/op 122us/op-cpu readfile3 582ops/s 9.4mb/s 0.1ms/op 64us/op-cpu openfile3 582ops/s 0.0mb/s 0.1ms/op 62us/op-cpu closefile2 582ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2 582ops/s 0.0mb/s 12.9ms/op 141us/op-cpu appendfilerand2 582ops/s 9.1mb/s 0.1ms/op 91us/op-cpu createfile2 582ops/s 0.0mb/s 0.2ms/op 157us/op-cpu deletefile1 582ops/s 0.0mb/s 0.1ms/op 89us/op-cpu 1437: 206.429: IO Summary: 457649 ops 7562.1 ops/s, (1163/1164 r/w) 37.0mb/s, 328us cpu/op, 6.5ms latency 1437: 206.429: Shutting down processes filebench> This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Robert Milkowski
2006-Aug-08 16:11 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Luke, Tuesday, August 8, 2006, 4:48:38 PM, you wrote: LL> Does snv44 have the ZFS fixes to the I/O scheduler, the ARC and the prefetch logic? LL> These are great results for random I/O, I wonder how the sequential I/O looks? LL> Of course you''ll not get great results for sequential I/O on the 3510 :-) filebench/singlestreamread v440 1. UFS, noatime, HW RAID5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) 87MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 130MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 133MB/s ps. With software RAID-Z I got about 940ms/s :)))) well, after files were created they were all cached and ZFS almost didn''t touch a disks :) ok, I changed filesize to be well over memory size of the server and above results are with that larger filesize. filebench/singlestreamwrite v440 1. UFS, noatime, HW RAID-5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID-5 6 disks, S10U2 (the same lun as in #1) 52MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 148MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 147MB/s So sequential writing in ZFS on HWR5 is actually worse than UFS. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Luke Lonergan
2006-Aug-08 16:18 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Robert, On 8/8/06 9:11 AM, "Robert Milkowski" <rmilkowski at task.gda.pl> wrote:> 1. UFS, noatime, HW RAID5 6 disks, S10U2 > 70MB/s > 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) > 87MB/s > 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 > 130MB/s > 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 > 133MB/sWell, the UFS results are miserable, but the ZFS results aren''t good - I''d expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from 8kb to 32kb. Most of my ZFS experiments have been with RAID10, but there were some massive improvements to seq I/O with the fixes I mentioned - I''d expect that this shows that they aren''t in snv44. - Luke
Robert Milkowski
2006-Aug-08 16:32 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Luke, Tuesday, August 8, 2006, 6:18:39 PM, you wrote: LL> Robert, LL> On 8/8/06 9:11 AM, "Robert Milkowski" <rmilkowski at task.gda.pl> wrote:>> 1. UFS, noatime, HW RAID5 6 disks, S10U2 >> 70MB/s >> 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) >> 87MB/s >> 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 >> 130MB/s >> 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 >> 133MB/sLL> Well, the UFS results are miserable, but the ZFS results aren''t good - I''d LL> expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from LL> 8kb to 32kb. Well right now I''m testing with single 200MB/s fc link so it''s upper limit in this testing. LL> Most of my ZFS experiments have been with RAID10, but there were some LL> massive improvements to seq I/O with the fixes I mentioned - I''d expect that LL> this shows that they aren''t in snv44. So where did you get those fixes? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Mark Maybee
2006-Aug-08 16:33 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Luke Lonergan wrote:> Robert, > > On 8/8/06 9:11 AM, "Robert Milkowski" <rmilkowski at task.gda.pl> wrote: > > >>1. UFS, noatime, HW RAID5 6 disks, S10U2 >> 70MB/s >>2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) >> 87MB/s >>3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 >> 130MB/s >>4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 >> 133MB/s > > > Well, the UFS results are miserable, but the ZFS results aren''t good - I''d > expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from > 8kb to 32kb. > > Most of my ZFS experiments have been with RAID10, but there were some > massive improvements to seq I/O with the fixes I mentioned - I''d expect that > this shows that they aren''t in snv44. >Those fixes went into snv_45 -Mark
Luke Lonergan
2006-Aug-08 16:52 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Robert,> LL> Most of my ZFS experiments have been with RAID10, but there were some > LL> massive improvements to seq I/O with the fixes I mentioned - I''d expect > that > LL> this shows that they aren''t in snv44. > > So where did you get those fixes?>From the fine people who implemented them!As Mark said, apparently they''re available in snv_45 (yay!) - Luke
Doug Scott
2006-Aug-08 17:15 UTC
[zfs-discuss] Re: Re[2]: Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
> Robert, > > On 8/8/06 9:11 AM, "Robert Milkowski" > <rmilkowski at task.gda.pl> wrote: > > > 1. UFS, noatime, HW RAID5 6 disks, S10U2 > > 70MB/s > > 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the > same lun as in #1) > > 87MB/s > > 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 > > 130MB/s > > 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 > > 133MB/s > > Well, the UFS results are miserable, but the ZFS > results aren''t good - I''d > expect between 250-350MB/s from a 6-disk RAID5 with > read() blocksize from > 8kb to 32kb. > > Most of my ZFS experiments have been with RAID10, but > there were some > massive improvements to seq I/O with the fixes I > mentioned - I''d expect that > this shows that they aren''t in snv44. > > - LukeI dont think there is much chance of achieving anywhere near 350MB/s. That is a hell of a lot of IO/s for 6 disks+raid(5/Z)+shared fibre. While you can always get very good results from a single disk IO, your percentage gain is always decreasing the more disks you add to the equation.>From a single 200MB/s fibre, expect some where between 160-180MB/s,at best. Doug This message posted from opensolaris.org
Matthew Ahrens
2006-Aug-08 17:25 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
On Tue, Aug 08, 2006 at 06:11:09PM +0200, Robert Milkowski wrote:> filebench/singlestreamread v440 > > 1. UFS, noatime, HW RAID5 6 disks, S10U2 > 70MB/s > > 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) > 87MB/s > > 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 > 130MB/s > > 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 > 133MB/sFYI, Streaming read performance is improved considerably by Mark''s prefetch fixes which are in build 45. (However, as mentioned you will soon run into the bandwidth of a single fiber channel connection.) --matt
Robert Milkowski
2006-Aug-08 17:29 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Matthew, Tuesday, August 8, 2006, 7:25:17 PM, you wrote: MA> On Tue, Aug 08, 2006 at 06:11:09PM +0200, Robert Milkowski wrote:>> filebench/singlestreamread v440 >> >> 1. UFS, noatime, HW RAID5 6 disks, S10U2 >> 70MB/s >> >> 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) >> 87MB/s >> >> 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 >> 130MB/s >> >> 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 >> 133MB/sMA> FYI, Streaming read performance is improved considerably by Mark''s MA> prefetch fixes which are in build 45. (However, as mentioned you will MA> soon run into the bandwidth of a single fiber channel connection.) I will probably re-test with snv_45 (waiting for SX). FC is not that big problem - if I will find enough time I will just add another FC cards. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Torrey McMahon
2006-Aug-09 02:59 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Robert Milkowski wrote:> Hello Richard, > > Monday, August 7, 2006, 6:54:37 PM, you wrote: > > RE> Hi Robert, thanks for the data. > RE> Please clarify one thing for me. > RE> In the case of the HW raid, was there just one LUN? Or was it 12 LUNs? > > Just one lun which was build on 3510 from 12 luns in raid-1(0). >One 12 disk Raid1 lun? One R0 lun of 12 drives?
Torrey McMahon
2006-Aug-09 03:39 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
I read through the entire thread, I think, and have some comments. * There are still some "granny smith" to "Macintosh" comparisons going on. Different OS revs, it looks like different server types, and I can''t tell about the HBAs, links or the LUNs being tested. * Before you test with filebench or ZFS perform a baseline on the LUN(s) itself with a block workload generator. This should tell the raw performance of the device of which ZFS should be some percentage smaller. Make sure you use lots of threads. * Testing ... o I''d start with configuring the 3510RAID for a sequential workload, one large R0 raid pool across all the drives exported as one LUN, ZFS block size at default and testing from there. This should line the ZFS blocksize and cache blocksize up more then the random setting. o If you want to get interesting try slicing12 LUNs from the single R0 raid pool in the 3510, export those to the host, and stripe ZFS across them. (I have a feeling it will be faster but thats just a hunch) o If you want to get really interesting export each drive as a single R0 LUN and stripe ZFS across the 12 LUNs (Which I think you can do but don''t remember ever testing because, well, it would be silly but could show some interesting behaviors.) * Some of the results appear to show limitations in something besides the underlying storage but it''s hard to tell. Our internal tools - Which I''m dying to get out in the public - also capture cpu load and some other stats to note bottlenecks that might come up during testing. That said this is all great stuff. Keep kicking the tires.
Luke Lonergan
2006-Aug-09 04:07 UTC
[zfs-discuss] Re: Re[2]: Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Doug, On 8/8/06 10:15 AM, "Doug Scott" <dougs at truemail.co.th> wrote:> I dont think there is much chance of achieving anywhere near 350MB/s. > That is a hell of a lot of IO/s for 6 disks+raid(5/Z)+shared fibre. While you > can always get very good results from a single disk IO, your percentage > gain is always decreasing the more disks you add to the equation. > >> From a single 200MB/s fibre, expect some where between 160-180MB/s, > at best.Momentarily forgot about the sucky single FC limit - I''ve become so used to calculating drive rate, which in this case would be 80MB/s per disk for modern 15K RPM FC or SCSI drives - then multiply by the 5 drives in a 6 drive RAID5/Z. We routinely get 950MB/s from 16 SATA disks on a single server with internal storage. We''re getting 2,000 MB/s on 36 disks in an X4500 with ZFS. - Luke
Robert Milkowski
2006-Aug-09 08:00 UTC
[zfs-discuss] Re: Re[2]: Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Luke, Wednesday, August 9, 2006, 6:07:38 AM, you wrote: LL> We routinely get 950MB/s from 16 SATA disks on a single server with internal LL> storage. We''re getting 2,000 MB/s on 36 disks in an X4500 with ZFS. Can you share more data? How these disks are configured, what kind of access pattern, etc. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski
2006-Aug-09 08:07 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Torrey, Wednesday, August 9, 2006, 4:59:08 AM, you wrote: TM> Robert Milkowski wrote:>> Hello Richard, >> >> Monday, August 7, 2006, 6:54:37 PM, you wrote: >> >> RE> Hi Robert, thanks for the data. >> RE> Please clarify one thing for me. >> RE> In the case of the HW raid, was there just one LUN? Or was it 12 LUNs? >> >> Just one lun which was build on 3510 from 12 luns in raid-1(0). >>TM> One 12 disk Raid1 lun? One R0 lun of 12 drives? If you select RAID-1 in 3510 and specify more than two disks it will actually do RAID-10 using pairs of disks. So I gave all 12 disks in a head unit and got 6 2-way mirrors which are stripped. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski
2006-Aug-09 08:22 UTC
[zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Torrey, Wednesday, August 9, 2006, 5:39:54 AM, you wrote: TM> I read through the entire thread, I think, and have some comments. TM> * There are still some "granny smith" to "Macintosh" comparisons TM> going on. Different OS revs, it looks like different server types, TM> and I can''t tell about the HBAs, links or the LUNs being tested. Hmmmm.... in a first test that''s true I did use diffferent OS revisions, but then I corrected it and the same tests were performed on both OS''es. The server hardware are identical on both servers - v440, 4x1,5GHz, 8GB RAM, dual-ported 2Gb FC card based on Qlogic (1077,2312). I also included snv_44 and S10 06/06 to see if there''re real differences in ZFS performance in those tests. I know I haven''t included all the details - some are more or less obvious some not. TM> * Before you test with filebench or ZFS perform a baseline on the TM> LUN(s) itself with a block workload generator. This should tell TM> the raw performance of the device of which ZFS should be some TM> percentage smaller. Make sure you use lots of threads. Well, that''s why I compared it to UFS. Ok, no SVM+UFS testing, but anyway. I wanted some kind of quick answer to a really simple question a lot of people are going to ask themselves (me included) - in case of 3510s like arrays is it better to use HW RAID with UFS? Or maybe HW RAID with ZFS? Or maybe it''s actually better to uses only 3510s JBODs with ZFS? There are many factors and one of them is performance. As I want to use it as NFS server filebench/varmail is good enough approximation. And I''ve got an answer - ZFS should be faster right now than UFS regardles if I will use them on HW RAID or in case of ZFS make use of software RAID. TM> * Testing ... TM> o I''d start with configuring the 3510RAID for a sequential TM> workload, one large R0 raid pool across all the drives TM> exported as one LUN, ZFS block size at default and testing TM> from there. This should line the ZFS blocksize and cache TM> blocksize up more then the random setting. TM> o If you want to get interesting try slicing12 LUNs from the TM> single R0 raid pool in the 3510, export those to the host, TM> and stripe ZFS across them. (I have a feeling it will be TM> faster but thats just a hunch) TM> o If you want to get really interesting export each drive as a TM> single R0 LUN and stripe ZFS across the 12 LUNs (Which I TM> think you can do but don''t remember ever testing because, TM> well, it would be silly but could show some interesting TM> behaviors.) I know - there are more scenarios also interesting. I would love to test them and do it in more detail with different workloads, etc. if I only had a time. TM> * Some of the results appear to show limitations in something TM> besides the underlying storage but it''s hard to tell. Our internal TM> tools - Which I''m dying to get out in the public - also capture TM> cpu load and some other stats to note bottlenecks that might come TM> up during testing. It looks like so. I would like to test also bigger configs - like 2-3 additional JBODs, more HW RAID groups and generate workload concurrently on many file systems. Then try to do it in ZFS in a one pool and in a separate pools and see how it behaves. I''ll see about it. TM> That said this is all great stuff. Keep kicking the tires. :) -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
eric kustarz writes: > > >ES> Second, you may be able to get more performance from the ZFS filesystem > >ES> on the HW lun by tweaking the max pending # of reqeusts. One thing > >ES> we''ve found is that ZFS currently has a hardcoded limit of how many > >ES> outstanding requests to send to the underlying vdev (35). This works > >ES> well for most single devices, but large arrays can actually handle more, > >ES> and we end up leaving some performance on the floor. Currently the only > >ES> way to tweak this variable is through ''mdb -kw''. Try something like: > > > >Well, strange - I did try with value of 1, 60 and 256. And basically I > >get the same results from varmail tests. > > > > > > > > > > If vdev_reopen() is called then it will reset vq_max_pending to the > vdev_knob''s default value. > > So you can set the "global" vq_max_pending in vdev_knob (though this > affects all pools and all vdevs of each pool): > #mdb -kw > > vdev_knob::print > .... I think the interlace on the volume was set to 32K which means that each 128K I/O spreads to 4 disks. So the 35 vq_max_pending turns into 140 disk I/O which seems enough, as was found, to drive the 10-20 disks storage. If the interlace had been set to 1M or more then I would expect vq_max_pending to start to make a difference. What we must try to avoid is ZFS throttling itself on vq_max_pending when some disks have near 0 request in their pipe. -r
mario heimel
2006-Aug-09 13:06 UTC
[zfs-discuss] Re: Re[2]: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hi. i am very interested in ZFS compression on vs off tests maybe you can run another one with the 3510. i have seen a slightly benefit with compression on in the following test (also with high system load): S10U2 v880 8xcore 16Ggb ram (only six internal disks at this moment, i wait for the san luns -;) filebench/varmail test with default parameter and run for 60 seconds zpool create lzp raidz c0t2d0 c0t3d0 c0t4d0 c0t5d0 ZFS compression on IO Summary: 284072 ops 4688.6 ops/s, (721/722 r/w) 24.2mb/s, 544us cpu/op, 10.2ms latency IO Summary: 295985 ops 4887.7 ops/s, (752/752 r/w) 25.2mb/s, 539us cpu/op, 9.8ms latency IO Summary: 337249 ops 5568.1 ops/s, (857/857 r/w) 28.5mb/s, 529us cpu/op, 8.6ms latency IO Summary: 306231 ops 5055.1 ops/s, (778/778 r/w) 25.9mb/s, 531us cpu/op, 9.4ms latency ZFS compression off IO Summary: 284828 ops 4701.8 ops/s, (723/724 r/w) 24.0mb/s, 553us cpu/op, 10.2ms latency IO Summary: 276570 ops 4565.5 ops/s, (702/703 r/w) 23.3mb/s, 543us cpu/op, 10.5ms latency IO Summary: 276570 ops 4565.5 ops/s, (702/703 r/w) 23.3mb/s, 543us cpu/op, 10.5ms latency IO Summary: 264656 ops 4370.3 ops/s, (672/673 r/w) 22.1mb/s, 546us cpu/op, 11.1ms latency IO Summary: 264656 ops 4370.3 ops/s, (672/673 r/w) 22.1mb/s, 546us cpu/op, 11.1ms latency test under heavy avg. system load 9 compression on IO Summary: 285405 ops 4701.3 ops/s, (723/724 r/w) 22.9mb/s, 5370us cpu/op, 10.1ms latency IO Summary: 285946 ops 4719.5 ops/s, (726/726 r/w) 23.3mb/s, 5342us cpu/op, 10.0ms latency IO Summary: 307347 ops 5074.4 ops/s, (781/781 r/w) 24.6mb/s, 4964us cpu/op, 9.3ms latency IO Summary: 271030 ops 4472.6 ops/s, (688/688 r/w) 22.1mb/s, 5650us cpu/op, 10.5ms latency compression off IO Summary: 277434 ops 4579.8 ops/s, (705/705 r/w) 22.6mb/s, 5520us cpu/op, 10.4ms latency IO Summary: 259470 ops 4283.9 ops/s, (659/659 r/w) 21.2mb/s, 5913us cpu/op, 11.2ms latency IO Summary: 272979 ops 4506.2 ops/s, (693/693 r/w) 22.0mb/s, 5601us cpu/op, 10.4ms latency IO Summary: 271089 ops 4475.8 ops/s, (689/689 r/w) 22.2mb/s, 5644us cpu/op, 10.6ms latency This message posted from opensolaris.org
Roch
2006-Aug-09 15:36 UTC
[zfs-discuss] Re: Re[2]: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
mario heimel writes: > Hi. > > i am very interested in ZFS compression on vs off tests maybe you can run another one with the 3510. > > i have seen a slightly benefit with compression on in the following test (also with high system load): > S10U2 > v880 8xcore 16Ggb ram > (only six internal disks at this moment, i wait for the san luns -;) > > filebench/varmail test with default parameter and run for 60 seconds > > zpool create lzp raidz c0t2d0 c0t3d0 c0t4d0 c0t5d0 > > ZFS compression on > IO Summary: 284072 ops 4688.6 ops/s, (721/722 r/w) 24.2mb/s, 544us cpu/op, 10.2ms latency > IO Summary: 295985 ops 4887.7 ops/s, (752/752 r/w) 25.2mb/s, 539us cpu/op, 9.8ms latency > IO Summary: 337249 ops 5568.1 ops/s, (857/857 r/w) 28.5mb/s, 529us cpu/op, 8.6ms latency > IO Summary: 306231 ops 5055.1 ops/s, (778/778 r/w) 25.9mb/s, 531us cpu/op, 9.4ms latency > > > ZFS compression off > IO Summary: 284828 ops 4701.8 ops/s, (723/724 r/w) 24.0mb/s, 553us cpu/op, 10.2ms latency > IO Summary: 276570 ops 4565.5 ops/s, (702/703 r/w) 23.3mb/s, 543us cpu/op, 10.5ms latency > IO Summary: 276570 ops 4565.5 ops/s, (702/703 r/w) 23.3mb/s, 543us cpu/op, 10.5ms latency > IO Summary: 264656 ops 4370.3 ops/s, (672/673 r/w) 22.1mb/s, 546us cpu/op, 11.1ms latency > IO Summary: 264656 ops 4370.3 ops/s, (672/673 r/w) 22.1mb/s, 546us cpu/op, 11.1ms latency > > > test under heavy avg. system load 9 > > compression on > IO Summary: 285405 ops 4701.3 ops/s, (723/724 r/w) 22.9mb/s, 5370us cpu/op, 10.1ms latency > IO Summary: 285946 ops 4719.5 ops/s, (726/726 r/w) 23.3mb/s, 5342us cpu/op, 10.0ms latency > IO Summary: 307347 ops 5074.4 ops/s, (781/781 r/w) 24.6mb/s, 4964us cpu/op, 9.3ms latency > IO Summary: 271030 ops 4472.6 ops/s, (688/688 r/w) 22.1mb/s, 5650us cpu/op, 10.5ms latency > > compression off > IO Summary: 277434 ops 4579.8 ops/s, (705/705 r/w) 22.6mb/s, 5520us cpu/op, 10.4ms latency > IO Summary: 259470 ops 4283.9 ops/s, (659/659 r/w) 21.2mb/s, 5913us cpu/op, 11.2ms latency > IO Summary: 272979 ops 4506.2 ops/s, (693/693 r/w) 22.0mb/s, 5601us cpu/op, 10.4ms latency > IO Summary: 271089 ops 4475.8 ops/s, (689/689 r/w) 22.2mb/s, 5644us cpu/op, 10.6ms latency > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Beware that filebench creates zero-filled files which compress rather well. YMMV. -r
Robert Milkowski
2006-Aug-09 16:03 UTC
[zfs-discuss] Re: Re[2]: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Roch, Wednesday, August 9, 2006, 5:36:39 PM, you wrote: R> mario heimel writes: >> Hi. >> >> i am very interested in ZFS compression on vs off tests maybe you can run another one with the 3510. >> >> i have seen a slightly benefit with compression on in the following test (also with high system load): >> S10U2 >> v880 8xcore 16Ggb ram >> (only six internal disks at this moment, i wait for the san luns -;) >> >> filebench/varmail test with default parameter and run for 60 seconds >> >> zpool create lzp raidz c0t2d0 c0t3d0 c0t4d0 c0t5d0 >> >> ZFS compression on >> IO Summary: 284072 ops 4688.6 ops/s, (721/722 r/w) 24.2mb/s, 544us cpu/op, 10.2ms latency >> IO Summary: 295985 ops 4887.7 ops/s, (752/752 r/w) 25.2mb/s, 539us cpu/op, 9.8ms latency >> IO Summary: 337249 ops 5568.1 ops/s, (857/857 r/w) 28.5mb/s, 529us cpu/op, 8.6ms latency >> IO Summary: 306231 ops 5055.1 ops/s, (778/778 r/w) 25.9mb/s, 531us cpu/op, 9.4ms latency >> >> >> ZFS compression off >> IO Summary: 284828 ops 4701.8 ops/s, (723/724 r/w) 24.0mb/s, 553us cpu/op, 10.2ms latency >> IO Summary: 276570 ops 4565.5 ops/s, (702/703 r/w) 23.3mb/s, 543us cpu/op, 10.5ms latency >> IO Summary: 276570 ops 4565.5 ops/s, (702/703 r/w) 23.3mb/s, 543us cpu/op, 10.5ms latency >> IO Summary: 264656 ops 4370.3 ops/s, (672/673 r/w) 22.1mb/s, 546us cpu/op, 11.1ms latency >> IO Summary: 264656 ops 4370.3 ops/s, (672/673 r/w) 22.1mb/s, 546us cpu/op, 11.1ms latency >> >> >> test under heavy avg. system load 9 >> >> compression on >> IO Summary: 285405 ops 4701.3 ops/s, (723/724 r/w) 22.9mb/s, 5370us cpu/op, 10.1ms latency >> IO Summary: 285946 ops 4719.5 ops/s, (726/726 r/w) 23.3mb/s, 5342us cpu/op, 10.0ms latency >> IO Summary: 307347 ops 5074.4 ops/s, (781/781 r/w) 24.6mb/s, 4964us cpu/op, 9.3ms latency >> IO Summary: 271030 ops 4472.6 ops/s, (688/688 r/w) 22.1mb/s, 5650us cpu/op, 10.5ms latency >> >> compression off >> IO Summary: 277434 ops 4579.8 ops/s, (705/705 r/w) 22.6mb/s, 5520us cpu/op, 10.4ms latency >> IO Summary: 259470 ops 4283.9 ops/s, (659/659 r/w) 21.2mb/s, 5913us cpu/op, 11.2ms latency >> IO Summary: 272979 ops 4506.2 ops/s, (693/693 r/w) 22.0mb/s, 5601us cpu/op, 10.4ms latency >> IO Summary: 271089 ops 4475.8 ops/s, (689/689 r/w) 22.2mb/s, 5644us cpu/op, 10.6ms latency >> >> >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss R> Beware that filebench creates zero-filled files which R> compress rather well. YMMV. To be completely honest - such blocks aren''t actually compressed by ZFS - if a whole block is 0s and compression is on, then no compression is actually run for that block and no data block is written. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Dave Fisk
2006-Aug-09 22:29 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hi, Note that these are page cache rates and that if the application pushes harder and exposes the supporting device rates there is another world of performance to be observed. This is where ZFS gets to be a challenge as the relationship between the application level I/O and the pool level is very hard to predict. For example the COW may or may not have to read old data for a small I/O update operation, and a large portion of the pool vdev capability can be spent on this kind of overhead. Also, on read, if the pattern is random, you may or may not receive any benefit from the 32 KB to 128 KB reads on each disk of the pool vdev on behalf of a small read, say 8 KB by the application, again lots of overhead potential. I am not complaining, ZFS is great, I?m a fan, but you definitely have your work cut out for you if you want to predict its ability to scale for any given workload. Cheers, Dave (the ORtera man) This message posted from opensolaris.org
Eric Schrock
2006-Aug-09 22:35 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
On Wed, Aug 09, 2006 at 03:29:05PM -0700, Dave Fisk wrote:> > For example the COW may or may not have to read old data for a small > I/O update operation, and a large portion of the pool vdev capability > can be spent on this kind of overhead.This is what the ''recordsize'' property is for. If you have a workload that works on large files in very small sized chunks, setting the recordsize before creating the files will result in a big improvement.> Also, on read, if the pattern is random, you may or may not > receive any benefit from the 32 KB to 128 KB reads on each disk of the > pool vdev on behalf of a small read, say 8 KB by the application, > again lots of overhead potential.We''re evaluating the tradeoffs on this one. The original vdev cache has been around forever, and hasn''t really been reevaluated in the context of the latest improvements. See: 6437054 vdev_cache: wise up or die The DMU-level prefetch code had to undergo a similar overhaul, and was fixed up in build 45. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Dave C. Fisk
2006-Aug-09 23:24 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hi Eric, Thanks for the information. I am aware of the recsize option and its intended use. However, when I was exploring it to confirm the expected behavior, what I found was the opposite! The test case was build 38, Solaris 11, a 2 GB file, initially created with 1 MB SW, and a recsize of 8 KB, on a pool with two raid-z 5+1, accessed with 24 threads of 8 KB RW, for 500,000 ops or 40 seconds which ever came first. The result at the pool level was 78% of the operations were RR, all overhead. For the same test, with a 128 KB recsize (the default), the pool access was pure SW, beautiful. I ran this test 5 times. The test results with an 8 KB recsize were consistent, however ONE of the 128 KB recsize tests did have 62% RR at the pool level....this is not exactly a confidence builder for predictability. As I understand the striping logic is separate from the on disk format and can be changed in the future, I would suggest a variant of raid-z (raid-z+) that would have a variable stripe width instead of a variable stripe unit. The worst case would be 1+1, but you would generally do better than mirroring in terms the number of drives used for protection, and you could avoid dividing an 8 KB I/O over say 5, 10 or (god forbid) 47 drives. It would be much less overhead, something like 200 to 1 in one analysis (if I recall correctly), and hence much better performance. I will be happy to post ORtera summary reports for a pair of these tests if you would like to see the numbers. However, the forum would be the better place to post the reports. Regards, Dave Eric Schrock wrote:>On Wed, Aug 09, 2006 at 03:29:05PM -0700, Dave Fisk wrote: > > >>For example the COW may or may not have to read old data for a small >>I/O update operation, and a large portion of the pool vdev capability >>can be spent on this kind of overhead. >> >> > >This is what the ''recordsize'' property is for. If you have a workload >that works on large files in very small sized chunks, setting the >recordsize before creating the files will result in a big improvement. > > > >>Also, on read, if the pattern is random, you may or may not >>receive any benefit from the 32 KB to 128 KB reads on each disk of the >>pool vdev on behalf of a small read, say 8 KB by the application, >>again lots of overhead potential. >> >> > >We''re evaluating the tradeoffs on this one. The original vdev cache has >been around forever, and hasn''t really been reevaluated in the context >of the latest improvements. See: > >6437054 vdev_cache: wise up or die > >The DMU-level prefetch code had to undergo a similar overhaul, and was >fixed up in build 45. > >- Eric > >-- >Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock > > >-- Dave Fisk, ORtera Inc. Phone (562) 433-7078 DFisk at ORtera.com http://www.ORtera.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060809/986e03bf/attachment.html>
Matthew Ahrens
2006-Aug-10 01:37 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
On Wed, Aug 09, 2006 at 04:24:55PM -0700, Dave C. Fisk wrote:> Hi Eric, > > Thanks for the information. > > I am aware of the recsize option and its intended use. However, when I > was exploring it to confirm the expected behavior, what I found was the > opposite! > > The test case was build 38, Solaris 11, a 2 GB file, initially created > with 1 MB SW, and a recsize of 8 KB, on a pool with two raid-z 5+1, > accessed with 24 threads of 8 KB RW, for 500,000 ops or 40 seconds which > ever came first. The result at the pool level was 78% of the operations > were RR, all overhead. For the same test, with a 128 KB recsize (the > default), the pool access was pure SW, beautiful.I''m not sure what RR means, but you should re-try your tests on build 42 or later. Earlier builds have bug 6424554 "full block re-writes need not read data in" which will cause a lot more data to be read than is necessary, when overwriting entire blocks. --matt
Dave C. Fisk
2006-Aug-10 02:03 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hi Matthew, In the case of the 8 KB Random Write to the 128 KB recsize filesystem the I/O were not full block re-writes, yet the expected COW Random Read (RR) at the pool level is somehow avoided. I suspect it was able to coalesce enough I/O in the 5 second transaction window to construct 128 KB blocks. This was after all, 24 threads of I/O to a 2 GB file at a rate of 140,000 IOPS. However, when using the 8 KB recsize it was not able to do this. I will check to see if it''s fixed in b45. Thanks! Dave 8 KB update to a 128 KB block), however, did not have much Random Read (RR) at the pool level. The 8 KB RW to the 8 KB recsize filesystem is where I generaly observed RR at the pool level. RR is Random Read, RW is random Write... Dave Matthew Ahrens wrote:>On Wed, Aug 09, 2006 at 04:24:55PM -0700, Dave C. Fisk wrote: > > >>Hi Eric, >> >>Thanks for the information. >> >>I am aware of the recsize option and its intended use. However, when I >>was exploring it to confirm the expected behavior, what I found was the >>opposite! >> >>The test case was build 38, Solaris 11, a 2 GB file, initially created >>with 1 MB SW, and a recsize of 8 KB, on a pool with two raid-z 5+1, >>accessed with 24 threads of 8 KB RW, for 500,000 ops or 40 seconds which >>ever came first. The result at the pool level was 78% of the operations >>were RR, all overhead. For the same test, with a 128 KB recsize (the >>default), the pool access was pure SW, beautiful. >> >> > >I''m not sure what RR means, but you should re-try your tests on build 42 >or later. Earlier builds have bug 6424554 "full block re-writes need >not read data in" which will cause a lot more data to be read than is >necessary, when overwriting entire blocks. > >--matt > > >-- Dave Fisk, ORtera Inc. Phone (562) 433-7078 DFisk at ORtera.com http://www.ORtera.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060809/767008c5/attachment.html>
Robert Milkowski
2006-Aug-10 08:51 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Dave, Thursday, August 10, 2006, 12:29:05 AM, you wrote: DF> Hi, DF> Note that these are page cache rates and that if the application DF> pushes harder and exposes the supporting device rates there is DF> another world of performance to be observed. This is where ZFS DF> gets to be a challenge as the relationship between the application DF> level I/O and the pool level is very hard to predict. For example DF> the COW may or may not have to read old data for a small I/O DF> update operation, and a large portion of the pool vdev capability DF> can be spent on this kind of overhead. Also, on read, if the DF> pattern is random, you may or may not receive any benefit from the DF> 32 KB to 128 KB reads on each disk of the pool vdev on behalf of a DF> small read, say 8 KB by the application, again lots of overhead DF> potential. I am not complaining, ZFS is great, I?m a fan, but you DF> definitely have your work cut out for you if you want to predict DF> its ability to scale for any given workload. I know, you have valid concerns. However in a tests I performed ZFS behaved better than UFS and it was most important for me. Does it mean that it will behave (performance) better than UFS in a production? Well, I don''t know - but thanks to these tests (and some others I haven''t posted) I''m more confident that it''s likely it will not behave worse. And this is only performance point of view, there are others also important. ps. however I''m really concerned with ZFS behavior when a pool is almost full, there''re lot of write transactions to that pool and server is restarted forcibly or panics. I observed that file systems on that pool will mount in 10-30 minutes each during zfs mount -a, and one CPU is completely consumed. It''s during system start-up so basically whole system boots waits for it. It means additional 1 hour downtime. This is something really unexpected for me and unfortunately no one was really interested in my report - I know people are busy. But still if it hits other users when zfs pools will be already populated people won''t be happy. For more details see my post here with subject: "zfs mount stuck in zil_replay". -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
The test case was build 38, Solaris 11, a 2 GB file, initially created with 1 MB SW, and a recsize of 8 KB, on a pool with two raid-z 5+1, accessed with 24 threads of 8 KB RW, for 500,000 ops or 40 seconds which ever came first. The result at the pool level was 78% of the operations were RR, all overhead. Hi David, Could this bug (now fixed) have hit you ? 6424554 full block re-writes need not read data in -r
Neil Perrin
2006-Aug-14 22:59 UTC
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Robert Milkowski wrote:> ps. however I''m really concerned with ZFS behavior when a pool is > almost full, there''re lot of write transactions to that pool and > server is restarted forcibly or panics. I observed that file systems > on that pool will mount in 10-30 minutes each during zfs mount -a, and > one CPU is completely consumed. It''s during system start-up so basically > whole system boots waits for it. It means additional 1 hour downtime. > This is something really unexpected for me and unfortunately no one > was really interested in my report - I know people are busy. But still > if it hits other users when zfs pools will be already populated people > won''t be happy. For more details see my post here with subject: "zfs > mount stuck in zil_replay".That problem must have fallen through the cracks. Yes we are busy, but we really do care about your experiences and bugs. I have just raised a bug to cover this issue: 6460107 Extremely slow mounts after panic - searching space maps during replay Thanks for reporting this and helping make ZFS better. Neil