Hello list, did some bonnie++ benchmarks for different zpool configurations consisting of one or two 1tb sata disks (hitachi hds721010cla332, 512 bytes/sector, 7.2k), and got some strange results, please see attachements for exact numbers and pool config: seq write factor seq read factor MB/sec MB/sec single 123 1 135 1 raid0 114 1 249 2 mirror 57 0.5 129 1 Each of the disks is capable of about 135 MB/sec sequential reads and about 120 MB/sec sequential writes, iostat -En shows no defects. Disks are 100% busy in all tests, and show normal service times. This is on opensolaris 130b, rebooting with openindiana 151a live cd gives the same results, dd tests give the same results, too. Storage controller is an lsi 1068 using mpt driver. The pools are newly created and empty. atime on/off doesn''t make a difference. Is there an explanation why 1) in the raid0 case the write speed is more or less the same as a single disk. 2) in the mirror case the write speed is cut by half, and the read speed is the same as a single disk. I''d expect about twice the performance for both reading and writing, maybe a bit less, but definitely more than measured. For comparison I did the same tests with 2 old 2.5" 36gb sas 10k disks maxing out at about 50-60 MB/sec on the outer tracks. seq write factor seq read factor MB/sec MB/sec single 38 1 50 1 raid0 89 2 111 2 mirror 36 1 92 2 Here we get the expected behaviour: raid0 with about double the performance for reading and writing, mirror about the same performance for writing, and double the speed for reading, compared to a single disk. An old scsi system with 4x2 mirror pairs also shows these scaling characteristics, about 450-500 MB/sec seq read and 250 MB/sec write, each disk capable of 80 MB/sec. I don''t care about absolute numbers, just don''t get why the sata system is so much slower than expected, especially for a simple mirror. Any ideas? Thanks, Michael -- Michael Hase http://edition-software.de -------------- next part -------------- pool: ptest state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM ptest ONLINE 0 0 0 c13t4d0 ONLINE 0 0 0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zfssingle 32G 79 98 123866 51 63626 35 255 99 135359 25 530.6 13 Latency 333ms 111ms 5283ms 73791us 465ms 2535ms Version 1.96 ------Sequential Create------ --------Random Create-------- zfssingle -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 4536 40 +++++ +++ 14140 50 10382 69 +++++ +++ 6260 73 Latency 21655us 154us 206us 24539us 46us 405us 1.96,1.96,zfssingle,1,1342165334,32G,,79,98,123866,51,63626,35,255,99,135359,25,530.6,13,16,,,,,4536,40,+++++,+++,14140,50,10382,69,+++++,+++,6260,73,333ms,111ms,5283ms,73791us,465ms,2535ms,21655us,154us,206us,24539us,46us,405us ############### pool: ptest state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM ptest ONLINE 0 0 0 c13t4d0 ONLINE 0 0 0 c13t5d0 ONLINE 0 0 0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zfsstripe 32G 78 98 114243 46 72938 37 192 77 249022 44 815.1 20 Latency 483ms 106ms 5179ms 3613ms 259ms 1567ms Version 1.96 ------Sequential Create------ --------Random Create-------- zfsstripe -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 6474 53 +++++ +++ 15505 47 8562 81 +++++ +++ 10839 65 Latency 21894us 131us 208us 22203us 52us 230us 1.96,1.96,zfsstripe,1,1342172768,32G,,78,98,114243,46,72938,37,192,77,249022,44,815.1,20,16,,,,,6474,53,+++++,+++,15505,47,8562,81,+++++,+++,10839,65,483ms,106ms,5179ms,3613ms,259ms,1567ms,21894us,131us,208us,22203us,52us,230us ################ pool: ptest state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ptest ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c13t4d0 ONLINE 0 0 0 c13t5d0 ONLINE 0 0 0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zfsmirror 32G 77 98 57247 24 39607 22 227 98 129639 25 739.9 17 Latency 520ms 73719us 5408ms 94349us 451ms 1466ms Version 1.96 ------Sequential Create------ --------Random Create-------- zfsmirror -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 5790 53 +++++ +++ 9871 55 7183 65 +++++ +++ 9993 44 Latency 29362us 262us 435us 22629us 25us 202us 1.96,1.96,zfsmirror,1,1342174995,32G,,77,98,57247,24,39607,22,227,98,129639,25,739.9,17,16,,,,,5790,53,+++++,+++,9871,55,7183,65,+++++,+++,9993,44,520ms,73719us,5408ms,94349us,451ms,1466ms,29362us,262us,435us,22629us,25us,202us -------------- next part -------------- pool: psas state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM psas ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zfssasstripe 32G 122 99 89086 18 27264 7 325 99 111753 11 522.4 11 Latency 89941us 26949us 3192ms 53126us 2052ms 2528ms Version 1.96 ------Sequential Create------ --------Random Create-------- zfssasstripe -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 5036 23 +++++ +++ 7989 23 8044 33 +++++ +++ 8853 25 Latency 26568us 133us 135us 15398us 113us 140us 1.96,1.96,zfssasstripe,1,1342171776,32G,,122,99,89086,18,27264,7,325,99,111753,11,522.4,11,16,,,,,5036,23,+++++,+++,7989,23,8044,33,+++++,+++,8853,25,89941us,26949us,3192ms,53126us,2052ms,2528ms,26568us,133us,135us,15398us,113us,140us #################### pool: psas state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM psas ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zfssassingle 32G 121 98 38025 7 12144 3 318 97 50313 5 364.9 6 Latency 165ms 2803ms 4687ms 234ms 2898ms 2923ms Version 1.96 ------Sequential Create------ --------Random Create-------- zfssassingle -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 3320 14 +++++ +++ 8438 25 7683 31 +++++ +++ 8113 30 Latency 20777us 130us 149us 15352us 54us 151us 1.96,1.96,zfssassingle,1,1342173496,32G,,121,98,38025,7,12144,3,318,97,50313,5,364.9,6,16,,,,,3320,14,+++++,+++,8438,25,7683,31,+++++,+++,8113,30,165ms,2803ms,4687ms,234ms,2898ms,2923ms,20777us,130us,149us,15352us,54us,151us ################### pool: psas state: ONLINE scan: resilvered 610K in 0h0m with 0 errors on Fri Jul 13 14:46:38 2012 config: NAME STATE READ WRITE CKSUM psas ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zfssasmirror 32G 122 99 36393 7 14645 4 325 99 92238 10 547.0 11 Latency 110ms 3220ms 3883ms 57845us 821ms 1838ms Version 1.96 ------Sequential Create------ --------Random Create-------- zfssasmirror -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 3526 15 +++++ +++ 6846 22 6589 26 +++++ +++ 8342 29 Latency 28666us 133us 180us 15383us 38us 133us 1.96,1.96,zfssasmirror,1,1342185390,32G,,122,99,36393,7,14645,4,325,99,92238,10,547.0,11,16,,,,,3526,15,+++++,+++,6846,22,6589,26,+++++,+++,8342,29,110ms,3220ms,3883ms,57845us,821ms,1838ms,28666us,133us,180us,15383us,38us,133us
Richard Elling
2012-Jul-16 15:40 UTC
[zfs-discuss] zfs sata mirror slower than single disk
On Jul 16, 2012, at 2:43 AM, Michael Hase wrote:> Hello list, > > did some bonnie++ benchmarks for different zpool configurations > consisting of one or two 1tb sata disks (hitachi hds721010cla332, 512 > bytes/sector, 7.2k), and got some strange results, please see > attachements for exact numbers and pool config: > > seq write factor seq read factor > MB/sec MB/sec > single 123 1 135 1 > raid0 114 1 249 2 > mirror 57 0.5 129 1 > > Each of the disks is capable of about 135 MB/sec sequential reads and > about 120 MB/sec sequential writes, iostat -En shows no defects. Disks > are 100% busy in all tests, and show normal service times.For 7,200 rpm disks, average service times should be on the order of 10ms writes and 13ms reads. If you see averages > 20ms, then you are likely running into scheduling issues. -- richard> This is on > opensolaris 130b, rebooting with openindiana 151a live cd gives the > same results, dd tests give the same results, too. Storage controller > is an lsi 1068 using mpt driver. The pools are newly created and > empty. atime on/off doesn''t make a difference. > > Is there an explanation why > > 1) in the raid0 case the write speed is more or less the same as a > single disk. > > 2) in the mirror case the write speed is cut by half, and the read > speed is the same as a single disk. I''d expect about twice the > performance for both reading and writing, maybe a bit less, but > definitely more than measured. > > For comparison I did the same tests with 2 old 2.5" 36gb sas 10k disks > maxing out at about 50-60 MB/sec on the outer tracks. > > seq write factor seq read factor > MB/sec MB/sec > single 38 1 50 1 > raid0 89 2 111 2 > mirror 36 1 92 2 > > Here we get the expected behaviour: raid0 with about double the > performance for reading and writing, mirror about the same performance > for writing, and double the speed for reading, compared to a single > disk. An old scsi system with 4x2 mirror pairs also shows these > scaling characteristics, about 450-500 MB/sec seq read and 250 MB/sec > write, each disk capable of 80 MB/sec. I don''t care about absolute > numbers, just don''t get why the sata system is so much slower than > expected, especially for a simple mirror. Any ideas? > > Thanks, > Michael > > -- > Michael Hase > http://edition-software.de<sata.txt><sas.txt>_______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ZFS Performance and Training Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120716/9623b7a8/attachment-0001.html>
> 2) in the mirror case the write speed is cut by half, and the read > speed is the same as a single disk. I''d expect about twice the > performance for both reading and writing, maybe a bit less, but > definitely more than measured.I wouldn''t expect mirrored read to be faster than single-disk read, because the individual disks would need to read small chunks of data with holes in-between. Regardless of the holes being read or not, the disk will spin at the same speed.
Bob Friesenhahn
2012-Jul-16 15:49 UTC
[zfs-discuss] zfs sata mirror slower than single disk
On Mon, 16 Jul 2012, Stefan Ring wrote:> > I wouldn''t expect mirrored read to be faster than single-disk read, > because the individual disks would need to read small chunks of data > with holes in-between. Regardless of the holes being read or not, the > disk will spin at the same speed.It is normal for reads from mirrors to be faster than for a single disk because reads can be scheduled from either disk, with different I/Os being handled in parallel. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
> It is normal for reads from mirrors to be faster than for a single disk > because reads can be scheduled from either disk, with different I/Os being > handled in parallel.That assumes that there *are* outstanding requests to be scheduled in parallel, which would only happen with multiple readers or a large read-ahead buffer.
Bob Friesenhahn
2012-Jul-16 16:47 UTC
[zfs-discuss] zfs sata mirror slower than single disk
On Mon, 16 Jul 2012, Stefan Ring wrote:>> It is normal for reads from mirrors to be faster than for a single disk >> because reads can be scheduled from either disk, with different I/Os being >> handled in parallel. > > That assumes that there *are* outstanding requests to be scheduled in > parallel, which would only happen with multiple readers or a large > read-ahead buffer.That is true. Zfs tries to detect the case of sequential reads and requests to read more data than the application has already requested. In this case the data may be prefetched from the other disk before the application has requested it. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Mon, 16 Jul 2012, Bob Friesenhahn wrote:> On Mon, 16 Jul 2012, Stefan Ring wrote: > >>> It is normal for reads from mirrors to be faster than for a single disk >>> because reads can be scheduled from either disk, with different I/Os being >>> handled in parallel. >> >> That assumes that there *are* outstanding requests to be scheduled in >> parallel, which would only happen with multiple readers or a large >> read-ahead buffer. > > That is true. Zfs tries to detect the case of sequential reads and requests > to read more data than the application has already requested. In this case > the data may be prefetched from the other disk before the application has > requested it.This is my understanding of zfs: it should load balance read requests even for a single sequential reader. zfs_prefetch_disable is the default 0. And I can see exactly this scaling behaviour with sas disks and with scsi disks, just not on this sata pool. zfs_vdev_max_pending is already tuned down to 3 as recommended for sata disks, iostat -Mxnz 2 looks something like r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 507.1 0.0 63.4 0.0 0.0 2.9 0.0 5.8 1 99 c13t5d0 477.6 0.0 59.7 0.0 0.0 2.8 0.0 5.8 1 94 c13t4d0 when reading from the zfs mirror. The default zfs_vdev_max_pending=10 leads to much higher service times in the 20-30msec range, throughput remains roughly the same. I can read from the dsk or rdsk devices in parallel with real platter speeds: dd if=/dev/dsk/c13t4d0s0 of=/dev/null bs=1024k count=8192 & dd if=/dev/dsk/c13t5d0s0 of=/dev/null bs=1024k count=8192 & extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 2467.5 0.0 134.9 0.0 0.0 0.9 0.0 0.4 1 87 c13t5d0 2546.5 0.0 139.3 0.0 0.0 0.8 0.0 0.3 1 84 c13t4d0 So I think there is no problem with the disks. Maybe it''s a corner case which doesn''t matter in real world applications? The random seek values in my bonnie output show the expected performance boost when going from one disk to a mirrored configuration. It''s just the sequential read/write case, that''s different for sata and sas disks. Michael> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn
2012-Jul-16 18:08 UTC
[zfs-discuss] zfs sata mirror slower than single disk
On Mon, 16 Jul 2012, Michael Hase wrote:> > This is my understanding of zfs: it should load balance read requests even > for a single sequential reader. zfs_prefetch_disable is the default 0. And I > can see exactly this scaling behaviour with sas disks and with scsi disks, > just not on this sata pool.Is the BIOS configured to use AHCI mode or is it using IDE mode? Are the disks 512 byte/sector or 4K?> Maybe it''s a corner case which doesn''t matter in real world applications? The > random seek values in my bonnie output show the expected performance boost > when going from one disk to a mirrored configuration. It''s just the > sequential read/write case, that''s different for sata and sas disks.I don''t have a whole lot of experience with SATA disks but it is my impression that you might see this sort of performance if the BIOS was configured so that the drives were used as IDE disks. If not that, then there must be a bottleneck in your hardware somewhere. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Edward Ned Harvey
2012-Jul-16 19:50 UTC
[zfs-discuss] zfs sata mirror slower than single disk
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Michael Hase > > got some strange results, please see > attachements for exact numbers and pool config: > > seq write factor seq read factor > MB/sec MB/sec > single 123 1 135 1 > raid0 114 1 249 2 > mirror 57 0.5 129 1I agree with you these look wrong. Here is what you should expect: seq W seq R single 1.0 1.0 stripe 2.0 2.0 mirror 1.0 2.0 You have three things wrong: (a) stripe should write 2x (b) mirror should write 1x (c) mirror should read 2x I would have simply said "for some reason your drives are unable to operate concurrently" but you have the stripe read 2x. I cannot think of a single reason that the stripe should be able to read 2x, and the mirror only 1x.
On Mon, 16 Jul 2012, Bob Friesenhahn wrote:> On Mon, 16 Jul 2012, Michael Hase wrote: >> >> This is my understanding of zfs: it should load balance read requests even >> for a single sequential reader. zfs_prefetch_disable is the default 0. And >> I can see exactly this scaling behaviour with sas disks and with scsi >> disks, just not on this sata pool. > > Is the BIOS configured to use AHCI mode or is it using IDE mode?Not relevant here, disks are connected to an onboard sas hba (lsi 1068, see first post), hardware is a primergy rx330 with 2 qc opterons.> > Are the disks 512 byte/sector or 4K?512 byte/sector, HDS721010CLA330> >> Maybe it''s a corner case which doesn''t matter in real world applications? >> The random seek values in my bonnie output show the expected performance >> boost when going from one disk to a mirrored configuration. It''s just the >> sequential read/write case, that''s different for sata and sas disks. > > I don''t have a whole lot of experience with SATA disks but it is my > impression that you might see this sort of performance if the BIOS was > configured so that the drives were used as IDE disks. If not that, then > there must be a bottleneck in your hardware somewhere.With early nevada releases I had indeed the IDE/AHCI problem, albeit on different hardware. Solaris only ran in IDE mode, disks were 4 times slower than on linux, see http://www.oracle.com/webfolder/technetwork/hcl/data/components/details/intel/sol_10_05_08/2999.html Wouldn''t a hardware bottleneck show up on raw dd tests as well? I can stream > 130 MB/sec from each of the two disks in parallel. dd reading from more than these two disks at the same time results in a slight slowdown, but here we talk about nearly 400 MB/sec aggregated bandwidth through the onboard hba, the box has 6 disk slots: extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 94.5 0.0 94.5 0.0 0.0 1.0 0.0 10.5 0 100 c13t6d0 94.5 0.0 94.5 0.0 0.0 1.0 0.0 10.6 0 100 c13t1d0 93.0 0.0 93.0 0.0 0.0 1.0 0.0 10.7 0 100 c13t2d0 94.5 0.0 94.5 0.0 0.0 1.0 0.0 10.5 0 100 c13t5d0 Don''t know why this is a bit slower, maybe some pci-e bottleneck. Or something with the mpt driver, intrstat shows only one cpu handles all mpt interrupts. Or even the slow cpus? These are 1.8ghz opterons. During sequential reads from the zfs mirror I see > 1000 interrupts/sec on one cpu. So it could really be a bottleneck somewhere triggerd by the "smallish" 128k i/o requests from the zfs side. I think I''ll benchmark again on a xeon box with faster cpus, my tests with sas disks were done on this other box. Michael> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ >
On Mon, 16 Jul 2012, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Michael Hase >> >> got some strange results, please see >> attachements for exact numbers and pool config: >> >> seq write factor seq read factor >> MB/sec MB/sec >> single 123 1 135 1 >> raid0 114 1 249 2 >> mirror 57 0.5 129 1 > > I agree with you these look wrong. Here is what you should expect: > > seq W seq R > single 1.0 1.0 > stripe 2.0 2.0 > mirror 1.0 2.0 > > You have three things wrong: > (a) stripe should write 2x > (b) mirror should write 1x > (c) mirror should read 2x > > I would have simply said "for some reason your drives are unable to operate > concurrently" but you have the stripe read 2x. > > I cannot think of a single reason that the stripe should be able to read 2x, > and the mirror only 1x. >Yes, I think so too. In the meantime I switched the two disks to another box (hp xw8400, 2 xeon 5150 cpus, 16gb ram). On this machine I did the previous sas tests. OS is now OpenIndiana 151a (vs OpenSolaris b130 before), the mirror pool was upgraded from version 22 to 28, the raid0 pool newly created. The results look quite different: seq write factor seq read factor MB/sec MB/sec raid0 236 2 330 2.5 mirror 111 1 128 1 Now the raid0 case shows excellent performance, the 330 MB/sec are a bit on the optimistic side, maybe some arc cache effects (file size 32gb, 16gb ram). iostat during sequential read shows about 115 MB/sec from each disk, which is great. The (really desired) mirror case still has a problem with sequential reads. sequential writes to the mirror are twice as fast as before, and show the expected performance for a single disk. So only one thing left: mirror should read 2x I suspect the difference is not the hardware, both boxess should have enough horsepower to easily do sequential reads with way more than 200 MB/sec. In all tests cpu time (user and system) remained quite low. I think it''s an OS issue: OpenSolaris b130 is over 2 years old, OI 151a dates 11/2011. Could someone please send me some bonnie++ results for a 2 disk mirror or a 2x2 disk mirror pool with sata disks? Michael -- Michael Hase http://edition-software.de
Bob Friesenhahn
2012-Jul-16 23:09 UTC
[zfs-discuss] zfs sata mirror slower than single disk
On Tue, 17 Jul 2012, Michael Hase wrote:> > So only one thing left: mirror should read 2xI don''t think that mirror should necessarily read 2x faster even though the potential is there to do so. Last I heard, zfs did not include a special read scheduler for sequential reads from a mirrored pair. As a result, 50% of the time, a read will be scheduled for a device which already has a read scheduled. If this is indeed true, the typical performance would be 150%. There may be some other scheduling factor (e.g. estimate of busyness) which might still allow zfs to select the right side and do better than that. If you were to add a second vdev (i.e. stripe) then you should see very close to 200% due to the default round-robin scheduling of the writes. It is really difficult to measure zfs read performance due to caching effects. One way to do it is to write a large file (containing random data such as returned from /dev/urandom) to a zfs filesystem, unmount the filesystem, remount the filesystem, and then time how long it takes to read the file once. The reason why this works is because remounting the filesystem restarts the filesystem cache. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Edward Ned Harvey
2012-Jul-16 23:40 UTC
[zfs-discuss] zfs sata mirror slower than single disk
> From: Michael Hase [mailto:michael at edition-software.de] > Sent: Monday, July 16, 2012 6:41 PM > > > So only one thing left: mirror should read 2x >That is still weird - But all your numbers so far are coming from bonnie. Why don''t you do a test like this? (below) Write a big file to mirror. Reboot (or something) to clear cache. Now time read the file. Sometimes you''ll get a different result with dd versus cat.> Could someone please send me some bonnie++ results for a 2 disk mirror or > a 2x2 disk mirror pool with sata disks?I don''t have bonnie, but I have certainly confirmed mirror performance on solaris before with sata disks. I''ve generally done iozone, benchmarking the N-way mirror, and the stripe-of-mirrors. So I know the expectation in this case is correct.
sorry to insist, but still no real answer... On Mon, 16 Jul 2012, Bob Friesenhahn wrote:> On Tue, 17 Jul 2012, Michael Hase wrote: >> >> So only one thing left: mirror should read 2x > > I don''t think that mirror should necessarily read 2x faster even though the > potential is there to do so. Last I heard, zfs did not include a special > read scheduler for sequential reads from a mirrored pair. As a result, 50% > of the time, a read will be scheduled for a device which already has a read > scheduled. If this is indeed true, the typical performance would be 150%. > There may be some other scheduling factor (e.g. estimate of busyness) which > might still allow zfs to select the right side and do better than that. > > If you were to add a second vdev (i.e. stripe) then you should see very close > to 200% due to the default round-robin scheduling of the writes.My expectation would be > 200%, as 4 disks are involved. It may not be the perfect 4x scaling, but imho it should be (and is for a scsi system) more than half of the theoretical throughput. This is solaris or a solaris derivative, not linux ;-)> > It is really difficult to measure zfs read performance due to caching > effects. One way to do it is to write a large file (containing random data > such as returned from /dev/urandom) to a zfs filesystem, unmount the > filesystem, remount the filesystem, and then time how long it takes to read > the file once. The reason why this works is because remounting the > filesystem restarts the filesystem cache.Ok, did a zpool export/import cycle between the dd read and write test. This really empties the arc, checked this with arc_summary.pl. the test even uses two processes in parallel (doesn''t make a difference). Result is still the same: dd write: 2x 58 MB/sec --> perfect, each disk does > 110 MB/sec dd read: 2x 68 MB/sec --> imho too slow, about 68 MB/sec per disk For writes each disk gets 900 128k io requests/sec with asvc_t in the 8-9 msec range. For reads each disk only gets 500 io requests/sec, asvc_t 18-20 msec with the default zfs_vdev_maxpending=10. When reducing zfs_vdev_maxpending the asvc_t drops accordingly, the i/o rate remains at 500/sec per disk, throughput also the same. I think iostat values should be reliable here. These high iops numbers make sense as we work on empty pools so there aren''t very high seek times. All benchmarks (dd, bonnie, will try iozone) lead to the same result: on the sata mirror pair read performance is in the range of a single disk. For the sas disks (only two available for testing) and for the scsi system there is quite good throughput scaling. Here for comparison a table for 1-4 36gb 15k u320 scsi disks on an old sxde box (nevada b130): seq write factor seq read factor MB/sec MB/sec single 82 1 78 1 mirror 79 1 137 1.75 2x mirror 120 1.5 251 3.2 This is exactly what''s imho to be expected from mirrors and striped mirrors. It just doesn''t happen for my sata pool. Still have no reference numbers for other sata pools, just one with the 4k/512bytes sector problem which is even slower than mine. It seems the zfs performance people just use sas disks and be done. Michael> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ >-------------- next part -------------- old ibm dual opteron intellistation with external hp msa30, 36gb 15k u320 scsi disks #################### pool: scsi1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM scsi1 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 errors: No known data errors Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zfssingle 16G 137 99 82739 20 39453 9 314 99 78251 7 856.9 8 Latency 160ms 4799ms 5292ms 43210us 3274ms 2069ms Version 1.96 ------Sequential Create------ --------Random Create-------- zfssingle -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 8819 34 +++++ +++ 26318 68 20390 73 +++++ +++ 26846 72 Latency 16413us 108us 231us 12206us 46us 124us 1.96,1.96,zfssingle,1,1342514790,16G,,137,99,82739,20,39453,9,314,99,78251,7,856.9,8,16,,,,,8819,34,+++++,+++,26318,68,20390,73,+++++,+++,26846,72,160ms,4799ms,5292ms,43210us,3274ms,2069ms,16413us,108us,231us,12206us,46us,124us ###################### pool: scsi1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM scsi1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 errors: No known data errors Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zfsmirror 16G 110 99 79137 19 50591 12 305 99 137244 13 1065 16 Latency 199ms 4932ms 5101ms 50429us 3885ms 1303ms Version 1.96 ------Sequential Create------ --------Random Create-------- zfsmirror -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 11337 41 +++++ +++ 26398 66 19797 70 +++++ +++ 26299 68 Latency 14297us 139us 136us 10732us 48us 116us 1.96,1.96,zfsmirror,1,1342515696,16G,,110,99,79137,19,50591,12,305,99,137244,13,1065,16,16,,,,,11337,41,+++++,+++,26398,66,19797,70,+++++,+++,26299,68,199ms,4932ms,5101ms,50429us,3885ms,1303ms,14297us,139us,136us,10732us,48us,116us ######################## pool: scsi1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM scsi1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP zfsraid10 16G 127 99 120319 30 86902 23 300 99 251493 26 1747 27 Latency 105ms 3078ms 5083ms 43082us 3657ms 360ms Version 1.96 ------Sequential Create------ --------Random Create-------- zfsraid10 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 12031 46 +++++ +++ 25764 64 21220 75 +++++ +++ 27288 69 Latency 14091us 123us 136us 10823us 49us 117us 1.96,1.96,zfsraid10,1,1342515541,16G,,127,99,120319,30,86902,23,300,99,251493,26,1747,27,16,,,,,12031,46,+++++,+++,25764,64,21220,75,+++++,+++,27288,69,105ms,3078ms,5083ms,43082us,3657ms,360ms,14091us,123us,136us,10823us,49us,117us #################### dd write -------- for FILE in bigfile1 bigfile2 do time /usr/gnu/bin/dd if=/dev/zero of=$FILE bs=1024k count=8192 & done 8589934592 bytes (8.6 GB) copied, 108.421 s, 79.2 MB/s 8589934592 bytes (8.6 GB) copied, 112.788 s, 76.2 MB/s capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- scsi1 14.1G 53.4G 0 1.11K 0 140M mirror 7.03G 26.7G 0 571 0 70.2M c1t4d0 - - 0 571 0 70.2M c3t4d0 - - 0 674 0 82.8M mirror 7.02G 26.7G 0 567 0 69.9M c1t5d0 - - 0 567 0 69.9M c3t5d0 - - 0 669 0 82.4M ---------- ----- ----- ----- ----- ----- ----- dd read ------- for FILE in bigfile1 bigfile2 do time /usr/gnu/bin/dd if=$FILE of=/dev/null bs=1024k count=8192 & done 8589934592 bytes (8.6 GB) copied, 62.2953 s, 138 MB/s 8589934592 bytes (8.6 GB) copied, 62.8319 s, 137 MB/s capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- scsi1 16.0G 51.5G 2.08K 0 261M 0 mirror 7.99G 25.8G 1.06K 0 133M 0 c1t4d0 - - 535 0 66.6M 0 c3t4d0 - - 544 0 67.6M 0 mirror 8.01G 25.7G 1.02K 0 128M 0 c1t5d0 - - 518 0 64.3M 0 c3t5d0 - - 516 0 64.4M 0 ---------- ----- ----- ----- ----- ----- ----- -------------- next part -------------- dd write -------- for FILE in bigfile1 bigfile2 do time /usr/gnu/bin/dd if=/dev/zero of=$FILE bs=1024k count=16384 & done 17179869184 bytes (17 GB) copied, 294.442 s, 58.3 MB/s 17179869184 bytes (17 GB) copied, 294.28 s, 58.4 MB/s capacity operations bandwidth pool alloc free read write read write ----------- ----- ----- ----- ----- ----- ----- ptest 40.6G 887G 0 1000 0 113M mirror 40.6G 887G 0 1000 0 113M c5t9d0 - - 0 935 0 111M c5t10d0 - - 0 946 0 113M ----------- ----- ----- ----- ----- ----- ----- extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 0.0 907.0 0.0 106.3 0.0 7.7 0.0 8.5 1 79 c5t9d0 0.0 914.0 0.0 107.3 0.0 7.7 0.0 8.5 1 80 c5t10d0 ############ zpool export ptest zpool import ptest arc_summary.pl System Memory: Physical RAM: 16375 MB Free Memory : 2490 MB LotsFree: 255 MB ZFS Tunables (/etc/system): ARC Size: Current Size: 85 MB (arcsize) Target Size (Adaptive): 12690 MB (c) Min Size (Hard Limit): 1918 MB (zfs_arc_min) Max Size (Hard Limit): 15351 MB (zfs_arc_max) ############ dd read ------- for FILE in bigfile1 bigfile2 do time /usr/gnu/bin/dd if=$FILE of=/dev/null bs=1024k count=16384 & done 17179869184 bytes (17 GB) copied, 253.017 s, 67.9 MB/s 17179869184 bytes (17 GB) copied, 253.567 s, 67.8 MB/s capacity operations bandwidth pool alloc free read write read write ----------- ----- ----- ----- ----- ----- ----- ptest 71.1G 857G 1008 0 125M 0 mirror 71.1G 857G 1008 0 125M 0 c5t9d0 - - 517 0 64.2M 0 c5t10d0 - - 491 0 61.0M 0 ----------- ----- ----- ----- ----- ----- ----- extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 519.0 1.0 64.0 0.0 0.0 10.0 0.0 19.2 1 100 c5t9d0 521.5 0.5 64.8 0.0 0.0 10.0 0.0 19.1 1 100 c5t10d0
Bob Friesenhahn
2012-Jul-17 14:01 UTC
[zfs-discuss] zfs sata mirror slower than single disk
On Tue, 17 Jul 2012, Michael Hase wrote:>> >> If you were to add a second vdev (i.e. stripe) then you should see very >> close to 200% due to the default round-robin scheduling of the writes. > > My expectation would be > 200%, as 4 disks are involved. It may not be the > perfect 4x scaling, but imho it should be (and is for a scsi system) more > than half of the theoretical throughput. This is solaris or a solaris > derivative, not linux ;-)Here are some results from my own machine based on the ''virgin mount'' test approach. The results show less boost than is reported by a benchmark tool like ''iozone'' which sees benefits from caching. I get an initial sequential read speed of 657 MB/s on my new pool which has 1200 MB/s of raw bandwidth (if mirrors could produce 100% boost). Reading the file a second time reports 6.9 GB/s. The below is with a 2.6 GB test file but with a 26 GB test file (just add another zero to ''count'' and wait longer) I see an initial read rate of 618 MB/s and a re-read rate of 8.2 GB/s. The raw disk can transfer 150 MB/s. % zpool status pool: tank state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scan: scrub repaired 0 in 0h10m with 0 errors on Mon Jul 16 04:30:48 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t50000393E8CA21FAd0p0 ONLINE 0 0 0 c11t50000393D8CA34B2d0p0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c8t50000393E8CA2066d0p0 ONLINE 0 0 0 c12t50000393E8CA2196d0p0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c9t50000393D8CA82A2d0p0 ONLINE 0 0 0 c13t50000393E8CA2116d0p0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c10t50000393D8CA59C2d0p0 ONLINE 0 0 0 c14t50000393D8CA828Ed0p0 ONLINE 0 0 0 errors: No known data errors % pfexec zfs create tank/zfstest % pfexec zfs create tank/zfstest/defaults % cd /tank/zfstest/defaults % pfexec dd if=/dev/urandom of=random.dat bs=128k count=20000 20000+0 records in 20000+0 records out 2621440000 bytes (2.6 GB) copied, 36.8133 s, 71.2 MB/s % cd .. % pfexec zfs umount tank/zfstest/defaults % pfexec zfs mount tank/zfstest/defaults % cd defaults % dd if=random.dat of=/dev/null bs=128k count=20000 20000+0 records in 20000+0 records out 2621440000 bytes (2.6 GB) copied, 3.99229 s, 657 MB/s % pfexec dd if=/dev/rdsk/c7t50000393E8CA21FAd0p0 of=/dev/null bs=128k count=2000 2000+0 records in 2000+0 records out 262144000 bytes (262 MB) copied, 1.74532 s, 150 MB/s % bc scale=8 657/150 4.38000000 It is very difficult to benchmark with a cache which works so well: % dd if=random.dat of=/dev/null bs=128k count=20000 20000+0 records in 20000+0 records out 2621440000 bytes (2.6 GB) copied, 0.379147 s, 6.9 GB/s Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Tue, 17 Jul 2012, Bob Friesenhahn wrote:> On Tue, 17 Jul 2012, Michael Hase wrote: >>> >>> If you were to add a second vdev (i.e. stripe) then you should see very >>> close to 200% due to the default round-robin scheduling of the writes. >> >> My expectation would be > 200%, as 4 disks are involved. It may not be the >> perfect 4x scaling, but imho it should be (and is for a scsi system) more >> than half of the theoretical throughput. This is solaris or a solaris >> derivative, not linux ;-) > > Here are some results from my own machine based on the ''virgin mount'' test > approach. The results show less boost than is reported by a benchmark tool > like ''iozone'' which sees benefits from caching. > > I get an initial sequential read speed of 657 MB/s on my new pool which has > 1200 MB/s of raw bandwidth (if mirrors could produce 100% boost). Reading > the file a second time reports 6.9 GB/s. > > The below is with a 2.6 GB test file but with a 26 GB test file (just add > another zero to ''count'' and wait longer) I see an initial read rate of 618 > MB/s and a re-read rate of 8.2 GB/s. The raw disk can transfer 150 MB/s.To work around these caching effects just use a file > 2 times the size of ram, iostat then shows the numbers really coming from disk. I always test like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but quite impressive ;-)> % pfexec zfs create tank/zfstest/defaults > % cd /tank/zfstest/defaults > % pfexec dd if=/dev/urandom of=random.dat bs=128k count=20000 > 20000+0 records in > 20000+0 records out > 2621440000 bytes (2.6 GB) copied, 36.8133 s, 71.2 MB/s > % cd .. > % pfexec zfs umount tank/zfstest/defaults > % pfexec zfs mount tank/zfstest/defaults > % cd defaults > % dd if=random.dat of=/dev/null bs=128k count=20000 > 20000+0 records in > 20000+0 records out > 2621440000 bytes (2.6 GB) copied, 3.99229 s, 657 MB/s > % pfexec dd if=/dev/rdsk/c7t50000393E8CA21FAd0p0 of=/dev/null bs=128k > count=2000 > 2000+0 records in > 2000+0 records out > 262144000 bytes (262 MB) copied, 1.74532 s, 150 MB/s > % bc > scale=8 > 657/150 > 4.38000000 > > It is very difficult to benchmark with a cache which works so well: > > % dd if=random.dat of=/dev/null bs=128k count=20000 > 20000+0 records in > 20000+0 records out > 2621440000 bytes (2.6 GB) copied, 0.379147 s, 6.9 GB/sThis is not my point, I''m pretty sure I did not measure any arc effects - maybe with the one exception of the raid0 test on the scsi array. Don''t know why the arc had this effect, filesize was 2x of ram. The point is: I''m searching for an explanation for the relative slowness of a mirror pair of sata disks, or some tuning knobs, or something like "the disks are plain crap", or maybe: zfs throttles sata disks in general (don''t know the internals). In the range of > 600 MB/s other issues may show up (pcie bus contention, hba contention, cpu load). And performance at this level could be just good enough, not requiring any further tuning. Could you recheck with only 4 disks (2 mirror pairs)? If you just get some 350 MB/s it could be the same problem as with my boxes. All sata disks? Michael> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ >
Bob Friesenhahn
2012-Jul-17 16:41 UTC
[zfs-discuss] zfs sata mirror slower than single disk
On Tue, 17 Jul 2012, Michael Hase wrote:>> >> The below is with a 2.6 GB test file but with a 26 GB test file (just add >> another zero to ''count'' and wait longer) I see an initial read rate of 618 >> MB/s and a re-read rate of 8.2 GB/s. The raw disk can transfer 150 MB/s. > > To work around these caching effects just use a file > 2 times the size of > ram, iostat then shows the numbers really coming from disk. I always test > like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but > quite impressive ;-)Yes, in the past I have done benchmarking with file size 2X the size of memory. This does not necessary erase all caching because the ARC is smart enough not to toss everything. At the moment I have an iozone benchark run up from 8 GB to 256 GB file size. I see that it has started the 256 GB size now. It may be a while. Maybe a day.> In the range of > 600 MB/s other issues may show up (pcie bus contention, hba > contention, cpu load). And performance at this level could be just good > enough, not requiring any further tuning. Could you recheck with only 4 disks > (2 mirror pairs)? If you just get some 350 MB/s it could be the same problem > as with my boxes. All sata disks?Unfortunately, I already put my pool into use and can not conveniently destroy it now. The disks I am using are SAS (7200 RPM, 1 GB) but return similar per-disk data rates as the SATA disks I use for the boot pool. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn
2012-Jul-17 20:18 UTC
[zfs-discuss] zfs sata mirror slower than single disk
On Tue, 17 Jul 2012, Michael Hase wrote:> To work around these caching effects just use a file > 2 times the size of > ram, iostat then shows the numbers really coming from disk. I always test > like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but > quite impressive ;-)Ok, the iozone benchmark finally completed. The results do suggest that reading from mirrors substantially improves the throughput. This is interesting since the results differ (better than) from my ''virgin mount'' test approach: Command line used: iozone -a -i 0 -i 1 -y 64 -q 512 -n 8G -g 256G KB reclen write rewrite read reread 8388608 64 572933 1008668 6945355 7509762 8388608 128 2753805 2388803 6482464 7041942 8388608 256 2508358 2331419 2969764 3045430 8388608 512 2407497 2131829 3021579 3086763 16777216 64 671365 879080 6323844 6608806 16777216 128 1279401 2286287 6409733 6739226 16777216 256 2382223 2211097 2957624 3021704 16777216 512 2237742 2179611 3048039 3085978 33554432 64 933712 699966 6418428 6604694 33554432 128 459896 431640 6443848 6546043 33554432 256 444490 430989 2997615 3026246 33554432 512 427158 430891 3042620 3100287 67108864 64 426720 427167 6628750 6738623 67108864 128 419328 422581 6666153 6743711 67108864 256 419441 419129 3044352 3056615 67108864 512 431053 417203 3090652 3112296 134217728 64 417668 55434 759351 760994 134217728 128 409383 400433 759161 765120 134217728 256 408193 405868 763892 766184 134217728 512 408114 403473 761683 766615 268435456 64 418910 55239 768042 768498 268435456 128 408990 399732 763279 766882 268435456 256 413919 399386 760800 764468 268435456 512 410246 403019 766627 768739 Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
for what is worth.. I had the same problem and found the answer here - http://forums.freebsd.org/showthread.php?t=27207
Be careful when testing ZFS with ozone, I ran a bunch of stats many years ago that produced results that did not pass a basic sanity check. There was *something* about the ozone test data that ZFS either did not like or liked very much, depending on the specific test. I eventually wrote my own very crude tool to test exactly what our workload was and started getting results that matched the reality we saw. On Jul 17, 2012, at 4:18 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Tue, 17 Jul 2012, Michael Hase wrote: > >> To work around these caching effects just use a file > 2 times the size of ram, iostat then shows the numbers really coming from disk. I always test like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but quite impressive ;-) > > Ok, the iozone benchmark finally completed. The results do suggest that reading from mirrors substantially improves the throughput. This is interesting since the results differ (better than) from my ''virgin mount'' test approach: > > Command line used: iozone -a -i 0 -i 1 -y 64 -q 512 -n 8G -g 256G > > KB reclen write rewrite read reread > 8388608 64 572933 1008668 6945355 7509762 > 8388608 128 2753805 2388803 6482464 7041942 > 8388608 256 2508358 2331419 2969764 3045430 > 8388608 512 2407497 2131829 3021579 3086763 > 16777216 64 671365 879080 6323844 6608806 > 16777216 128 1279401 2286287 6409733 6739226 > 16777216 256 2382223 2211097 2957624 3021704 > 16777216 512 2237742 2179611 3048039 3085978 > 33554432 64 933712 699966 6418428 6604694 > 33554432 128 459896 431640 6443848 6546043 > 33554432 256 444490 430989 2997615 3026246 > 33554432 512 427158 430891 3042620 3100287 > 67108864 64 426720 427167 6628750 6738623 > 67108864 128 419328 422581 6666153 6743711 > 67108864 256 419441 419129 3044352 3056615 > 67108864 512 431053 417203 3090652 3112296 > 134217728 64 417668 55434 759351 760994 > 134217728 128 409383 400433 759161 765120 > 134217728 256 408193 405868 763892 766184 > 134217728 512 408114 403473 761683 766615 > 268435456 64 418910 55239 768042 768498 > 268435456 128 408990 399732 763279 766882 > 268435456 256 413919 399386 760800 764468 > 268435456 512 410246 403019 766627 768739 > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Paul Kraus Deputy Technical Director, LoneStarCon 3 Sound Coordinator, Schenectady Light Opera Company
Bob Friesenhahn
2013-Feb-27 02:53 UTC
[zfs-discuss] zfs sata mirror slower than single disk
On Tue, 26 Feb 2013, hagai wrote:> for what is worth.. > I had the same problem and found the answer here - > http://forums.freebsd.org/showthread.php?t=27207Given enough sequential I/O requests, zfs mirrors behave every much like RAID-0 for reads. Sequential prefetch is very important in order to avoid the latencies. While this script may not work perfectly as is for FreeBSD, it was very good at discovering a zfs performance bug (since corrected) and is still an interesting exercise for zfs to see how ZFS ARC caching helps for re-reads. See "http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh". The script will exercise an initial uncached read from disks, and then a (hopefully) cached re-read from disks. I think that it serves as a useful benchmark. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/