Pascal Vandeputte
2008-Apr-17 09:29 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (AHCI)
Hi everyone, I''ve bought some new hardware a couple of weeks ago to replace my home fileserver: Intel DG33TL motherboard with Intel gigabit nic and ICH9R Intel Pentium Dual E2160 (= 1.8GHz Core 2 Duo 64-bit architecture with less cache, cheap, cool and more than fast enough) 2 x 1 GB DDR2 RAM 3 x Seagate 7200.11 750GB SATA drives Originally I was going to keep running Windows 2003 for a month (to finish migrating some data files to an open-source friendly format) and then move to Solaris, but because the Intel Matrix RAID 5 write speeds were abysmally low no matter which stripe sizes/NTFS allocation unit size I selected, I''ve already thrown out W2K3 completely in favor of Solaris 10 u5. I have updated the motherboard with the latest Intel BIOS (0413 3/6/2008). I have loaded "optimal defaults" and have put the SATA drives in AHCI mode. At the moment I''m seeing read speeds of 200MB/s on a ZFS raidz filesystem consisting of c1t0d0s3, c1t1d0 and c1t2d0 (I''m booting from a small 700MB slice on the first sata drive; c1t0d0s3 is about 690 "real" gigabytes large and ZFS just uses the same amount of sectors on the other disks and leaves the rest untouched). As a single drive should top out at about 104MB/s for sequential access in the outer tracks, I''m very pleased with that. But the write speeds I''m getting are still far below my expectations: about 20MB/s (versus 14MB/s in Windows 2003 with Intel RAID driver). I was hoping for at least 100MB/s, maybe even more. I''m doing simple dd read and write tests (with /dev/zero, /dev/null etc) using blocksizes like 16384 and 65536. Shouldn''t write speed be substantially higher? If I monitor using "vmstat 1", I see that cpu usage never exceeds 3% during writes (!), and 10% during reads. I''m a Solaris newbie (but with the intention of learning a whole lot), so I may have overlooked something. I also don''t really know where to start looking for bottlenecks. Thanks! This message posted from opensolaris.org
Bob Friesenhahn
2008-Apr-17 16:47 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (AHCI)
On Thu, 17 Apr 2008, Pascal Vandeputte wrote:> > At the moment I''m seeing read speeds of 200MB/s on a ZFS raidz > filesystem consisting of c1t0d0s3, c1t1d0 and c1t2d0 (I''m booting > from a small 700MB slice on the first sata drive; c1t0d0s3 is about > 690 "real" gigabytes large and ZFS just uses the same amount of > sectors on the other disks and leaves the rest untouched). As a > single drive should top out at about 104MB/s for sequential access > in the outer tracks, I''m very pleased with that. > > But the write speeds I''m getting are still far below my > expectations: about 20MB/s (versus 14MB/s in Windows 2003 with Intel > RAID driver). I was hoping for at least 100MB/s, maybe even more.I don''t know what you should be expecting. 20MB/s seems pretty poor but 100MB/s seems like a stretch with only three drives.> I''m a Solaris newbie (but with the intention of learning a whole > lot), so I may have overlooked something. I also don''t really know > where to start looking for bottlenecks.There are a couple of things which come to mind. * Since you are using a slice on the boot drive, this causes ZFS to not enable the disk drive write cache since it does not assume to know what the filesystem on the other partition needs. As a result, writes to that disk will have more latency and since you are using raidz (which needs to write to all the drives) the extra latency will impact overall write performance. If one of the drives has slower write performance than the others, then the whole raidz will suffer. See "Storage Pools" in http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide. * Maybe this ICH9R interface has some sort of bottleneck in its design or there is a driver performance problem. If the ICH9R is sharing resources rather than dedicating a channel for each drive, then raidz''s increased write load may be overwelming it. If you are looking for really good scalable write performance, perhaps you should be using mirrors instead. In order to see if you have a slow drive, run ''iostat -x'' while writing data. If the svc_t field is much higher for one drive than the others, then that drive is likely slow. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Tim
2008-Apr-17 17:10 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (AHCI)
On Thu, Apr 17, 2008 at 11:47 AM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Thu, 17 Apr 2008, Pascal Vandeputte wrote: > > > > At the moment I''m seeing read speeds of 200MB/s on a ZFS raidz > > filesystem consisting of c1t0d0s3, c1t1d0 and c1t2d0 (I''m booting > > from a small 700MB slice on the first sata drive; c1t0d0s3 is about > > 690 "real" gigabytes large and ZFS just uses the same amount of > > sectors on the other disks and leaves the rest untouched). As a > > single drive should top out at about 104MB/s for sequential access > > in the outer tracks, I''m very pleased with that. > > > > But the write speeds I''m getting are still far below my > > expectations: about 20MB/s (versus 14MB/s in Windows 2003 with Intel > > RAID driver). I was hoping for at least 100MB/s, maybe even more. > > I don''t know what you should be expecting. 20MB/s seems pretty poor > but 100MB/s seems like a stretch with only three drives. > > > I''m a Solaris newbie (but with the intention of learning a whole > > lot), so I may have overlooked something. I also don''t really know > > where to start looking for bottlenecks. > > There are a couple of things which come to mind. > > * Since you are using a slice on the boot drive, this causes ZFS to > not enable the disk drive write cache since it does not assume to know > what the filesystem on the other partition needs. As a result, writes > to that disk will have more latency and since you are using raidz > (which needs to write to all the drives) the extra latency will impact > overall write performance. If one of the drives has slower write > performance than the others, then the whole raidz will suffer. See > "Storage Pools" in > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide. > > * Maybe this ICH9R interface has some sort of bottleneck in its > design or there is a driver performance problem. If the ICH9R is > sharing resources rather than dedicating a channel for each drive, > then raidz''s increased write load may be overwelming it. > > If you are looking for really good scalable write performance, perhaps > you should be using mirrors instead. > > In order to see if you have a slow drive, run ''iostat -x'' while > writing data. If the svc_t field is much higher for one drive than > the others, then that drive is likely slow. > > Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Along those lines, I''d *strongly* suggest running Jeff''s script to pin down whether one drive is the culprit: #!/bin/ksh disks=`format </dev/null | grep c.t.d | nawk ''{print $2}''` getspeed1() { ptime dd if=/dev/rdsk/${1}s0 of=/dev/null bs=64k count=1024 2>&1 | nawk ''$1 == "real" { printf("%.0f\n", 67.108864 / $2) }'' } getspeed() { for iter in 1 2 3 do getspeed1 $1 done | sort -n | tail -2 | head -1 } for disk in $disks do echo $disk `getspeed $disk` MB/sec done ---------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080417/a252b18f/attachment.html>
Bob Friesenhahn
2008-Apr-17 17:36 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (AHCI)
On Thu, 17 Apr 2008, Tim wrote:> > Along those lines, I''d *strongly* suggest running Jeff''s script to pin down > whether one drive is the culprit:But that script only tests read speed and Pascal''s read performance seems fine. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Robert Milkowski
2008-Apr-18 07:48 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (AHCI)
Hello Pascal, Thursday, April 17, 2008, 10:29:33 AM, you wrote: PV> Hi everyone, PV> I''ve bought some new hardware a couple of weeks ago to replace my home fileserver: PV> Intel DG33TL motherboard with Intel gigabit nic and ICH9R PV> Intel Pentium Dual E2160 (= 1.8GHz Core 2 Duo 64-bit PV> architecture with less cache, cheap, cool and more than fast enough) PV> 2 x 1 GB DDR2 RAM PV> 3 x Seagate 7200.11 750GB SATA drives PV> Originally I was going to keep running Windows 2003 for a month PV> (to finish migrating some data files to an open-source friendly PV> format) and then move to Solaris, but because the Intel Matrix PV> RAID 5 write speeds were abysmally low no matter which stripe PV> sizes/NTFS allocation unit size I selected, I''ve already thrown PV> out W2K3 completely in favor of Solaris 10 u5. PV> I have updated the motherboard with the latest Intel BIOS (0413 PV> 3/6/2008). I have loaded "optimal defaults" and have put the SATA drives in AHCI mode. PV> At the moment I''m seeing read speeds of 200MB/s on a ZFS raidz PV> filesystem consisting of c1t0d0s3, c1t1d0 and c1t2d0 (I''m booting PV> from a small 700MB slice on the first sata drive; c1t0d0s3 is PV> about 690 "real" gigabytes large and ZFS just uses the same amount PV> of sectors on the other disks and leaves the rest untouched). As a PV> single drive should top out at about 104MB/s for sequential access PV> in the outer tracks, I''m very pleased with that. PV> But the write speeds I''m getting are still far below my PV> expectations: about 20MB/s (versus 14MB/s in Windows 2003 with PV> Intel RAID driver). I was hoping for at least 100MB/s, maybe even more. PV> I''m doing simple dd read and write tests (with /dev/zero, PV> /dev/null etc) using blocksizes like 16384 and 65536. PV> Shouldn''t write speed be substantially higher? If I monitor using PV> "vmstat 1", I see that cpu usage never exceeds 3% during writes (!), and 10% during reads. PV> I''m a Solaris newbie (but with the intention of learning a whole PV> lot), so I may have overlooked something. I also don''t really know PV> where to start looking for bottlenecks. Check iostat -xn 1 Also try to lower number of outstanding IOs per device from default 35 in zfs to something much slower. -- Best regards, Robert mailto:milek at task.gda.pl http://milek.blogspot.com
Pascal Vandeputte
2008-Apr-18 11:46 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Thanks for all the replies! Some output from "iostat -x 1" while doing a dd of /dev/zero to a file on a raidz of c1t0d0s3, c1t1d0 and c1t2d0 using bs=1048576: extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 104.0 0.0 13312.0 4.0 32.0 346.0 100 100 sd1 0.0 104.0 0.0 13312.0 3.0 32.0 336.4 100 100 sd2 0.0 104.0 0.0 13312.0 3.0 32.0 336.4 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 104.0 0.0 13311.5 4.0 32.0 346.0 100 100 sd1 0.0 106.0 0.0 13567.5 3.0 32.0 330.1 100 100 sd2 0.0 106.0 0.0 13567.5 3.0 32.0 330.1 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 135.0 0.0 12619.3 2.6 25.9 211.3 66 100 sd1 0.0 107.0 0.0 8714.6 1.1 16.3 163.3 38 66 sd2 0.0 101.0 0.0 8077.0 1.0 14.5 153.5 32 61 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 1.0 13.0 8.0 14.5 1.7 0.2 139.9 29 22 sd1 0.0 6.0 0.0 4.0 0.0 0.0 0.9 0 0 sd2 0.0 6.0 0.0 4.0 0.0 0.0 0.9 0 0 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 77.0 0.0 9537.9 19.7 0.6 264.5 63 63 sd1 0.0 122.0 0.0 13833.2 1.7 19.6 174.5 58 63 sd2 0.0 136.0 0.0 15497.6 1.7 19.6 156.8 59 63 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 106.0 0.0 13567.8 34.0 1.0 330.1 100 100 sd1 0.0 103.0 0.0 13183.8 3.0 32.0 339.7 100 100 sd2 0.0 97.0 0.0 12415.8 3.0 32.0 360.7 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 104.0 0.0 13311.7 34.0 1.0 336.4 100 100 sd1 0.0 83.0 0.0 10623.8 3.0 32.0 421.6 100 100 sd2 0.0 76.0 0.0 9727.8 3.0 32.0 460.4 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 104.0 0.0 13312.7 34.0 1.0 336.4 100 100 sd1 0.0 104.0 0.0 13312.7 3.0 32.0 336.4 100 100 sd2 0.0 105.0 0.0 13440.7 3.0 32.0 333.2 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 104.0 0.0 13311.9 34.0 1.0 336.4 100 100 sd1 0.0 106.0 0.0 13567.9 3.0 32.0 330.1 100 100 sd2 0.0 105.0 0.0 13439.9 3.0 32.0 333.2 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 106.0 0.0 13567.6 34.0 1.0 330.1 100 100 sd1 0.0 106.0 0.0 13567.6 3.0 32.0 330.1 100 100 sd2 0.0 104.0 0.0 13311.6 3.0 32.0 336.4 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 120.0 0.0 14086.7 17.0 18.0 291.6 100 100 sd1 0.0 104.0 0.0 13311.7 7.8 27.1 336.4 100 100 sd2 0.0 107.0 0.0 13695.7 7.3 27.7 327.0 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 103.0 0.0 13185.0 3.0 32.0 339.7 100 100 sd1 0.0 104.0 0.0 13313.0 3.0 32.0 336.4 100 100 sd2 0.0 104.0 0.0 13313.0 3.0 32.0 336.4 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 115.0 0.0 12824.4 3.0 32.0 304.3 100 100 sd1 0.0 131.0 0.0 14360.3 3.0 32.0 267.1 100 100 sd2 0.0 125.0 0.0 14104.8 3.0 32.0 279.9 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 99.0 0.0 12672.9 3.0 32.0 353.4 100 100 sd1 0.0 82.0 0.0 10496.8 3.0 32.0 426.7 100 100 sd2 0.0 95.0 0.0 12160.9 3.0 32.0 368.3 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 104.0 0.0 13311.7 3.0 32.0 336.4 100 100 sd1 0.0 103.0 0.0 13183.7 3.0 32.0 339.7 100 100 sd2 0.0 105.0 0.0 13439.7 3.0 32.0 333.2 100 100 Similar output when running "iostat -xn 1": extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 103.0 0.0 13184.3 4.0 32.0 38.7 310.7 100 100 c1t0d0 0.0 104.0 0.0 13312.3 3.0 32.0 28.7 307.7 100 100 c1t1d0 0.0 104.0 0.0 13312.3 3.0 32.0 28.7 307.7 100 100 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 106.0 0.0 13567.9 4.0 32.0 37.6 301.9 100 100 c1t0d0 0.0 123.0 0.0 13592.9 2.9 31.9 23.4 259.2 96 100 c1t1d0 0.0 122.0 0.0 13467.4 2.7 31.3 22.1 256.3 90 100 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.0 91.0 8.0 6986.7 2.2 12.7 23.8 137.8 45 79 c1t0d0 0.0 47.0 0.0 3057.1 0.0 3.0 0.0 63.9 0 24 c1t1d0 0.0 42.0 0.0 2545.1 0.0 1.8 0.0 43.7 0 19 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 36.7 0.0 2747.0 1.4 0.1 38.4 1.7 14 6 c1t0d0 0.0 42.7 0.0 4326.3 0.0 1.2 0.6 28.6 1 6 c1t1d0 0.0 44.7 0.0 4707.1 0.0 1.3 0.6 28.1 1 6 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 99.7 0.0 12760.7 33.3 1.0 334.4 10.0 100 100 c1t0d0 0.0 128.9 0.0 15215.3 3.0 32.0 23.2 248.2 100 100 c1t1d0 0.0 141.0 0.0 16504.8 3.0 32.0 21.2 227.0 100 100 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 104.0 0.0 13313.1 34.0 1.0 326.8 9.6 100 100 c1t0d0 0.0 80.0 0.0 10240.9 3.0 32.0 37.4 400.0 100 100 c1t1d0 0.0 68.0 0.0 8704.7 3.0 32.0 44.0 470.5 100 100 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 104.0 0.0 13311.6 34.0 1.0 326.8 9.6 100 100 c1t0d0 0.0 106.0 0.0 13567.6 3.0 32.0 28.2 301.9 100 100 c1t1d0 0.0 105.0 0.0 13439.6 3.0 32.0 28.5 304.8 100 100 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 104.0 0.0 13312.5 34.0 1.0 326.8 9.6 100 100 c1t0d0 0.0 104.0 0.0 13312.5 3.0 32.0 28.7 307.7 100 100 c1t1d0 0.0 106.0 0.0 13568.5 3.0 32.0 28.2 301.9 100 100 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 104.0 0.0 13311.8 34.0 1.0 326.8 9.6 100 100 c1t0d0 0.0 106.0 0.0 13567.8 3.0 32.0 28.2 301.9 100 100 c1t1d0 0.0 104.0 0.0 13311.8 3.0 32.0 28.7 307.7 100 100 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 106.0 0.0 13567.7 34.0 1.0 320.7 9.4 100 100 c1t0d0 0.0 106.0 0.0 13567.7 3.0 32.0 28.2 301.9 100 100 c1t1d0 0.0 104.0 0.0 13311.7 3.0 32.0 28.7 307.7 100 100 c1t2d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 120.0 0.0 14087.1 4.0 31.0 33.0 258.6 100 100 c1t0d0 0.0 104.0 0.0 13312.1 7.8 27.1 75.5 260.9 100 100 c1t1d0 0.0 107.0 0.0 13696.1 7.3 27.7 68.4 258.5 100 100 c1t2d0 I mostly get readings like the first two ones. Another run, half an hour later, most often shows this instead: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 102.0 0.0 13054.5 34.0 1.0 333.3 9.8 100 100 c1t0d0 0.0 111.0 0.0 14206.4 34.0 1.0 306.2 9.0 100 100 c1t1d0 0.0 106.0 0.0 13503.9 3.0 32.0 28.2 301.9 100 100 c1t2d0 It''s all a little fishy, and kw/s doesn''t differ much between the drives (but this could be explained as drive(s) with longer wait queues holding back the others I guess?). According to Jeff''s script, read speed seems to differ slightly as well (I repeated it 3 times and always got the same result): # ./jeff.sh c1t0d0 100 MB/sec c1t1d0 112 MB/sec c1t2d0 112 MB/sec I have tested dd write speed on the root partition (c1t0d0s0) and I get 27 MB/s there (which I find quite low as well, these people here get 87 MB/s average write speed... http://techreport.com/articles.x/13440/13 ). I''ll double-check using Linux what sequential write speeds I can get out of a single drive on this system. So 100MB/s is indeed out of the question on a raidz with 3 drives, but I would still expect 50 MB/s to be technically possible (at least on nearly empty disks). For fun I tried a mirror of c1t1d0 and c1t2d0, so the OS disk is not involved and write caching should work. I still get the same write speed of 13 MB/s per drive: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0 0.0 104.0 0.0 13313.0 3.0 32.0 28.8 307.7 100 100 c1t1d0 0.0 104.0 0.0 13313.0 3.0 32.0 28.8 307.7 100 100 c1t2d0 And if I do the same with c1t0d0s3 and c1t2d0: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 104.0 0.0 13311.7 3.0 32.0 28.8 307.7 100 100 c1t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0 0.0 106.0 0.0 13567.7 3.0 32.0 28.3 301.9 100 100 c1t2d0 Hmm, doesn''t look like one drive holding back another one, all of them seem to be equally slow at writing. This is the current partition table of the boot drive: partition> print Current partition table (original): Total disk cylinders available: 45597 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 1 - 45 705.98MB (45/0/0) 1445850 1 swap wu 46 - 78 517.72MB (33/0/0) 1060290 2 backup wm 0 - 45596 698.58GB (45597/0/0) 1465031610 3 unassigned wm 79 - 45596 697.37GB (45518/0/0) 1462493340 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 0 0 (0/0/0) 0 8 boot wu 0 - 0 15.69MB (1/0/0) 32130 9 unassigned wm 0 0 (0/0/0) 0 Note that I have included a small 512MB slice for swap space. The ZFS Best Practices Guide recommends against swap on the same disk as ZFS storage, but being new to Solaris I don''t know if it would run fine without any swap space at all. I''ve got 2GB of memory and don''t intend to run anything else than ZFS, Samba and NFS. Apr 18, 2008 12:48 AM milek wrote :> Also try to lower number of outstanding IOs per device from default 35in zfs to something much slower. Thanks a lot for the suggestion. But how do I do that? I''ve found some information on pending IOs on http://blogs.sun.com/roch/ , but neither the Best Practices Guide, the ZFS Administration Guide, or Google give much information if I search for pending/outstanding or 35 etc. Could the ahci driver be suspect? Maybe I can change my BIOS SATA support to legacy IDE, reinstall and see if anything interesting occurs? Finally, while recreating the ZFS pool, I got a "raidz contains devices of different sizes" warning (I must have forgotten about using -f to force creation the first time). I do hope this is safe, right? How does ZFS handle block devices of different sizes? I get the same amount of blocks when doing df on a mirror of (whole disk 2 & 3) versus a mirror of (the large slice of disk 1 & whole disk 3)... :-| Which bothers me a lot now! I could always try to install Solaris on a compactflash card in an IDE-to-CF adapter. Many thanks for your help, Pascal This message posted from opensolaris.org
Bob Friesenhahn
2008-Apr-18 15:28 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
On Fri, 18 Apr 2008, Pascal Vandeputte wrote:> Thanks for all the replies! > > Some output from "iostat -x 1" while doing a dd of /dev/zero to a > file on a raidz of c1t0d0s3, c1t1d0 and c1t2d0 using bs=1048576:[ data removed ]> It''s all a little fishy, and kw/s doesn''t differ much between the > drives (but this could be explained as drive(s) with longer wait > queues holding back the others I guess?).Your data does strongly support my hypothesis that using a slice on ''sd0'' would slow down writes. It may also be that your boot drive is a different type and vintage from the other drives. Testing with output from /dev/zero is not very good since zfs treats blocks of zeros specially. I have found ''iozone'' (http://www.iozone.org/) to be quite useful for basic filesystem throughput testing.> Hmm, doesn''t look like one drive holding back another one, all of > them seem to be equally slow at writing.Note that if drives are paired, or raidz requires a write to all drives, then the write rate is necessarily limited to the speed of the slowest device. I suspect that your c1t1d0 and c1t2d0 drives are similar type and vintage whereas the boot drive was delivered with the computer and has different performance characteristics (double wammy). Usually drives delivered with computers are selected by the computer vendor based on lowest cost in order to decrease the cost of the entire computer. SATA drives are cheap this days so perhaps you can find a way to add a fourth drive which is at least as good as the drives you are using for c1t1d0 and c1t2d0. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Pascal Vandeputte
2008-Apr-18 15:59 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Hi, Thanks for your input. Unfortunately, all 3 drives are identical Seagate 7200.11 drives which I bought separately and they are attached in no particular order. Thanks about the /dev/zero remark, I didn''t know that.>From what I''ve seen this afternoon, I''m starting to suspect a hardware/firmware issue as well. Using Linux I cannot extract more than 24,5 MB/s sequential write performance out of a single drive (writing directly to /dev/sdX, no filesystem overhead).I tried flashing the BIOS to an older version, but that firmware update process fails somehow. Reflashing the newest BIOS still works however. It''s a pity that I didn''t benchmark before updating the BIOS & RAID firmware package. Maybe then I would have gotten decent Windows performance as well. It could even be an issue with the Seagate disks, as there have been problems with SD04 and SD14 firmwares (reported 0MB cache to the system). Mine are SD15 and should be fine though. I''m at a loss, I''m thinking about just settling for the 20MB/s write speeds with a 3-drive raidz and enjoy life... Which leaves me with my other previously asked questions: - does Solaris require a swap space on disk - does Solaris run from a CompactFlash card - does ZFS handle raidz or mirror pools with block devices of a slightly different size or am I risking data loss? Thanks, Pascal This message posted from opensolaris.org
Bob Friesenhahn
2008-Apr-18 16:24 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
On Fri, 18 Apr 2008, Pascal Vandeputte wrote:> - does Solaris require a swap space on diskNo, Solaris does not require a swap space. However you do not have a lot of memory so when there is not enough virtual memory available, programs will fail to allocate memory and quit running. There is an advantage to having a swap area since then Solaris can put rarely used pages in swap to improve overall performance. The memory can then be used for useful caching (e.g. ZFS ARC), or for your applications. In addition to using a dedicated partition, you can use a file on UFS for swap (''man swap'') and ZFS itself is able to support a swap volume. I don''t think that you can put a normal swap file on ZFS so you would want to use ZFS''s built-in support for that. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Brandon High
2008-Apr-18 17:59 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
On Fri, Apr 18, 2008 at 8:28 AM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Fri, 18 Apr 2008, Pascal Vandeputte wrote: > > > Thanks for all the replies! > > > > Some output from "iostat -x 1" while doing a dd of /dev/zero to a > > file on a raidz of c1t0d0s3, c1t1d0 and c1t2d0 using bs=1048576: > [ data removed ] > > > It''s all a little fishy, and kw/s doesn''t differ much between the > > drives (but this could be explained as drive(s) with longer wait > > queues holding back the others I guess?). > > Your data does strongly support my hypothesis that using a slice on > ''sd0'' would slow down writes. It may also be that your boot drive is > a different type and vintage from the other drives.ZFS disables (or rather, doesn''t enable) the drive''s write cache when it''s using a slice. It can be manually enabled, however. The swap slice on your first disk is what''s limiting your performance.> SATA drives are cheap this days so perhaps you can find a way to add a > fourth drive which is at least as good as the drives you are using for > c1t1d0 and c1t2d0.Using a separate boot volume with a swap slice on it might be a good idea. You''ll be able to upgrade or reinstall your OS without touching the zpool. -B -- Brandon High bhigh at freaks.com "The good is the enemy of the best." - Nietzsche
Richard Elling
2008-Apr-18 22:36 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
comments below... Pascal Vandeputte wrote:> Thanks for all the replies! > > Some output from "iostat -x 1" while doing a dd of /dev/zero to a file on a raidz of c1t0d0s3, c1t1d0 and c1t2d0 using bs=1048576: > > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13312.0 4.0 32.0 346.0 100 100 > sd1 0.0 104.0 0.0 13312.0 3.0 32.0 336.4 100 100 > sd2 0.0 104.0 0.0 13312.0 3.0 32.0 336.4 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13311.5 4.0 32.0 346.0 100 100 > sd1 0.0 106.0 0.0 13567.5 3.0 32.0 330.1 100 100 > sd2 0.0 106.0 0.0 13567.5 3.0 32.0 330.1 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 135.0 0.0 12619.3 2.6 25.9 211.3 66 100 > sd1 0.0 107.0 0.0 8714.6 1.1 16.3 163.3 38 66 > sd2 0.0 101.0 0.0 8077.0 1.0 14.5 153.5 32 61 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 1.0 13.0 8.0 14.5 1.7 0.2 139.9 29 22 > sd1 0.0 6.0 0.0 4.0 0.0 0.0 0.9 0 0 > sd2 0.0 6.0 0.0 4.0 0.0 0.0 0.9 0 0 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 77.0 0.0 9537.9 19.7 0.6 264.5 63 63 > sd1 0.0 122.0 0.0 13833.2 1.7 19.6 174.5 58 63 > sd2 0.0 136.0 0.0 15497.6 1.7 19.6 156.8 59 63 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 106.0 0.0 13567.8 34.0 1.0 330.1 100 100 > sd1 0.0 103.0 0.0 13183.8 3.0 32.0 339.7 100 100 > sd2 0.0 97.0 0.0 12415.8 3.0 32.0 360.7 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13311.7 34.0 1.0 336.4 100 100 > sd1 0.0 83.0 0.0 10623.8 3.0 32.0 421.6 100 100 > sd2 0.0 76.0 0.0 9727.8 3.0 32.0 460.4 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13312.7 34.0 1.0 336.4 100 100 > sd1 0.0 104.0 0.0 13312.7 3.0 32.0 336.4 100 100 > sd2 0.0 105.0 0.0 13440.7 3.0 32.0 333.2 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13311.9 34.0 1.0 336.4 100 100 > sd1 0.0 106.0 0.0 13567.9 3.0 32.0 330.1 100 100 > sd2 0.0 105.0 0.0 13439.9 3.0 32.0 333.2 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 106.0 0.0 13567.6 34.0 1.0 330.1 100 100 > sd1 0.0 106.0 0.0 13567.6 3.0 32.0 330.1 100 100 > sd2 0.0 104.0 0.0 13311.6 3.0 32.0 336.4 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 120.0 0.0 14086.7 17.0 18.0 291.6 100 100 > sd1 0.0 104.0 0.0 13311.7 7.8 27.1 336.4 100 100 > sd2 0.0 107.0 0.0 13695.7 7.3 27.7 327.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 103.0 0.0 13185.0 3.0 32.0 339.7 100 100 > sd1 0.0 104.0 0.0 13313.0 3.0 32.0 336.4 100 100 > sd2 0.0 104.0 0.0 13313.0 3.0 32.0 336.4 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 115.0 0.0 12824.4 3.0 32.0 304.3 100 100 > sd1 0.0 131.0 0.0 14360.3 3.0 32.0 267.1 100 100 > sd2 0.0 125.0 0.0 14104.8 3.0 32.0 279.9 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 99.0 0.0 12672.9 3.0 32.0 353.4 100 100 > sd1 0.0 82.0 0.0 10496.8 3.0 32.0 426.7 100 100 > sd2 0.0 95.0 0.0 12160.9 3.0 32.0 368.3 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd0 0.0 104.0 0.0 13311.7 3.0 32.0 336.4 100 100 > sd1 0.0 103.0 0.0 13183.7 3.0 32.0 339.7 100 100 > sd2 0.0 105.0 0.0 13439.7 3.0 32.0 333.2 100 100 > > > Similar output when running "iostat -xn 1": > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 103.0 0.0 13184.3 4.0 32.0 38.7 310.7 100 100 c1t0d0 > 0.0 104.0 0.0 13312.3 3.0 32.0 28.7 307.7 100 100 c1t1d0 > 0.0 104.0 0.0 13312.3 3.0 32.0 28.7 307.7 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 106.0 0.0 13567.9 4.0 32.0 37.6 301.9 100 100 c1t0d0 > 0.0 123.0 0.0 13592.9 2.9 31.9 23.4 259.2 96 100 c1t1d0 > 0.0 122.0 0.0 13467.4 2.7 31.3 22.1 256.3 90 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1.0 91.0 8.0 6986.7 2.2 12.7 23.8 137.8 45 79 c1t0d0 > 0.0 47.0 0.0 3057.1 0.0 3.0 0.0 63.9 0 24 c1t1d0 > 0.0 42.0 0.0 2545.1 0.0 1.8 0.0 43.7 0 19 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 36.7 0.0 2747.0 1.4 0.1 38.4 1.7 14 6 c1t0d0 > 0.0 42.7 0.0 4326.3 0.0 1.2 0.6 28.6 1 6 c1t1d0 > 0.0 44.7 0.0 4707.1 0.0 1.3 0.6 28.1 1 6 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 99.7 0.0 12760.7 33.3 1.0 334.4 10.0 100 100 c1t0d0 > 0.0 128.9 0.0 15215.3 3.0 32.0 23.2 248.2 100 100 c1t1d0 > 0.0 141.0 0.0 16504.8 3.0 32.0 21.2 227.0 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13313.1 34.0 1.0 326.8 9.6 100 100 c1t0d0 > 0.0 80.0 0.0 10240.9 3.0 32.0 37.4 400.0 100 100 c1t1d0 > 0.0 68.0 0.0 8704.7 3.0 32.0 44.0 470.5 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13311.6 34.0 1.0 326.8 9.6 100 100 c1t0d0 > 0.0 106.0 0.0 13567.6 3.0 32.0 28.2 301.9 100 100 c1t1d0 > 0.0 105.0 0.0 13439.6 3.0 32.0 28.5 304.8 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13312.5 34.0 1.0 326.8 9.6 100 100 c1t0d0 > 0.0 104.0 0.0 13312.5 3.0 32.0 28.7 307.7 100 100 c1t1d0 > 0.0 106.0 0.0 13568.5 3.0 32.0 28.2 301.9 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13311.8 34.0 1.0 326.8 9.6 100 100 c1t0d0 > 0.0 106.0 0.0 13567.8 3.0 32.0 28.2 301.9 100 100 c1t1d0 > 0.0 104.0 0.0 13311.8 3.0 32.0 28.7 307.7 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 106.0 0.0 13567.7 34.0 1.0 320.7 9.4 100 100 c1t0d0 > 0.0 106.0 0.0 13567.7 3.0 32.0 28.2 301.9 100 100 c1t1d0 > 0.0 104.0 0.0 13311.7 3.0 32.0 28.7 307.7 100 100 c1t2d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 120.0 0.0 14087.1 4.0 31.0 33.0 258.6 100 100 c1t0d0 > 0.0 104.0 0.0 13312.1 7.8 27.1 75.5 260.9 100 100 c1t1d0 > 0.0 107.0 0.0 13696.1 7.3 27.7 68.4 258.5 100 100 c1t2d0 > > I mostly get readings like the first two ones. > Another run, half an hour later, most often shows this instead: > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 102.0 0.0 13054.5 34.0 1.0 333.3 9.8 100 100 c1t0d0 > 0.0 111.0 0.0 14206.4 34.0 1.0 306.2 9.0 100 100 c1t1d0 > 0.0 106.0 0.0 13503.9 3.0 32.0 28.2 301.9 100 100 c1t2d0 >The average service time for all of your disks is about 9-10ms. The difference you see here (9.8 for c1t0d0 vs. 301.9 for c1t2d0) is due to the queue depth at the disk (actv). By default, ZFS will try to queue 35 iops to each vdev, which is why you see 34 + 1 (wait + actv) or 3 + 32, and so on. The takeaway here is that ZFS is sending a bunch of work to the disks, it is just a matter of how quickly the disks finish the work. An average of 10ms for a 7,200 rpm disk for write workloads is less than I would expect. Just doing the simple math, 10ms = ~ 100 w/s @ 128kBytes/iop or about 12.8 MBytes/s. More interestingly, sequential writes should not require seeks and the 10ms response time implies seeking. Even in the NCQ case (actv > 1), we see the same disk performance, ~ 10ms/iop. I would look more closely at the hardware and firmware for clues. -- richard> It''s all a little fishy, and kw/s doesn''t differ much between the drives (but this could be explained as drive(s) with longer wait queues holding back the others I guess?). > > According to Jeff''s script, read speed seems to differ slightly as well (I repeated it 3 times and always got the same result): > > # ./jeff.sh > c1t0d0 100 MB/sec > c1t1d0 112 MB/sec > c1t2d0 112 MB/sec > > I have tested dd write speed on the root partition (c1t0d0s0) and I get 27 MB/s there (which I find quite low as well, these people here get 87 MB/s average write speed... http://techreport.com/articles.x/13440/13 ). I''ll double-check using Linux what sequential write speeds I can get out of a single drive on this system. > So 100MB/s is indeed out of the question on a raidz with 3 drives, but I would still expect 50 MB/s to be technically possible (at least on nearly empty disks). > > > For fun I tried a mirror of c1t1d0 and c1t2d0, so the OS disk is not involved and write caching should work. I still get the same write speed of 13 MB/s per drive: > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0 > 0.0 104.0 0.0 13313.0 3.0 32.0 28.8 307.7 100 100 c1t1d0 > 0.0 104.0 0.0 13313.0 3.0 32.0 28.8 307.7 100 100 c1t2d0 > > And if I do the same with c1t0d0s3 and c1t2d0: > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 104.0 0.0 13311.7 3.0 32.0 28.8 307.7 100 100 c1t0d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0 > 0.0 106.0 0.0 13567.7 3.0 32.0 28.3 301.9 100 100 c1t2d0 > > Hmm, doesn''t look like one drive holding back another one, all of them seem to be equally slow at writing. > > > This is the current partition table of the boot drive: > > partition> print > Current partition table (original): > Total disk cylinders available: 45597 + 2 (reserved cylinders) > > Part Tag Flag Cylinders Size Blocks > 0 root wm 1 - 45 705.98MB (45/0/0) 1445850 > 1 swap wu 46 - 78 517.72MB (33/0/0) 1060290 > 2 backup wm 0 - 45596 698.58GB (45597/0/0) 1465031610 > 3 unassigned wm 79 - 45596 697.37GB (45518/0/0) 1462493340 > 4 unassigned wm 0 0 (0/0/0) 0 > 5 unassigned wm 0 0 (0/0/0) 0 > 6 unassigned wm 0 0 (0/0/0) 0 > 7 unassigned wm 0 0 (0/0/0) 0 > 8 boot wu 0 - 0 15.69MB (1/0/0) 32130 > 9 unassigned wm 0 0 (0/0/0) 0 > > Note that I have included a small 512MB slice for swap space. The ZFS Best Practices Guide recommends against swap on the same disk as ZFS storage, but being new to Solaris I don''t know if it would run fine without any swap space at all. I''ve got 2GB of memory and don''t intend to run anything else than ZFS, Samba and NFS. > > > > Apr 18, 2008 12:48 AM milek wrote : > >> Also try to lower number of outstanding IOs per device from default 35 >> > in zfs to something much slower. > > Thanks a lot for the suggestion. But how do I do that? I''ve found some information on pending IOs on http://blogs.sun.com/roch/ , but neither the Best Practices Guide, the ZFS Administration Guide, or Google give much information if I search for pending/outstanding or 35 etc. > > > Could the ahci driver be suspect? Maybe I can change my BIOS SATA support to legacy IDE, reinstall and see if anything interesting occurs? > > > Finally, while recreating the ZFS pool, I got a "raidz contains devices of different sizes" warning (I must have forgotten about using -f to force creation the first time). I do hope this is safe, right? How does ZFS handle block devices of different sizes? I get the same amount of blocks when doing df on a mirror of (whole disk 2 & 3) versus a mirror of (the large slice of disk 1 & whole disk 3)... :-| Which bothers me a lot now! I could always try to install Solaris on a compactflash card in an IDE-to-CF adapter. > > > Many thanks for your help, > > Pascal > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Marc Bevand
2008-Apr-19 14:51 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Pascal Vandeputte <pascal_vdp <at> hotmail.com> writes:> > I''m at a loss, I''m thinking about just settling for the 20MB/s write > speeds with a 3-drive raidz and enjoy life...As Richard Elling pointed out, the ~10ms per IO operation implies seeking, or hardware/firmware problems. The mere fact you observed a low 27 MB/s sequential write throughput on c1t0d0s0 indicates this is not a ZFS pb. Test other disks, another SATA controller, mobo, BIOS/firmware, etc. As you pointed out, these disks should normally be capable of a 80-90 MB/s write throughput. Like you I would also expect ~100 MB/s writes on a 3-drive raidz pool. As a datapoint, I see 150 MB/s writes on a 4-drive raidz on a similar config (750GB SATA Samsung HD753LJ disks, SB600 AHCI controller, low-end CPU). -marc
Pascal Vandeputte
2008-Apr-19 15:16 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
I see. I''ll only be running a minimal Solaris install with ZFS and samba on this machine, so I wouldn''t expect immediate memory issues with 2 gigabytes of RAM. OTOH I read that ZFS is a real memory hog so I''ll be careful. I''ve tested swap on a ZFS volume now, it''s really easy so I''ll try running without swap for some quick performance testing and use swap on ZFS after that. This also takes away my fears about using a swap slice on the CompactFlash card I''ll be booting from. Thanks! This message posted from opensolaris.org
Pascal Vandeputte
2008-Apr-19 15:40 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Thanks, I''ll try installing Solaris on a 1GB CF card in an CF-to-IDE adapter, so all disks will then be completely available to ZFS. Then I needn''t worry about different size block devices either. I also find it weird that the boot disk is displayed differently from the other two disks if I run the "format" command... (could be normal though, as I said before I''m new to Solaris) # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <DEFAULT cyl 45597 alt 2 hd 255 sec 126> /pci at 0,0/pci8086,5044 at 1f,2/disk at 0,0 1. c1t1d0 <ATA-ST3750330AS-SD15-698.64GB> /pci at 0,0/pci8086,5044 at 1f,2/disk at 1,0 2. c1t2d0 <ATA-ST3750330AS-SD15-698.64GB> /pci at 0,0/pci8086,5044 at 1f,2/disk at 2,0 This message posted from opensolaris.org
Pascal Vandeputte
2008-Apr-19 15:43 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
(the lt and gt symbols are filtered by the forum I guess; replaced with minus signs now) # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 -DEFAULT cyl 45597 alt 2 hd 255 sec 126- /pci at 0,0/pci8086,5044 at 1f,2/disk at 0,0 1. c1t1d0 -ATA-ST3750330AS-SD15-698.64GB- /pci at 0,0/pci8086,5044 at 1f,2/disk at 1,0 2. c1t2d0 -ATA-ST3750330AS-SD15-698.64GB- /pci at 0,0/pci8086,5044 at 1f,2/disk at 2,0 This message posted from opensolaris.org
Pascal Vandeputte
2008-Apr-19 15:46 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Great, superb write speeds with a similar setup, my motivation is growing again ;-) It just occurs to me that I have a spare Silicon Image 3124 SATA card lying around. I was postponing testing of these drives on my desktop because it has an Intel ICH9 SATA controller probably quite similar to the ICH9R (RAID support) in my Solaris box, but that 3124 may give completely different results with the Seagates. Test coming up. (the forum seems to be having technical difficulties, I hope my replies end up in the right places...) This message posted from opensolaris.org
Pascal Vandeputte
2008-Apr-19 15:53 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Thanks a lot for your input, I understand those numbers a lot better now! I''ll look deeper into hardware issues. It''s a pity that I can''t get older BIOS versions flashed. But I''ve got some other hardware lying around. Someone suggested lowering the 35 iops default, but I can''t find any information anywhere on how to accomplish this (not with Google, not in the ZFS Admin guide either). Greetings, Pascal This message posted from opensolaris.org
Richard Elling
2008-Apr-19 16:50 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Pascal Vandeputte wrote:> Thanks a lot for your input, I understand those numbers a lot better now! I''ll look deeper into hardware issues. It''s a pity that I can''t get older BIOS versions flashed. But I''ve got some other hardware lying around. > > Someone suggested lowering the 35 iops default, but I can''t find any information anywhere on how to accomplish this (not with Google, not in the ZFS Admin guide either). >It is in the Evil Tuning Guide. But don''t bother, it won''t fix your problem. The evidence suggests you get 10ms response even with only 1 iop queued to the device. -- richard
Richard Elling
2008-Apr-19 16:59 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Pascal Vandeputte wrote:> I see. I''ll only be running a minimal Solaris install with ZFS and samba on this machine, so I wouldn''t expect immediate memory issues with 2 gigabytes of RAM. OTOH I read that ZFS is a real memory hog so I''ll be careful. >Memory usage is completely dependent on the workload. Unless you are doing a *lot* of writes with a slow back end (hmmm....) then you should be ok with modest RAM.> I''ve tested swap on a ZFS volume now, it''s really easy so I''ll try running without swap for some quick performance testing and use swap on ZFS after that. This also takes away my fears about using a swap slice on the CompactFlash card I''ll be booting from. > >To save you some grief, please wait for b88 before swapping to ZFS. Don''t worry about swapping on CF. In most cases, you won''t be using the swap device for normal operations. You can use the swap -l command to observe the swap device usage. No usage means that you can probably do away with it. If you actually use swap, performance will suck, so buy more RAM. -- richard
Bob Friesenhahn
2008-Apr-19 17:16 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
On Sat, 19 Apr 2008, Richard Elling wrote:> > Don''t worry about swapping on CF. In most cases, you won''t be > using the swap device for normal operations. You can use the > swap -l command to observe the swap device usage. No usage > means that you can probably do away with it. If you actually > use swap, performance will suck, so buy more RAM.I don''t agree that if swap is used that performance will necessarily suck. If swap is available, Solaris will mount /tmp there, which helps temporary file performance. It is best to look at system paging (hard faults) while programs are running in order to determine if performance sucks due to inadequate RAM. In many runtime environments, only a small bit of the application address space is ever needed. More RAM definitely improves ZFS repeated read performance due to caching in RAM. ZFS gives otherwise unused memory something useful to do. Regardless, with a 64-bit CPU and a 64-bit OS it seems like a crying shame to install less than 4GB of RAM. :-) Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling
2008-Apr-19 17:28 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Bob Friesenhahn wrote:> On Sat, 19 Apr 2008, Richard Elling wrote: > >> Don''t worry about swapping on CF. In most cases, you won''t be >> using the swap device for normal operations. You can use the >> swap -l command to observe the swap device usage. No usage >> means that you can probably do away with it. If you actually >> use swap, performance will suck, so buy more RAM. >> > > I don''t agree that if swap is used that performance will necessarily > suck. If swap is available, Solaris will mount /tmp there, which > helps temporary file performance. It is best to look at system paging > (hard faults) while programs are running in order to determine if > performance sucks due to inadequate RAM. In many runtime > environments, only a small bit of the application address space is > ever needed. >swapfs is always there. But, IMHO, it is a misnomer because it just uses the virtual memory system. The prevailing method of determining memory shortfall is to observe the page scanner (scan rate, sr). But just for grins, try swap -l on your systems and see if any pages have been used on the swap device. The answer usually surprises ;-)> More RAM definitely improves ZFS repeated read performance due to > caching in RAM. ZFS gives otherwise unused memory something useful to > do. > > Regardless, with a 64-bit CPU and a 64-bit OS it seems like a crying > shame to install less than 4GB of RAM. :-) >Yep, or if you do OpenGL stuff, like I''ve been doing lately, much more RAM :-) -- richard
A Darren Dunham
2008-Apr-20 03:22 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
On Sat, Apr 19, 2008 at 12:16:11PM -0500, Bob Friesenhahn wrote:> On Sat, 19 Apr 2008, Richard Elling wrote: > > > > Don''t worry about swapping on CF. In most cases, you won''t be > > using the swap device for normal operations. You can use the > > swap -l command to observe the swap device usage. No usage > > means that you can probably do away with it. If you actually > > use swap, performance will suck, so buy more RAM. > > I don''t agree that if swap is used that performance will necessarily > suck. If swap is available, Solaris will mount /tmp there, which > helps temporary file performance.I think these paragraphs are referring to two different concepts with "swap". Swapfiles or backing store in the first, and virtual memory space in the second. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
A Darren Dunham
2008-Apr-20 03:42 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
On Sat, Apr 19, 2008 at 10:28:45AM -0700, Richard Elling wrote:> Bob Friesenhahn wrote: > > I don''t agree that if swap is used that performance will necessarily > > suck. If swap is available, Solaris will mount /tmp there, which > > helps temporary file performance. It is best to look at system paging > > (hard faults) while programs are running in order to determine if > > performance sucks due to inadequate RAM. In many runtime > > environments, only a small bit of the application address space is > > ever needed. > > swapfs is always there. But, IMHO, it is a misnomer because it just uses > the virtual memory system.Why a misnomer? "swap" and "virtual memory" are used as identical terms in many places in Solaris. But since /tmp was mentioned, perhaps you''re referring to tmpfs instead of swapfs? -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Bob Friesenhahn
2008-Apr-20 05:19 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
On Sun, 20 Apr 2008, A Darren Dunham wrote:> > I think these paragraphs are referring to two different concepts with > "swap". Swapfiles or backing store in the first, and virtual memory > space in the second.The "swap" area is mis-named since Solaris never "swaps". Some older operating systems would put an entire program in the swap area when the system ran short on memory and would have to "swap" between programs. Solaris just "pages" (a virtual memory function) and it is very smart about how and when it does it. Only dirty pages which are not write-mapped to a file in the filesystem need to go in the swap area, and only when the system runs short on RAM. Solaris is a quite-intensely memory-mapped system. The memory mapping allows a huge amount of sharing of shared library files, program text images, and unmodified pages shared after fork(). The end result is a very memory-efficient OS. Now if we could just get ZFS ARC and Gnome Desktop to not use any memory, we would be in nirvana. :-) Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
michael schuster
2008-Apr-20 05:29 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
Bob Friesenhahn wrote:> On Sun, 20 Apr 2008, A Darren Dunham wrote: >> I think these paragraphs are referring to two different concepts with >> "swap". Swapfiles or backing store in the first, and virtual memory >> space in the second. > > The "swap" area is mis-named since Solaris never "swaps". Some older > operating systems would put an entire program in the swap area when > the system ran short on memory and would have to "swap" between > programs. Solaris just "pages" (a virtual memory function) and it is > very smart about how and when it does it. Only dirty pages which are > not write-mapped to a file in the filesystem need to go in the swap > area, and only when the system runs short on RAM.that''s true most of the time ... unless free memory gets *really* low, then Solaris *does* start to swap (ie page out pages by process). IIRC, the threshold for swapping is minfree (measured in pages), and the value that needs to fall below this threshold is freemem. HTH Michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''
Bob Friesenhahn
2008-Apr-20 05:35 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R
On Sat, 19 Apr 2008, michael schuster wrote:> that''s true most of the time ... unless free memory gets *really* low, then > Solaris *does* start to swap (ie page out pages by process). IIRC, the > threshold for swapping is minfree (measured in pages), and the value that > needs to fall below this threshold is freemem.Most people here are likely too young to know what "swapping" really is. Swapping is not the same as the paging that Solaris does. With swapping the kernel knows that this address region belongs to this process and we are short of RAM so block copy the process to the swap area, and only remember that it exists via the process table. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Pascal Vandeputte
2008-Apr-20 15:47 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (A
Hi, First of all, my apologies for some of my posts appearing 2 or even 3 times here, the forum seems to be acting up, and although I received a Java exception for those double postings and they never appeared yesterday, apparently they still made it through eventually. Back on topic: I fruitlessly tried to extract higher write speeds from the Seagate drives using an Addonics Silicon Image 3124 based SATA controller. I got exactly the same 21 MB/s for each drive (booted from a Knoppix cd). I was planning on contacting Seagate support about this, but in the mean time I absolutely had to start using this system, even if it meant low write speeds. So I installed Solaris on a 1GB CF card and wanted to start configuring ZFS. I noticed that the first SATA disk was still shown with a different label by the "format" command (see my other post somewhere here). I tried to get rid of all disk labels (unsuccessfully), so I decided to boot Knoppix again and zero out the start and end sectors manually (erasing all GPT data). Back to Solaris. I ran "zpool create tank raidz c1t0d0 c1t1d0 c1t2d0" and tried a dd while monitoring with iostat -xn 1 to see the effect of not having a slice as part of the zpool (write cache etc). I was seeing write speeds in excess of 50MB/s per drive! Whoa! I didn''t understand this at all, because 5 minutes earlier I couldn''t get more than 21MB/s in Linux using block sizes up to 1048576 bytes. How could this be? I decided to destroy the zpool and try to dd from Linux once more. This is when my jaw dropped to the floor: root at Knoppix:~# dd if=/dev/zero of=/dev/sda bs=4096 ^[250916+0 records in 250915+0 records out 1027747840 bytes (1.0 GB) copied, 10.0172 s, 103 MB/s Finally, the write speed one should expect from these drives, according to various reviews around the web. I still get a healthy 52MB/s at the end of the disk: # dd if=/dev/zero of=/dev/sda bs=4096 seek=183000000 dd: writing `/dev/sda'': No space left on device 143647+0 records in 143646+0 records out 588374016 bytes (588 MB) copied, 11.2223 s, 52.4 MB/s But how is it possible that I didn''t get these speeds earlier? This may be part of the explanation: root at Knoppix:~# dd if=/dev/zero of=/dev/sda bs=2048 101909+0 records in 101909+0 records out 208709632 bytes (209 MB) copied, 9.32228 s, 22.4 MB/s Could it be that the firmware in these drives has issues with write requests of 2048 bytes and smaller? There must be more to it though, because I''m absolutely sure that I used larger block sizes when testing with Linux earlier (like 16384, 65536 and 1048576). It''s impossible to tell, but maybe there was something fishy going on which was fixed by zero''ing parts of the drives. I absolutely cannot explain it otherwise. Anyway, I''m still not seeing much more than 50MB/s per drive from ZFS, but I suspect the 2048 VS 4096 byte write block size effect may be influencing this. Having a slice as part of the pool earlier perhaps magnified this behavior as well. Caching or swap problems are certainly no issues now. Any thoughts? I certainly want to thank everyone once more for your co-operation! Greetings, Pascal This message posted from opensolaris.org
Shaky
2008-Aug-13 15:53 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (A
Did you ever figure this out? I have the same hardware: Intel DG33TL motherboard with Intel gigabit nic and ICH9R but with Hitachi 1TB drives. I''m getting 2MB/s write speeds. I''ve tried the zeroing out trick. No luck. Network is fine. Disks are fine, the write at around 50MB/s when formatted with ext3 under Linux. This message posted from opensolaris.org
Shaky
2008-Aug-13 15:58 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (A
Oh, Jeff''s write script gives around 60MB/s IIRC. This message posted from opensolaris.org
Shaky
2008-Aug-13 16:05 UTC
[zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (A
I used BCwipe to zero the drives. How do you: boot Knoppix again and zero out the start and end sectors manually (erasing all GPT data) ?? thanks This message posted from opensolaris.org