hi all I just setup this test box on OI. It has a couple of X25Ms, 80GB and eight 2TB drives, two of them Hitachi Deskstar 7k2 drives and the other six WD Green. I have done some tests on this with mirrors to compare the performance and those tests conclude that the Hitachi drives are 25% or so faster. Now, installing a Bacula storage agent on the OI box, I see from iostat -xd that it looks like the WD Green drives are running in circles around the Deskstar drives. I really can''t beleive why. Anyone here that has an idea of why this should be happening? See below for iostat output. 2 second snap extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b cmdk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 cmdk1 0.0 163.6 0.0 20603.7 1.6 0.5 12.9 24 24 fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 0.5 140.3 0.3 2426.3 0.0 1.0 7.2 0 14 sd2 0.0 138.3 0.0 2476.3 0.0 1.5 10.6 0 18 sd3 0.0 303.9 0.0 2633.8 0.0 0.4 1.3 0 7 sd4 0.5 306.9 0.3 2555.8 0.0 0.4 1.2 0 7 sd5 1.0 308.5 0.5 2579.7 0.0 0.3 1.0 0 7 sd6 1.0 304.9 0.5 2352.1 0.0 0.3 1.1 1 7 sd7 1.0 298.9 0.5 2764.5 0.0 0.6 2.0 0 13 sd8 1.0 304.9 0.5 2400.8 0.0 0.3 0.9 0 6 iostat -xd (with collected stats) extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b cmdk0 0.4 1.2 24.2 21.4 0.0 0.0 1.9 0 0 cmdk1 0.1 5.7 3.7 709.6 0.1 0.0 12.5 1 1 fd0 0.0 0.0 0.0 0.0 0.0 0.0 982.8 0 0 sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd1 0.5 6.5 46.8 248.2 0.0 0.0 6.0 0 1 sd2 0.5 6.5 46.5 248.2 0.0 0.0 6.0 0 1 sd3 0.5 9.5 46.6 248.0 0.0 0.0 2.5 0 1 sd4 0.5 9.5 46.6 248.0 0.0 0.0 2.5 0 1 sd5 0.5 9.5 46.6 248.0 0.0 0.0 2.6 0 1 sd6 0.5 9.5 46.5 248.0 0.0 0.0 2.5 0 1 sd7 0.5 9.5 46.6 248.0 0.0 0.0 2.6 0 1 sd8 0.5 9.5 46.6 248.0 0.0 0.0 2.5 0 1 Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Is this a sector size issue? I see two of the disks each doing the same amount of work in roughly half the I/O operations each operation taking about twice the time compared to each of the remaining six drives. I know nothing about either drive, but I wonder if one type of drive has twice the sector size of the other? -- This message posted from opensolaris.org
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk > > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd1 0.5 140.3 0.3 2426.3 0.0 1.0 7.2 0 14 > sd2 0.0 138.3 0.0 2476.3 0.0 1.5 10.6 0 18 > sd3 0.0 303.9 0.0 2633.8 0.0 0.4 1.3 0 7 > sd4 0.5 306.9 0.3 2555.8 0.0 0.4 1.2 0 7 > sd5 1.0 308.5 0.5 2579.7 0.0 0.3 1.0 0 7 > sd6 1.0 304.9 0.5 2352.1 0.0 0.3 1.1 1 7 > sd7 1.0 298.9 0.5 2764.5 0.0 0.6 2.0 0 13 > sd8 1.0 304.9 0.5 2400.8 0.0 0.3 0.9 0 6Unless I''m misunderstanding this output... It looks like all disks are doing approx the same data throughput. It looks like sd1 & sd2 are doing half the IOPS. So sd1 & sd2 must be doing larger chunks. How are these drives configured? One vdev of raidz2? No cache/log devices, etc... It would be easy to explain, if you''re striping mirrors. Difficult (at least for me) to explain if you''re using raidzN.
> > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd1 0.5 140.3 0.3 2426.3 0.0 1.0 7.2 0 14 > > sd2 0.0 138.3 0.0 2476.3 0.0 1.5 10.6 0 18 > > sd3 0.0 303.9 0.0 2633.8 0.0 0.4 1.3 0 7 > > sd4 0.5 306.9 0.3 2555.8 0.0 0.4 1.2 0 7 > > sd5 1.0 308.5 0.5 2579.7 0.0 0.3 1.0 0 7 > > sd6 1.0 304.9 0.5 2352.1 0.0 0.3 1.1 1 7 > > sd7 1.0 298.9 0.5 2764.5 0.0 0.6 2.0 0 13 > > sd8 1.0 304.9 0.5 2400.8 0.0 0.3 0.9 0 6 > > Unless I''m misunderstanding this output... > It looks like all disks are doing approx the same data throughput. > It looks like sd1 & sd2 are doing half the IOPS. > > So sd1 & sd2 must be doing larger chunks. How are these drives > configured? One vdev of raidz2? No cache/log devices, etc... > > It would be easy to explain, if you''re striping mirrors. Difficult (at > least for me) to explain if you''re using raidzN.It''s a raidz2 pool with eight drives, the first two are Hitachi 7k2 deskstar drives, the other 6 are WD Green drives. They are all 2TB and there is a separate device used for L2ARC, an 80GB x25m. roy at mime:/home/roy$ /usr/sbin/zpool status pool: mimedata state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mimedata ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 cache c7d0 ONLINE 0 0 0 errors: No known data errors -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Hi, the question is which WD Green drives you are using. WDxxEADS or WDxxEARS. The WDxxEARS have a 4k physical sector size instead of 512B. You need some special trickery to get the max performance out of them, probably even more so in a raidz configuration. See http://www.solarismen.de/archives/4-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-1.html and the following parts. Regards, Christian -- This message posted from opensolaris.org
I have both EVDS and EARS 2TB green drive. And I have to say they are not good to build storage servers. EVDS has compatibility issue with my supermicro appliance. it will hang when doing huge data send or copy. from IOSTAT I can see the data throughput is stuck on green disks with extremely high wait time. for EARS drives, they are ok running opensolaris but veryvery poor performance handling small files. I''m doing SVN via NFS share and it takes triple time CO a repository compare to NEtapp FAS960. but the issue could be resolved by adding SSD as log device. notice that, green disk seek time is 15ms and normal 7200RPM disk is around 8.5ms. I''ll try the link provided by Christian to see if it can help the performance. anyway, I''ve decided to use seagate 7200.11 which is big enough and fast. -- This message posted from opensolaris.org
>I have both EVDS and EARS 2TB green drive. And I have to say they are >not good to build storage servers.I think both have native 4K sectors; as such, they balk or perform slowly when a smaller I/O or an unaligned IOP hits them. How are they formatted? Specifically, solaris slices must be aligned on a 4K boundary or performance will stink. Casper
Roy Sigurd Karlsbakk wrote:> device r/s w/s kr/s kw/s wait actv svc_t %w %b > cmdk0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > cmdk1 0.0 163.6 0.0 20603.7 1.6 0.5 12.9 24 24 > fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd1 0.5 140.3 0.3 2426.3 0.0 1.0 7.2 0 14 > sd2 0.0 138.3 0.0 2476.3 0.0 1.5 10.6 0 18 > sd3 0.0 303.9 0.0 2633.8 0.0 0.4 1.3 0 7 > sd4 0.5 306.9 0.3 2555.8 0.0 0.4 1.2 0 7 > sd5 1.0 308.5 0.5 2579.7 0.0 0.3 1.0 0 7 > sd6 1.0 304.9 0.5 2352.1 0.0 0.3 1.1 1 7 > sd7 1.0 298.9 0.5 2764.5 0.0 0.6 2.0 0 13 > sd8 1.0 304.9 0.5 2400.8 0.0 0.3 0.9 0 6Something is going on with how these writes are ganged together. The first two drives average 17KB per write and the other six 8.7KB per write. The aggregate statistics listed show less of a disparity, but one still exists. I have to wonder if there is some "max transfer length" type of setting on each drive which is different, allowing the Hitachi drives to allow larger transfers, resulting in fewer I/O operations, each having a longer service time. Just to avoid confusion, the svc_t field it "service time" and not "seek time." The service time is the total time to service a request, including seek time, controller overhead, time for the data to transit the SATA bus and time to write the data. If the requests are larger, all else being equal, the service time will ALWAYS be higher, but that does NOT imply the drive is slower. On the contrary, it often implies a faster drive which can service more data per request. At any rate, there is a reason that the Hitachi drives are handling larger requests than the WD drives. I glanced at the code for a while but could not figure out where the max transfer size is determined or used. -- This message posted from opensolaris.org
Regarding vdevs and mixing WD Green drives with other drives, you might find it interesting that WD itself does not recommend them for ''business critical'' RAID use - this quoted from the WD20EARS page here (http://www.wdc.com/en/products/Products.asp?DriveID=773): <i> Desktop / Consumer RAID Environments - WD Caviar Green Hard Drives are tested and recommended for use in consumer-type RAID applications (i.e., Intel Matrix RAID technology).* *Business Critical RAID Environments ? WD Caviar Green Hard Drives are not recommended for and are not warranted for use in RAID environments utilizing Enterprise HBAs and/or expanders and in multi-bay chassis, as they are not designed for, nor tested in, these specific types of RAID applications. For all Business Critical RAID applications, please consider WD?s Enterprise Hard Drives that are specifically designed with RAID-specific, time-limited error recovery (TLER), are tested extensively in 24x7 RAID applications, and include features like enhanced RAFF technology and thermal extended burn-in testing. </i> Further reading: http://breden.org.uk/2009/05/01/home-fileserver-a-year-in-zfs/#drives http://opensolaris.org/jive/thread.jspa?threadID=121871&tstart=0 http://jmlittle.blogspot.com/2010/03/wd-caviar-green-drives-and-zfs.html (mixing WD Green & Hitachi) -- This message posted from opensolaris.org
----- Original Message -----> Regarding vdevs and mixing WD Green drives with other drives, you > might find it interesting that WD itself does not recommend them for > ''business critical'' RAID use - this quoted from the WD20EARS page here > (http://www.wdc.com/en/products/Products.asp?DriveID=773):With TLER easily enabled on WD Black drives, I guess that''s where we''ll go. Anyway - this is a test box used for its purpose (testing), but it''s still interesting to see these differences. Thanks for your input! Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
>>>>> "sb" == Simon Breden <sbreden at gmail.com> writes:sb> WD itself does not recommend them for ''business critical'' RAID sb> use The described problems with WD aren''t okay for non-critical development/backup/home use either. The statement from WD is nothing but an attempt to upsell you, to differentiate the market so they can tap into the demand curve at multiple points, and to overload you with information so the question becomes ``which WD drive should I buy'''' instead of ``which manufactuer''s drive should I buy.'''' Don''t let this stuff get a foothold inside your brain. ``mixing'''' drives within a stripe is a good idea because it protects you from bad batches and bad models/firmwares, which are not rare in recent experience! I always mix drives and included WD in that mix up until this latest rash of problems. ``mixing'''' is only bad (for WD) because it makes it easier for you, the customer, to characterize the green performance deficit and notice the firmware bugs that are unique to the WD drives. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100928/67ca55c9/attachment.bin>
IIRC the currently available WD Caviar Black models no longer enable TLER to be set. For WD drives, to have TLER capability you will need to buy their enterprise models like REx models which cost mucho $$$. -- This message posted from opensolaris.org
> described problems with WD aren''t okay for > non-critical > evelopment/backup/home use either.Indeed. I don''t use WD drives for RAID any longer.> The statement > from WD is nothing > but an attempt to upsell you, to differentiate the > market so they can > tap into the demand curve at multiple points........Yes, I''m quite aware of this.> Don''t let this > stuff get a foothold inside your brain.Ok, thanks, I''ll try to ensure that never happens :P> ``mixing'''' drives within a stripe is a good idea > because it protects > you from bad batches and bad models/firmwares, which > are not rare in > recent experience!Yep, that''s one way, although you also multiply the risk of at least one type of drive being a lemon. Another is to research good drives & firmwares and stick with those. Twice out of two drive choosing/buying occasions, this latter choice has served me well. Zero read/write/checksum errors so far in almost 3 years. I must be lucky, very lucky :)> I always mix drives and included > WD in that mix up > until this latest rash of problems.I avoided WD (for RAID) as soon as these problems showed up and bought another manufacturer''s drives. I still buy their Caviar Black drives as scratch video editing drives though, as they''re pretty good. -- This message posted from opensolaris.org