Mohammed Naser
2012-Jan-31 17:52 UTC
[zfs-discuss] Bad performance (Seagate drive related?)
Hi list! I have seen less-than-stellar ZFS performance on a setup of one main head connected to a JBOD (using SAS, but drives are SATA). There are 16 drives (8 mirrors) in this pool but I''m getting 180ish MB sequential writes (using dd, I know it''s not precise, but those numbers should be higher). With some help on IRC, it seems that part of the reason I''m slowing down is some drives seem to be slower than the others. Initially, I had some drives running at 1.5 mode instead of 3.0 -- They are all running at 3.0 now. While running the following dd command, the output of iostat reflects a much higher %b which seems to say that those drives are slower (but could they really be slowing down everything else that much? --- Or am I looking at the wrong spot here?) -- The pool configuration is also included below dd if=/dev/zero of=4g bs=1M count=4000 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.0 0.0 8.0 0.0 0.0 0.0 0.0 0.2 0 0 c1 1.0 0.0 8.0 0.0 0.0 0.0 0.0 0.2 0 0 c1t2d0 8.0 3857.8 64.0 337868.8 0.0 64.5 0.0 16.7 0 704 c5 0.0 259.0 0.0 26386.2 0.0 3.6 0.0 14.0 0 37 c5t50014EE0ACE4AEEFd0 1.0 266.0 8.0 27139.2 0.0 3.6 0.0 13.5 0 37 c5t50014EE056EB0356d0 2.0 276.0 16.0 19315.1 0.0 3.7 0.0 13.3 0 40 c5t50014EE00239C976d0 0.0 279.0 0.0 19699.0 0.0 3.6 0.0 13.0 0 37 c5t50014EE0577C459Cd0 1.0 232.0 8.0 23061.9 0.0 3.6 0.0 15.4 0 37 c5t50014EE0578F60F5d0 0.0 227.0 0.0 22677.9 0.0 3.6 0.0 15.8 0 37 c5t50014EE0AC407BAEd0 0.0 205.0 0.0 24870.2 0.0 3.4 0.0 16.6 0 35 c5t50014EE0AC408605d0 0.0 205.0 0.0 24870.2 0.0 3.4 0.0 16.6 0 35 c5t50014EE056EB0B94d0 1.0 210.0 8.0 15954.2 0.0 4.4 0.0 20.8 0 68 c5t5000C50010C77647d0 0.0 212.0 0.0 16082.2 0.0 4.1 0.0 19.2 0 42 c5t5000C50010C865DEd0 0.0 207.0 0.0 20093.9 0.0 4.2 0.0 20.3 0 45 c5t5000C50010C77679d0 0.0 208.0 0.0 19689.5 0.0 4.1 0.0 19.8 0 44 c5t5000C50010C7672Dd0 0.0 259.0 0.0 14013.7 0.0 5.1 0.0 19.7 0 53 c5t5000C5000A11B600d0 2.0 320.0 16.0 19942.9 0.0 6.9 0.0 21.5 0 84 c5t5000C50008315CE5d0 1.0 259.0 8.0 23380.2 0.0 3.6 0.0 13.9 0 37 c5t50014EE001407113d0 0.0 234.0 0.0 20692.4 0.0 3.6 0.0 15.4 0 38 c5t50014EE00194FB1Bd0 pool: tank state: ONLINE scan: scrub canceled on Mon Jan 30 11:07:02 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c5t50014EE0ACE4AEEFd0 ONLINE 0 0 0 c5t50014EE056EB0356d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c5t50014EE00239C976d0 ONLINE 0 0 0 c5t50014EE0577C459Cd0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c5t50014EE0578F60F5d0 ONLINE 0 0 0 c5t50014EE0AC407BAEd0 ONLINE 0 0 0 mirror-4 ONLINE 0 0 0 c5t50014EE056EB0B94d0 ONLINE 0 0 0 c5t50014EE0AC408605d0 ONLINE 0 0 0 mirror-5 ONLINE 0 0 0 c5t5000C50010C77647d0 ONLINE 0 0 0 c5t5000C50010C865DEd0 ONLINE 0 0 0 mirror-6 ONLINE 0 0 0 c5t5000C50010C7672Dd0 ONLINE 0 0 0 c5t5000C50010C77679d0 ONLINE 0 0 0 mirror-7 ONLINE 0 0 0 c5t50014EE001407113d0 ONLINE 0 0 0 c5t50014EE00194FB1Bd0 ONLINE 0 0 0 mirror-8 ONLINE 0 0 0 c5t5000C50008315CE5d0 ONLINE 0 0 0 c5t5000C5000A11B600d0 ONLINE 0 0 0 cache c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 spares c5t5000C5000D46F13Dd0 AVAIL>From c5t5000C50010C77647d0 to c5t5000C50008315CE5d0 are the 6 Seagatedrives, they are 2 ST31000340AS and 4 ST31000340NS. The rest of the drives are all WD RE3 (WD1002FBYS). Could those Seagate''s really be slowing down the array that much or there is something else in here that I should be trying to look at? I did the same dd on the main OS pool (2 mirrors) and got 63MB/s .. times 8 mirrors should give me 504MBs reads? tl;dr: My tank of 8 mirrors is giving 180MB writes, how to fix?! -- Mohammed Naser???vexxhost
Nathan Kroenert
2012-Feb-05 05:27 UTC
[zfs-discuss] Bad performance (Seagate drive related?)
Hey there, Few things: - Using /dev/zero is not necessarily a great test. I typically use /dev/urandom to create an initial block-o-stuff - something like a gig or so worth, in /tmp, then use dd to push that to my zpool. (/dev/zero will return dramatically different results depending on pool/dataset settings for compression etc.) - Indeed - getting a total aggregate of 180MB/s seems pretty low on the face of it for the setup you have. What''s the controller you are using? Any details on the driver, backplane, expander, array or other you might be using? - Have you tried your dd on individual spindles? You might find that they behave differently - Does your controller have DRAM on it? Can you put it in passthrough mode rather than cache? - I have done some testing trying to find odd behaviour like this before, and found on different occasions a number of different things: - Drives: Things like the WD ''green'' drives getting in my way - Alignment for non-EFI labled disks (hm - maybe even on EFI... that one was a while ago) (particularly for 4K ''advanced format'' (ha!) disks) - The controller was unable to keep up. (In one case, I ended up tossing an HP P400 (IIRC) and using the on-motherboard chipset as it was considerably faster when running four disks - Disks with wildly different performance characteristics were also bad (eg: Enterprise SATA mixed with 5400 RPM disks. ;) I''d suggest that you spend a little time validating the basic assumptions around: - speed of individual disks, - speed of individual buses - Whether you are being limited by CPU (ie: If you have compression or dedupe turned on) (view with mpstat and friends) - I''ll also note that you are looking close to the number of IOPS I''d expect a consumer disk to supply assuming a somewhat random distribution of IOPS. - Consider that your 180MB/s is actually 360 (well - not quite - but it''s a lot more than 180). Remember - in a mirror, you literally need to write the data twice. 8.0 3857.8 64.0 337868.8 0.0 64.5 0.0 16.7 0 704 c5 (Note above is your c5 controller - running at around 337 MB/s) Incidentally - this seems awfully close to 3Gb/s... How did you say all of your external drives were attached? If I didn''t know better, I''d be asking serious questions about how many lanes of a SAS connection sata attached drives were able to use... Actually - I don''t know better, so I''d ask anyway... ;) I think this will likely go along way to helping understand where the holdup is. There is also a heap of great stuff on solarisinternals.com which I''d highly recommend taking a look at after you have validated the basics... Were this one of my systems, (and especially if it''s new, and you don''t love your data and can re-create the pool) I''d be tempted to do something like a very destructive... for i in <all your disks> do dd if=/tmp/randomdata.file.I.created.earlier of=/dev/rdsk/${i} & done and see how much you can stuff down the pipe. Remember - this will kill whatever is on the disks, do think twice before you do it. ;) If you can''t get at least 80-100MB/s on the outside of the platter, I''d suggest you should be looking at layers below ZFS. If you *can*, then you start looking further up the stack. Hope this helps somewhat. Let us know how you go. Cheers! Nathan. On 02/ 1/12 04:52 AM, Mohammed Naser wrote:> Hi list! > > I have seen less-than-stellar ZFS performance on a setup of one main > head connected to a JBOD (using SAS, but drives are SATA). There are > 16 drives (8 mirrors) in this pool but I''m getting 180ish MB > sequential writes (using dd, I know it''s not precise, but those > numbers should be higher). > > With some help on IRC, it seems that part of the reason I''m slowing > down is some drives seem to be slower than the others. Initially, I > had some drives running at 1.5 mode instead of 3.0 -- They are all > running at 3.0 now. While running the following dd command, the > output of iostat reflects a much higher %b which seems to say that > those drives are slower (but could they really be slowing down > everything else that much? --- Or am I looking at the wrong spot > here?) -- The pool configuration is also included below > > dd if=/dev/zero of=4g bs=1M count=4000 > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 1.0 0.0 8.0 0.0 0.0 0.0 0.0 0.2 0 0 c1 > 1.0 0.0 8.0 0.0 0.0 0.0 0.0 0.2 0 0 c1t2d0 > 8.0 3857.8 64.0 337868.8 0.0 64.5 0.0 16.7 0 704 c5 > 0.0 259.0 0.0 26386.2 0.0 3.6 0.0 14.0 0 37 > c5t50014EE0ACE4AEEFd0 > 1.0 266.0 8.0 27139.2 0.0 3.6 0.0 13.5 0 37 > c5t50014EE056EB0356d0 > 2.0 276.0 16.0 19315.1 0.0 3.7 0.0 13.3 0 40 > c5t50014EE00239C976d0 > 0.0 279.0 0.0 19699.0 0.0 3.6 0.0 13.0 0 37 > c5t50014EE0577C459Cd0 > 1.0 232.0 8.0 23061.9 0.0 3.6 0.0 15.4 0 37 > c5t50014EE0578F60F5d0 > 0.0 227.0 0.0 22677.9 0.0 3.6 0.0 15.8 0 37 > c5t50014EE0AC407BAEd0 > 0.0 205.0 0.0 24870.2 0.0 3.4 0.0 16.6 0 35 > c5t50014EE0AC408605d0 > 0.0 205.0 0.0 24870.2 0.0 3.4 0.0 16.6 0 35 > c5t50014EE056EB0B94d0 > 1.0 210.0 8.0 15954.2 0.0 4.4 0.0 20.8 0 68 > c5t5000C50010C77647d0 > 0.0 212.0 0.0 16082.2 0.0 4.1 0.0 19.2 0 42 > c5t5000C50010C865DEd0 > 0.0 207.0 0.0 20093.9 0.0 4.2 0.0 20.3 0 45 > c5t5000C50010C77679d0 > 0.0 208.0 0.0 19689.5 0.0 4.1 0.0 19.8 0 44 > c5t5000C50010C7672Dd0 > 0.0 259.0 0.0 14013.7 0.0 5.1 0.0 19.7 0 53 > c5t5000C5000A11B600d0 > 2.0 320.0 16.0 19942.9 0.0 6.9 0.0 21.5 0 84 > c5t5000C50008315CE5d0 > 1.0 259.0 8.0 23380.2 0.0 3.6 0.0 13.9 0 37 > c5t50014EE001407113d0 > 0.0 234.0 0.0 20692.4 0.0 3.6 0.0 15.4 0 38 > c5t50014EE00194FB1Bd0 > > pool: tank > state: ONLINE > scan: scrub canceled on Mon Jan 30 11:07:02 2012 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c5t50014EE0ACE4AEEFd0 ONLINE 0 0 0 > c5t50014EE056EB0356d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c5t50014EE00239C976d0 ONLINE 0 0 0 > c5t50014EE0577C459Cd0 ONLINE 0 0 0 > mirror-3 ONLINE 0 0 0 > c5t50014EE0578F60F5d0 ONLINE 0 0 0 > c5t50014EE0AC407BAEd0 ONLINE 0 0 0 > mirror-4 ONLINE 0 0 0 > c5t50014EE056EB0B94d0 ONLINE 0 0 0 > c5t50014EE0AC408605d0 ONLINE 0 0 0 > mirror-5 ONLINE 0 0 0 > c5t5000C50010C77647d0 ONLINE 0 0 0 > c5t5000C50010C865DEd0 ONLINE 0 0 0 > mirror-6 ONLINE 0 0 0 > c5t5000C50010C7672Dd0 ONLINE 0 0 0 > c5t5000C50010C77679d0 ONLINE 0 0 0 > mirror-7 ONLINE 0 0 0 > c5t50014EE001407113d0 ONLINE 0 0 0 > c5t50014EE00194FB1Bd0 ONLINE 0 0 0 > mirror-8 ONLINE 0 0 0 > c5t5000C50008315CE5d0 ONLINE 0 0 0 > c5t5000C5000A11B600d0 ONLINE 0 0 0 > cache > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > spares > c5t5000C5000D46F13Dd0 AVAIL > > From c5t5000C50010C77647d0 to c5t5000C50008315CE5d0 are the 6 Seagate > drives, they are 2 ST31000340AS and 4 ST31000340NS. The rest of the > drives are all WD RE3 (WD1002FBYS). > > Could those Seagate''s really be slowing down the array that much or > there is something else in here that I should be trying to look at? I > did the same dd on the main OS pool (2 mirrors) and got 63MB/s .. > times 8 mirrors should give me 504MBs reads? > > tl;dr: My tank of 8 mirrors is giving 180MB writes, how to fix?! > > -- > Mohammed Naser ? vexxhost > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss