I figure this group will know better than any other I have contact with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun badged Seagate ST31000N in a J4400) ? I have a resilver running and am seeing about 700-800 writes/sec. on the hot spare as it resilvers. There is no other I/O activity on this box, as this is a remote replication target for production data. I have a the replication disabled until the resilver completes. Solaris 10U9 zpool version 22 Server is a T2000 -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On 01 June, 2011 - Paul Kraus sent me these 0,9K bytes:> I figure this group will know better than any other I have contact > with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun > badged Seagate ST31000N in a J4400) ? I have a resilver running and am > seeing about 700-800 writes/sec. on the hot spare as it resilvers. > There is no other I/O activity on this box, as this is a remote > replication target for production data. I have a the replication > disabled until the resilver completes.700-800 seq ones perhaps.. for random, you can divide by 10. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
> I have a resilver running and am > seeing about 700-800 writes/sec. on the hot spare as it resilvers.IIRC resilver works in block birth order (write order) which is commonly more-or-less sequential unless the fs is fragmented. So it might or might not be. I think you cannot get that kind of performance for a fully random load, more like 100 IOPS or so. -- - Tuomas
On Wed, Jun 1, 2011 at 1:16 PM, Tuomas Leikola <tuomas.leikola at gmail.com> wrote:>> I have a resilver running and am >> seeing about 700-800 writes/sec. on the hot spare as it resilvers. > > IIRC resilver works in block birth order (write order) which is > commonly more-or-less sequential unless the fs is fragmented. So it > might or might not be. I think you cannot get that kind of performance > for a fully random load, more like 100 IOPS or so.Since this zpool only receives zfs send streams from the far end, I would expect the data to be relatively sequential (minus the holes from deleted snapshots). -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:> I figure this group will know better than any other I have contact > with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun > badged Seagate ST31000N in a J4400) ? I have a resilver running and am > seeing about 700-800 writes/sec. on the hot spare as it resilvers. > There is no other I/O activity on this box, as this is a remote > replication target for production data. I have a the replication > disabled until the resilver completes. > > Solaris 10U9 > zpool version 22 > Server is a T2000 >Here''s how you calculate (average) how long a random IOPs takes: seek time + ((60 / RPMs) / 2))] A truly sequential IOPs is: (60 / RPMs) / 2) For that series of drives, seek time averages 8.5ms (per Seagate). So, you get 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 IOPS 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS. Note that due to averaging, the above numbers may be slightly higher or lower for any actual workload. In your case, since ZFS does write aggregation (turning multiple write requests into a single larger one), you might see what appears to be more than the above number from something like ''iostat'', which is measuring not the *actual* writes to physical disk, but the *requested* write operations. -- Erik Trimble Java System Support Mailstop: usca22-317 Phone: x67195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
On Wed, Jun 1, 2011 at 9:17 PM, Erik Trimble <erik.trimble at oracle.com> wrote:> Here''s how you calculate (average) how long a random IOPs takes: > > seek time + ((60 / RPMs) / 2))] > > A truly sequential IOPs is: > > (60 / RPMs) / 2) > > For that series of drives, seek time averages 8.5ms (per Seagate). > > So, you get > > 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 > IOPS > > 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.Thank you. I had found the seek specification, but did not know how to covert it to anything approaching a useful I/Ops limit.> Note that due to averaging, the above numbers may be slightly higher or > lower for any actual workload.> In your case, since ZFS does write aggregation (turning multiple write > requests into a single larger one), you might see what appears to be > more than the above number from something like ''iostat'', which is > measuring not the *actual* writes to physical disk, but the *requested* > write operations.Hurmmm, I don''t think that really explains what I am seeing. iostat output for the two drives that are resilvering (yes, we had a second failure before Oracle could get us a replacement drive, the hoops first line support is making us hop through is amazing, in a bad way): iostat -xn c6t5000C5001A452C72d0 c6t5000C5001A406415d0 1 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 359.7 0.3 1181.1 0.0 1.8 0.0 5.1 0 28 c6t5000C5001A406415d0 0.1 573.3 6.2 1846.8 0.0 3.0 0.0 5.2 0 45 c6t5000C5001A452C72d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 629.3 0.0 1859.7 0.0 3.0 0.0 4.7 0 53 c6t5000C5001A406415d0 0.0 581.1 0.0 1780.8 0.0 2.8 0.0 4.9 0 48 c6t5000C5001A452C72d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 855.0 0.0 3595.7 0.0 4.9 0.0 5.7 0 70 c6t5000C5001A406415d0 0.0 785.9 0.0 3487.1 0.0 5.2 0.0 6.7 0 70 c6t5000C5001A452C72d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 842.3 0.0 2709.8 0.0 4.2 0.0 5.0 0 71 c6t5000C5001A406415d0 0.0 811.3 0.0 2607.3 0.0 4.1 0.0 5.0 0 68 c6t5000C5001A452C72d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 567.0 0.0 1946.0 0.0 2.8 0.0 4.9 0 48 c6t5000C5001A406415d0 0.0 549.0 0.0 1897.0 0.0 2.7 0.0 4.9 0 48 c6t5000C5001A452C72d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 803.8 0.0 2860.6 0.0 4.7 0.0 5.8 0 72 c6t5000C5001A406415d0 0.0 798.8 0.0 2756.4 0.0 4.3 0.0 5.4 0 70 c6t5000C5001A452C72d0 and the zpool configuration:> zpool statuspool: zpool-53 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver in progress for 16h29m, 19.17% done, 69h28m to go config: NAME STATE READ WRITE CKSUM alb-ed-01 DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c6t5000C5001A67E217d0 ONLINE 0 0 0 c6t5000C5001A67AF9Dd0 ONLINE 0 0 0 c6t5000C5001A67AADBd0 ONLINE 0 0 0 c6t5000C5001A67A539d0 ONLINE 0 0 0 c6t5000C5001A67A099d0 ONLINE 0 0 0 c6t5000C5001A679F0Dd0 ONLINE 0 0 0 c6t5000C5001A679C5Dd0 ONLINE 0 0 0 c6t5000C5001A679B46d0 ONLINE 0 0 0 c6t5000C5001A679A09d0 ONLINE 0 0 0 c6t5000C5001A67104Ed0 ONLINE 0 0 0 c6t5000C5001A670DBEd0 ONLINE 0 0 0 c6t5000C5001A66E3DAd0 ONLINE 0 0 0 c6t5000C5001A66411Ad0 ONLINE 0 0 0 c6t5000C5001A663D19d0 ONLINE 0 0 0 c6t5000C5001A663783d0 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 c6t5000C5001A663474d0 ONLINE 0 0 0 c6t5000C5001A65EF79d0 ONLINE 0 0 0 c6t5000C5001A65D7C0d0 ONLINE 0 0 0 c6t5000C5001A65D50Ed0 ONLINE 0 0 0 c6t5000C5001A65D000d0 ONLINE 0 0 0 c6t5000C5001A65BBD8d0 ONLINE 0 0 0 c6t5000C5001A65BA57d0 ONLINE 0 0 0 c6t5000C5001A65A0E5d0 ONLINE 0 0 0 c6t5000C5001A659F98d0 ONLINE 0 0 0 c6t5000C5001A659D5Bd0 ONLINE 0 0 0 c6t5000C5001A658E6Bd0 ONLINE 0 0 0 c6t5000C5001A657CDCd0 ONLINE 0 0 0 c6t5000C5001A657CBBd0 ONLINE 0 0 0 c6t5000C5001A6573B0d0 ONLINE 0 0 0 c6t5000C5001A6572E8d0 ONLINE 0 0 0 raidz2-2 DEGRADED 0 0 0 c6t5000C5001A656F98d0 ONLINE 0 0 0 c6t5000C5001A655093d0 ONLINE 0 0 0 spare-2 DEGRADED 0 0 2 4335891315817043040 UNAVAIL 0 0 0 was /dev/dsk/c6t5000C5001A654417d0s0 c6t5000C5001A452C72d0 ONLINE 0 0 0 90.9G resilvered c6t5000C5001A653C3Dd0 ONLINE 0 0 0 c6t5000C5001A652ABEd0 ONLINE 0 0 0 c6t5000C5001A652A1Ed0 ONLINE 0 0 0 c6t5000C5001A64D042d0 ONLINE 0 0 0 c6t5000C5001A5C1D37d0 ONLINE 0 0 0 c6t5000C5001A59722Fd0 ONLINE 0 0 0 replacing-9 DEGRADED 0 0 195 c6t5000C5001A59595Ed0 FAULTED 232 437 0 too many errors c6t5000C5001A406415d0 ONLINE 0 0 0 90.8G resilvered c6t5000C5001A57F715d0 ONLINE 0 0 0 c6t5000C5001A54DB0Cd0 ONLINE 0 0 0 c6t5000C5001A4836CDd0 ONLINE 0 0 0 c6t5000C5001A4737D8d0 ONLINE 0 0 0 c6t5000C5001A455E70d0 ONLINE 0 0 0 spares c6t5000C5001A452C72d0 INUSE currently in use c6t5000C5001A42D792d0 AVAIL errors: No known data errors I know that a 15 disk drive raidz2 is not the best receipe for performance, but we needed the capacity and this is for storing the remote backup copy of the data, so most I/O is sequential (zfs recv). I am trying to understand what I am seeing and relate it to real world activity (time to resilver a failed drive, for example). -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Erik Trimble > > Here''s how you calculate (average) how long a random IOPs takes: > > seek time + ((60 / RPMs) / 2))] > > 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 > IOPSWhile this is true, all drives nowadays use things like PIO command queueing, and other hardware optimization techniques. So even when you instruct the drive to do a bunch of random IO, the drive will make it less random in the controller before it instructs the arm to move about and so on. Generally speaking, these techniques will approx double the random IOPS, because with a random distribution of IO requests, on average it will be able to halve the randomness. Consider your nit picked. ;-)
On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote:> On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:> Here''s how you calculate (average) how long a random IOPs takes: > seek time + ((60 / RPMs) / 2))] > > A truly sequential IOPs is: > (60 / RPMs) / 2) > > For that series of drives, seek time averages 8.5ms (per Seagate). > So, you get > > 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 > IOPS > 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS. > > Note that due to averaging, the above numbers may be slightly higher or > lower for any actual workload.Nahh, shouldn''t it read "numbers may be _significant_ higher or lower" ...? ;-) Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768
On 6/2/2011 5:12 PM, Jens Elkner wrote:> On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote: >> On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote: > >> Here''s how you calculate (average) how long a random IOPs takes: >> seek time + ((60 / RPMs) / 2))] >> >> A truly sequential IOPs is: >> (60 / RPMs) / 2) >> >> For that series of drives, seek time averages 8.5ms (per Seagate). >> So, you get >> >> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 >> IOPS >> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS. >> >> Note that due to averaging, the above numbers may be slightly higher or >> lower for any actual workload. > Nahh, shouldn''t it read "numbers may be _significant_ higher or lower" > ...? ;-) > > Regards, > jel.Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn''t going to be able to do more than 200 under ideal conditions, and should be able to manage 50 under anything other than the pedantically worst-case situation. That''s only about a 50% deviation, not like an order of magnitude or so. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Thu, Jun 2, 2011 at 11:49 PM, Erik Trimble <erik.trimble at oracle.com> wrote:> On 6/2/2011 5:12 PM, Jens Elkner wrote: >> >> On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote: >>> >>> On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote: >> >>> Here''s how you calculate (average) how long a random IOPs takes: >>> seek time + ((60 / RPMs) / 2))] >>> >>> A truly sequential IOPs is: >>> (60 / RPMs) / 2) >>> >>> For that series of drives, seek time averages 8.5ms (per Seagate). >>> So, you get >>> >>> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78 >>> IOPS >>> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS. >>> >>> Note that due to averaging, the above numbers may be slightly higher or >>> lower for any actual workload. >> >> Nahh, shouldn''t it read "numbers may be _significant_ higher or lower" >> ...? ;-) >> >> Regards, >> jel. > > Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn''t going to be > able to do more than 200 under ideal conditions, and should be able to > manage 50 under anything other than the pedantically worst-case situation. > That''s only about a 50% deviation, not like an order of magnitude or so.So is there a way to read these real I/Ops numbers ? iostat is reporting 600-800 I/Ops peak (1 second sample) for these 7200 RPM SATA drives. If the drives are doing aggregation, then how to tell what is really going on ? -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Fri, Jun 3, 2011 at 11:22 AM, Paul Kraus <paul at kraus-haus.org> wrote:> So is there a way to read these real I/Ops numbers ? > > iostat is reporting 600-800 I/Ops peak (1 second sample) for these > 7200 RPM SATA drives. If the drives are doing aggregation, then how to > tell what is really going on ?I''ve always assumed that crazy high IOPS numbers on 7.2k drives means I''m seeing the individual drive caches absorbing those writes. That''s the first place those writes will "land" when coming in from the disk controller. As other posters have said, after that the drive may internally reorder and/or aggregate those writes before sending them to the platter. Eric
On Thu, Jun 2 at 20:49, Erik Trimble wrote:>Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn''t >going to be able to do more than 200 under ideal conditions, and >should be able to manage 50 under anything other than the >pedantically worst-case situation. That''s only about a 50% deviation, >not like an order of magnitude or so.Most cache-enabled 7200RPM drives can do 20K+ sequential IOPS at small block sizes, up close to their peak transfer rate. For random IO, I typically see 80 IOPS for unqueued reads, 120 for queued reads/writes with cache disabled, and maybe 150-200 for cache enabled writes. The above are all full-stroke, so the average seek is 1/3 stroke (unqueued). On a smaller data set where the drive dwarfs the data set, average seek distance is much shorter and the resulting IOPS can be quite a bit higher. --eric -- Eric D. Mudama edmudama at bounceswoosh.org