thr3ads.net - zfs discuss - [zfs-discuss] SATA disk perf question [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Paul Kraus

2011-Jun-01 16:54 UTC

[zfs-discuss] SATA disk perf question

I figure this group will know better than any other I have contact
with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun
badged Seagate ST31000N in a J4400) ? I have a resilver running and am
seeing about 700-800 writes/sec. on the hot spare as it resilvers.
There is no other I/O activity on this box, as this is a remote
replication target for production data. I have a the replication
disabled until the resilver completes.

Solaris 10U9
zpool version 22
Server is a T2000

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Tomas Ögren

2011-Jun-01 17:15 UTC

head link

[zfs-discuss] SATA disk perf question

On 01 June, 2011 - Paul Kraus sent me these 0,9K bytes:
>     I figure this group will know better than any other I have contact
> with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun
> badged Seagate ST31000N in a J4400) ? I have a resilver running and am
> seeing about 700-800 writes/sec. on the hot spare as it resilvers.
> There is no other I/O activity on this box, as this is a remote
> replication target for production data. I have a the replication
> disabled until the resilver completes.
700-800 seq ones perhaps.. for random, you can divide by 10.

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Tuomas Leikola

2011-Jun-01 17:16 UTC

head link

[zfs-discuss] SATA disk perf question

> I have a resilver running and am
> seeing about 700-800 writes/sec. on the hot spare as it resilvers.
IIRC resilver works in block birth order (write order) which is
commonly more-or-less sequential unless the fs is fragmented. So it
might or might not be. I think you cannot get that kind of performance
for a fully random load, more like 100 IOPS or so.

-- 
- Tuomas

Paul Kraus

2011-Jun-01 17:53 UTC

head link

[zfs-discuss] SATA disk perf question

On Wed, Jun 1, 2011 at 1:16 PM, Tuomas Leikola <tuomas.leikola at
gmail.com> wrote:>> I have a resilver running and am
>> seeing about 700-800 writes/sec. on the hot spare as it resilvers.
>
> IIRC resilver works in block birth order (write order) which is
> commonly more-or-less sequential unless the fs is fragmented. So it
> might or might not be. I think you cannot get that kind of performance
> for a fully random load, more like 100 IOPS or so.
    Since this zpool only receives zfs send streams from the far end,
I would expect the data to be relatively sequential (minus the holes
from deleted snapshots).

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Erik Trimble

2011-Jun-02 01:17 UTC

head link

[zfs-discuss] SATA disk perf question

On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:> I figure this group will know better than any other I have contact
> with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun
> badged Seagate ST31000N in a J4400) ? I have a resilver running and am
> seeing about 700-800 writes/sec. on the hot spare as it resilvers.
> There is no other I/O activity on this box, as this is a remote
> replication target for production data. I have a the replication
> disabled until the resilver completes.
> 
> Solaris 10U9
> zpool version 22
> Server is a T2000
> 
Here''s how you calculate (average) how long a random IOPs takes:

seek time + ((60 / RPMs) / 2))]

A truly sequential IOPs is:

(60 / RPMs) / 2)

For that series of drives, seek time averages 8.5ms (per Seagate).

So, you get 

1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
IOPS

1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.

Note that due to averaging, the above numbers may be slightly higher or
lower for any actual workload.

In your case, since ZFS does write aggregation (turning multiple write
requests into a single larger one), you might see what appears to be
more than the above number from something like ''iostat'', which
is
measuring not the *actual* writes to physical disk, but the *requested*
write operations.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Paul Kraus

2011-Jun-02 12:04 UTC

head link

[zfs-discuss] SATA disk perf question

On Wed, Jun 1, 2011 at 9:17 PM, Erik Trimble <erik.trimble at oracle.com>
wrote:
> Here''s how you calculate (average) how long a random IOPs takes:
>
> seek time + ((60 / RPMs) / 2))]
>
> A truly sequential IOPs is:
>
> (60 / RPMs) / 2)
>
> For that series of drives, seek time averages 8.5ms (per Seagate).
>
> So, you get
>
> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
> IOPS
>
> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.
Thank you. I had found the seek specification, but did not know how to
covert it to anything approaching a useful I/Ops limit.
> Note that due to averaging, the above numbers may be slightly higher or
> lower for any actual workload.
> In your case, since ZFS does write aggregation (turning multiple write
> requests into a single larger one), you might see what appears to be
> more than the above number from something like ''iostat'',
which is
> measuring not the *actual* writes to physical disk, but the *requested*
> write operations.
Hurmmm, I don''t think that really explains what I am seeing. iostat
output for the two drives that are resilvering (yes, we had a second
failure before Oracle could get us a replacement drive, the hoops
first line support is making us hop through is amazing, in a bad way):

iostat -xn c6t5000C5001A452C72d0 c6t5000C5001A406415d0 1
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  359.7    0.3 1181.1  0.0  1.8    0.0    5.1   0  28
c6t5000C5001A406415d0
    0.1  573.3    6.2 1846.8  0.0  3.0    0.0    5.2   0  45
c6t5000C5001A452C72d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  629.3    0.0 1859.7  0.0  3.0    0.0    4.7   0  53
c6t5000C5001A406415d0
    0.0  581.1    0.0 1780.8  0.0  2.8    0.0    4.9   0  48
c6t5000C5001A452C72d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  855.0    0.0 3595.7  0.0  4.9    0.0    5.7   0  70
c6t5000C5001A406415d0
    0.0  785.9    0.0 3487.1  0.0  5.2    0.0    6.7   0  70
c6t5000C5001A452C72d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  842.3    0.0 2709.8  0.0  4.2    0.0    5.0   0  71
c6t5000C5001A406415d0
    0.0  811.3    0.0 2607.3  0.0  4.1    0.0    5.0   0  68
c6t5000C5001A452C72d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  567.0    0.0 1946.0  0.0  2.8    0.0    4.9   0  48
c6t5000C5001A406415d0
    0.0  549.0    0.0 1897.0  0.0  2.7    0.0    4.9   0  48
c6t5000C5001A452C72d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  803.8    0.0 2860.6  0.0  4.7    0.0    5.8   0  72
c6t5000C5001A406415d0
    0.0  798.8    0.0 2756.4  0.0  4.3    0.0    5.4   0  70
c6t5000C5001A452C72d0

and the zpool configuration:
> zpool status  pool: zpool-53
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver in progress for 16h29m, 19.17% done, 69h28m to go
config:

        NAME                         STATE     READ WRITE CKSUM
        alb-ed-01                    DEGRADED     0     0     0
          raidz2-0                   ONLINE       0     0     0
            c6t5000C5001A67E217d0    ONLINE       0     0     0
            c6t5000C5001A67AF9Dd0    ONLINE       0     0     0
            c6t5000C5001A67AADBd0    ONLINE       0     0     0
            c6t5000C5001A67A539d0    ONLINE       0     0     0
            c6t5000C5001A67A099d0    ONLINE       0     0     0
            c6t5000C5001A679F0Dd0    ONLINE       0     0     0
            c6t5000C5001A679C5Dd0    ONLINE       0     0     0
            c6t5000C5001A679B46d0    ONLINE       0     0     0
            c6t5000C5001A679A09d0    ONLINE       0     0     0
            c6t5000C5001A67104Ed0    ONLINE       0     0     0
            c6t5000C5001A670DBEd0    ONLINE       0     0     0
            c6t5000C5001A66E3DAd0    ONLINE       0     0     0
            c6t5000C5001A66411Ad0    ONLINE       0     0     0
            c6t5000C5001A663D19d0    ONLINE       0     0     0
            c6t5000C5001A663783d0    ONLINE       0     0     0
          raidz2-1                   ONLINE       0     0     0
            c6t5000C5001A663474d0    ONLINE       0     0     0
            c6t5000C5001A65EF79d0    ONLINE       0     0     0
            c6t5000C5001A65D7C0d0    ONLINE       0     0     0
            c6t5000C5001A65D50Ed0    ONLINE       0     0     0
            c6t5000C5001A65D000d0    ONLINE       0     0     0
            c6t5000C5001A65BBD8d0    ONLINE       0     0     0
            c6t5000C5001A65BA57d0    ONLINE       0     0     0
            c6t5000C5001A65A0E5d0    ONLINE       0     0     0
            c6t5000C5001A659F98d0    ONLINE       0     0     0
            c6t5000C5001A659D5Bd0    ONLINE       0     0     0
            c6t5000C5001A658E6Bd0    ONLINE       0     0     0
            c6t5000C5001A657CDCd0    ONLINE       0     0     0
            c6t5000C5001A657CBBd0    ONLINE       0     0     0
            c6t5000C5001A6573B0d0    ONLINE       0     0     0
            c6t5000C5001A6572E8d0    ONLINE       0     0     0
          raidz2-2                   DEGRADED     0     0     0
            c6t5000C5001A656F98d0    ONLINE       0     0     0
            c6t5000C5001A655093d0    ONLINE       0     0     0
            spare-2                  DEGRADED     0     0     2
              4335891315817043040    UNAVAIL      0     0     0  was
/dev/dsk/c6t5000C5001A654417d0s0
              c6t5000C5001A452C72d0  ONLINE       0     0     0  90.9G
resilvered
            c6t5000C5001A653C3Dd0    ONLINE       0     0     0
            c6t5000C5001A652ABEd0    ONLINE       0     0     0
            c6t5000C5001A652A1Ed0    ONLINE       0     0     0
            c6t5000C5001A64D042d0    ONLINE       0     0     0
            c6t5000C5001A5C1D37d0    ONLINE       0     0     0
            c6t5000C5001A59722Fd0    ONLINE       0     0     0
            replacing-9              DEGRADED     0     0   195
              c6t5000C5001A59595Ed0  FAULTED    232   437     0  too many errors
              c6t5000C5001A406415d0  ONLINE       0     0     0  90.8G
resilvered
            c6t5000C5001A57F715d0    ONLINE       0     0     0
            c6t5000C5001A54DB0Cd0    ONLINE       0     0     0
            c6t5000C5001A4836CDd0    ONLINE       0     0     0
            c6t5000C5001A4737D8d0    ONLINE       0     0     0
            c6t5000C5001A455E70d0    ONLINE       0     0     0
        spares
          c6t5000C5001A452C72d0      INUSE     currently in use
          c6t5000C5001A42D792d0      AVAIL

errors: No known data errors

I know that a 15 disk drive raidz2 is not the best receipe for
performance, but we needed the capacity and this is for storing the
remote backup copy of the data, so most I/O is sequential (zfs recv).

I am trying to understand what I am seeing and relate it to real world
activity (time to resilver a failed drive, for example).

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Edward Ned Harvey

2011-Jun-02 12:23 UTC

head link

[zfs-discuss] SATA disk perf question

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Erik Trimble
> 
> Here''s how you calculate (average) how long a random IOPs takes:
> 
> seek time + ((60 / RPMs) / 2))]
> 
> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
> IOPS
While this is true, all drives nowadays use things like PIO command
queueing, and other hardware optimization techniques.  So even when you
instruct the drive to do a bunch of random IO, the drive will make it less
random in the controller before it instructs the arm to move about and so
on.  Generally speaking, these techniques will approx double the random
IOPS, because with a random distribution of IO requests, on average it will
be able to halve the randomness.

Consider your nit picked.  ;-)

Jens Elkner

2011-Jun-03 00:12 UTC

head link

[zfs-discuss] SATA disk perf question

On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble
wrote:> On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:
  > Here''s how you calculate (average) how long a random IOPs takes:
> seek time + ((60 / RPMs) / 2))]
> 
> A truly sequential IOPs is:
> (60 / RPMs) / 2)
> 
> For that series of drives, seek time averages 8.5ms (per Seagate).
> So, you get 
> 
> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
> IOPS
> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.
> 
> Note that due to averaging, the above numbers may be slightly higher or
> lower for any actual workload.
Nahh, shouldn''t it read "numbers may be _significant_ higher or
lower"
...? ;-)

Regards,
jel.
-- 
Otto-von-Guericke University     http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany         Tel: +49 391 67 12768

Erik Trimble

2011-Jun-03 03:49 UTC

head link

[zfs-discuss] SATA disk perf question

On 6/2/2011 5:12 PM, Jens Elkner wrote:> On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote:
>> On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:
>
>> Here''s how you calculate (average) how long a random IOPs
takes:
>> seek time + ((60 / RPMs) / 2))]
>>
>> A truly sequential IOPs is:
>> (60 / RPMs) / 2)
>>
>> For that series of drives, seek time averages 8.5ms (per Seagate).
>> So, you get
>>
>> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to 78
>> IOPS
>> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.
>>
>> Note that due to averaging, the above numbers may be slightly higher or
>> lower for any actual workload.
> Nahh, shouldn''t it read "numbers may be _significant_ higher
or lower"
> ...? ;-)
>
> Regards,
> jel.
Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn''t going 
to be able to do more than 200 under ideal conditions, and should be 
able to manage 50 under anything other than the pedantically worst-case 
situation. That''s only about a 50% deviation, not like an order of 
magnitude or so.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Paul Kraus

2011-Jun-03 15:22 UTC

head link

[zfs-discuss] SATA disk perf question

On Thu, Jun 2, 2011 at 11:49 PM, Erik Trimble <erik.trimble at oracle.com>
wrote:> On 6/2/2011 5:12 PM, Jens Elkner wrote:
>>
>> On Wed, Jun 01, 2011 at 06:17:08PM -0700, Erik Trimble wrote:
>>>
>>> On Wed, 2011-06-01 at 12:54 -0400, Paul Kraus wrote:
>>
>>> Here''s how you calculate (average) how long a random IOPs
takes:
>>> seek time + ((60 / RPMs) / 2))]
>>>
>>> A truly sequential IOPs is:
>>> (60 / RPMs) / 2)
>>>
>>> For that series of drives, seek time averages 8.5ms (per Seagate).
>>> So, you get
>>>
>>> 1 Random IOPs takes [8.5ms + 4.13ms] = 12.6ms, which translates to
78
>>> IOPS
>>> 1 Sequential IOPs takes 4.13ms, which gives 120 IOPS.
>>>
>>> Note that due to averaging, the above numbers may be slightly
higher or
>>> lower for any actual workload.
>>
>> Nahh, shouldn''t it read "numbers may be _significant_
higher or lower"
>> ...? ;-)
>>
>> Regards,
>> jel.
>
> Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn''t
going to be
> able to do more than 200 under ideal conditions, and should be able to
> manage 50 under anything other than the pedantically worst-case situation.
> That''s only about a 50% deviation, not like an order of magnitude
or so.
So is there a way to read these real I/Ops numbers ?

iostat is reporting 600-800 I/Ops peak (1 second sample) for these
7200 RPM SATA drives. If the drives are doing aggregation, then how to
tell what is really going on ?

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Eric Sproul

2011-Jun-03 15:37 UTC

head link

[zfs-discuss] SATA disk perf question

On Fri, Jun 3, 2011 at 11:22 AM, Paul Kraus <paul at kraus-haus.org>
wrote:> So is there a way to read these real I/Ops numbers ?
>
> iostat is reporting 600-800 I/Ops peak (1 second sample) for these
> 7200 RPM SATA drives. If the drives are doing aggregation, then how to
> tell what is really going on ?
I''ve always assumed that crazy high IOPS numbers on 7.2k drives means
I''m seeing the individual drive caches absorbing those writes. 
That''s
the first place those writes will "land" when coming in from the disk
controller.  As other posters have said, after that the drive may
internally reorder and/or aggregate those writes before sending them
to the platter.

Eric

Eric D. Mudama

2011-Jun-03 15:37 UTC

head link

[zfs-discuss] SATA disk perf question

On Thu, Jun  2 at 20:49, Erik Trimble wrote:>Nope. In terms of actual, obtainable IOPS, a 7200RPM drive isn''t 
>going to be able to do more than 200 under ideal conditions, and 
>should be able to manage 50 under anything other than the 
>pedantically worst-case situation. That''s only about a 50%
deviation,
>not like an order of magnitude or so.
Most cache-enabled 7200RPM drives can do 20K+ sequential IOPS at small
block sizes, up close to their peak transfer rate.  

For random IO, I typically see 80 IOPS for unqueued reads, 120 for
queued reads/writes with cache disabled, and maybe 150-200 for cache
enabled writes.  The above are all full-stroke, so the average seek is
1/3 stroke (unqueued).  On a smaller data set where the drive dwarfs
the data set, average seek distance is much shorter and the resulting
IOPS can be quite a bit higher.

--eric

-- 
Eric D. Mudama
edmudama at bounceswoosh.org

Possibly Parallel Threads

Search for more seemingly similar threads

zfs discuss - Jun 2011 - SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

[zfs-discuss] SATA disk perf question

Possibly Parallel Threads