thr3ads.net - zfs discuss - [zfs-discuss] pool analysis [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Kent Watsen

2007-Jul-11 15:40 UTC

[zfs-discuss] pool analysis

Richard''s blog analyzes MTTDL as a function of N+P+S:
    http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl

But to understand how to best utilize an array with a fixed number of 
drives, I add the following constraints:
  - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2}
  - all sets in an array should be configured similarly
  - the MTTDL for S sets is equal to (MTTDL for one set)/S

I got the following results by varying the NUM_BAYS parameter in the 
source code below:

    *_4 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of
    5840.00 years
      - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of
    799350.00 years
      - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years
      - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years
      - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years
      - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years

    *_8 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of
    2920.00 years
      - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of
    399675.00 years
      - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of
    1752.00 years
      - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of
    2557920.00 years
      - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years
      - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years

    *_12 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of
    365.00 years
      - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of
    266450.00 years
      - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of
    876.00 years
      - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of
    79935.00 years
      - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of
    486.67 years
      - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of
    426320.00 years

    *_16 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of
    1168.00 years
      - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of
    199837.50 years
      - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of
    584.00 years
      - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of
    1278960.00 years
      - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of
    486.67 years
      - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of
    426320.00 years

    *_20 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of
    973.33 years
      - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of
    159870.00 years
      - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of
    109.50 years
      - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of
    852640.00 years
      - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of
    243.33 years
      - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of
    13322.50 years

    *_24 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of
    182.50 years
      - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of
    133225.00 years
      - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of
    438.00 years
      - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of
    39967.50 years
      - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of
    243.33 years
      - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of
    213160.00 years

While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that 
/any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 
is unnecessary in a practical sense...  This conclusion surprises me 
given the amount of attention people give to double-parity solutions - 
what am I overlooking?

Thanks,
Kent



_*Source Code*_  (compile with: cc -std:c99 -lm <filename>) [its more 
than 80 columns - sorry!]

#include <stdio.h>
#include <math.h>

#define NUM_BAYS 24
#define DRIVE_SIZE_GB 300
#define MTBF_YEARS 4
#define MTTR_HOURS_NO_SPARE 16
#define MTTR_HOURS_SPARE 4

int main() {

    printf("\n");
    printf("%u bays w/ %u GB drives having MTBF=%u years\n", NUM_BAYS,
DRIVE_SIZE_GB, MTBF_YEARS);
    for (int num_drives=2; num_drives<=8; num_drives*=2) {
        for (int num_parity=1; num_parity<=2; num_parity++) {
            double  mttdl;

            int     mtbf_hours          = MTBF_YEARS * 365 * 24;
            int     total_num_drives    = num_drives + num_parity;
            int     num_instances       = NUM_BAYS / total_num_drives;
            int     num_spares          = NUM_BAYS % total_num_drives;
            double  mttr                = num_spares==0 ? 
MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE;
            int     total_capacity      = num_drives * num_instances * 
DRIVE_SIZE_GB;

            if (num_parity==1) {
                    mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * 
(total_num_drives-1) * mttr );
            } else if (num_parity==2) {
                    mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * 
(total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2.
            }

            printf("  - can have %u (%u+%u) w/ %u spares providing %u GB 
with MTTDL of %.2f years\n",
                    num_instances,
                    num_drives, num_parity,
                    num_spares,
                    total_capacity,
                    mttdl/24/365/num_instances
                );
        }
    }
}

Kent Watsen

2007-Jul-11 15:48 UTC

head link

[zfs-discuss] pool analysis

Resent as HTML to avoid line-wrapping:


Richard''s blog analyzes MTTDL as a function of N+P+S:
   http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl

But to understand how to best utilize an array with a fixed number of 
drives, I add the following constraints:
 - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2}
 - all sets in an array should be configured similarly
 - the MTTDL for S sets is equal to (MTTDL for one set)/S

I got the following results by varying the NUM_BAYS parameter in the 
source code below:

   _*4 bays w/ 300 GB drives having MTBF=4 years*_
     - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of 
5840.00 years
     - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of 
799350.00 years
     - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years
     - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years
     - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years
     - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years

   _*8 bays w/ 300 GB drives having MTBF=4 years*_
     - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of 
2920.00 years
     - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of 
399675.00 years
     - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of 
1752.00 years
     - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of 
2557920.00 years
     - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years
     - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years

   _*12 bays w/ 300 GB drives having MTBF=4 years*_
     - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of 
365.00 years
     - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of 
266450.00 years
     - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of 
876.00 years
     - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of 
79935.00 years
     - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of 
486.67 years
     - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of 
426320.00 years

*   _16 bays w/ 300 GB drives having MTBF=4 years_*
     - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of 
1168.00 years
     - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of 
199837.50 years
     - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of 
584.00 years
     - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of 
1278960.00 years
     - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of 
486.67 years
     - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of 
426320.00 years

   _*20 bays w/ 300 GB drives having MTBF=4 years*_
     - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of 
973.33 years
     - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of 
159870.00 years
     - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of 
109.50 years
     - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of 
852640.00 years
     - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of 
243.33 years
     - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of 
13322.50 years

   _*24 bays w/ 300 GB drives having MTBF=4 years*_
     - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of 
182.50 years
     - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of 
133225.00 years
     - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of 
438.00 years
     - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of 
39967.50 years
     - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of 
243.33 years
     - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of 
213160.00 years

While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that 
/any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 
is unnecessary in a practical sense...  This conclusion surprises me 
given the amount of attention people give to double-parity solutions - 
what am I overlooking?

Thanks,
Kent



_Source Code_  (compile with: cc -std:c99 -lm <filename>) [its more than 
80 columns - sorry!]

#include <stdio.h>
#include <math.h>

#define NUM_BAYS 24
#define DRIVE_SIZE_GB 300
#define MTBF_YEARS 4
#define MTTR_HOURS_NO_SPARE 16
#define MTTR_HOURS_SPARE 4

int main() {

   printf("\n");
   printf("%u bays w/ %u GB drives having MTBF=%u years\n", NUM_BAYS, 
DRIVE_SIZE_GB, MTBF_YEARS);
   for (int num_drives=2; num_drives<=8; num_drives*=2) {
       for (int num_parity=1; num_parity<=2; num_parity++) {
           double  mttdl;

           int     mtbf_hours          = MTBF_YEARS * 365 * 24;
           int     total_num_drives    = num_drives + num_parity;
           int     num_instances       = NUM_BAYS / total_num_drives;
           int     num_spares          = NUM_BAYS % total_num_drives;
           double  mttr                = num_spares==0 ? 
MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE;
           int     total_capacity      = num_drives * num_instances * 
DRIVE_SIZE_GB;

           if (num_parity==1) {
                   mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * 
(total_num_drives-1) * mttr );
           } else if (num_parity==2) {
                   mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * 
(total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2.
           }

           printf("  - can have %u (%u+%u) w/ %u spares providing %u GB 
with MTTDL of %.2f years\n",
                   num_instances,
                   num_drives, num_parity,
                   num_spares,
                   total_capacity,
                   mttdl/24/365/num_instances
               );
       }
   }
}





-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070711/64f56ccd/attachment.html>

Darren Dunham

2007-Jul-11 15:57 UTC

head link

[zfs-discuss] pool analysis

> While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that 
> /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 
> is unnecessary in a practical sense...  This conclusion surprises me 
> given the amount of attention people give to double-parity solutions - 
> what am I overlooking?
When talking to Netapp, some of their folks have mentioned their DP
solution wasn''t necessarily so useful for handling near-simultaneous
disk loss (although it does do that).

But that when a disk failed, it would not be uncommon for reconstruction
to be unable to read some data off the remaining disks (perhaps a bad
sector or bad data that fails checksum).  With 1P, you have to shut down
the volume or leave a hole in the filesystem.  With 2P, you reconstruct
that one read and continue.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Kent Watsen

2007-Jul-11 16:25 UTC

head link

[zfs-discuss] pool analysis

All, 

When I reformatted to HTML, I forgot ro fix the code also - here is the 
correct code:

#include <stdio.h>
#include <math.h>

#define NUM_BAYS 24
#define DRIVE_SIZE_GB 300
#define MTBF_YEARS 4
#define MTTR_HOURS_NO_SPARE 16
#define MTTR_HOURS_SPARE 4

int main() {

    printf("\n");
    printf("%u bays w/ %u GB drives having MTBF=%u years\n", NUM_BAYS,
DRIVE_SIZE_GB, MTBF_YEARS);
    for (int num_drives=2; num_drives<=8; num_drives*=2) {
        for (int num_parity=1; num_parity<=2; num_parity++) {
            double  mttdl;

            int     mtbf_hours          = MTBF_YEARS * 365 * 24;
            int     total_num_drives    = num_drives + num_parity;
            int     num_instances       = NUM_BAYS / total_num_drives;
            int     num_spares          = NUM_BAYS % total_num_drives;
            double  mttr                = num_spares==0 ? 
MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE;
            int     total_capacity      = num_drives * num_instances * 
DRIVE_SIZE_GB;

            if (num_parity==1) {
                    mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * 
(total_num_drives-1) * mttr );
            } else if (num_parity==2) {
                    mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * 
(total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2.0) );
            }

            printf("  - can have %u (%u+%u) w/ %u spares providing %u GB 
with MTTDL of %.2f years\n",
                    num_instances,
                    num_drives, num_parity,
                    num_spares,
                    total_capacity,
                    mttdl/24/365/num_instances
                );
        }
    }
}

Sorry about that

Kent

David Dyer-Bennet

2007-Jul-11 16:26 UTC

head link

[zfs-discuss] pool analysis

Darren Dunham wrote:>> While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that 
>> /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2
>> is unnecessary in a practical sense...  This conclusion surprises me 
>> given the amount of attention people give to double-parity solutions - 
>> what am I overlooking?
>>     
>
> When talking to Netapp, some of their folks have mentioned their DP
> solution wasn''t necessarily so useful for handling
near-simultaneous
> disk loss (although it does do that).
>
> But that when a disk failed, it would not be uncommon for reconstruction
> to be unable to read some data off the remaining disks (perhaps a bad
> sector or bad data that fails checksum).  With 1P, you have to shut down
> the volume or leave a hole in the filesystem.  With 2P, you reconstruct
> that one read and continue.
>
>   Are Netapp using some kind of block checksumming?  That seems to be one 
of the big wins of ZFS compared to ordinary filesystems -- I have a 
higher confidence that data I haven''t accessed recently is still good.
If Netapp doesn''t do something like that, that would explain why
there''s
frequently trouble reconstructing, and point up a major ZFS advantage.

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/dd-b
Pics: http://dd-b.net/dd-b/SnapshotAlbum, http://dd-b.net/photography/gallery
Dragaera: http://dragaera.info

Anton B. Rang

2007-Jul-11 16:41 UTC

head link

[zfs-discuss] pool analysis

> Are Netapp using some kind of block checksumming?
They provide an option for it, I''m not sure how often it''s
used.
> If Netapp doesn''t do something like [ZFS checksums], that would
> explain why there''s frequently trouble reconstructing, and point
up a
> major ZFS advantage.
Actually, the real problem is uncorrectable errors on drives. On a 1 TB SATA
drive, there''s a good chance (over 1%) that at least one block will be
unreadable once written.

Scrubbing tries to catch these, but if an error develops between the last scrub
and the need to read the data as part of reconstruction, you''re out of
luck.

This is the big advantage of RAID-6 / RAIDZ2; the combination of a drive failure
and a single-block failure on a second drive won''t lead to data loss.
 
 
This message posted from opensolaris.org

Rob Logan

2007-Jul-11 16:47 UTC

head link

[zfs-discuss] pool analysis

#define DRIVE_SIZE_GB 300
#define MTBF_YEARS 2
#define MTTR_HOURS_NO_SPARE 48
#define MTTR_HOURS_SPARE 8

#define NUM_BAYS 10
   - can have 3 (2+1) w/ 1 spares providing 1800 GB with MTTDL of 243.33 years
   - can have 2 (4+1) w/ 0 spares providing 2400 GB with MTTDL of 18.25 years
   - can have 1 (8+1) w/ 1 spares providing 2400 GB with MTTDL of 60.83 years
   - can have 1 (8+2) w/ 0 spares providing 2400 GB with MTTDL of 370.07 years
#define NUM_BAYS 11
   - can have 2 (4+1) w/ 1 spares providing 2400 GB with MTTDL of 109.50 years
   - can have 1 (8+2) w/ 1 spares providing 2400 GB with MTTDL of 13322.50 years

wow, from 370.07 years to 13322.50 years by adding a spare?

Darren Dunham

2007-Jul-11 17:03 UTC

head link

[zfs-discuss] pool analysis

> Are Netapp using some kind of block checksumming?  That seems to be one 
> of the big wins of ZFS compared to ordinary filesystems -- I have a 
> higher confidence that data I haven''t accessed recently is still
good.
> If Netapp doesn''t do something like that, that would explain why
there''s
> frequently trouble reconstructing, and point up a major ZFS advantage.
Yes, they do.  I don''t think it''s identical to ZFS, but is is
similar.

Here''s a page from their website that describes it (without much
detail).  Given the comments, I doubt that the author was aware of ZFS
when this was written.

http://www.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html

The "trouble reconstructing" needs to be detected.  And
that''s something
that both WAFL and ZFS can do.  Without the checksums, you might not
even know there was a problem.  

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Richard Elling

2007-Jul-11 17:27 UTC

head link

[zfs-discuss] pool analysis

cool.  comments below...

Kent Watsen wrote:> Richard''s blog analyzes MTTDL as a function of N+P+S:
>     http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl
> 
> But to understand how to best utilize an array with a fixed number of 
> drives, I add the following constraints:
>   - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2}
>   - all sets in an array should be configured similarly
>   - the MTTDL for S sets is equal to (MTTDL for one set)/S
Yes, these are reasonable and will reduce the problem space, somewhat.
> I got the following results by varying the NUM_BAYS parameter in the 
> source code below:
> 
>     *_4 bays w/ 300 GB drives having MTBF=4 years_*
>       - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of
>     5840.00 years
>       - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of
>     799350.00 years
>       - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years
>       - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years
>       - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years
>       - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years
> 
>     *_8 bays w/ 300 GB drives having MTBF=4 years_*
>       - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of
>     2920.00 years
>       - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of
>     399675.00 years
>       - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of
>     1752.00 years
>       - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of
>     2557920.00 years
>       - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years
>       - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years
> 
>     *_12 bays w/ 300 GB drives having MTBF=4 years_*
>       - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of
>     365.00 years
>       - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of
>     266450.00 years
>       - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of
>     876.00 years
>       - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of
>     79935.00 years
>       - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of
>     486.67 years
>       - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of
>     426320.00 years
> 
>     *_16 bays w/ 300 GB drives having MTBF=4 years_*
>       - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of
>     1168.00 years
>       - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of
>     199837.50 years
>       - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of
>     584.00 years
>       - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of
>     1278960.00 years
>       - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of
>     486.67 years
>       - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of
>     426320.00 years
> 
>     *_20 bays w/ 300 GB drives having MTBF=4 years_*
>       - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of
>     973.33 years
>       - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of
>     159870.00 years
>       - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of
>     109.50 years
>       - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of
>     852640.00 years
>       - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of
>     243.33 years
>       - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of
>     13322.50 years
> 
>     *_24 bays w/ 300 GB drives having MTBF=4 years_*
>       - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of
>     182.50 years
>       - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of
>     133225.00 years
>       - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of
>     438.00 years
>       - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of
>     39967.50 years
>       - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of
>     243.33 years
>       - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of
>     213160.00 years
> 
> While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that 
> /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 
> is unnecessary in a practical sense...  This conclusion surprises me 
> given the amount of attention people give to double-parity solutions - 
> what am I overlooking?
You are overlooking statistics :-).  As I discuss in
	http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent
the MTBF (F == death) of children aged 5-14 in the US is 4,807 years, but
clearly no child will live anywhere close to 4,807 years.  The number itself
is not really that important, but it does provide a way to compare designs.
In other words, the numbers are important in a relative sense.

Another observation is that the MTBF does change over time, but the math to
consider that case is much more difficult.  It is also difficult to find any
real data or data sheets which would show that number.  There are other
techniques to model this, but they won''t change the relative
improvement of
raidz2 over raidz, so what you have is reasonable.

I had the fortune to hear Dave Patterson speak a few years ago.  He said
that (anecdotally) people would come up to him upset because they had lost
data with RAID-5 systems.  He said that it would have been better if he
had done RAID-6 instead of RAID-5... hindsight is always 20/20 :-)
> _*Source Code*_  (compile with: cc -std:c99 -lm <filename>) [its more
> than 80 columns - sorry!]
This is relatively easy to implement in a spreadsheet, too.  But as you
begin to notice, there are hundreds or thousands of possible combinations
as you add disk drives.
> #include <stdio.h>
> #include <math.h>
> 
> #define NUM_BAYS 24
> #define DRIVE_SIZE_GB 300
> #define MTBF_YEARS 4
I think this is pessimistic :-)
> #define MTTR_HOURS_NO_SPARE 16
I think this is optimistic :-)
> #define MTTR_HOURS_SPARE 4
I also think this is optimistic and pessimistic :-)  With ZFS, you only
recover the data used.  This is an advantage over other LVMs which try to
reconstruct the whole space.  In the best case, this number is almost zero
and in the worst case, it is approximately the same as an LVM with aggressive
reconstruction.  There is a possibility that it is worse for some use cases,
but that has not been characterized yet.

Since you restrict the types to raidz and raidz2, you can simplify the
analysis a bit, which helps.
> int main() {
> 
>     printf("\n");
>     printf("%u bays w/ %u GB drives having MTBF=%u years\n",
NUM_BAYS,
> DRIVE_SIZE_GB, MTBF_YEARS);
>     for (int num_drives=2; num_drives<=8; num_drives*=2) {
>         for (int num_parity=1; num_parity<=2; num_parity++) {
>             double  mttdl;
> 
>             int     mtbf_hours          = MTBF_YEARS * 365 * 24;
>             int     total_num_drives    = num_drives + num_parity;
>             int     num_instances       = NUM_BAYS / total_num_drives;
>             int     num_spares          = NUM_BAYS % total_num_drives;
>             double  mttr                = num_spares==0 ? 
> MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE;
>             int     total_capacity      = num_drives * num_instances * 
> DRIVE_SIZE_GB;
> 
>             if (num_parity==1) {
>                     mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * 
> (total_num_drives-1) * mttr );
>             } else if (num_parity==2) {
>                     mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * 
> (total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2.
>             }
> 
>             printf("  - can have %u (%u+%u) w/ %u spares providing %u
GB
> with MTTDL of %.2f years\n",
>                     num_instances,
>                     num_drives, num_parity,
>                     num_spares,
>                     total_capacity,
>                     mttdl/24/365/num_instances
>                 );
>         }
>     }
> }
There are many more facets of looking at these sorts of analysis, which is
why I wrote RAIDoptimizer.  Attached is a similar output from RAIDoptimizer
in a spreadsheet so you can sort or plot the data as you''d like.  The
algorithms are described in various blog entries at:
	http://blogs.sun.com/relling

I''ll note that RAIDoptimizer doesn''t currently let me set an
MTBF < 100,000
hours, so I''ll take that as an RFE.
  -- richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: raidz-raidz2-MTTDL-example.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 13199 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070711/1ea92982/attachment.ods>

Kent Watsen

2007-Jul-11 17:58 UTC

head link

[zfs-discuss] pool analysis

>> But to understand how to best utilize an array with a fixed number of 
>> drives, I add the following constraints:
>>   - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2}
>>   - all sets in an array should be configured similarly
>>   - the MTTDL for S sets is equal to (MTTDL for one set)/S
>
> Yes, these are reasonable and will reduce the problem space, somewhat.Actually, I wish I could get more insight into why N can only be 2, 4,or 
8.  In contemplating a 16-bay array, I many times think that 3 (3+2) + 1 
spare would be perfect, but I have no understanding what N=3 implicates...

>>
>>
>> While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that 
>> /any /RAIDZ configuration will outlive me and so I conclude that 
>> RAIDZ2 is unnecessary in a practical sense...  This conclusion 
>> surprises me given the amount of attention people give to 
>> double-parity solutions - what am I overlooking?
>
> You are overlooking statistics :-).  As I discuss in
>     http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent
> the MTBF (F == death) of children aged 5-14 in the US is 4,807 years, but
> clearly no child will live anywhere close to 4,807 years.  Thanks - I hadn''t seen that blog entry yet...

> #define MTTR_HOURS_NO_SPARE 16
>
> I think this is optimistic :-)Not really for me as the array is in my basement - so I assume that
I''ll
swap in a drive when I get home from work  ;)

> There are many more facets of looking at these sorts of analysis, 
> which is
> why I wrote RAIDoptimizer.  Is RAIDoptimizer the name of a spreadsheet you developed - is it 
publically available?


Thanks,
Kent

David Dyer-Bennet

2007-Jul-11 18:16 UTC

head link

[zfs-discuss] pool analysis

Kent Watsen wrote:>> #define MTTR_HOURS_NO_SPARE 16
>>
>> I think this is optimistic :-)
>>     
> Not really for me as the array is in my basement - so I assume that
I''ll
> swap in a drive when I get home from work  ;)
>   Yes, it''s interesting how the parameters for home setups differ from 
"professional" (not meaning to denigrate the professionalism of 
anybodies home network of course).  We can run to the store and buy 
something rather quicker than lots of professional outfits seem to be 
able to get spares in hand.

But what if you''re away on business that week?

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/dd-b
Pics: http://dd-b.net/dd-b/SnapshotAlbum, http://dd-b.net/photography/gallery
Dragaera: http://dragaera.info

Richard Elling

2007-Jul-11 18:25 UTC

head link

[zfs-discuss] pool analysis

Kent Watsen wrote:> 
>>> But to understand how to best utilize an array with a fixed number
of
>>> drives, I add the following constraints:
>>>   - N+P should follow ZFS best-practice rule of N={2,4,8} and
P={1,2}
>>>   - all sets in an array should be configured similarly
>>>   - the MTTDL for S sets is equal to (MTTDL for one set)/S
>>
>> Yes, these are reasonable and will reduce the problem space, somewhat.
> Actually, I wish I could get more insight into why N can only be 2, 4,or 
> 8.  In contemplating a 16-bay array, I many times think that 3 (3+2) + 1 
> spare would be perfect, but I have no understanding what N=3 implicates...
There was a discussion a while back which centered around this topic.
I don''t recall the details, and I think it needs to be revisited, but
there was concensus that for the time being, best performace was thus
achieved.  I''d like to revisit this, since I think best performance is
more difficult to predict, due to the dynamic nature of ZFS making it
particularly sensitive to the workload.
>>> While its true that RAIDZ2 is /much /safer that RAIDZ, it seems
that
>>> /any /RAIDZ configuration will outlive me and so I conclude that 
>>> RAIDZ2 is unnecessary in a practical sense...  This conclusion 
>>> surprises me given the amount of attention people give to 
>>> double-parity solutions - what am I overlooking?
>>
>> You are overlooking statistics :-).  As I discuss in
>>     http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent
>> the MTBF (F == death) of children aged 5-14 in the US is 4,807 years,
but
>> clearly no child will live anywhere close to 4,807 years.  
> Thanks - I hadn''t seen that blog entry yet...
> 
> 
>> #define MTTR_HOURS_NO_SPARE 16
>>
>> I think this is optimistic :-)
> Not really for me as the array is in my basement - so I assume that
I''ll
> swap in a drive when I get home from work  ;)
It is an average value, so you have some leeway there.  I work from home,
so in theory I should have fast response time :-)
>> There are many more facets of looking at these sorts of analysis, 
>> which is
>> why I wrote RAIDoptimizer.  
> Is RAIDoptimizer the name of a spreadsheet you developed - is it 
> publically available?
It is a Java application.  I plan to open-source it, but that may take a
while to get through the process.  I''ll check to see if there is a way
to
make it available as a webstart client (which is how I deploy it).
  -- richard

Torrey McMahon

2007-Jul-11 20:47 UTC

head link

[zfs-discuss] pool analysis

David Dyer-Bennet wrote:> Kent Watsen wrote:
>   
>>> #define MTTR_HOURS_NO_SPARE 16
>>>
>>> I think this is optimistic :-)
>>>     
>>>       
>> Not really for me as the array is in my basement - so I assume that
I''ll
>> swap in a drive when I get home from work  ;)
>>   
>>     
> Yes, it''s interesting how the parameters for home setups differ
from
> "professional" (not meaning to denigrate the professionalism of 
> anybodies home network of course).  We can run to the store and buy 
> something rather quicker than lots of professional outfits seem to be 
> able to get spares in hand.
>
> But what if you''re away on business that week?
Then you put it in a datacenter with remote monitoring. ZFS can''t solve
for all problems or issues. You''ll still have drives go bad or have 
errors outside of their listed tolerance. (Bad fw anyone?) You''ll still
have operations staff that press the wrong button. All the typical stuff 
that is usually listed as the reason for data loss.

Toby Thain

2007-Jul-11 21:29 UTC

head link

[zfs-discuss] pool analysis

On 11-Jul-07, at 3:16 PM, David Dyer-Bennet wrote:
> Kent Watsen wrote:
>>> #define MTTR_HOURS_NO_SPARE 16
>>>
>>> I think this is optimistic :-)
>>>
>> Not really for me as the array is in my basement - so I assume  
>> that I''ll
>> swap in a drive when I get home from work  ;)
>>
> Yes, it''s interesting how the parameters for home setups differ
from
> "professional" (not meaning to denigrate the professionalism of
> anybodies home network of course).  We can run to the store and buy
> something rather quicker than lots of professional outfits seem to be
> able to get spares in hand.
Ugh. Spares are not optional in either case. The store is always out  
of stock, or your car won''t start.
>
> But what if you''re away on business that week?
I think he intends a -mean- time to repair?

--Toby
>
> -- 
> David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/dd-b
> Pics: http://dd-b.net/dd-b/SnapshotAlbum, http://dd-b.net/ 
> photography/gallery
> Dragaera: http://dragaera.info
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Jul 2007 - pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis

[zfs-discuss] pool analysis