Richard''s blog analyzes MTTDL as a function of N+P+S: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl But to understand how to best utilize an array with a fixed number of drives, I add the following constraints: - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2} - all sets in an array should be configured similarly - the MTTDL for S sets is equal to (MTTDL for one set)/S I got the following results by varying the NUM_BAYS parameter in the source code below: *_4 bays w/ 300 GB drives having MTBF=4 years_* - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of 5840.00 years - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of 799350.00 years - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years *_8 bays w/ 300 GB drives having MTBF=4 years_* - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of 2920.00 years - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of 399675.00 years - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of 1752.00 years - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of 2557920.00 years - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years *_12 bays w/ 300 GB drives having MTBF=4 years_* - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of 365.00 years - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of 266450.00 years - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of 876.00 years - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of 79935.00 years - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of 426320.00 years *_16 bays w/ 300 GB drives having MTBF=4 years_* - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of 1168.00 years - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of 199837.50 years - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of 584.00 years - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of 1278960.00 years - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of 426320.00 years *_20 bays w/ 300 GB drives having MTBF=4 years_* - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of 973.33 years - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of 159870.00 years - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of 109.50 years - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of 852640.00 years - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of 13322.50 years *_24 bays w/ 300 GB drives having MTBF=4 years_* - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of 182.50 years - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of 133225.00 years - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of 438.00 years - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of 39967.50 years - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of 213160.00 years While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 is unnecessary in a practical sense... This conclusion surprises me given the amount of attention people give to double-parity solutions - what am I overlooking? Thanks, Kent _*Source Code*_ (compile with: cc -std:c99 -lm <filename>) [its more than 80 columns - sorry!] #include <stdio.h> #include <math.h> #define NUM_BAYS 24 #define DRIVE_SIZE_GB 300 #define MTBF_YEARS 4 #define MTTR_HOURS_NO_SPARE 16 #define MTTR_HOURS_SPARE 4 int main() { printf("\n"); printf("%u bays w/ %u GB drives having MTBF=%u years\n", NUM_BAYS, DRIVE_SIZE_GB, MTBF_YEARS); for (int num_drives=2; num_drives<=8; num_drives*=2) { for (int num_parity=1; num_parity<=2; num_parity++) { double mttdl; int mtbf_hours = MTBF_YEARS * 365 * 24; int total_num_drives = num_drives + num_parity; int num_instances = NUM_BAYS / total_num_drives; int num_spares = NUM_BAYS % total_num_drives; double mttr = num_spares==0 ? MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE; int total_capacity = num_drives * num_instances * DRIVE_SIZE_GB; if (num_parity==1) { mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * (total_num_drives-1) * mttr ); } else if (num_parity==2) { mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * (total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2. } printf(" - can have %u (%u+%u) w/ %u spares providing %u GB with MTTDL of %.2f years\n", num_instances, num_drives, num_parity, num_spares, total_capacity, mttdl/24/365/num_instances ); } } }
Resent as HTML to avoid line-wrapping: Richard''s blog analyzes MTTDL as a function of N+P+S: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl But to understand how to best utilize an array with a fixed number of drives, I add the following constraints: - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2} - all sets in an array should be configured similarly - the MTTDL for S sets is equal to (MTTDL for one set)/S I got the following results by varying the NUM_BAYS parameter in the source code below: _*4 bays w/ 300 GB drives having MTBF=4 years*_ - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of 5840.00 years - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of 799350.00 years - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years _*8 bays w/ 300 GB drives having MTBF=4 years*_ - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of 2920.00 years - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of 399675.00 years - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of 1752.00 years - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of 2557920.00 years - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years _*12 bays w/ 300 GB drives having MTBF=4 years*_ - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of 365.00 years - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of 266450.00 years - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of 876.00 years - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of 79935.00 years - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of 426320.00 years * _16 bays w/ 300 GB drives having MTBF=4 years_* - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of 1168.00 years - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of 199837.50 years - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of 584.00 years - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of 1278960.00 years - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of 426320.00 years _*20 bays w/ 300 GB drives having MTBF=4 years*_ - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of 973.33 years - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of 159870.00 years - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of 109.50 years - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of 852640.00 years - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of 13322.50 years _*24 bays w/ 300 GB drives having MTBF=4 years*_ - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of 182.50 years - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of 133225.00 years - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of 438.00 years - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of 39967.50 years - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of 213160.00 years While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 is unnecessary in a practical sense... This conclusion surprises me given the amount of attention people give to double-parity solutions - what am I overlooking? Thanks, Kent _Source Code_ (compile with: cc -std:c99 -lm <filename>) [its more than 80 columns - sorry!] #include <stdio.h> #include <math.h> #define NUM_BAYS 24 #define DRIVE_SIZE_GB 300 #define MTBF_YEARS 4 #define MTTR_HOURS_NO_SPARE 16 #define MTTR_HOURS_SPARE 4 int main() { printf("\n"); printf("%u bays w/ %u GB drives having MTBF=%u years\n", NUM_BAYS, DRIVE_SIZE_GB, MTBF_YEARS); for (int num_drives=2; num_drives<=8; num_drives*=2) { for (int num_parity=1; num_parity<=2; num_parity++) { double mttdl; int mtbf_hours = MTBF_YEARS * 365 * 24; int total_num_drives = num_drives + num_parity; int num_instances = NUM_BAYS / total_num_drives; int num_spares = NUM_BAYS % total_num_drives; double mttr = num_spares==0 ? MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE; int total_capacity = num_drives * num_instances * DRIVE_SIZE_GB; if (num_parity==1) { mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * (total_num_drives-1) * mttr ); } else if (num_parity==2) { mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * (total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2. } printf(" - can have %u (%u+%u) w/ %u spares providing %u GB with MTTDL of %.2f years\n", num_instances, num_drives, num_parity, num_spares, total_capacity, mttdl/24/365/num_instances ); } } } -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070711/64f56ccd/attachment.html>
> While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that > /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 > is unnecessary in a practical sense... This conclusion surprises me > given the amount of attention people give to double-parity solutions - > what am I overlooking?When talking to Netapp, some of their folks have mentioned their DP solution wasn''t necessarily so useful for handling near-simultaneous disk loss (although it does do that). But that when a disk failed, it would not be uncommon for reconstruction to be unable to read some data off the remaining disks (perhaps a bad sector or bad data that fails checksum). With 1P, you have to shut down the volume or leave a hole in the filesystem. With 2P, you reconstruct that one read and continue. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
All, When I reformatted to HTML, I forgot ro fix the code also - here is the correct code: #include <stdio.h> #include <math.h> #define NUM_BAYS 24 #define DRIVE_SIZE_GB 300 #define MTBF_YEARS 4 #define MTTR_HOURS_NO_SPARE 16 #define MTTR_HOURS_SPARE 4 int main() { printf("\n"); printf("%u bays w/ %u GB drives having MTBF=%u years\n", NUM_BAYS, DRIVE_SIZE_GB, MTBF_YEARS); for (int num_drives=2; num_drives<=8; num_drives*=2) { for (int num_parity=1; num_parity<=2; num_parity++) { double mttdl; int mtbf_hours = MTBF_YEARS * 365 * 24; int total_num_drives = num_drives + num_parity; int num_instances = NUM_BAYS / total_num_drives; int num_spares = NUM_BAYS % total_num_drives; double mttr = num_spares==0 ? MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE; int total_capacity = num_drives * num_instances * DRIVE_SIZE_GB; if (num_parity==1) { mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * (total_num_drives-1) * mttr ); } else if (num_parity==2) { mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * (total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2.0) ); } printf(" - can have %u (%u+%u) w/ %u spares providing %u GB with MTTDL of %.2f years\n", num_instances, num_drives, num_parity, num_spares, total_capacity, mttdl/24/365/num_instances ); } } } Sorry about that Kent
Darren Dunham wrote:>> While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that >> /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 >> is unnecessary in a practical sense... This conclusion surprises me >> given the amount of attention people give to double-parity solutions - >> what am I overlooking? >> > > When talking to Netapp, some of their folks have mentioned their DP > solution wasn''t necessarily so useful for handling near-simultaneous > disk loss (although it does do that). > > But that when a disk failed, it would not be uncommon for reconstruction > to be unable to read some data off the remaining disks (perhaps a bad > sector or bad data that fails checksum). With 1P, you have to shut down > the volume or leave a hole in the filesystem. With 2P, you reconstruct > that one read and continue. > >Are Netapp using some kind of block checksumming? That seems to be one of the big wins of ZFS compared to ordinary filesystems -- I have a higher confidence that data I haven''t accessed recently is still good. If Netapp doesn''t do something like that, that would explain why there''s frequently trouble reconstructing, and point up a major ZFS advantage. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/dd-b Pics: http://dd-b.net/dd-b/SnapshotAlbum, http://dd-b.net/photography/gallery Dragaera: http://dragaera.info
> Are Netapp using some kind of block checksumming?They provide an option for it, I''m not sure how often it''s used.> If Netapp doesn''t do something like [ZFS checksums], that would > explain why there''s frequently trouble reconstructing, and point up a > major ZFS advantage.Actually, the real problem is uncorrectable errors on drives. On a 1 TB SATA drive, there''s a good chance (over 1%) that at least one block will be unreadable once written. Scrubbing tries to catch these, but if an error develops between the last scrub and the need to read the data as part of reconstruction, you''re out of luck. This is the big advantage of RAID-6 / RAIDZ2; the combination of a drive failure and a single-block failure on a second drive won''t lead to data loss. This message posted from opensolaris.org
#define DRIVE_SIZE_GB 300 #define MTBF_YEARS 2 #define MTTR_HOURS_NO_SPARE 48 #define MTTR_HOURS_SPARE 8 #define NUM_BAYS 10 - can have 3 (2+1) w/ 1 spares providing 1800 GB with MTTDL of 243.33 years - can have 2 (4+1) w/ 0 spares providing 2400 GB with MTTDL of 18.25 years - can have 1 (8+1) w/ 1 spares providing 2400 GB with MTTDL of 60.83 years - can have 1 (8+2) w/ 0 spares providing 2400 GB with MTTDL of 370.07 years #define NUM_BAYS 11 - can have 2 (4+1) w/ 1 spares providing 2400 GB with MTTDL of 109.50 years - can have 1 (8+2) w/ 1 spares providing 2400 GB with MTTDL of 13322.50 years wow, from 370.07 years to 13322.50 years by adding a spare?
> Are Netapp using some kind of block checksumming? That seems to be one > of the big wins of ZFS compared to ordinary filesystems -- I have a > higher confidence that data I haven''t accessed recently is still good. > If Netapp doesn''t do something like that, that would explain why there''s > frequently trouble reconstructing, and point up a major ZFS advantage.Yes, they do. I don''t think it''s identical to ZFS, but is is similar. Here''s a page from their website that describes it (without much detail). Given the comments, I doubt that the author was aware of ZFS when this was written. http://www.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html The "trouble reconstructing" needs to be detected. And that''s something that both WAFL and ZFS can do. Without the checksums, you might not even know there was a problem. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
cool. comments below... Kent Watsen wrote:> Richard''s blog analyzes MTTDL as a function of N+P+S: > http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl > > But to understand how to best utilize an array with a fixed number of > drives, I add the following constraints: > - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2} > - all sets in an array should be configured similarly > - the MTTDL for S sets is equal to (MTTDL for one set)/SYes, these are reasonable and will reduce the problem space, somewhat.> I got the following results by varying the NUM_BAYS parameter in the > source code below: > > *_4 bays w/ 300 GB drives having MTBF=4 years_* > - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of > 5840.00 years > - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of > 799350.00 years > - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years > - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years > - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years > - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years > > *_8 bays w/ 300 GB drives having MTBF=4 years_* > - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of > 2920.00 years > - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of > 399675.00 years > - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of > 1752.00 years > - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of > 2557920.00 years > - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years > - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years > > *_12 bays w/ 300 GB drives having MTBF=4 years_* > - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of > 365.00 years > - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of > 266450.00 years > - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of > 876.00 years > - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of > 79935.00 years > - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of > 486.67 years > - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of > 426320.00 years > > *_16 bays w/ 300 GB drives having MTBF=4 years_* > - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of > 1168.00 years > - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of > 199837.50 years > - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of > 584.00 years > - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of > 1278960.00 years > - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of > 486.67 years > - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of > 426320.00 years > > *_20 bays w/ 300 GB drives having MTBF=4 years_* > - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of > 973.33 years > - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of > 159870.00 years > - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of > 109.50 years > - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of > 852640.00 years > - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of > 243.33 years > - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of > 13322.50 years > > *_24 bays w/ 300 GB drives having MTBF=4 years_* > - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of > 182.50 years > - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of > 133225.00 years > - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of > 438.00 years > - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of > 39967.50 years > - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of > 243.33 years > - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of > 213160.00 years > > While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that > /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 > is unnecessary in a practical sense... This conclusion surprises me > given the amount of attention people give to double-parity solutions - > what am I overlooking?You are overlooking statistics :-). As I discuss in http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent the MTBF (F == death) of children aged 5-14 in the US is 4,807 years, but clearly no child will live anywhere close to 4,807 years. The number itself is not really that important, but it does provide a way to compare designs. In other words, the numbers are important in a relative sense. Another observation is that the MTBF does change over time, but the math to consider that case is much more difficult. It is also difficult to find any real data or data sheets which would show that number. There are other techniques to model this, but they won''t change the relative improvement of raidz2 over raidz, so what you have is reasonable. I had the fortune to hear Dave Patterson speak a few years ago. He said that (anecdotally) people would come up to him upset because they had lost data with RAID-5 systems. He said that it would have been better if he had done RAID-6 instead of RAID-5... hindsight is always 20/20 :-)> _*Source Code*_ (compile with: cc -std:c99 -lm <filename>) [its more > than 80 columns - sorry!]This is relatively easy to implement in a spreadsheet, too. But as you begin to notice, there are hundreds or thousands of possible combinations as you add disk drives.> #include <stdio.h> > #include <math.h> > > #define NUM_BAYS 24 > #define DRIVE_SIZE_GB 300 > #define MTBF_YEARS 4I think this is pessimistic :-)> #define MTTR_HOURS_NO_SPARE 16I think this is optimistic :-)> #define MTTR_HOURS_SPARE 4I also think this is optimistic and pessimistic :-) With ZFS, you only recover the data used. This is an advantage over other LVMs which try to reconstruct the whole space. In the best case, this number is almost zero and in the worst case, it is approximately the same as an LVM with aggressive reconstruction. There is a possibility that it is worse for some use cases, but that has not been characterized yet. Since you restrict the types to raidz and raidz2, you can simplify the analysis a bit, which helps.> int main() { > > printf("\n"); > printf("%u bays w/ %u GB drives having MTBF=%u years\n", NUM_BAYS, > DRIVE_SIZE_GB, MTBF_YEARS); > for (int num_drives=2; num_drives<=8; num_drives*=2) { > for (int num_parity=1; num_parity<=2; num_parity++) { > double mttdl; > > int mtbf_hours = MTBF_YEARS * 365 * 24; > int total_num_drives = num_drives + num_parity; > int num_instances = NUM_BAYS / total_num_drives; > int num_spares = NUM_BAYS % total_num_drives; > double mttr = num_spares==0 ? > MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE; > int total_capacity = num_drives * num_instances * > DRIVE_SIZE_GB; > > if (num_parity==1) { > mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * > (total_num_drives-1) * mttr ); > } else if (num_parity==2) { > mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * > (total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2. > } > > printf(" - can have %u (%u+%u) w/ %u spares providing %u GB > with MTTDL of %.2f years\n", > num_instances, > num_drives, num_parity, > num_spares, > total_capacity, > mttdl/24/365/num_instances > ); > } > } > }There are many more facets of looking at these sorts of analysis, which is why I wrote RAIDoptimizer. Attached is a similar output from RAIDoptimizer in a spreadsheet so you can sort or plot the data as you''d like. The algorithms are described in various blog entries at: http://blogs.sun.com/relling I''ll note that RAIDoptimizer doesn''t currently let me set an MTBF < 100,000 hours, so I''ll take that as an RFE. -- richard -------------- next part -------------- A non-text attachment was scrubbed... Name: raidz-raidz2-MTTDL-example.ods Type: application/vnd.oasis.opendocument.spreadsheet Size: 13199 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070711/1ea92982/attachment.ods>
>> But to understand how to best utilize an array with a fixed number of >> drives, I add the following constraints: >> - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2} >> - all sets in an array should be configured similarly >> - the MTTDL for S sets is equal to (MTTDL for one set)/S > > Yes, these are reasonable and will reduce the problem space, somewhat.Actually, I wish I could get more insight into why N can only be 2, 4,or 8. In contemplating a 16-bay array, I many times think that 3 (3+2) + 1 spare would be perfect, but I have no understanding what N=3 implicates...>> >> >> While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that >> /any /RAIDZ configuration will outlive me and so I conclude that >> RAIDZ2 is unnecessary in a practical sense... This conclusion >> surprises me given the amount of attention people give to >> double-parity solutions - what am I overlooking? > > You are overlooking statistics :-). As I discuss in > http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent > the MTBF (F == death) of children aged 5-14 in the US is 4,807 years, but > clearly no child will live anywhere close to 4,807 years.Thanks - I hadn''t seen that blog entry yet...> #define MTTR_HOURS_NO_SPARE 16 > > I think this is optimistic :-)Not really for me as the array is in my basement - so I assume that I''ll swap in a drive when I get home from work ;)> There are many more facets of looking at these sorts of analysis, > which is > why I wrote RAIDoptimizer.Is RAIDoptimizer the name of a spreadsheet you developed - is it publically available? Thanks, Kent
Kent Watsen wrote:>> #define MTTR_HOURS_NO_SPARE 16 >> >> I think this is optimistic :-) >> > Not really for me as the array is in my basement - so I assume that I''ll > swap in a drive when I get home from work ;) >Yes, it''s interesting how the parameters for home setups differ from "professional" (not meaning to denigrate the professionalism of anybodies home network of course). We can run to the store and buy something rather quicker than lots of professional outfits seem to be able to get spares in hand. But what if you''re away on business that week? -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/dd-b Pics: http://dd-b.net/dd-b/SnapshotAlbum, http://dd-b.net/photography/gallery Dragaera: http://dragaera.info
Kent Watsen wrote:> >>> But to understand how to best utilize an array with a fixed number of >>> drives, I add the following constraints: >>> - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2} >>> - all sets in an array should be configured similarly >>> - the MTTDL for S sets is equal to (MTTDL for one set)/S >> >> Yes, these are reasonable and will reduce the problem space, somewhat. > Actually, I wish I could get more insight into why N can only be 2, 4,or > 8. In contemplating a 16-bay array, I many times think that 3 (3+2) + 1 > spare would be perfect, but I have no understanding what N=3 implicates...There was a discussion a while back which centered around this topic. I don''t recall the details, and I think it needs to be revisited, but there was concensus that for the time being, best performace was thus achieved. I''d like to revisit this, since I think best performance is more difficult to predict, due to the dynamic nature of ZFS making it particularly sensitive to the workload.>>> While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that >>> /any /RAIDZ configuration will outlive me and so I conclude that >>> RAIDZ2 is unnecessary in a practical sense... This conclusion >>> surprises me given the amount of attention people give to >>> double-parity solutions - what am I overlooking? >> >> You are overlooking statistics :-). As I discuss in >> http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent >> the MTBF (F == death) of children aged 5-14 in the US is 4,807 years, but >> clearly no child will live anywhere close to 4,807 years. > Thanks - I hadn''t seen that blog entry yet... > > >> #define MTTR_HOURS_NO_SPARE 16 >> >> I think this is optimistic :-) > Not really for me as the array is in my basement - so I assume that I''ll > swap in a drive when I get home from work ;)It is an average value, so you have some leeway there. I work from home, so in theory I should have fast response time :-)>> There are many more facets of looking at these sorts of analysis, >> which is >> why I wrote RAIDoptimizer. > Is RAIDoptimizer the name of a spreadsheet you developed - is it > publically available?It is a Java application. I plan to open-source it, but that may take a while to get through the process. I''ll check to see if there is a way to make it available as a webstart client (which is how I deploy it). -- richard
David Dyer-Bennet wrote:> Kent Watsen wrote: > >>> #define MTTR_HOURS_NO_SPARE 16 >>> >>> I think this is optimistic :-) >>> >>> >> Not really for me as the array is in my basement - so I assume that I''ll >> swap in a drive when I get home from work ;) >> >> > Yes, it''s interesting how the parameters for home setups differ from > "professional" (not meaning to denigrate the professionalism of > anybodies home network of course). We can run to the store and buy > something rather quicker than lots of professional outfits seem to be > able to get spares in hand. > > But what if you''re away on business that week?Then you put it in a datacenter with remote monitoring. ZFS can''t solve for all problems or issues. You''ll still have drives go bad or have errors outside of their listed tolerance. (Bad fw anyone?) You''ll still have operations staff that press the wrong button. All the typical stuff that is usually listed as the reason for data loss.
On 11-Jul-07, at 3:16 PM, David Dyer-Bennet wrote:> Kent Watsen wrote: >>> #define MTTR_HOURS_NO_SPARE 16 >>> >>> I think this is optimistic :-) >>> >> Not really for me as the array is in my basement - so I assume >> that I''ll >> swap in a drive when I get home from work ;) >> > Yes, it''s interesting how the parameters for home setups differ from > "professional" (not meaning to denigrate the professionalism of > anybodies home network of course). We can run to the store and buy > something rather quicker than lots of professional outfits seem to be > able to get spares in hand.Ugh. Spares are not optional in either case. The store is always out of stock, or your car won''t start.> > But what if you''re away on business that week?I think he intends a -mean- time to repair? --Toby> > -- > David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/dd-b > Pics: http://dd-b.net/dd-b/SnapshotAlbum, http://dd-b.net/ > photography/gallery > Dragaera: http://dragaera.info > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss