zfsmonk
2008-Jun-14 15:09 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
Mentioned on http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide is the following: "ZFS works well with storage based protected LUNs (RAID-5 or mirrored LUNs from intelligent storage arrays). However, ZFS cannot heal corrupted blocks that are detected by ZFS checksums." based upon that, if we have LUNs already in RAID5 being served from intelligent storage arrays, is it any benefit to create the zpool in a mirror if zfs can''t heal any corrupted blocks? Or would we just be wasting disk space? This message posted from opensolaris.org
Tomas Ă–gren
2008-Jun-14 15:25 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On 14 June, 2008 - zfsmonk sent me these 0,7K bytes:> Mentioned on > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide > is the following: "ZFS works well with storage based protected LUNs > (RAID-5 or mirrored LUNs from intelligent storage arrays). However, > ZFS cannot heal corrupted blocks that are detected by ZFS checksums." > > based upon that, if we have LUNs already in RAID5 being served from > intelligent storage arrays, is it any benefit to create the zpool in a > mirror if zfs can''t heal any corrupted blocks? Or would we just be > wasting disk space?Let''s say you have a raid thing called A.. If you use that as ZFS storage and ZFS detects bit errors in it, there''s not much it can do other than say "your storage sucks". If you have another raid thing called B and you mirror A and B through ZFS.. Then A comes along and flips some bits again.. then ZFS checks B, sees that it''s still correct and fixes A. A might be intelligent storage and can cope with a disk dying, but if A delivers bit errors up to ZFS - then ZFS can''t fix it. If A is actually dumb storage and you leave the raid part to ZFS, then it can fix. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Bob Friesenhahn
2008-Jun-14 16:11 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Sat, 14 Jun 2008, zfsmonk wrote:> Mentioned on > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide > is the following: "ZFS works well with storage based protected LUNs > (RAID-5 or mirrored LUNs from intelligent storage arrays). However, > ZFS cannot heal corrupted blocks that are detected by ZFS > checksums."This basically means that the checksum itself is not sufficient to accomplish correction. However if ZFS-level RAID is used, the correct block can be obtained from a redundant copy.> based upon that, if we have LUNs already in RAID5 being served from > intelligent storage arrays, is it any benefit to create the zpool in > a mirror if zfs can''t heal any corrupted blocks? Or would we just be > wasting disk space?This is a matter of opinion. If ZFS does not have access to redundancy then it can not correct any problems that it encounters, and could even panic the system or the entire pool could be lost. However, if the storage array and all associated drivers, adaptors, memory, and links are working correctly, then this risk may be acceptable (to you). ZFS experts at Sun say that even the best storage arrays may not detect and correct some problems and that complex systems can produce errors even though all of their components seem to be working correctly. This is in spite of Sun also making a living by selling such products. The storage array is only able to correct errors it detects due to the hardware reporting an unrecoverable error condition or by double-checking using data on a different drive. Since storage arrays want to be fast they are likely to engage additional validity checks/correction only after a problem has already been reported (or during a scrub/resilver) rather than as a matter of course. A problem which may occur is that your storage array may say that the data is good while ZFS says that there is bad data. Under these conditions there might not be a reasonable way to correct the problem other than to lose the data. If the zfs pool requires the failed data in order to operate, then the entire pool could be lost. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Brian Wilson
2008-Jun-14 17:11 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
> On Sat, 14 Jun 2008, zfsmonk wrote: > > > Mentioned on > > > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide > > > is the following: "ZFS works well with storage based protected LUNs > > > (RAID-5 or mirrored LUNs from intelligent storage arrays). However, > > > ZFS cannot heal corrupted blocks that are detected by ZFS > > checksums." > > This basically means that the checksum itself is not sufficient to > accomplish correction. However if ZFS-level RAID is used, the correct > > block can be obtained from a redundant copy. > > > based upon that, if we have LUNs already in RAID5 being served from > > > intelligent storage arrays, is it any benefit to create the zpool in > > > a mirror if zfs can''t heal any corrupted blocks? Or would we just be > > > wasting disk space? > > This is a matter of opinion. If ZFS does not have access to > redundancy then it can not correct any problems that it encounters, > and could even panic the system or the entire pool could be lost. > However, if the storage array and all associated drivers, adaptors, > memory, and links are working correctly, then this risk may be > acceptable (to you). > > ZFS experts at Sun say that even the best storage arrays may not > detect and correct some problems and that complex systems can produce > > errors even though all of their components seem to be working > correctly. This is in spite of Sun also making a living by selling > such products. The storage array is only able to correct errors it > detects due to the hardware reporting an unrecoverable error condition > > or by double-checking using data on a different drive. Since storage > > arrays want to be fast they are likely to engage additional validity > checks/correction only after a problem has already been reported (or > during a scrub/resilver) rather than as a matter of course. > > A problem which may occur is that your storage array may say that the > > data is good while ZFS says that there is bad data. Under these > conditions there might not be a reasonable way to correct the problem > > other than to lose the data. If the zfs pool requires the failed data > > in order to operate, then the entire pool could be lost. >Couple of questions on this topic - What''s the percent of data in a zpool that if it gets one of these bit corruption errors, will actually cause the zpool to fail? Is it a higher/lower percent than what it would take to fatally and irrevocably corrupt UFS, or VxFS to the point where a restore is required? Given that today''s storage arrays catch a good percentage of errors and correct them (for the intelligent arrays I have in mind anyway), is we''re talking about the nasty, silent corruption I''ve been reading about that occurs in huge datasets where the RAID thinks it''s good, but it''s actually garbage? From what I remember reading, that''s an low occurrence rate and only became noticeable because we''re dealing in such large amounts of data these days. Am I wrong here? So, looking at making operational decisions in the short term, I have to ask specifically. Is it more or less likely that a zpool will die and have to be restored than UFS or VxFS filesystems on a VxVM volume? My opinions and questions are my own, and do not necessarily represent those of my employer. (or my coworkers, or anyone else) cheers, Brian> Bob > =====================================> Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Brian Wilson
2008-Jun-14 17:21 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
----- Original Message ----- From: Brian Wilson <bfwilson at doit.wisc.edu> Date: Saturday, June 14, 2008 12:12 pm Subject: Re: [zfs-discuss] zpool with RAID-5 from intelligent storage arrays To: Bob Friesenhahn <bfriesen at simple.dallas.tx.us> Cc: zfs-discuss at opensolaris.org> > On Sat, 14 Jun 2008, zfsmonk wrote: > > > > > Mentioned on > > > > > > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide > > > > > > is the following: "ZFS works well with storage based protected > LUNs > > > > > (RAID-5 or mirrored LUNs from intelligent storage arrays). > However, > > > > > ZFS cannot heal corrupted blocks that are detected by ZFS > > > checksums." > > > > This basically means that the checksum itself is not sufficient to > > accomplish correction. However if ZFS-level RAID is used, the > correct > > > > block can be obtained from a redundant copy. > > > > > based upon that, if we have LUNs already in RAID5 being served > from > > > > > intelligent storage arrays, is it any benefit to create the zpool > in > > > > > a mirror if zfs can''t heal any corrupted blocks? Or would we just > be > > > > > wasting disk space? > > > > This is a matter of opinion. If ZFS does not have access to > > redundancy then it can not correct any problems that it encounters, > > > and could even panic the system or the entire pool could be lost. > > However, if the storage array and all associated drivers, adaptors, > > > memory, and links are working correctly, then this risk may be > > acceptable (to you). > > > > ZFS experts at Sun say that even the best storage arrays may not > > detect and correct some problems and that complex systems can > produce > > > > errors even though all of their components seem to be working > > correctly. This is in spite of Sun also making a living by selling > > > such products. The storage array is only able to correct errors it > > > detects due to the hardware reporting an unrecoverable error > condition > > > > or by double-checking using data on a different drive. Since > storage > > > > arrays want to be fast they are likely to engage additional validity > > > checks/correction only after a problem has already been reported (or > > > during a scrub/resilver) rather than as a matter of course. > > > > A problem which may occur is that your storage array may say that > the > > > > data is good while ZFS says that there is bad data. Under these > > conditions there might not be a reasonable way to correct the > problem > > > > other than to lose the data. If the zfs pool requires the failed > data > > > > in order to operate, then the entire pool could be lost. > > > > Couple of questions on this topic - > > What''s the percent of data in a zpool that if it gets one of these bit > corruption errors, will actually cause the zpool to fail? Is it a > higher/lower percent than what it would take to fatally and > irrevocably corrupt UFS, or VxFS to the point where a restore is > required? > > Given that today''s storage arrays catch a good percentage of errors > and correct them (for the intelligent arrays I have in mind anyway), > is we''re talking about the nasty, silent corruption I''ve been reading > about that occurs in huge datasets where the RAID thinks it''s good, > but it''s actually garbage? From what I remember reading, that''s an > low occurrence rate and only became noticeable because we''re dealing > in such large amounts of data these days. Am I wrong here? > > So, looking at making operational decisions in the short term, I have > to ask specifically. Is it more or less likely that a zpool will die > and have to be restored than UFS or VxFS filesystems on a VxVM volume? >To put it specifically - I have currently got a volume (a bunch of them) that''s on one intelligent array, on UFS or VxFS on VxVM volumes. I''m not intending at this point if I use ZFS to mirror it to another array (which may or may not exist) and double my use of expensive disk. So, that puts me at this risk described here, where the zpool could go poof. What are the odds, in that configuration of zpool (no mirroring, just using the intelligent disk as concatenated luns in the zpool) that if we have this silent corruption, the whole zpool dies? If anyone knows, what''s the comparative odds of the VxVM volume, UFS or VxFS filesystem similarly dying in the same scenario? Thanks! Brian> My opinions and questions are my own, and do not necessarily represent > those of my employer. (or my coworkers, or anyone else) > > cheers, > Brian > > > Bob > > =====================================> > Bob Friesenhahn > > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Bob Friesenhahn
2008-Jun-14 19:19 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Sat, 14 Jun 2008, Brian Wilson wrote:> What are the odds, in that configuration of zpool (no mirroring, > just using the intelligent disk as concatenated luns in the zpool) > that if we have this silent corruption, the whole zpool dies? If > anyone knows, what''s the comparative odds of the VxVM volume, UFS or > VxFS filesystem similarly dying in the same scenario?I don''t know the answer to that. Probably nobody knows the answer since there is no formal research project to analyze it and no automatic collection agent to report the data. You can scan the list archives to find the zfs horror stories. Most of the "whole pool died" horror stories are not due to data loss on a properly maintained RAID array. Zfs does not come with fsck. That is both good and bad. With fsck you can simply say ''yes'' to the obscure questions and (if you are lucky) after a few hours (or a day), there will be something left and perhaps that critical file is still to be found in the lost+found directory if you can figure out which one it is among all the files which which were previously deleted and are now ressurected. With zfs you can scrub the pool at the system level. This allows you to discover many issues early before they become nightmares. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
dick hoogendijk
2008-Jun-14 19:32 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Sat, 14 Jun 2008 14:19:05 -0500 (CDT) Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> With zfs you can scrub the pool at the system level. This allows you > to discover many issues early before they become nightmares.#zpool status scrub: none requested My question is really, do I wait ''till scrub is requested or am I supposed to scrub on a regular basis myself. -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D ++ http://nagual.nl/ + SunOS sxce snv90 ++
Bob Friesenhahn
2008-Jun-14 19:51 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Sat, 14 Jun 2008, dick hoogendijk wrote:>> With zfs you can scrub the pool at the system level. This allows you >> to discover many issues early before they become nightmares. > > #zpool status > scrub: none requested > > My question is really, do I wait ''till scrub is requested or am I > supposed to scrub on a regular basis myself.I think that "none requested" likely means that the administrator has never issued a request to scrub the pool. How often to scrub depends on how much you care about your data and how invasive the scrub is to other activities (I/O bandwidth consumption, snapshots, acoustic noise, electricity consumption), and how long the scrub takes. My pool is set to be scrubbed every night via a cron job: # Scrub the pool for errors 20 4 * * * /usr/sbin/zpool scrub MyPool Scrub helps find and correct residual problems before they cause serious problems such as when the data is used, or during resilvering. The statistical chances of problems during disk resilvering are surely significantly improved if scrub is executed often since scrub and resilvering both access the same data. This is based on the "Does the sun comes up every day?" principle. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Al Hopper
2008-Jun-15 00:09 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Sat, Jun 14, 2008 at 12:11 PM, Brian Wilson <bfwilson at doit.wisc.edu> wrote:> > > > >> On Sat, 14 Jun 2008, zfsmonk wrote: >> >> > Mentioned on >> > >> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide >> >> > is the following: "ZFS works well with storage based protected LUNs >> >> > (RAID-5 or mirrored LUNs from intelligent storage arrays). However, >> >> > ZFS cannot heal corrupted blocks that are detected by ZFS >> > checksums." >> >> This basically means that the checksum itself is not sufficient to >> accomplish correction. However if ZFS-level RAID is used, the correct >> >> block can be obtained from a redundant copy. >> >> > based upon that, if we have LUNs already in RAID5 being served from >> >> > intelligent storage arrays, is it any benefit to create the zpool in >> >> > a mirror if zfs can''t heal any corrupted blocks? Or would we just be >> >> > wasting disk space? >> >> This is a matter of opinion. If ZFS does not have access to >> redundancy then it can not correct any problems that it encounters, >> and could even panic the system or the entire pool could be lost. >> However, if the storage array and all associated drivers, adaptors, >> memory, and links are working correctly, then this risk may be >> acceptable (to you). >> >> ZFS experts at Sun say that even the best storage arrays may not >> detect and correct some problems and that complex systems can produce >> >> errors even though all of their components seem to be working >> correctly. This is in spite of Sun also making a living by selling >> such products. The storage array is only able to correct errors it >> detects due to the hardware reporting an unrecoverable error condition >> >> or by double-checking using data on a different drive. Since storage >> >> arrays want to be fast they are likely to engage additional validity >> checks/correction only after a problem has already been reported (or >> during a scrub/resilver) rather than as a matter of course. >> >> A problem which may occur is that your storage array may say that the >> >> data is good while ZFS says that there is bad data. Under these >> conditions there might not be a reasonable way to correct the problem >> >> other than to lose the data. If the zfs pool requires the failed data >> >> in order to operate, then the entire pool could be lost. >> > > Couple of questions on this topic - > > What''s the percent of data in a zpool that if it gets one of these bit corruption errors, will actually cause the zpool to fail? Is it a higher/lower percent than what it would take to fatally and irrevocably corrupt UFS, or VxFS to the point where a restore is required? > > Given that today''s storage arrays catch a good percentage of errors and correct them (for the intelligent arrays I have in mind anyway), is we''re talking about the nasty, silent corruption I''ve been reading about that occurs in huge datasets where the RAID thinks it''s good, but it''s actually garbage? From what I remember reading, that''s an low occurrence rate and only became noticeable because we''re dealing in such large amounts of data these days. Am I wrong here?Yes - you''re "wrong" - but not because you''re unintelligent or saying something "wrong", but because you can be let down by a bad FC (Fibre Channel) port on a switch (random noise) or by a bad optical component in the optical path between your host system that is writing the data and the final destination (read "expensive FC hardware SAN box") - or a bad optical connection or a "flaky" data comm link. Or ... a firmware bug (in your high $dollar SAN box) after the last (firmware) upgrade you performed on your SAN box. There are already well documented cases where an OP mailed the ZFS list and said "my SAN box has been working correctly for X years, and when I used ZFS to store data on it, ZFS "said" that the data is "bad". ZFS is "broken" (technical term (TM)) and not ready for prime time. In *all* cases, it turned out the ZFS was *not* broken and that there was a problem somewhere in the data path, or with the SAN hardware/firmware. Also - look at the legacy posts and see where an OpenSolaris developer discovered that the errors being reported by ZFS were caused by a flaky/noisey power supply in his desktop box - despite the fact that the particular desktop was very popular with other (OpenSolaris) kernel developers as was widely regarded as "fool-proof". Its probably true to state that ZFS is the first filesystem that allowed those high-$dollar hardware SAN vendors to actually verify that their complex hardware/firmware chain was behaving as designed, end-to-end. Where "end-to-end" is defined as the data that that host system writes is actually the data that can be retrieved N years after its been written!> So, looking at making operational decisions in the short term, I have to ask specifically. Is it more or less likely that a zpool will die and have to be restored than UFS or VxFS filesystems on a VxVM volume? > > My opinions and questions are my own, and do not necessarily represent those of my employer. (or my coworkers, or anyone else) > > cheers, > Brian > >> Bob >> =====================================>> Bob Friesenhahn >> bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ >> GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Regards, -- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Brian Hechinger
2008-Jun-15 04:10 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Sat, Jun 14, 2008 at 02:19:05PM -0500, Bob Friesenhahn wrote:> On Sat, 14 Jun 2008, Brian Wilson wrote: > > > What are the odds, in that configuration of zpool (no mirroring, > > just using the intelligent disk as concatenated luns in the zpool) > > that if we have this silent corruption, the whole zpool dies? If > > anyone knows, what''s the comparative odds of the VxVM volume, UFS or > > VxFS filesystem similarly dying in the same scenario? > > I don''t know the answer to that. Probably nobody knows the answer > since there is no formal research project to analyze it and no > automatic collection agent to report the data. You can scan the list > archives to find the zfs horror stories. Most of the "whole pool > died" horror stories are not due to data loss on a properly maintained > RAID array.ZFS uses ditto blocks for meta-data. I think it would be really hard for silent corruptions to render a ZFS volume (even on a single disk with no RAID or mirroring) unless luck just wasn''t on your side and both copies of your meta-data got corrupted. That being said, you can increase the number of ditto copies that are made (I think 2 is the default for meta-data and 1 is the default for data) and increase your chances of survival on a single disk system. -brian -- "Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you''ll end up with a cupboard full of pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
Brian Hechinger
2008-Jun-15 04:12 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Sat, Jun 14, 2008 at 02:51:31PM -0500, Bob Friesenhahn wrote:> > I think that "none requested" likely means that the administrator has > never issued a request to scrub the pool.Or the system. That status line will show the last scrub/resilver to have taken place. "None requested" means that no scrub/resilver has happened.> How often to scrub depends on how much you care about your data and > how invasive the scrub is to other activities (I/O bandwidth > consumption, snapshots, acoustic noise, electricity consumption), and > how long the scrub takes. My pool is set to be scrubbed every night > via a cron job:And like all other things of this nature, the more often you do it, the less invasive it will be as there is less to do. That being said, I still wouldn''t recommend hourly scrubs. ;)> The statistical chances of problems during disk resilvering are surely > significantly improved if scrub is executed often since scrub and > resilvering both access the same data. This is based on the "Does the > sun comes up every day?" principle.Also I would think that this would in the worst case scenario reduce the amount of time to resilver, but I could be wrong. -brian -- "Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you''ll end up with a cupboard full of pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
Bob Friesenhahn
2008-Jun-15 05:16 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Sun, 15 Jun 2008, Brian Hechinger wrote:>> how long the scrub takes. My pool is set to be scrubbed every night >> via a cron job: > > And like all other things of this nature, the more often you do it, the > less invasive it will be as there is less to do. That being said, I still > wouldn''t recommend hourly scrubs. ;)Unless things are quite broken, why would there be less to do? It seems that the amount of work to do depends on the amount of data stored (and data transfer rate) since the scrub''s task is to read all of the data and make sure that it is consistent. If my math is right, it seems that scrub on my drive array proceeds at about 15.2GB/minute (259MB/second). It would be interesting to see what typical scrub rates are for various scenarios. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Erik Trimble
2008-Jun-16 08:45 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
One thing I should mention on this is that I''ve had _very_ bad experience with using single-LUN ZFS filesystems over FC. that is, using an external SAN box to create a single LUN, export that LUN to a FC-connected host, then creating a pool as follows: zpool create tank <LUN_ID> It works fine, up until something bad happens to the array, or the FC connection (like, say, losing power to the whole system), and the host computer cannot talk to the LUN. This will corrupt the zpool permanently, and there is no way to fix the pool (and, without some magic in /etc/system , will leave the host in a permanent kernel panic loop). This is a known bug, and the fix isn''t looking to be available anytime soon. This problem doesn''t seem to manifest itself if the zpool has redundant members, even if they are on the same array (and thus, the host loses contact with both LUNs at the same time). So, for FC or iSCSI targets, I would HIGHLY recommend that ZFS _ALWAYS_ be configured in a redundant setup. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Vincent Fox
2008-Jun-16 17:53 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
I''m not sure why people obsess over this issue so much. Disk is cheap. We have a fair number of 3510 and 2540 on our SAN. They make RAID-5 LUNs available to various servers. On the servers we take RAID-5 LUNs from different arrays and ZFS mirror them. So if any array goes away we are still uperational. VERY ROBUST! If you are trying to be cheap, then you could: 1) Use copies=2 to make sure data is duplicated 2) Advertise individual disks as LUN build RAIDZ2 on them. The advantage of intelligent array is I have low-level control of matching a hot-spare in array#1 to the LUN in array#1. ZFS does not have this fine-grained hot-spare capability yet so I just don''t use ZFS sparing. Also the array has SAN connectivity and caching and dual-controllers that just don''t exist in the JBOD world. I am hosting mailboxs for > 50K people, we cannot afford lengthy downtimes. This message posted from opensolaris.org
Bob Friesenhahn
2008-Jun-16 18:15 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Mon, 16 Jun 2008, Vincent Fox wrote:> Also the array has SAN connectivity and caching and > dual-controllers that just don''t exist in the JBOD world.As a clarification, you can convince your StorageTek 2540 to appear as JBOD on the SAN. Then you obtain the SAN connectivity and caching and dual-controllers. One does not exclude the other. The sparing and user interface might be nicer using CAM and RAID-5. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Robert Milkowski
2008-Jun-16 22:33 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
Hello Erik, Monday, June 16, 2008, 9:45:13 AM, you wrote: ET> One thing I should mention on this is that I''ve had _very_ bad ET> experience with using single-LUN ZFS filesystems over FC. ET> that is, using an external SAN box to create a single LUN, export that ET> LUN to a FC-connected host, then creating a pool as follows: ET> zpool create tank <LUN_ID> ET> It works fine, up until something bad happens to the array, or the FC ET> connection (like, say, losing power to the whole system), and the host ET> computer cannot talk to the LUN. ET> This will corrupt the zpool permanently, and there is no way to fix the ET> pool (and, without some magic in /etc/system , will leave the host in a ET> permanent kernel panic loop). This is a known bug, and the fix isn''t ET> looking to be available anytime soon. ET> This problem doesn''t seem to manifest itself if the zpool has redundant ET> members, even if they are on the same array (and thus, the host loses ET> contact with both LUNs at the same time). ET> So, for FC or iSCSI targets, I would HIGHLY recommend that ZFS _ALWAYS_ ET> be configured in a redundant setup. Have you got more details or at least bug ids? Is it only (I dboubt) fc related? -- Best regards, Robert mailto:milek at task.gda.pl http://milek.blogspot.com
Jeff Bonwick
2008-Jul-01 01:28 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
Using ZFS to mirror two hardware RAID-5 LUNs is actually quite nice. Because the data is mirrored at the ZFS level, you get all the benefits of self-healing. Moreover, you can survive a great variety of hardware failures: three or more disks can die (one in the first array, two or more in the second), failure of a cable, or failure of an entire array. Jeff On Sat, Jun 14, 2008 at 08:09:49AM -0700, zfsmonk wrote:> Mentioned on http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide is the following: > "ZFS works well with storage based protected LUNs (RAID-5 or mirrored LUNs from intelligent storage arrays). However, ZFS cannot heal corrupted blocks that are detected by ZFS checksums." > > based upon that, if we have LUNs already in RAID5 being served from intelligent storage arrays, is it any benefit to create the zpool in a mirror if zfs can''t heal any corrupted blocks? Or would we just be wasting disk space? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Erik Trimble
2008-Jul-01 02:42 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
Jeff Bonwick wrote:> Using ZFS to mirror two hardware RAID-5 LUNs is actually quite nice. > Because the data is mirrored at the ZFS level, you get all the benefits > of self-healing. Moreover, you can survive a great variety of hardware > failures: three or more disks can die (one in the first array, two or > more in the second), failure of a cable, or failure of an entire array. > > Jeff > > On Sat, Jun 14, 2008 at 08:09:49AM -0700, zfsmonk wrote: > >> Mentioned on http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide is the following: >> "ZFS works well with storage based protected LUNs (RAID-5 or mirrored LUNs from intelligent storage arrays). However, ZFS cannot heal corrupted blocks that are detected by ZFS checksums." >> >> based upon that, if we have LUNs already in RAID5 being served from intelligent storage arrays, is it any benefit to create the zpool in a mirror if zfs can''t heal any corrupted blocks? Or would we just be wasting disk space? >>As Jeff mentioned, use two HW RAID-5 LUNs in a zpool for a mirror (or, even 3+ LUNs for a RAID-Z of RAID-5 :-) The quote from the Best Practices Guide is applicable to single LUN zpools (and, applies to any single-vdev zpool). Indeed, there are some nasty problems with using single-LUN zpools, so DON''T DO IT. ZFS is happiest (and you will be too) when you allow some redundancy inside ZFS, and not just at the hardware level. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Mike Gerdts
2008-Jul-01 02:44 UTC
[zfs-discuss] zpool with RAID-5 from intelligent storage arrays
On Mon, Jun 16, 2008 at 5:33 PM, Robert Milkowski <milek at task.gda.pl> wrote:> Have you got more details or at least bug ids? > Is it only (I dboubt) fc related?I ran into something that looks like 6594621 dangling dbufs (dn=ffffff056a5ad0a8, dbuf=ffffff0520303300) during stress with LDoms 1.0. It seems as though data that zfs in a guest LDom thought was committed was not really committed. Not FC related, but it is quite frustrating to deal with a panic loop in a file system (zpool) not required to boot the system to single user mode. That one has since been fixed. More recently I reported: 6709336 panic in mzap_open(): avl_find() succeeded inside avl_add() If the file that triggered this panic were in a place that was read at boot, it would be a panic loop. I asked on the list[1] if anyone was interested in a dump to dig into it more, with no takers. Earlier today I noticed that Jeff Bonwick said that not getting dumps was criminal[2], so a special cc goes out to him. :) 1. http://mail.opensolaris.org/pipermail/zfs-discuss/2008-May/047869.html 2. http://mail.opensolaris.org/pipermail/caiman-discuss/2008-June/004405.html I''ve run into many other problems with I/O errors when doing a stat() of a file. Repeated tries fails, but a reboot seems to clear it. zpool scrub reports no errors and the pool consists of a single mirror vdev. I haven''t filed a bug on this yet. -- Mike Gerdts http://mgerdts.blogspot.com/