Does anyone have a document that describes ZFS in a pure SAN environment? What will and will not work? From some of the information I have been gathering it doesn''t appear that ZFS was intended to operate in a SAN environment. Thanks, Dave
Dave Burleson wrote:> Does anyone have a document that describes ZFS in a pure > SAN environment? What will and will not work? > > From some of the information I have been gathering > it doesn''t appear that ZFS was intended to operate > in a SAN environment.What information? ZFS works on a SAN just as well as it does in other environments.
Hello Dave, Friday, December 15, 2006, 9:02:31 PM, you wrote: DB> Does anyone have a document that describes ZFS in a pure DB> SAN environment? What will and will not work? ZFS is "just" a filesystem with "just" an integrated volume manager. Ok, it''s more than that. The point is that if any other file system works in your SAN then ZFS should also work. There could be some issues with some arrays with flushing cache (I haven''t got hit by that) but there''s an workaround. Other than that it should just work or should even work better due to end-to-end data integrity - generally with SANs you''ve got more things which can play with your data and ZFS can take care of it or at least detect it. DB> From some of the information I have been gathering DB> it doesn''t appear that ZFS was intended to operate DB> in a SAN environment. I don''t know why people keep saying strange things about ZFS. Maybe it''s due to fact that ZFS is so different they don''t know what to do with it and get confused? Or maybe as ZFS makes cheap storage solutions really valuable option people start to think it only belongs to that segment - which is of course not true. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
I use zfs in a san. I have two Sun V440s running solaris 10 U2, which have luns assigned to them from my Sun SE 3511. So far, it has worked flawlessly. Robert Milkowski wrote:> Hello Dave, > > Friday, December 15, 2006, 9:02:31 PM, you wrote: > > DB> Does anyone have a document that describes ZFS in a pure > DB> SAN environment? What will and will not work? > > ZFS is "just" a filesystem with "just" an integrated volume manager. > Ok, it''s more than that. > The point is that if any other file system works in your SAN then ZFS > should also work. There could be some issues with some arrays with > flushing cache (I haven''t got hit by that) but there''s an workaround. > Other than that it should just work or should even work better due to > end-to-end data integrity - generally with SANs you''ve got more things > which can play with your data and ZFS can take care of it or at least > detect it. > > > DB> From some of the information I have been gathering > DB> it doesn''t appear that ZFS was intended to operate > DB> in a SAN environment. > > I don''t know why people keep saying strange things about ZFS. > Maybe it''s due to fact that ZFS is so different they don''t know what > to do with it and get confused? Or maybe as ZFS makes cheap storage > solutions really valuable option people start to think it only belongs > to that segment - which is of course not true. > >
On Friday 15 December 2006 20:02, Dave Burleson wrote:> Does anyone have a document that describes ZFS in a pure > SAN environment? What will and will not work? > > From some of the information I have been gathering > it doesn''t appear that ZFS was intended to operate > in a SAN environment.This might answer your question: http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid
On Sun, 17 Dec 2006, Ricardo Correia wrote:> On Friday 15 December 2006 20:02, Dave Burleson wrote: > > Does anyone have a document that describes ZFS in a pure > > SAN environment? What will and will not work? > > > > From some of the information I have been gathering > > it doesn''t appear that ZFS was intended to operate > > in a SAN environment. > > This might answer your question: > http://www.opensolaris.org/os/community/zfs/faq/#hardwareraidThe section entitled "Does ZFS work with SAN-attached devices?" does not make it clear the (some would say) dire effects of not having pool redundancy. I think that FAQ should clearly spell out the downside; i.e., where ZFS will "say" (Sorry Charlie) "pool is corrupt". A FAQ should always emphasize the real-world downsides to poor decisions made by the reader. Not delivering "bad news" does the reader a dis-service IMHO. Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006
On Dec 17, 2006, at 6:57 PM, Al Hopper wrote:> On Sun, 17 Dec 2006, Ricardo Correia wrote: > >> On Friday 15 December 2006 20:02, Dave Burleson wrote: >>> Does anyone have a document that describes ZFS in a pure >>> SAN environment? What will and will not work? >>> >>> From some of the information I have been gathering >>> it doesn''t appear that ZFS was intended to operate >>> in a SAN environment. >> >> This might answer your question: >> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid > > The section entitled "Does ZFS work with SAN-attached devices?" > does not > make it clear the (some would say) dire effects of not having pool > redundancy. I think that FAQ should clearly spell out the > downside; i.e., > where ZFS will "say" (Sorry Charlie) "pool is corrupt". > > A FAQ should always emphasize the real-world downsides to poor > decisions > made by the reader. Not delivering "bad news" does the reader a > dis-service IMHO. > > Regards, > > Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 > OpenSolaris Governing Board (OGB) Member - Feb 2006 > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussHmmm... A question. Are you referring to not using redundancy within the array, or not using a redundant pool configuration? In the case of the former, I completely agree. In the case of the latter using intelligent arrays, I don''t see how the a ''pool corrupt'' problem differs from any non-zfs solution today. If you''re using RAID-5 LUNs along with UFS/VxFS/SVM with no mirroring, you''re in the same situation; corruption within the array will require a data restore. Personally, I think data that requires more than RAID-5 redundancy should be mirrored between discrete storage arrays. This configuration allows ZFS to mirror the data, while using RAID-5 (or better) within the controllers for best performance. This solution isn''t cheap, however. The justification for a dual- array solution really depends on the data value. ----- Gregory Shaw, IT Architect IT CTO Group, Sun Microsystems Inc. Phone: (303)-272-8817 500 Eldorado Blvd, UBRM02-157 greg.shaw at sun.com (work) Broomfield, CO 80021 shaw at fmsoft.com (home) "When Microsoft writes an application for Linux, I''ve won." - Linus Torvalds -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061217/d02282a0/attachment.html>
On Sun, Dec 17, 2006 at 07:57:20PM -0600, Al Hopper wrote:> > The section entitled "Does ZFS work with SAN-attached devices?" does not > make it clear the (some would say) dire effects of not having pool > redundancy. I think that FAQ should clearly spell out the downside; i.e., > where ZFS will "say" (Sorry Charlie) "pool is corrupt". >This is not entirely true, thanks to ditto blocks. All metadata is written multiple times (3 times for pool metadata) regardless of the entirely device layout. We did this precisely because ZFS has a tree-based layout - losing an entire pool due to a single corrupt block is not acceptable. If you have a corruption in three distinct blocks across different devices, then you have some seriously busted hardware. I would be surprised if any filesystem were able to run sensibly in such an environment. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Al Hopper wrote:> On Sun, 17 Dec 2006, Ricardo Correia wrote: > > >> On Friday 15 December 2006 20:02, Dave Burleson wrote: >> >>> Does anyone have a document that describes ZFS in a pure >>> SAN environment? What will and will not work? >>> >>> From some of the information I have been gathering >>> it doesn''t appear that ZFS was intended to operate >>> in a SAN environment. >>> >> This might answer your question: >> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid >> > > The section entitled "Does ZFS work with SAN-attached devices?" does not > make it clear the (some would say) dire effects of not having pool > redundancy. I think that FAQ should clearly spell out the downside; i.e., > where ZFS will "say" (Sorry Charlie) "pool is corrupt". > > A FAQ should always emphasize the real-world downsides to poor decisions > made by the reader. Not delivering "bad news" does the reader a > dis-service IMHO.I''d say that it''s clearly described in the FAQ. If you push to hard people will infer that SANs are broken if you use ZFS on top of them or vice versa. The only bit that looks a little questionable to my eyes is ... Overall, ZFS functions as designed with SAN-attached devices, but if you expose simpler devices to ZFS, you can better leverage all available features. What are "simpler devices"? (I could take a guess ... )
On Dec 18, 2006, at 16:13, Torrey McMahon wrote:> Al Hopper wrote: >> On Sun, 17 Dec 2006, Ricardo Correia wrote: >> >> >>> On Friday 15 December 2006 20:02, Dave Burleson wrote: >>> >>>> Does anyone have a document that describes ZFS in a pure >>>> SAN environment? What will and will not work? >>>> >>>> From some of the information I have been gathering >>>> it doesn''t appear that ZFS was intended to operate >>>> in a SAN environment. >>>> >>> This might answer your question: >>> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid >>> >> >> The section entitled "Does ZFS work with SAN-attached devices?" >> does not >> make it clear the (some would say) dire effects of not having pool >> redundancy. I think that FAQ should clearly spell out the >> downside; i.e., >> where ZFS will "say" (Sorry Charlie) "pool is corrupt". >> >> A FAQ should always emphasize the real-world downsides to poor >> decisions >> made by the reader. Not delivering "bad news" does the reader a >> dis-service IMHO. > > > I''d say that it''s clearly described in the FAQ. If you push to > hard people will infer that SANs are broken if you use ZFS on top > of them or vice versa. The only bit that looks a little > questionable to my eyes is ... > > Overall, ZFS functions as designed with SAN-attached devices, > but if > you expose simpler devices to ZFS, you can better leverage all > available features. > > What are "simpler devices"? (I could take a guess ... )stone tablets in a room full of monkeys with chisels? The bottom line is ZFS wants to ultimately function as the controller cache and eventually eliminate the blind data algorithms that they incorporate .. the problem is that we can''t really say that explicitly since we sell, and much of the enterprise operates with enterprise class arrays and integrated data cache. The trick is in balancing who does what since you''ve really got duplicate Virtualization, RAID, and caching options open to you. .je
comment far below... Jonathan Edwards wrote:> > On Dec 18, 2006, at 16:13, Torrey McMahon wrote: > >> Al Hopper wrote: >>> On Sun, 17 Dec 2006, Ricardo Correia wrote: >>> >>> >>>> On Friday 15 December 2006 20:02, Dave Burleson wrote: >>>> >>>>> Does anyone have a document that describes ZFS in a pure >>>>> SAN environment? What will and will not work? >>>>> >>>>> From some of the information I have been gathering >>>>> it doesn''t appear that ZFS was intended to operate >>>>> in a SAN environment. >>>>> >>>> This might answer your question: >>>> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid >>>> >>> >>> The section entitled "Does ZFS work with SAN-attached devices?" does not >>> make it clear the (some would say) dire effects of not having pool >>> redundancy. I think that FAQ should clearly spell out the downside; >>> i.e., >>> where ZFS will "say" (Sorry Charlie) "pool is corrupt". >>> >>> A FAQ should always emphasize the real-world downsides to poor decisions >>> made by the reader. Not delivering "bad news" does the reader a >>> dis-service IMHO. >> >> >> I''d say that it''s clearly described in the FAQ. If you push to hard >> people will infer that SANs are broken if you use ZFS on top of them >> or vice versa. The only bit that looks a little questionable to my >> eyes is ... >> >> Overall, ZFS functions as designed with SAN-attached devices, but if >> you expose simpler devices to ZFS, you can better leverage all >> available features. >> >> What are "simpler devices"? (I could take a guess ... ) > > stone tablets in a room full of monkeys with chisels? > > The bottom line is ZFS wants to ultimately function as the controller cache > and eventually eliminate the blind data algorithms that they incorporate ..I don''t get this impression at all.> the problem is that we can''t really say that explicitly since we sell, > and much > of the enterprise operates with enterprise class arrays and integrated data > cache. The trick is in balancing who does what since you''ve really got > duplicate Virtualization, RAID, and caching options open to you.In general, the closer to the user you can make policy decisions, the better decisions you can make. The fact that we''ve had 10 years of RAID arrays acting like dumb block devices doesn''t mean that will continue for the next 10 years :-) In the interim, we will see more and more intelligence move closer to the user. -- richard
On Mon, 18 Dec 2006, Torrey McMahon wrote:> Al Hopper wrote: > > On Sun, 17 Dec 2006, Ricardo Correia wrote: > > > > > >> On Friday 15 December 2006 20:02, Dave Burleson wrote: > >> > >>> Does anyone have a document that describes ZFS in a pure > >>> SAN environment? What will and will not work? > >>> > >>> From some of the information I have been gathering > >>> it doesn''t appear that ZFS was intended to operate > >>> in a SAN environment. > >>> > >> This might answer your question: > >> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid > >> > > > > The section entitled "Does ZFS work with SAN-attached devices?" does not > > make it clear the (some would say) dire effects of not having pool > > redundancy. I think that FAQ should clearly spell out the downside; i.e., > > where ZFS will "say" (Sorry Charlie) "pool is corrupt". > > > > A FAQ should always emphasize the real-world downsides to poor decisions > > made by the reader. Not delivering "bad news" does the reader a > > dis-service IMHO. > > > I''d say that it''s clearly described in the FAQ. If you push to hard > people will infer that SANs are broken if you use ZFS on top of them or > vice versa.[ .... re-formatted ... but no content changed .... ] Fair enough - I''m also in receipt of pushback from the illustrious Eric Schrock - which usually indicates that I''m on the loosing side of this argument ^H^H^H^H^H^H^H^H (sorry) discussion. :)> The only bit that looks a little questionable to my eyes is ... > > Overall, ZFS functions as designed with SAN-attached devices, but if > you expose simpler devices to ZFS, you can better leverage all > available features. > > What are "simpler devices"? (I could take a guess ... ) >--- new comment ---- Let me look at a couple of possible user "bad" assumptions and see if the FAQ still reflects what a ZFS "convert" _might_ inadvertantly do. And I''ll try the scenarios in mind on Update 3. In the case that I don''t come up with anything worthwhile, I''ll still post a followup. I think it is always best to "fess up" to a mistake or a misleading post. Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006
It seems to me that the optimal scenario would be network filesystems on top of ZFS, so you can get the data portability of a SAN, but let ZFS make all of the decisions. Short of that, ZFS on SAN-attached JBODs would give a similar benefit. Having benefited tremendously from being able to easily detach and re-attach storage because of a SAN, its difficult to give that capability up to get maximum ZFS-benefit. Best Regards, Jason On 12/18/06, Richard Elling <Richard.Elling at sun.com> wrote:> comment far below... > > Jonathan Edwards wrote: > > > > On Dec 18, 2006, at 16:13, Torrey McMahon wrote: > > > >> Al Hopper wrote: > >>> On Sun, 17 Dec 2006, Ricardo Correia wrote: > >>> > >>> > >>>> On Friday 15 December 2006 20:02, Dave Burleson wrote: > >>>> > >>>>> Does anyone have a document that describes ZFS in a pure > >>>>> SAN environment? What will and will not work? > >>>>> > >>>>> From some of the information I have been gathering > >>>>> it doesn''t appear that ZFS was intended to operate > >>>>> in a SAN environment. > >>>>> > >>>> This might answer your question: > >>>> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid > >>>> > >>> > >>> The section entitled "Does ZFS work with SAN-attached devices?" does not > >>> make it clear the (some would say) dire effects of not having pool > >>> redundancy. I think that FAQ should clearly spell out the downside; > >>> i.e., > >>> where ZFS will "say" (Sorry Charlie) "pool is corrupt". > >>> > >>> A FAQ should always emphasize the real-world downsides to poor decisions > >>> made by the reader. Not delivering "bad news" does the reader a > >>> dis-service IMHO. > >> > >> > >> I''d say that it''s clearly described in the FAQ. If you push to hard > >> people will infer that SANs are broken if you use ZFS on top of them > >> or vice versa. The only bit that looks a little questionable to my > >> eyes is ... > >> > >> Overall, ZFS functions as designed with SAN-attached devices, but if > >> you expose simpler devices to ZFS, you can better leverage all > >> available features. > >> > >> What are "simpler devices"? (I could take a guess ... ) > > > > stone tablets in a room full of monkeys with chisels? > > > > The bottom line is ZFS wants to ultimately function as the controller cache > > and eventually eliminate the blind data algorithms that they incorporate .. > > I don''t get this impression at all. > > > the problem is that we can''t really say that explicitly since we sell, > > and much > > of the enterprise operates with enterprise class arrays and integrated data > > cache. The trick is in balancing who does what since you''ve really got > > duplicate Virtualization, RAID, and caching options open to you. > > In general, the closer to the user you can make policy decisions, the better > decisions you can make. The fact that we''ve had 10 years of RAID arrays > acting like dumb block devices doesn''t mean that will continue for the next > 10 years :-) In the interim, we will see more and more intelligence move > closer to the user. > -- richard > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hi All, Umm... I recently posted that I have successfully deployed zfs in a san. Well, I just had a disk fail on the second day of production, and am currently in downtime waiting for a disk from Sun. I have a Sun SE 3511 array with 5 x 500 GB SATA-I disks in a RAID 5. This 2 TB logical drive is partitioned into 10 x 200GB slices. I gave 4 of these slices to a Solaris 10 U2 machine and added each of them to a concat (non-raid) zpool as listed below: [root at tsali ~]# zpool status zp1 pool: zp1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zp1 ONLINE 0 0 0 c2t216000C0FF892AE3d0 ONLINE 0 0 0 c2t216000C0FF892AE3d2 ONLINE 0 0 0 c2t216000C0FF892AE3d3 ONLINE 0 0 0 c2t216000C0FF892AE3d4 ONLINE 0 0 0 errors: No known data errors Basically, is this a supported zfs configuration? You are gonna laugh, but do you think my zfs configuration caused the drive failure? Cheers, Mike
Mike Seda wrote:> > Basically, is this a supported zfs configuration?Can''t see why not, but support or not is something only Sun support can speak for, not this mailing list. You say you lost access to the array though-- a full disk failure shouldn''t cause this if you were using RAID-5 on the array. Perhaps you mean you''ve had to take it out of production because it couldn''t keep up with the expected workload?> You are gonna laugh, but do you think my zfs configuration caused the > drive failure?You mention this is a new array. As one Sun person (whose name I can''t remember) mentioned to me, there''s a high ''infant mortality'' rate among semiconductors. Components that are going to fail will either do so in the first 120 days or so, or will run for many years. I''m no expert in the area though and I have no data to prove it, but it has felt somewhat true as I''ve seen new systems set up over the years. A quick search for "semiconductor infant mortality" turned up some interesting results. Chances are, it''s something much more mundane that got your disk. ZFS is using the same underlying software as everything else to read/write to disks on a SAN (i.e. the ssd driver and friends)-- it''s just smarter about it. :) Regards, - Matt
On 18-Dec-06, at 11:18 PM, Matt Ingenthron wrote:> Mike Seda wrote: >> >> Basically, is this a supported zfs configuration? > Can''t see why not, but support or not is something only Sun support > can speak for, not this mailing list. > > You say you lost access to the array though-- a full disk failure > shouldn''t cause this if you were using RAID-5 on the array. > Perhaps you mean you''ve had to take it out of production because it > couldn''t keep up with the expected workload? >> You are gonna laugh, but do you think my zfs configuration caused >> the drive failure? > You mention this is a new array. As one Sun person (whose name I > can''t remember) mentioned to me, there''s a high ''infant mortality'' > rate among semiconductors. Components that are going to fail will > either do so in the first 120 days or so, or will run for many years. > I''m no expert in the area though and I have no data to prove it, > but it has felt somewhat true as I''ve seen new systems set up over > the years. A quick search for "semiconductor infant mortality" > turned up some interesting results.You might have even more luck with "tub curve". --Toby> > Chances are, it''s something much more mundane that got your disk. > ZFS is using the same underlying software as everything else to > read/write to disks on a SAN (i.e. the ssd driver and friends)-- > it''s just smarter about it. :) > > Regards, > > - Matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> I have a Sun SE 3511 array with 5 x 500 GB SATA-I disks in a RAID 5. This > 2 TB logical drive is partitioned into 10 x 200GB slices. I gave 4 of these slices to a > Solaris 10 U2 machine and added each of them to a concat (non-raid) zpool as listed below:This is certainly a supportable configuration. However, it''s not an optimal one. You think that you have a ''concat'' structure, but it''s actually striped/RAID-0, because ZFS implicitly stripes across all of its top-level structures (your slices, in this case). This means that ZFS will constantly be writing data to addresses around 0, 50 GB, 100 GB, and 150 GB of each disk (presuming the first four slices are those you used). This will keep the disk arms constantly in motion, which isn''t good for performance.> do you think my zfs configuration caused the drive failure?I doubt it. I haven''t investigated which disks ship in the 3511, but I would presume they are "enterprise-class" ATA drives, which can handle this type of head motion. (Standard ATA disks can overheat under a load which is heavy in seeks.) Then again, the 3511 is marketed as a "near-line" rather than "on-line array" ... that may be simply because the SATA drives don''t perform as well as FC. I do see this note in the 3511 documentation: "Note - Do not use a Sun StorEdge 3511 SATA array to store single instances of data. It is more suitable for use in configurations where the array has a backup or archival role." (I too am curious -- why do you consider yourself down? You''ve got a RAID 5, one disk is down, are you just worried about your current lack of redundancy? [I would be.] Will you be adding a hot spare?) Anton This message posted from opensolaris.org
Shouldn''t there be a big warning when configuring a pool with no redundancy and/or should that not require a -f flag ? -r Al Hopper writes: > On Sun, 17 Dec 2006, Ricardo Correia wrote: > > > On Friday 15 December 2006 20:02, Dave Burleson wrote: > > > Does anyone have a document that describes ZFS in a pure > > > SAN environment? What will and will not work? > > > > > > From some of the information I have been gathering > > > it doesn''t appear that ZFS was intended to operate > > > in a SAN environment. > > > > This might answer your question: > > http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid > > The section entitled "Does ZFS work with SAN-attached devices?" does not > make it clear the (some would say) dire effects of not having pool > redundancy. I think that FAQ should clearly spell out the downside; i.e., > where ZFS will "say" (Sorry Charlie) "pool is corrupt". > > A FAQ should always emphasize the real-world downsides to poor decisions > made by the reader. Not delivering "bad news" does the reader a > dis-service IMHO. > > Regards, > > Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 > OpenSolaris Governing Board (OGB) Member - Feb 2006 > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Anton B. Rang wrote:>> I have a Sun SE 3511 array with 5 x 500 GB SATA-I disks in a RAID 5. This >> 2 TB logical drive is partitioned into 10 x 200GB slices. I gave 4 of these slices to a >> Solaris 10 U2 machine and added each of them to a concat (non-raid) zpool as listed below: >> > > This is certainly a supportable configuration. However, it''s not an optimal one. >What would be the optimal configuration that you recommend?> You think that you have a ''concat'' structure, but it''s actually striped/RAID-0, because ZFS implicitly stripes across all of its top-level structures (your slices, in this case). This means that ZFS will constantly be writing data to addresses around 0, 50 GB, 100 GB, and 150 GB of each disk (presuming the first four slices are those you used). This will keep the disk arms constantly in motion, which isn''t good for performance. > > >> do you think my zfs configuration caused the drive failure? >> > > I doubt it. I haven''t investigated which disks ship in the 3511, but I would presume they are "enterprise-class" ATA drives, which can handle this type of head motion. (Standard ATA disks can overheat under a load which is heavy in seeks.) Then again, the 3511 is marketed as a "near-line" rather than "on-line array" ... that may be simply because the SATA drives don''t perform as well as FC. > > I do see this note in the 3511 documentation: "Note - Do not use a Sun StorEdge 3511 SATA array to store single instances of data. It is more suitable for use in configurations where the array has a backup or archival role." > > (I too am curious -- why do you consider yourself down? You''ve got a RAID 5, one disk is down, are you just worried about your current lack of redundancy? [I would be.] Will you be adding a hot spare?) >Yes, I am worried about the lack of redundancy. And, I have some new disks on order, at least one of which will be a hot spare.> Anton > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Dec 18, 2006, at 17:52, Richard Elling wrote:> In general, the closer to the user you can make policy decisions, > the better > decisions you can make. The fact that we''ve had 10 years of RAID > arrays > acting like dumb block devices doesn''t mean that will continue for > the next > 10 years :-) In the interim, we will see more and more > intelligence move > closer to the user.I thought this is what the T10 OSD spec was set up to address. We''ve already got device manufacturers beginning to design and code to the spec. --- .je (ps .. actually it''s closer to 20+ years of RAID and dumb block devices ..)
On Dec 19, 2006, at 07:17, Roch - PAE wrote:> > Shouldn''t there be a big warning when configuring a pool > with no redundancy and/or should that not require a -f flag ?why? what if the redundancy is below the pool .. should we warn that ZFS isn''t directly involved in redundancy decisions? --- .je
Jonathan Edwards writes: > On Dec 19, 2006, at 07:17, Roch - PAE wrote: > > > > > Shouldn''t there be a big warning when configuring a pool > > with no redundancy and/or should that not require a -f flag ? > > why? what if the redundancy is below the pool .. should we > warn that ZFS isn''t directly involved in redundancy decisions? > I think so while pointing to the associated downside of doing that. -r > --- > .je
Jonathan Edwards wrote:> On Dec 19, 2006, at 07:17, Roch - PAE wrote: > >> >> Shouldn''t there be a big warning when configuring a pool >> with no redundancy and/or should that not require a -f flag ? > > why? what if the redundancy is below the pool .. should we > warn that ZFS isn''t directly involved in redundancy decisions?Yes because if ZFS doesn''t know about it then ZFS can''t use it to do corrections when the checksums (which always work) detect problems. -- Darren J Moffat
Darren J Moffat wrote:> Jonathan Edwards wrote: >> On Dec 19, 2006, at 07:17, Roch - PAE wrote: >> >>> >>> Shouldn''t there be a big warning when configuring a pool >>> with no redundancy and/or should that not require a -f flag ? >> >> why? what if the redundancy is below the pool .. should we >> warn that ZFS isn''t directly involved in redundancy decisions? > > Yes because if ZFS doesn''t know about it then ZFS can''t use it to do > corrections when the checksums (which always work) detect problems. >We do not have the intelligent end-to-end management to make these judgments. Trying to make one layer of the stack {stronger, smarter, faster, bigger,} while ignoring the others doesn''t help. Trying to make educated guesses as to what the user intends doesn''t help either. The first bug we''ll get when adding a "ZFS is not going to be able to fix data inconsistency problems" error message to every pool creation or similar operation is going to be "Need a flag to turn off the warning message..."
Torrey McMahon wrote:> Darren J Moffat wrote: >> Jonathan Edwards wrote: >>> On Dec 19, 2006, at 07:17, Roch - PAE wrote: >>> >>>> >>>> Shouldn''t there be a big warning when configuring a pool >>>> with no redundancy and/or should that not require a -f flag ? >>> >>> why? what if the redundancy is below the pool .. should we >>> warn that ZFS isn''t directly involved in redundancy decisions? >> >> Yes because if ZFS doesn''t know about it then ZFS can''t use it to do >> corrections when the checksums (which always work) detect problems. >> > > > We do not have the intelligent end-to-end management to make these > judgments. Trying to make one layer of the stack {stronger, smarter, > faster, bigger,} while ignoring the others doesn''t help. Trying to make > educated guesses as to what the user intends doesn''t help either. > > The first bug we''ll get when adding a "ZFS is not going to be able to > fix data inconsistency problems" error message to every pool creation or > similar operation is going to be "Need a flag to turn off the warning > message..."said "flag" is 2>/dev/null ;-) -- Darren J Moffat
On Dec 19, 2006, at 10:15, Torrey McMahon wrote:> Darren J Moffat wrote: >> Jonathan Edwards wrote: >>> On Dec 19, 2006, at 07:17, Roch - PAE wrote: >>> >>>> >>>> Shouldn''t there be a big warning when configuring a pool >>>> with no redundancy and/or should that not require a -f flag ? >>> >>> why? what if the redundancy is below the pool .. should we >>> warn that ZFS isn''t directly involved in redundancy decisions? >> >> Yes because if ZFS doesn''t know about it then ZFS can''t use it to >> do corrections when the checksums (which always work) detect >> problems. >> > > > We do not have the intelligent end-to-end management to make these > judgments. Trying to make one layer of the stack {stronger, > smarter, faster, bigger,} while ignoring the others doesn''t help. > Trying to make educated guesses as to what the user intends doesn''t > help either."Hi! It looks like you''re writing a block" Would you like help? - Get help writing the block - Just write the block without help - (Don''t show me this tip again) somehow I think we all know on some level that letting a system attempt to guess your intent will get pretty annoying after a while ..
On Dec 19, 2006, at 7:14 AM, Mike Seda wrote:> Anton B. Rang wrote: >>> I have a Sun SE 3511 array with 5 x 500 GB SATA-I disks in a RAID >>> 5. This >>> 2 TB logical drive is partitioned into 10 x 200GB slices. I gave >>> 4 of these slices to a Solaris 10 U2 machine and added each of >>> them to a concat (non-raid) zpool as listed below: >>> >> >> This is certainly a supportable configuration. However, it''s not >> an optimal one. >> > What would be the optimal configuration that you recommend?If you don''t need ZFS redundancy, I would recommend taking a single "slice" for your ZFS file system (e.g. 6 x 200 GB for other file systems, and 1 x 800 GB for the ZFS pool). There would still be contention between the various file systems, but at least ZFS would be working with a single contiguous block of space on the array. Because of the implicit striping in ZFS, what you have right now is analogous to taking a single disk, partitioning it into several partitions, then striping across those partitions -- it works, you can use all of the space, but there''s a rearrangement which means that logically contiguous blocks on disk are no longer physically contiguous, hurting performance substantially.> Yes, I am worried about the lack of redundancy. And, I have some > new disks on order, at least one of which will be a hot spare.Glad to hear it. Anton
On 19-Dec-06, at 11:51 AM, Jonathan Edwards wrote:> > On Dec 19, 2006, at 10:15, Torrey McMahon wrote: > >> Darren J Moffat wrote: >>> Jonathan Edwards wrote: >>>> On Dec 19, 2006, at 07:17, Roch - PAE wrote: >>>> >>>>> >>>>> Shouldn''t there be a big warning when configuring a pool >>>>> with no redundancy and/or should that not require a -f flag ? >>>> >>>> why? what if the redundancy is below the pool .. should we >>>> warn that ZFS isn''t directly involved in redundancy decisions? >>> >>> Yes because if ZFS doesn''t know about it then ZFS can''t use it to >>> do corrections when the checksums (which always work) detect >>> problems. >>> >> >> >> We do not have the intelligent end-to-end management to make these >> judgments. Trying to make one layer of the stack {stronger, >> smarter, faster, bigger,} while ignoring the others doesn''t help. >> Trying to make educated guesses as to what the user intends >> doesn''t help either. > > "Hi! It looks like you''re writing a block" > Would you like help? > - Get help writing the block > - Just write the block without help > - (Don''t show me this tip again) > > somehow I think we all know on some level that letting a system > attempt to guess your intent will get pretty annoying after a while ..I think what you (hilariously) describe above is a system that''s *too stupid* not a system that''s *too smart*... --Toby> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> I thought this is what the T10 OSD spec was set up to address. We''ve already > got device manufacturers beginning to design and code to the spec.Precisely. The interface to block-based devices forces much of the knowledge that the file system and application have about access patterns to be thrown away before the device gets involved. The current OSD specification allows additional knowledge through ("Host X is accessing range Y of file Z.") I''m hopeful that future revisions will go even further, allowing knowledge such as "Process A on host X is accessing range Y of file Z," or even allowing processes/streams to be managed across multiple hosts.) OSD allows attributes as well; individual files could be tagged for a redundancy level, for instance. (To make this relevant to this ZFS discussion, perhaps it''s worth pointing out that ZFS would make an interesting starting point for certain types of OSD implementation.) This message posted from opensolaris.org
sidetracking below... Matt Ingenthron wrote:> Mike Seda wrote: >> >> Basically, is this a supported zfs configuration? > Can''t see why not, but support or not is something only Sun support can > speak for, not this mailing list. > > You say you lost access to the array though-- a full disk failure > shouldn''t cause this if you were using RAID-5 on the array. Perhaps you > mean you''ve had to take it out of production because it couldn''t keep up > with the expected workload? >> You are gonna laugh, but do you think my zfs configuration caused the >> drive failure? > You mention this is a new array. As one Sun person (whose name I can''t > remember) mentioned to me, there''s a high ''infant mortality'' rate among > semiconductors. Components that are going to fail will either do so in > the first 120 days or so, or will run for many years.We don''t use the term "infant mortality" because it elicits the wrong emotion. We use "early life failures" instead.> I''m no expert in the area though and I have no data to prove it, but it > has felt somewhat true as I''ve seen new systems set up over the years. > A quick search for "semiconductor infant mortality" turned up some > interesting results.We (Sun) do have the data and we track it rather closely. If a product shows a higher than expected early life failure rate then we investigate the issue and take corrective action. In general, semiconductor ELFs are discovered through the burn-in tests at the factory. However, there are some mechanical issues which can occur during shipping [1]. And, of course, you can just be unlucky. In any case, I hope that the replacements arrive soon and work well. [1] FOB origin is common -- a manufacturer''s best friend? -- richard
Torrey McMahon wrote:> The first bug we''ll get when adding a "ZFS is not going to be able to > fix data inconsistency problems" error message to every pool creation or > similar operation is going to be "Need a flag to turn off the warning > message..."Richard pines for ditto blocks for data... -- richard
> I do see this note in the 3511 documentation: "Note - Do not use a Sun StorEdge 3511 SATA array to store single instances of data. It is more suitable for use in configurations where the array has a backup or archival role."My understanding of this particular scare-tactic wording (its also in the SANnet II OEM version manual almost verbatim) is that it has mostly to do with the relative unreliability of SATA firmware versus SCSI/FC firmware. Its possible that the disks are lower quality SATA disks too, but that was not what was relayed to us when we looked at buying the 3511 from Sun or the DotHill version (SANnet II). Best Regards, Jason
> > Shouldn''t there be a big warning when configuring a pool > > with no redundancy and/or should that not require a -f flag ? > > why? what if the redundancy is below the pool .. should we > warn that ZFS isn''t directly involved in redundancy decisions?Because if the host controller port goes flaky and starts introducing checksum errors at the block level (a lady a few weeks ago reported this) ZFS will kernel panic, and most users won''t expect it. Users should be warned it seems to me to the real possibility of a kernel panic if they don''t implement redundancy at the zpool level. Just my 2 cents. Best Regards, Jason
On 19-Dec-06, at 2:42 PM, Jason J. W. Williams wrote:>> I do see this note in the 3511 documentation: "Note - Do not use a >> Sun StorEdge 3511 SATA array to store single instances of data. It >> is more suitable for use in configurations where the array has a >> backup or archival role." > > My understanding of this particular scare-tactic wording (its also in > the SANnet II OEM version manual almost verbatim) is that it has > mostly to do with the relative unreliability of SATA firmware versus > SCSI/FC firmware.That''s such a sad sentence to have to read. Either prices are unrealistically low, or the revenues aren''t being invested properly? --Toby> Its possible that the disks are lower quality SATA > disks too, but that was not what was relayed to us when we looked at > buying the 3511 from Sun or the DotHill version (SANnet II). > > > Best Regards, > Jason > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello Jason, Tuesday, December 19, 2006, 8:54:09 PM, you wrote:>> > Shouldn''t there be a big warning when configuring a pool >> > with no redundancy and/or should that not require a -f flag ? >> >> why? what if the redundancy is below the pool .. should we >> warn that ZFS isn''t directly involved in redundancy decisions?JJWW> Because if the host controller port goes flaky and starts introducing JJWW> checksum errors at the block level (a lady a few weeks ago reported JJWW> this) ZFS will kernel panic, and most users won''t expect it. Users JJWW> should be warned it seems to me to the real possibility of a kernel JJWW> panic if they don''t implement redundancy at the zpool level. Just my 2 JJWW> cents. I don''t agree - do not assume sys admin is complete idiot. Sure, lets create GUI and other ''inteligent'' creators which are for very beginner users with no understanding at all. Maybe we need something like vxassist (zfsassist?)? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hi Robert, I don''t think its about assuming the admin is an idiot. It happened to me in development and I didn''t expect it...I hope I''m not an idiot. :-) Just observing the list, a fair amount of people don''t expect it. The likelihood you''ll miss this one little bit of very important information in the manual or man page is pretty high. So it would be nice if an informational message appeared saying something like: "INFORMATION: If a member of this striped zpool becomes unavailable or develops corruption, Solaris will kernel panic and reboot to protect your data." I definitely wouldn''t require any sort of acknowledgment of this message, such as requiring a "-f" flag to continue. Best Regards, Jason On 12/19/06, Robert Milkowski <rmilkowski at task.gda.pl> wrote:> Hello Jason, > > Tuesday, December 19, 2006, 8:54:09 PM, you wrote: > > >> > Shouldn''t there be a big warning when configuring a pool > >> > with no redundancy and/or should that not require a -f flag ? > >> > >> why? what if the redundancy is below the pool .. should we > >> warn that ZFS isn''t directly involved in redundancy decisions? > > JJWW> Because if the host controller port goes flaky and starts introducing > JJWW> checksum errors at the block level (a lady a few weeks ago reported > JJWW> this) ZFS will kernel panic, and most users won''t expect it. Users > JJWW> should be warned it seems to me to the real possibility of a kernel > JJWW> panic if they don''t implement redundancy at the zpool level. Just my 2 > JJWW> cents. > > I don''t agree - do not assume sys admin is complete idiot. > Sure, lets create GUI and other ''inteligent'' creators which are for > very beginner users with no understanding at all. > > Maybe we need something like vxassist (zfsassist?)? > > > > -- > Best regards, > Robert mailto:rmilkowski at task.gda.pl > http://milek.blogspot.com > >
Hello Jason, Tuesday, December 19, 2006, 11:23:56 PM, you wrote: JJWW> Hi Robert, JJWW> I don''t think its about assuming the admin is an idiot. It happened to JJWW> me in development and I didn''t expect it...I hope I''m not an idiot. JJWW> :-) JJWW> Just observing the list, a fair amount of people don''t expect it. The JJWW> likelihood you''ll miss this one little bit of very important JJWW> information in the manual or man page is pretty high. So it would be JJWW> nice if an informational message appeared saying something like: JJWW> "INFORMATION: If a member of this striped zpool becomes unavailable or JJWW> develops corruption, Solaris will kernel panic and reboot to protect JJWW> your data." JJWW> I definitely wouldn''t require any sort of acknowledgment of this JJWW> message, such as requiring a "-f" flag to continue. First sorry for my wording - no offense to anyone was meant. I don''t know it''s like changing every tool in system so: # rm file INFORMATION: by removing file you won''t be able to read it again # mv fileA fileB INFORMATION: by moving fileA to fileB you won''t be able .... # reboot INFORMATION: by rebooting server it won''t be up for some time I don''t know such behavior is desired. If someone don''t understand basic RAID concepts then perhaps some assistant utilities (gui or cli) is more appropriate for them, like Veritas did. But putting warning messages here and there to inform user that he probably doesn''t know what is he doing isn''t a good option. Perhaps zpool status should explicitly show stripe groups with word stripe, like: home stripe c0t0d0 c0t1d0 So it will be more clear to people what they actually configured. I would really hate a system informing me on every command that I possibly don''t know what I''m doing. Maybe just a wrapper: zfsassist redundant space-optimized disk0 disk1 disk2 zfsassist redundant speed-optimized disk0 disk1 disk2 zfsassist non-redundant disk0 disk1 disk2 you get the idea. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hi Robert I didn''t take any offense. :-) I completely agree with you that zpool striping leverages standard RAID-0 knowledge in that if a device disappears your RAID group goes poof. That doesn''t really require a notice...was just trying to be complete. :-) The surprise to me was that detecting block corruption did the same thing...since most hardware RAID controllers and filesystems do a poor job of detecting block-level corruption, kernel panicking on corrupt blocks seems to be what folks like me aren''t expecting until it happens. Frankly, in about 5 years when ZFS and its concepts are common knowledge, warning folks about corrupt blocks re-booting your server would be like notifying them what rm and mv do. However, until then warning them that corruption will cause a panic would definitely aid folks who think they understand because they have existing RAID and SAN knowledge, and then get bitten. Also, I think the zfsassist program is a great idea for newbies. I''m not sure how often it would be used by storage pros new to ZFS. Using the gal with the EMC DMX-3 again as an example (sorry! O:-) ), I''m sure she''s pretty experienced and had no problems using ZFS correctly...just was not expecting a kernel panic on corruption and so was taken by surprise as to what caused the kernel panic when it happened. A warning message when creating a striped pool, would in my case have stuck in my brain so that when the kernel panic happened, corruption of the zpool would have been on my top 10 things to expect as a cause. Anyway, this is probably an Emacs/VI argument to some degree. Now that I''ve experienced a panic from zpool corruption its on the forefront of my mind when designing ZFS zpools, and the warning wouldn''t do much for me now. Though I probably would have preferred to learn from a warning message instead of a panic. :-) Best Regards, Jason On 12/19/06, Robert Milkowski <rmilkowski at task.gda.pl> wrote:> Hello Jason, > > Tuesday, December 19, 2006, 11:23:56 PM, you wrote: > > JJWW> Hi Robert, > > JJWW> I don''t think its about assuming the admin is an idiot. It happened to > JJWW> me in development and I didn''t expect it...I hope I''m not an idiot. > JJWW> :-) > > JJWW> Just observing the list, a fair amount of people don''t expect it. The > JJWW> likelihood you''ll miss this one little bit of very important > JJWW> information in the manual or man page is pretty high. So it would be > JJWW> nice if an informational message appeared saying something like: > > JJWW> "INFORMATION: If a member of this striped zpool becomes unavailable or > JJWW> develops corruption, Solaris will kernel panic and reboot to protect > JJWW> your data." > > JJWW> I definitely wouldn''t require any sort of acknowledgment of this > JJWW> message, such as requiring a "-f" flag to continue. > > First sorry for my wording - no offense to anyone was meant. > > I don''t know it''s like changing every tool in system so: > > # rm file > INFORMATION: by removing file you won''t be able to read it again > > # mv fileA fileB > INFORMATION: by moving fileA to fileB you won''t be able .... > > # reboot > INFORMATION: by rebooting server it won''t be up for some time > > > I don''t know such behavior is desired. > If someone don''t understand basic RAID concepts then perhaps some > assistant utilities (gui or cli) is more appropriate for them, like > Veritas did. But putting warning messages here and there to inform > user that he probably doesn''t know what is he doing isn''t a good > option. > > Perhaps zpool status should explicitly show stripe groups with word > stripe, like: > > home > stripe > c0t0d0 > c0t1d0 > > So it will be more clear to people what they actually configured. > I would really hate a system informing me on every command that I > possibly don''t know what I''m doing. > > > Maybe just a wrapper: > > zfsassist redundant space-optimized disk0 disk1 disk2 > zfsassist redundant speed-optimized disk0 disk1 disk2 > zfsassist non-redundant disk0 disk1 disk2 > > you get the idea. > > > > -- > Best regards, > Robert mailto:rmilkowski at task.gda.pl > http://milek.blogspot.com > >
> "INFORMATION: If a member of this striped zpool becomes unavailable or > develops corruption, Solaris will kernel panic and reboot to protect your data."OK, I''m puzzled. Am I the only one on this list who believes that a kernel panic, instead of EIO, represents a bug? This message posted from opensolaris.org
Anton B. Rang wrote:>> "INFORMATION: If a member of this striped zpool becomes unavailable or >> develops corruption, Solaris will kernel panic and reboot to protect your data." >> > > OK, I''m puzzled. > > Am I the only one on this list who believes that a kernel panic, instead of EIO, represents a bug? > >Nope. I''m with you.
> Anton B. Rang wrote: >>> "INFORMATION: If a member of this striped zpool becomes unavailable or >>> develops corruption, Solaris will kernel panic and reboot to protect your >>> data." >>> >> >> OK, I''m puzzled. >> >> Am I the only one on this list who believes that a kernel panic, instead >> of EIO, represents a bug? >> >> > Nope. I''m with you.no no .. its a "feature". :-P If it walks like a duck and quacks like a duck then its a duck. a kernel panic that brings down a system is a bug. Plain and simple. Dennis
On Tue, 19 Dec 2006, Anton B. Rang wrote:>> "INFORMATION: If a member of this striped zpool becomes unavailable or >> develops corruption, Solaris will kernel panic and reboot to protect your data." > > OK, I''m puzzled. > > Am I the only one on this list who believes that a kernel panic, instead of EIO, represents a bug?I think any use of cmn_err(CE_PANIC,...) should be seen very critical ... because it''s either trying to hide that we haven''t bothered to create recoverability for a known-to-be-problematic situation, or because it''s used where an ASSERT() should''ve been used. FrankH.
Hello Jason, Wednesday, December 20, 2006, 1:02:36 AM, you wrote: JJWW> Hi Robert JJWW> I didn''t take any offense. :-) I completely agree with you that zpool JJWW> striping leverages standard RAID-0 knowledge in that if a device JJWW> disappears your RAID group goes poof. That doesn''t really require a JJWW> notice...was just trying to be complete. :-) JJWW> The surprise to me was that detecting block corruption did the same JJWW> thing...since most hardware RAID controllers and filesystems do a poor JJWW> job of detecting block-level corruption, kernel panicking on corrupt JJWW> blocks seems to be what folks like me aren''t expecting until it JJWW> happens. JJWW> Frankly, in about 5 years when ZFS and its concepts are common JJWW> knowledge, warning folks about corrupt blocks re-booting your server JJWW> would be like notifying them what rm and mv do. However, until then JJWW> warning them that corruption will cause a panic would definitely aid JJWW> folks who think they understand because they have existing RAID and JJWW> SAN knowledge, and then get bitten. Also, I think the zfsassist JJWW> program is a great idea for newbies. I''m not sure how often it would JJWW> be used by storage pros new to ZFS. Using the gal with the EMC DMX-3 JJWW> again as an example (sorry! O:-) ), I''m sure she''s pretty experienced JJWW> and had no problems using ZFS correctly...just was not expecting a JJWW> kernel panic on corruption and so was taken by surprise as to what JJWW> caused the kernel panic when it happened. A warning message when JJWW> creating a striped pool, would in my case have stuck in my brain so JJWW> that when the kernel panic happened, corruption of the zpool would JJWW> have been on my top 10 things to expect as a cause. Anyway, this is JJWW> probably an Emacs/VI argument to some degree. Now that I''ve JJWW> experienced a panic from zpool corruption its on the forefront of my JJWW> mind when designing ZFS zpools, and the warning wouldn''t do much for JJWW> me now. Though I probably would have preferred to learn from a warning JJWW> message instead of a panic. :-) But with other file systems you basically get the same - in many cases kernel crash - but in a more unpredictable way. Now not that I''m fond of current ZFS behavior, I would really like to specify like in UFS if system has to panic or just lock the filesystem (or a pool). As Eric posted some time ago (I think it was Eric) it''s on a list to address. However I still agree that striped pools should be displayed (zpool status) with stripe keyword like mirrors or raidz groups - that would be less confusing for beginners. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On Dec 20, 2006, at 00:37, Anton B. Rang wrote:>> "INFORMATION: If a member of this striped zpool becomes >> unavailable or >> develops corruption, Solaris will kernel panic and reboot to >> protect your data." > > OK, I''m puzzled. > > Am I the only one on this list who believes that a kernel panic, > instead of EIO, represents a bug?I agree as well - did you file a bug on this yet? Inducing kernel panics (like we also do on certain sun cluster failure types) to prevent corruption can often lead to more corruption elsewhere, and usually ripples to throw admins, managers, and users in a panic as well - typically resulting in more corrupted opinions and perceptions of reliability and usability. :) --- .je
Dennis Clarke wrote:>> Anton B. Rang wrote: >>>> "INFORMATION: If a member of this striped zpool becomes unavailable or >>>> develops corruption, Solaris will kernel panic and reboot to protect your >>>> data."Is this the official, long-term stance? I don''t think it is. I think this is an interpretation of current state.>>> OK, I''m puzzled. >>> >>> Am I the only one on this list who believes that a kernel panic, instead >>> of EIO, represents a bug? >>> >>> >> Nope. I''m with you. > > no no .. its a "feature". :-P > > If it walks like a duck and quacks like a duck then its a duck. > > a kernel panic that brings down a system is a bug. Plain and simple.I disagree (nit). A hardware fault can also cause a panic. Faults != bugs. I do agree in principle, though. Panics should be avoided whenever possible. Incidentally, we do track the panic rate and collect panic strings. The last detailed analysis I saw on the data showed that the vast majority were hardware induced. This was a bit of a bummer because we were hoping that the tracking data would lead to identifying software bugs. -- richard
Jason J. W. Williams wrote:> "INFORMATION: If a member of this striped zpool becomes unavailable or > develops corruption, Solaris will kernel panic and reboot to protect > your data."This is a bug, not a feature. We are currently working on fixing it. --matt
>> >> no no .. its a "feature". :-P >> >> If it walks like a duck and quacks like a duck then its a duck. >> >> a kernel panic that brings down a system is a bug. Plain and simple. > > I disagree (nit). A hardware fault can also cause a panic. Faults != bugs.ha ha .. yeah. If the sysadm walks over to a machine an pour coffee in it then I guess it will fault all over the place. No appreciation for coffee I guess. however ... when it comes to storage I expect that a disk failure or hot swap will not cause a fault if and only if there still remains some other storage device that holds the bits in a redundant fashion. so .. disks can fail. That should be okay. even memory and processors can fail. within reason.> I do agree in principle, though. Panics should be avoided whenever > possible.coffee spillage also ..> Incidentally, we do track the panic rate and collect panic strings. The > last detailed analysis I saw on the data showed that the vast majority were > hardware induced. This was a bit of a bummer because we were hoping that > the tracking data would lead to identifying software bugs.but it does imply that the software is way better than the hardware eh ? -- Dennis Clarke
Hi Toby, My understanding on the subject of SATA firmware reliability vs. FC/SCSI is that its mostly related to SATA firmware being a lot younger. The FC/SCSI firmware that''s out there has been debugged for 10 years or so, so it has a lot fewer hiccoughs. Pillar Data Systems told us once that they found most of their SATA "failed disks" were just fine when examined, so their policy is to issue a RESET to the drive when a SATA error is detected, then retry the write/read and keep trucking. If they continue to get SATA errors, then they''ll fail the drive. Looking at the latest Engenio SATA products, I believe they do the same thing. Its probably unfair to expect defect rates out of SATA firmware equivalent to firmware that''s been around for a long time...particularly with the price pressures on SATA. SAS may suffer the same issue, though they seem to have 1,000,000 MTBF ratings like their traditional FC/SCSI counterparts. On a side-note, we experienced a path failure to a drive in our SATA Engenio array (older model), simply popping the drive out and back in fixed the issue...haven''t had any notifications since. A RESET and RETRY would have been nice behavior to have, since popping and reinserting triggered a rebuild of the drive. Best Regards, Jason On 12/19/06, Toby Thain <toby at smartgames.ca> wrote:> > On 19-Dec-06, at 2:42 PM, Jason J. W. Williams wrote: > > >> I do see this note in the 3511 documentation: "Note - Do not use a > >> Sun StorEdge 3511 SATA array to store single instances of data. It > >> is more suitable for use in configurations where the array has a > >> backup or archival role." > > > > My understanding of this particular scare-tactic wording (its also in > > the SANnet II OEM version manual almost verbatim) is that it has > > mostly to do with the relative unreliability of SATA firmware versus > > SCSI/FC firmware. > > That''s such a sad sentence to have to read. > > Either prices are unrealistically low, or the revenues aren''t being > invested properly? > > --Toby > > > Its possible that the disks are lower quality SATA > > disks too, but that was not what was relayed to us when we looked at > > buying the 3511 from Sun or the DotHill version (SANnet II). > > > > > > Best Regards, > > Jason > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hi Robert, I agree with others here that the kernel panic is undesired behavior. If ZFS would simply offline the zpool and not kernel panic, that would obviate my request for an informational message. It''d be pretty darn obvious what was going on. Best Regards, Jason On 12/20/06, Robert Milkowski <rmilkowski at task.gda.pl> wrote:> Hello Jason, > > Wednesday, December 20, 2006, 1:02:36 AM, you wrote: > > JJWW> Hi Robert > > JJWW> I didn''t take any offense. :-) I completely agree with you that zpool > JJWW> striping leverages standard RAID-0 knowledge in that if a device > JJWW> disappears your RAID group goes poof. That doesn''t really require a > JJWW> notice...was just trying to be complete. :-) > > JJWW> The surprise to me was that detecting block corruption did the same > JJWW> thing...since most hardware RAID controllers and filesystems do a poor > JJWW> job of detecting block-level corruption, kernel panicking on corrupt > JJWW> blocks seems to be what folks like me aren''t expecting until it > JJWW> happens. > > JJWW> Frankly, in about 5 years when ZFS and its concepts are common > JJWW> knowledge, warning folks about corrupt blocks re-booting your server > JJWW> would be like notifying them what rm and mv do. However, until then > JJWW> warning them that corruption will cause a panic would definitely aid > JJWW> folks who think they understand because they have existing RAID and > JJWW> SAN knowledge, and then get bitten. Also, I think the zfsassist > JJWW> program is a great idea for newbies. I''m not sure how often it would > JJWW> be used by storage pros new to ZFS. Using the gal with the EMC DMX-3 > JJWW> again as an example (sorry! O:-) ), I''m sure she''s pretty experienced > JJWW> and had no problems using ZFS correctly...just was not expecting a > JJWW> kernel panic on corruption and so was taken by surprise as to what > JJWW> caused the kernel panic when it happened. A warning message when > JJWW> creating a striped pool, would in my case have stuck in my brain so > JJWW> that when the kernel panic happened, corruption of the zpool would > JJWW> have been on my top 10 things to expect as a cause. Anyway, this is > JJWW> probably an Emacs/VI argument to some degree. Now that I''ve > JJWW> experienced a panic from zpool corruption its on the forefront of my > JJWW> mind when designing ZFS zpools, and the warning wouldn''t do much for > JJWW> me now. Though I probably would have preferred to learn from a warning > JJWW> message instead of a panic. :-) > > But with other file systems you basically get the same - in many cases > kernel crash - but in a more unpredictable way. Now not that I''m fond > of current ZFS behavior, I would really like to specify like in UFS if > system has to panic or just lock the filesystem (or a pool). > > As Eric posted some time ago (I think it was Eric) it''s on a list to > address. > > However I still agree that striped pools should be displayed (zpool > status) with stripe keyword like mirrors or raidz groups - that would > be less confusing for beginners. > > -- > Best regards, > Robert mailto:rmilkowski at task.gda.pl > http://milek.blogspot.com > >
Jason J. W. Williams wrote:> I agree with others here that the kernel panic is undesired behavior. > If ZFS would simply offline the zpool and not kernel panic, that would > obviate my request for an informational message. It''d be pretty darn > obvious what was going on.What about the root/boot pool? James C. McPherson -- Solaris kernel software engineer, system admin and troubleshooter http://www.jmcp.homeunix.com/blog Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson
James C. McPherson wrote:> Jason J. W. Williams wrote: >> I agree with others here that the kernel panic is undesired behavior. >> If ZFS would simply offline the zpool and not kernel panic, that would >> obviate my request for an informational message. It''d be pretty darn >> obvious what was going on. > > What about the root/boot pool?The default with ufs today is onerror=panic, so having ZFS do likewise is no backwards step. What other mechanisms do people suggest be implemented to guarantee the integrity of your data on zfs? James C. McPherson -- Solaris kernel software engineer, system admin and troubleshooter http://www.jmcp.homeunix.com/blog Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson
Not sure. I don''t see an advantage to moving off UFS for boot pools. :-) -J On 12/20/06, James C. McPherson <James.C.McPherson at gmail.com> wrote:> Jason J. W. Williams wrote: > > I agree with others here that the kernel panic is undesired behavior. > > If ZFS would simply offline the zpool and not kernel panic, that would > > obviate my request for an informational message. It''d be pretty darn > > obvious what was going on. > > What about the root/boot pool? > > > James C. McPherson > -- > Solaris kernel software engineer, system admin and troubleshooter > http://www.jmcp.homeunix.com/blog > Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson >
Jason J. W. Williams wrote:> Not sure. I don''t see an advantage to moving off UFS for boot pools. :-) > > -JExcept of course that snapshots & clones will surely be a nicer way of recovering from "adverse administrative events"... -= Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
On 20-Dec-06, at 3:05 PM, Jason J. W. Williams wrote:> Hi Toby, > > My understanding on the subject of SATA firmware reliability vs. > FC/SCSI is that its mostly related to SATA firmware being a lot > younger. ... Its probably unfair to expect defect rates out of SATA > firmware equivalent to firmware that''s been around for a long > time...particularly with the price pressures on SATA. SAS may suffer > the same issue, though they seem to have 1,000,000 MTBF ratings like > their traditional FC/SCSI counterparts.Jason, Thanks for the great answer. --Toby
Bart Smaalders wrote:> Jason J. W. Williams wrote: >> Not sure. I don''t see an advantage to moving off UFS for boot pools. :-) >> >> -J > > Except of course that snapshots & clones will surely be a nicer > way of recovering from "adverse administrative events"...and make live upgrade and patching so much nicer. lucopy is often one of the most time consuming parts of doing live upgrade. The other HUGE advantage from ZFS root is that you don''t need to prepare in advance for live upgrade because file systems are cheap and easily added in ZFS unlike with UFS root where you need at least one vtoc slice per live upgrade boot environment you want to keep around. -- Darren J Moffat
Anton Rang wrote:> On Dec 19, 2006, at 7:14 AM, Mike Seda wrote: > >> Anton B. Rang wrote: >>>> I have a Sun SE 3511 array with 5 x 500 GB SATA-I disks in a RAID >>>> 5. This >>>> 2 TB logical drive is partitioned into 10 x 200GB slices. I gave 4 >>>> of these slices to a Solaris 10 U2 machine and added each of them >>>> to a concat (non-raid) zpool as listed below: >>>> >>> >>> This is certainly a supportable configuration. However, it''s not an >>> optimal one. >>> >> What would be the optimal configuration that you recommend? > > If you don''t need ZFS redundancy, I would recommend taking a single > "slice" for your ZFS file system (e.g. 6 x 200 GB for other file > systems, and 1 x 800 GB for the ZFS pool). There would still be > contention between the various file systems, but at least ZFS would be > working with a single contiguous block of space on the array. > > Because of the implicit striping in ZFS, what you have right now is > analogous to taking a single disk, partitioning it into several > partitions, then striping across those partitions -- it works, you can > use all of the space, but there''s a rearrangement which means that > logically contiguous blocks on disk are no longer physically > contiguous, hurting performance substantially.Hmm... But, how is my current configuration (1 striped zpool consisting of 4 x 200 GB luns from a hardware RAID 5 logical drive) "analogous to taking a single disk, partitioning it into several partitions, then striping across those partitions" if each 200 GB lun is presented to solaris as a whole disk: Current partition table (original): Total disk sectors available: 390479838 + 16384 (reserved sectors) Part Tag Flag First Sector Size Last Sector 0 usr wm 34 186.20GB 390479838 1 unassigned wm 0 0 0 2 unassigned wm 0 0 0 3 unassigned wm 0 0 0 4 unassigned wm 0 0 0 5 unassigned wm 0 0 0 6 unassigned wm 0 0 0 8 reserved wm 390479839 8.00MB 390496222 Why is my current configuration not analogous to taking 4 disks and striping across those 4 disks?> >> Yes, I am worried about the lack of redundancy. And, I have some new >> disks on order, at least one of which will be a hot spare. > > Glad to hear it. > > Anton > >_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> Hmm... But, how is my current configuration (1 striped zpool consisting of > 4 x 200 GB luns from a hardware RAID 5 logical drive) "analogous to > taking a single disk, partitioning it into several partitions, then > striping across those partitions" if each 200 GB lun is presented to > solaris as a whole disk:The partitioning is happening within the array, when you create the LUNs. When you have a single RAID 5 logical drive and you create multiple LUNs, you''re effectively partitioning each of the underlying disks. Anton This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss